Lab 6- R Function Lab

Author

Cecilia Wang (PID:18625854)

Background

All functions in R have at least 3 things:

  • A Name that we use to call the function.
  • One or more input arguments
  • The body the lines of R code that do the work

Our first function

let’s write a silly wee function to called add() to add some numbers (the input arguments)

add <- function(x,y){x+y}

Now we can use this function

add(100,1)
[1] 101
add(x=c(100,1,100),y=1)
[1] 101   2 101
add(x=10, y=10)
[1] 20

Q.What is I give multiple element vector to x and y

add(x=c(100,1),(y=c(100,1)))
[1] 200   2

Q.What If I give three inputs to the function?

it gives you an error message

Q.What If I give only one inputs to the function?

addnew <- function(x,y=1){x+y}

addnew(x=100)
[1] 101
addnew(c(100,1),100)
[1] 200 101

If we write our function with input arguments having no default value then the user will be required to set them whey they use the function. We can give our input arguments “default” values by setting them equal to some variable. e.g., y=1.

A second function

Let’s try something more interesting: Make a sequence generating tool…

The sample() function can be useful starting point.

sample(1:10,size=4)
[1] 7 1 6 3

Q. Generate 9 random numbers taken form the input vector x=1:10

sample(1:10, size=9)
[1]  3  8 10  5  1  7  6  2  4

Q. Generate 12 random numbers taken form the input vector x=1:10

sample(1:10, size=12, replace=TRUE)
 [1]  1  9  2  3  2  1  9  3  1 10  9  3

Q. Write code for a sample() function that generates nucletide sequences of length 6?

sample(x=c("A","C","G","T"), size=6, replace=TRUE)
[1] "C" "G" "C" "G" "G" "A"

Q.Write a first function generate_dna() that returns a user specifies length DNA sequence:

generate_dna <- function(len=6) {bases <- c("A", "T", "C", "G")
(sample(bases, size = len, replace = TRUE))
}
generate_dna(10)
 [1] "C" "A" "G" "C" "G" "T" "T" "C" "G" "G"

Key points Every function in R look fundamentally the same in term of its structure. Basically 3 things: name, input, and body.

name <- function(input){
body
}

Function can have multiple inputs. These can be required arguments or optional arguments. With optional arguments having a set default value.

Q. Modify and improve our generate_dna() function to retun it’s generated sequence om a more standard format like “AGTATA” rathe than the vector “A”,“C”,“G”,“T”.

generate_dna <- function(len=6, fasta=TRUE) {
  
ans<- sample(x=c("A","C","G","T"), size = len, replace=TRUE)

if (fasta) {
  cat("single-element vector output")
ans<-paste(ans,collapse = "")
} else{
  cat("Multi-element vector output")
}
return(ans)
}


generate_dna(10)
single-element vector output
[1] "ACACAGCTGA"

The paste() function - it’s job is to join up or stick together (a.k.a. paste) input strings together.

paste("alice","loves R", sep=" ")
[1] "alice loves R"

Flow control means where the R brain goes in your code

goodmood <- TRUE

if(goodmood){
  cat("Great!")
} else {
  cat("Bummer!")
}
Great!

A Protein generating function

Q. Write a function that generate a user specifed length protein sequence.

Q. Use that function to generate random protein sequence between length 6 and 12.

generate_protein <- function(len, fasta=TRUE) {
# The amino acids to sample from 
aa<- c("A","D","R","N","C","Q","E","G","H","I","L","K","M","F","P","S","T","W","Y","V")

# Draw amino acids sequence 
ans <-sample(aa, size = len, replace=T)

return(ans)
}
for(i in 6:12) {
  #FASTA ID line ">id"
  cat(">",i,sep = "", "\n")
  # protein sequence line 
  cat(generate_protein(i),sep = "","\n")
}
>6
ASTYGF
>7
MIGMGQD
>8
ACNFATFH
>9
GETRYQKEH
>10
ALLVMNCNEY
>11
NWNDGLTKHPR
>12
DSKQAWVFTDWE

Q. Are any of your sequences unique i.e. not found anywhere in nature

Sequence 8-12 are unique, not found anywhere in nature.