The biology and statistics
What's that data structure?!
and OOO
Extractions!
if( is.logical(Question) ){
print( "LOGIC!" )
}else{ print("FUNCTIONS!") }
Regex / ggplot
5

What are two properties of a normal distribution?

Symmetrical and bell shaped

Continuous

Defined by a mean and standard deviation. Specifically, according to the probability density function for normal distributions.

Note: Normal distribution do not necessarily have a mean of 0 and standard deviation of 1. That is the description of a specific normal distribution called the "standard normal".

5

Besides being familiar with the object, how could you recognize the class of the iris object based on the below outputs? I am looking for 2 or more distinct reasons.

Printing out the contents allows you to see it prints in the form of a table as it is organized into rows and columns. This by itself lets you narrow down the base R options to data.frames and matrixes

The precense of multiple object classes organized into named columns is sufficient to identify it as list-type (and rule out a matrix). 

The precense of a row (obs.) descriptor is indicative of data.frame class object. 

The dollar sign symbols in the str() output are used to describe the names of list-type objects. 


5

Conceptually describe the answer: What values end up getting extracted in the following?

iris[1:10 , 2:3][,1]

The first ten values from the second column of `iris`.

5

What value will the following print?

x <- 1
y <- 2

if(x == y){
    print(1)
}else if( x > y){
    print(2)
}else{
    print(3)
}

3

5

Write a pattern that makes the following code return values that contain an "e" eventually followed by an "m". 

li <- c("Lorem", "ipsum", "dolor","sit",
        "amet", "consectetur","adipiscing",
        "elit", "sed","do", "eiusmod")
li[grep(____,li)]
Output:
[1] "Lorem"   "eiusmod"
# an e 
# followed by any character 0 or more times 
# followed by an m
li[grep("e.*m", li)]
5

How does read depth data (and count-based data in general) differ from normal distributions?

Read depths are discrete (integers) and positive values only.

Extras: The frequency of 0 depth was also far below what it would have been predicted to be due to samtools's filtering process.

Even deeper: Finally, read depth (and count based data in general) is specifically not a normal distribution, though it can superficially resemble one. Instead, they are more appropriately described using negative binomial models.

5

What is the class of obj1. Note: iris by itself is a data.frame.

obj1 <- iris[1:10,1:2]

data.frame; because more than one column was extracted.

5

What value does the following create?

paste0(
    c(LETTERS[5],
      letters[c(24,20,18,1,3,20,9,15,14,19)]),
    collapse = ""
)

"Extractions"

5

Why does the following produce an error?

x <- c(1, 2, 3)
y <- c(3, 2, 1)

if(x == y){
    print(1)
}else if( x > y){
    print(2)
}else{
    print(3)
}

Because if statements require a length 1 logical vector and the first current comparison produces an error because x==y is a length 3 logical vector. 


5

Write a pattern that makes the following code return all values that begin with an "s" or end with an "m". 

li <- c("Lorem", "ipsum", "dolor","sit",
        "amet", "consectetur","adipiscing",
        "elit", "sed","do", "eiusmod")
li[grep(____,li)]
Output:
[1] "Lorem" "ipsum" "sit"   "sed" 
# s at the beginning of the string
# OR
# m at the end of the string
li[grep("^s|m$",li)]
5

What was the major difference between first and second generation of sequencing?

Sanger sequencing only read a single sequence at a time. Separate solutions had to be used for separate sequences.

NGS is a massively (millions of clusters) parallel sequencing process. 

5

What is the class of each obj# in the output below?

obj1 <- letters[1:4]
obj2 <- data.frame(lower=obj1,pos = 1:4)
obj3 <- obj2$pos
obj4 <- obj2[2:3, ]
obj5 <- obj2[2:3,1]

character
data.frame
integer (numeric or double would suffice for credit)
data.frame
character


5

Write code that prints the rows of mtcars where both of the following are TRUE:
- The vs column is 1 (stright engine shape)
- The hp column is less than 100 (horsepower)

This should result in 8 rows and 11 columns. 

mtcars[mtcars$vs==1&mtcars$hp<100,]

5

What are the three (not a typo) values the following prints to the console?

xyz <- function(x , y = 3, z = 5){ 
    out <- (x - y)/z 
    return(out) 
} 
xyz(18) 
xyz(12,2) 
out <- xyz(z=2,5)
xyz(z=2,13)

xyz(18)  # 3
xyz(12,2)  # 2
out <- xyz(z=2,5) #Nothing
xyz(z=2,13) # 5


5

Write a pattern that makes the following code return all values with the following substitution: 

- text where a "p" is followed by an "i" should be replaced with "!!!".
- In the event there is more than one "i" following the "p", the match should stop at the first "i". 

li <- c("Lorem", "ipsum", "dolor","sit",
        "amet", "consectetur","adipiscing",
        "elit", "sed","do", "eiusmod")
gsub(_____,"!!!",li)
Output:
 [1] "Lorem"       "ipsum"       "dolor"      
 [4] "sit"         "amet"        "consectetur"
 [7] "adi!!!scing" "elit"        "sed"        
[10] "do"          "eiusmod" 
p = a p 
.*? = followed by any character 0 or more times but stop once a match is found
i = an i
gsub("p.*?i", "!!!", li)
7

What was the most problematic issue with genome assembly that long-read sequencing overcame to allow for the newest generation of genome assemblies such as the human telomere-to-telomere genome?

Repetitive sequences used to be larger than sequence fragments, so scientists were unable to stitch the genome together from shorter reads across regions of repetitive sequence. 

7

Put the following functions in order of execution in the following code. The last to complete should be last in the reordered values. 

Code
avgStrEngMpg <- mean(mtcars[mtcars$vs==1 , "mpg"])

Functions (listed in order of appearance)
<-
mean()
[
$
==


$

==

[

mean()

<-

7

How many extractions?

iris[iris$Sepal.Length < 4.5 , 1:2 ]$Sepal.Length[1:3]

4 extractions

Explanations
iris$Sepal.Length

iris[iris$Sepal.Length<4.5 , 1:2 ]

iris[iris$Sepal.Length<4.5 , 1:2 ]$Sepal.Length

iris[iris$Sepal.Length<4.5 , 1:2 ]$Sepal.Length[1:3]

7

What will be printed when all below is run (use brain not R)?

Question <- function(x){
    if(x==1){
        print("LOGIC!")
    }else if(x==2){
        print("FUNCTION!")
    }else{
        print("Not anticipated!")
    }
    return(x)
}

if( is.logical(Question) ){
    Question(1)
}else{ 
    Question(2) 
}

6 pts (because the function said to print):

[1] "FUNCTION!" 

1 pt (because it returned unassigned data): 

[1] 2

7

Fix the following code so that it makes the subsequent plot.

Code to fix:

ggplot(iris,
       mapping = aes(iris$Sepal.Length,
                     "species"))+
    geom_point()


Desired output:

ggplot(iris, mapping = aes(Species,Sepal.Length))+
    geom_point()
10

Why would I want to visualize biological measurements on a scatterplot or another two dimensional graphic as opposed to doing something with 1-dimensional graphics like histograms? 

The answer is not just "conciseness" similar.

Because you cannot visualize interactions between measurements on a one dimensional plot. This means that you might miss seeing  important correlations.

10

What is the class of obj1? Provide code that calculates the number of FALSEs (146)?

obj1 <- iris$Sepal.Length<4.5

Logical vector,
Many ways:
sum(!obj1)
table(obj1)["FALSE"]
length(obj1)-sum(obj1)
(1-mean(obj1))*length(obj1)

10

Write code that:
- prints all columns across the rows of `iris` where
- the Petal length or Petal width are greater than 6.6 or 2.4 respectively.
- This should be 6 rows total.
- I have shown these rows in red in the plot below. 

iris[iris$Petal.Length>6.6|iris$Petal.Width>2.4 , ]

10

Nolan often fidgets while teaching class. This sometimes results in objects flying into the air.

Assuming the highly realistic scenario of him flinging a frictionless object directly upwards at an initial velocity of 25 m/s on a post-apocalyptic Earth that has an acceleration due to gravity of exactly -10 m/s^2, how long would he have to wait for the object to land* back in his unmoved hand? This answer is an integer. 

For a bonus point, how high did it travel?

You may use R. 

#Here is a function that calculates displacement in 
#   1-dimension at a constant acceleration.
disp <- function(t, vi, a){
    d <- 0.5*a*t^2+vi*t
    return(d)
}

* Assume that the rogue planetoid that added mass to the Earth (and presumably liquified most of the planet's surface) also vaporized the roof of the building.

disp(1:10,25,-10) # 5 seconds results in 0 displacement

disp(5/2,25,-10) # 31.25 m


10

Fix the following code so that it makes the subsequent plot.

Code to fix:

ggplot(iris,
       mapping = aes(Species,Sepal.Length))+
    geom_point(size = 3)+
    geom_violin(color="black")+
    scale_color_viridis_b()+
    theme_dark()

Desired output:

ggplot(iris,
       mapping = aes(Species,Sepal.Length,
                     color=Sepal.Width))+
    geom_violin(color="black")+
    geom_point(size = 3)+
    scale_color_viridis_b()+
    theme_dark()

### OR ###

ggplot(iris,
       mapping = aes(Species,Sepal.Length))+
    geom_violin(color="black")+
    geom_point(aes(color=Sepal.Width), size = 3)+
    scale_color_viridis_b()+
    theme_dark()