Spring 2024 Midterm review for Bio321G Jeopardy Template

The biology and statistics

What's that data structure?!
and OOO

Extractions!

if( is.logical(Question) ){
print( "LOGIC!" )
}else{ print("FUNCTIONS!") }

Regex / ggplot

What are two properties of a normal distribution?

Symmetrical and bell shaped

Continuous

Defined by a mean and standard deviation. Specifically, according to the probability density function for normal distributions.

Note: Normal distribution do not necessarily have a mean of 0 and standard deviation of 1. That is the description of a specific normal distribution called the "standard normal".

Besides being familiar with the object, how could you recognize the class of the iris object based on the below outputs? I am looking for 2 or more distinct reasons.

Printing out the contents allows you to see it prints in the form of a table as it is organized into rows and columns. This by itself lets you narrow down the base R options to data.frames and matrixes

The precense of multiple object classes organized into named columns is sufficient to identify it as list-type (and rule out a matrix).

The precense of a row (obs.) descriptor is indicative of data.frame class object.

The dollar sign symbols in the str() output are used to describe the names of list-type objects.

Conceptually describe the answer: What values end up getting extracted in the following?

iris[1:10 , 2:3][,1]

The first ten values from the second column of `iris`.

What value will the following print?

x <- 1
y <- 2

if(x == y){
    print(1)
}else if( x > y){
    print(2)
}else{
    print(3)
}

Write a pattern that makes the following code return values that contain an "e" eventually followed by an "m".

li <- c("Lorem", "ipsum", "dolor","sit",
        "amet", "consectetur","adipiscing",
        "elit", "sed","do", "eiusmod")
li[grep(____,li)]

Output:
[1] "Lorem"   "eiusmod"

# an e 
# followed by any character 0 or more times 
# followed by an m
li[grep("e.*m", li)]

How does read depth data (and count-based data in general) differ from normal distributions?

Read depths are discrete (integers) and positive values only.

Extras: The frequency of 0 depth was also far below what it would have been predicted to be due to samtools's filtering process.

Even deeper: Finally, read depth (and count based data in general) is specifically not a normal distribution, though it can superficially resemble one. Instead, they are more appropriately described using negative binomial models.

What is the class of obj1. Note: iris by itself is a data.frame.

obj1 <- iris[1:10,1:2]

data.frame; because more than one column was extracted.

What value does the following create?

paste0(
    c(LETTERS[5],
      letters[c(24,20,18,1,3,20,9,15,14,19)]),
    collapse = ""
)

"Extractions"

Why does the following produce an error?

x <- c(1, 2, 3)
y <- c(3, 2, 1)

if(x == y){
    print(1)
}else if( x > y){
    print(2)
}else{
    print(3)
}

Because if statements require a length 1 logical vector and the first current comparison produces an error because x==y is a length 3 logical vector.

Write a pattern that makes the following code return all values that begin with an "s" or end with an "m".

li <- c("Lorem", "ipsum", "dolor","sit",
        "amet", "consectetur","adipiscing",
        "elit", "sed","do", "eiusmod")
li[grep(____,li)]

Output:
[1] "Lorem" "ipsum" "sit"   "sed"

# s at the beginning of the string
# OR
# m at the end of the string
li[grep("^s|m$",li)]

What was the major difference between first and second generation of sequencing?

Sanger sequencing only read a single sequence at a time. Separate solutions had to be used for separate sequences.

NGS is a massively (millions of clusters) parallel sequencing process.

What is the class of each obj# in the output below?

obj1 <- letters[1:4]
obj2 <- data.frame(lower=obj1,pos = 1:4)
obj3 <- obj2$pos
obj4 <- obj2[2:3, ]
obj5 <- obj2[2:3,1]

character
data.frame
integer (numeric or double would suffice for credit)
data.frame
character

Write code that prints the rows of mtcars where both of the following are TRUE:
- The vs column is 1 (stright engine shape)
- The hp column is less than 100 (horsepower)

This should result in 8 rows and 11 columns.

mtcars[mtcars$vs==1&mtcars$hp<100,]

What are the three (not a typo) values the following prints to the console?

xyz <- function(x , y = 3, z = 5){ 
    out <- (x - y)/z 
    return(out) 
} 
xyz(18) 
xyz(12,2) 
out <- xyz(z=2,5)
xyz(z=2,13)

xyz(18) # 3
xyz(12,2) # 2
out <- xyz(z=2,5) #Nothing
xyz(z=2,13) # 5

Write a pattern that makes the following code return all values with the following substitution:

- text where a "p" is followed by an "i" should be replaced with "!!!".
- In the event there is more than one "i" following the "p", the match should stop at the first "i".

li <- c("Lorem", "ipsum", "dolor","sit",
        "amet", "consectetur","adipiscing",
        "elit", "sed","do", "eiusmod")
gsub(_____,"!!!",li)

Output:
 [1] "Lorem"       "ipsum"       "dolor"      
 [4] "sit"         "amet"        "consectetur"
 [7] "adi!!!scing" "elit"        "sed"        
[10] "do"          "eiusmod"

p = a p 
.*? = followed by any character 0 or more times but stop once a match is found
i = an i
gsub("p.*?i", "!!!", li)

What was the most problematic issue with genome assembly that long-read sequencing overcame to allow for the newest generation of genome assemblies such as the human telomere-to-telomere genome?

Repetitive sequences used to be larger than sequence fragments, so scientists were unable to stitch the genome together from shorter reads across regions of repetitive sequence.

Put the following functions in order of execution in the following code. The last to complete should be last in the reordered values.

Code
avgStrEngMpg <- mean(mtcars[mtcars$vs==1 , "mpg"])

Functions (listed in order of appearance)
<-
mean()
[
$
==

[

mean()

How many extractions?

iris[iris$Sepal.Length < 4.5 , 1:2 ]$Sepal.Length[1:3]

4 extractions

Explanations
iris$Sepal.Length

iris[iris$Sepal.Length<4.5 , 1:2 ]

iris[iris$Sepal.Length<4.5 , 1:2 ]$Sepal.Length

iris[iris$Sepal.Length<4.5 , 1:2 ]$Sepal.Length[1:3]

What will be printed when all below is run (use brain not R)?

Question <- function(x){
    if(x==1){
        print("LOGIC!")
    }else if(x==2){
        print("FUNCTION!")
    }else{
        print("Not anticipated!")
    }
    return(x)
}

if( is.logical(Question) ){
    Question(1)
}else{ 
    Question(2) 
}

6 pts (because the function said to print):

[1] "FUNCTION!"

1 pt (because it returned unassigned data):

[1] 2

Fix the following code so that it makes the subsequent plot.

Code to fix:

ggplot(iris,
       mapping = aes(iris$Sepal.Length,
                     "species"))+
    geom_point()

Desired output:

ggplot(iris, mapping = aes(Species,Sepal.Length))+
    geom_point()

Why would I want to visualize biological measurements on a scatterplot or another two dimensional graphic as opposed to doing something with 1-dimensional graphics like histograms?

The answer is not just "conciseness" similar.

Because you cannot visualize interactions between measurements on a one dimensional plot. This means that you might miss seeing important correlations.

What is the class of obj1? Provide code that calculates the number of FALSEs (146)?

obj1 <- iris$Sepal.Length<4.5

Logical vector,
Many ways:
sum(!obj1)
table(obj1)["FALSE"]
length(obj1)-sum(obj1)
(1-mean(obj1))*length(obj1)

Write code that:
- prints all columns across the rows of `iris` where
- the Petal length or Petal width are greater than 6.6 or 2.4 respectively.
- This should be 6 rows total.
- I have shown these rows in red in the plot below.

iris[iris$Petal.Length>6.6|iris$Petal.Width>2.4 , ]

Nolan often fidgets while teaching class. This sometimes results in objects flying into the air.

Assuming the highly realistic scenario of him flinging a frictionless object directly upwards at an initial velocity of 25 m/s on a post-apocalyptic Earth that has an acceleration due to gravity of exactly -10 m/s^2, how long would he have to wait for the object to land* back in his unmoved hand? This answer is an integer.

For a bonus point, how high did it travel?

You may use R.

#Here is a function that calculates displacement in 
#   1-dimension at a constant acceleration.
disp <- function(t, vi, a){
    d <- 0.5*a*t^2+vi*t
    return(d)
}

* Assume that the rogue planetoid that added mass to the Earth (and presumably liquified most of the planet's surface) also vaporized the roof of the building.

disp(1:10,25,-10) # 5 seconds results in 0 displacement

disp(5/2,25,-10) # 31.25 m

Fix the following code so that it makes the subsequent plot.

Code to fix:

ggplot(iris,
       mapping = aes(Species,Sepal.Length))+
    geom_point(size = 3)+
    geom_violin(color="black")+
    scale_color_viridis_b()+
    theme_dark()

Desired output:

ggplot(iris,
       mapping = aes(Species,Sepal.Length,
                     color=Sepal.Width))+
    geom_violin(color="black")+
    geom_point(size = 3)+
    scale_color_viridis_b()+
    theme_dark()

### OR ###

ggplot(iris,
       mapping = aes(Species,Sepal.Length))+
    geom_violin(color="black")+
    geom_point(aes(color=Sepal.Width), size = 3)+
    scale_color_viridis_b()+
    theme_dark()