Week 1 Content
(Sanger/HGP/Etc)
Week 2 Content
(Illumina/Contigs)
Week 3 Content
(Phylogenetics/genomics)
Week 4
(SNPs, Odd ratios)
100

What is a ddNTP?

A nucleotide lacking a hydroxyl group necessary to continue strand polymerization with other nucleotides

100

What is the N50, conceptually?

The size of the contig at which 50% of the genome can be found in contigs greater than it in size.

100

Find the hamming distance between these two organisms:

Human: 5' AGGTACCTAGA 3'
Mouse:  5' CGGATGGTAGC 3'

6

100

Can SNPs be used as evidence for biological races?

No, since the portion of the genome accounting for differences by region only makes up around 15% of SNPs. 85% is global variation between people, meaning it's plausible for individuals even from the same region to be more different than individuals from different regions.

200


Green = A
Blue = T
Yellow = G
Red = C
Find the synthesized strand sequence

5' ATGGCCCTTAAG 3'

200

Blue = A
Red = T
Green = G
Yellow = C

What is the synthesized strands sequence?

5' AATGAC 3'

200


What is the amino acid sequence for the following sequence of RNA:

5' CCGUGAGGAAUGGGUGUGAGUUAGUUGGGU3'

Met(start)-Gly-Val-Ser

200

What are the types of SNPs, and what are their significance/effect

Silent SNPs affect noncoding regions or are synonymous and still code for the original amino acid. May affect regulation or translation speed.

If non-synonymous in a coding region, can be missense (changes the amino acid) or nonsense (changes it to a stop codon).

300


Green = A
Blue = T
Yellow = G
Red = C
Find the template strand sequence

5' CTTAAGGGCCAT 3'

300

Blue = A
Red = T
Green = G
Yellow = C

What is the template strands sequence?

5' GTCATT 3'

300

What is a realistic length of a real gene in the genome?

400+ amino acids.

Not ~20! That is the average expected length assuming random codon assembly based on a stop codons proportion of the total codon pool it makes up.

300

Find the odds ratio and determine if the A allele is protective, a risk factor, or no association.


      Healthy     Heart Disease
A    117                53

G     440                   200

No association, odds ratio is 1 

400

What is the difference between hierarchical shotgun sequencing and whole genome shotgun sequencing?

Whole genome shotgun sequencing involved fragmenting the entire genome into little pieces, and sanger sequencing each of these small fragments and aligning them together.

Hierarchical genome sequencing mapped larger fragments to chromosomes, fragmented those, and then sequenced those.

400

Describe pair end sequencing and why it helps with assembly of contigs

Pair end sequencing allows for techniques like scaffolding, where we have a known length of basepairs between two contigs we can use as a reference for placing other contigs. This is helpful when trying to sort out repeats. In addition, if one of the pair is found in one contig, and the other is found in another contig, then we easily overlap these two contigs

400

Give 3 features hinting that something may be a gene sequence.

CpG islands
Promoters
Codon preference
Intron/exon boundaries

400

Describe a haplotype block and why it is important in SNP products like 23 and me.

A haplotype block is a combination of nearby genotypes that tend to be inherited together. Meaning, by knowing one SNP in that area, we can potentially infer several others. This allows a sort of shortcut so you do not have to check all SNPs in the genome.

500

How many years would you be washing your hands if you decided to sing the entire human genome, to the tune of the 'Alphabet Song' if the song takes 20s to complete and has 26 letters.

73.18 years

500

How does a quality score help differentiate if a nucleotide difference is due to a sequencing error or a mutation.

Quality scores are calculated based on luminescence after addition of fluorescent bases. If its a sequencing error, it will be one base from one strand fluorescing a different color compared to all the other strands fluorescing a similar color. This will lower the quality score. A mutation in the organisms genome will be present in all fragments being sequenced, so there will be no disagreement in the fluorescence, so the quality score will be unaffected.

500

Describe a process used for phylogenetics, a process used for phylogenomics, and how they differ.


(Hint: Hamming distance alone is not a process, it's a tool that can be used for sequence comparisons in either process)

MSA/SuperMSA is an example gone over for phylogenetics whereas feature-frequency profile is the example gone over for phylogenomics.

MSA is "multiple sequence alignment" and involves aligning a gene or multiple genes between organisms. This is phylogenetics since it's looking and comparing a specific part of the genome. Feature-frequency profiling involves creating many combinations of 20 basepairs and seeing how many times they occur in the entirety of an organisms genome vs another. Since this is looking at the organisms whole genome, it is phylogenomics

500

Describe how the use of haplotype blocks can be problematic with services like 23 and me. Why isn't it fullproof?

The inferences of other SNPs based on the tag SNP are often regionally biased. It may be true for a group from European descent that a certain base at the tagSNP ensures the other SNPs will be something predictable, but that same tagSNP in another regional group may come with a different combination of SNPs at the other positions.

M
e
n
u