Mapping genomes/genes
Sequencing
Genome annotation
Human genome
Random
100

What is one main requirement for ANY DNA marker used in constructing genetic maps?

Must be polymorphic

100

Which of the following sequencing technologies does NOT require a library preparation step? Pyrosequencing, Illumina sequencing, Solid applied biosystems, Sanger sequencing

Sanger sequencing

100

Explain what the Hidden Markov Model (HMM) is.

Statistical method to determine hidden states (in our case this may be whether something is an exon, intron, splice site, etc.) based on given observations (patterns in the DNA)

100

True or false: unlike many other organisms with a more random distribution of gene-rich areas, the human genome is relatively uniform, with genes evenly spaced throughout.

False (it’s the other way around)

100

Which method discussed in class would be best suited for analyzing unique gene expression patterns of individual cells?

10X genomics (Chromium)

200

What are RAPDs, and how are they detected?

Random amplified polymorphic DNAs. Generate primer population and use pairs on multiple individuals. Run amplified fragments on a gel and compare where bands differ between individuals. 

(Pros - requires low gDNA, few primer sets generate many markers

Cons - does not determine marker sequence (only presence/absence), can be difficult to reproduce)

200

Explain the difference between FASTA and FASTQ format.

Both contain read sequences, but FASTQ also contains their corresponding quality scores

200

What are two shortcomings of EST (expressed sequence tag) sequencing?

Strong bias for abundant mRNA, mis-priming (incorrect 3’ ends), premature stop of the reverse transcriptase (incorrect 5’ ends)

200

Fill in the blanks: Less than 2% of the human genome codes for _____. _______ sequences make up at least 50% of the human genome. The human genome has a _______ (greater/smaller) portion of repeat sequences than the worm and the fly.

Proteins, repeated, greater

200

Dot matrix analysis is a _______ (graphical, heuristic, dynamic-programming) method of sequence alignment.

Graphical

300

Name one benefit of using YACs over BACs, and one disadvantage.

Can carry larger DNA fragments, but is more prone to rearrangements

300

Explain MGI technology sequencing.

Short-read sequencing that uses rolling circle amplification to generate DNA nanoballs (DNBs) which are then placed on a chip surface with one DNB per binding site. cPAS (combinatorial probe-anchor synthesis) sequencing is then used to determine sequence of each nanoball.

300

Describe one way to directly identify histone-bound DNA across the genome and explain the methods involved.

Possible answers: 

ChIP-seq: DNA and proteins are crosslinked, then sheared to produce small DNA fragments, histone-bound DNA are immunoprecipitated, DNA is released from proteins and sequenced.

CUT&RUN: Cells are permeabilized, and an antibody binds to target histones or chromatin proteins. Protein A/G-MNase is recruited and selectively cleaves bound DNA, which is released, purified, and sequenced.

300

How were genetic maps used in the HGP?

In the initial stages of the HGP, linkage mapping was used to determine the relative position of genes. These were constructed by studying how frequently certain genetic markers were inherited together and were based on recombination frequencies. This let them break down the genome into more manageable segments.

300

What are some examples of experimental designs besides twin studies that can be used to quantify the genetic component of complex diseases?

Adoptee studies, familial clustering, studies of populations with similar genetics but different environments (such as the Pima tribe)

400

List 3 factors that affect the number of SNPs BETWEEN species.

Population size (large = retains more diversity, small = susceptible to drift)

Mating system

Migration

400

For AVITI sequencing, answer these questions:

  1. Does it rely on single molecular sequencing or clonal amplification? 

  2. What type of clonal application if applicable?

  3. What kind of sequencing is used (by ligation, by synthesis, etc)

1. Clonal amplification

2. Rolling cycle

3. “Sequencing by avidity”

400

What is the goal of ENCODE? Name two methods they used to achieve this and the specific purposes they served.

Goal: Delineate all functional elements in the human genome.

Possible answers for methods:

DNase-seq + FAIRE-seq + ATAC-seq: used to identify open chromatin

ChIP-seq: used to find protein-DNA interactions

RNA-seq: used to find transcribed protein-coding and non-coding RNAs

RRBS: DNA methylation

5C: long-range interactions  

400

Name 2 goals of the HGP besides mapping/sequencing the human genome.

Possible answers: Improve the cost/time of sequencing, address ethical and social considerations, advance other applications such as medical fields

400

What is CAP biotinylation and how can it be used in genome annotation?

Process of tagging the 5’ cap of RNA molecules with a biotin group, allowing for selective capture. It can be used in RNA-seq to specifically enrich mRNA (or other capped RNA species) and reduce background from other RNAs.

500

Explain 2 core differences between the clone contig approach and the directed shotgun approach.

Possible answers -

Clone contig: fragmented into larger pieces (100 - 200 kb), cloned into vectors such as BACs or YACs, large clones are mapped to create a physical map, each clone individually undergoes shotgun sequencing and then undergoes full assembly

Directed shotgun: randomly fragmented into smaller pieces (500 - 1500 bp), fragments are directly sequenced with high-throughput methods, then full assembly

500

Explain 2 core differences between shotgun sequencing and map-based sequencing.

Possible answers -
Shotgun: fragmented into smaller pieces (500 bp - 10 kb), sequenced without prior mapping, assembled using overlapping regions, used in WGS

Map-based: fragmented into larger pieces (100 - 300 kb), cloned into BACs that are mapped and ordered before sequencing, sequences are determined for each clone before assembling entire sequence based on physical map

500

Describe 2 ways to identify accessible chromatin across the genome and explain the methods involved in both.

ATAC-seq: Tn5 transposase performs tagmentation which inserts sequencing adapters in accessible DNA, tagged DNA is amplified and sequenced.

DNase-seq: DNase I cuts accessible DNA, digested fragments are purified and adapters are added, DNA is sequenced.

500

Explain in depth how the human genome was sequenced.

Used a hierarchical shotgun approach:

  1. BAC library construction: genome was fragmented into large pieces, inserted into bacterial artificial chromosomes (BAC), and these plasmids were introduced into bacteria

  2. Organization of contigs: BAC clones were mapped relative to each other using markers 

  3. Shotgun cloning and sequencing: each BAC was sheared into smaller fragments, cloned into plasmids, and sequenced with Sanger sequencing

  4. Assembly: individual shotgun sequences were assembled bioinformatically 

500

What is primer walking used for? Explain the steps involved in this method.

Technique usually used for sequencing long regions (in class it was shown that it was used to sequence cDNA plasmids).

  1. Sanger sequence initial region

  2. Design new primers based on sequenced region

  3. Repeat until entire plasmid is read