What is one main requirement for ANY DNA marker used in constructing genetic maps?
Must be polymorphic
Which of the following sequencing technologies does NOT require a library preparation step? Pyrosequencing, Illumina sequencing, Solid applied biosystems, Sanger sequencing
Sanger sequencing
Explain what the Hidden Markov Model (HMM) is.
Statistical method to determine hidden states (in our case this may be whether something is an exon, intron, splice site, etc.) based on given observations (patterns in the DNA)
True or false: unlike many other organisms with a more random distribution of gene-rich areas, the human genome is relatively uniform, with genes evenly spaced throughout.
False (it’s the other way around)
Which method discussed in class would be best suited for analyzing unique gene expression patterns of individual cells?
10X genomics (Chromium)
What are RAPDs, and how are they detected?
Random amplified polymorphic DNAs. Generate primer population and use pairs on multiple individuals. Run amplified fragments on a gel and compare where bands differ between individuals.
(Pros - requires low gDNA, few primer sets generate many markers
Cons - does not determine marker sequence (only presence/absence), can be difficult to reproduce)
Explain the difference between FASTA and FASTQ format.
Both contain read sequences, but FASTQ also contains their corresponding quality scores
What are two shortcomings of EST (expressed sequence tag) sequencing?
Strong bias for abundant mRNA, mis-priming (incorrect 3’ ends), premature stop of the reverse transcriptase (incorrect 5’ ends)
Fill in the blanks: Less than 2% of the human genome codes for _____. _______ sequences make up at least 50% of the human genome. The human genome has a _______ (greater/smaller) portion of repeat sequences than the worm and the fly.
Proteins, repeated, greater
Dot matrix analysis is a _______ (graphical, heuristic, dynamic-programming) method of sequence alignment.
Graphical
Name one benefit of using YACs over BACs, and one disadvantage.
Can carry larger DNA fragments, but is more prone to rearrangements
Explain MGI technology sequencing.
Short-read sequencing that uses rolling circle amplification to generate DNA nanoballs (DNBs) which are then placed on a chip surface with one DNB per binding site. cPAS (combinatorial probe-anchor synthesis) sequencing is then used to determine sequence of each nanoball.
Describe one way to directly identify histone-bound DNA across the genome and explain the methods involved.
Possible answers:
ChIP-seq: DNA and proteins are crosslinked, then sheared to produce small DNA fragments, histone-bound DNA are immunoprecipitated, DNA is released from proteins and sequenced.
CUT&RUN: Cells are permeabilized, and an antibody binds to target histones or chromatin proteins. Protein A/G-MNase is recruited and selectively cleaves bound DNA, which is released, purified, and sequenced.
How were genetic maps used in the HGP?
In the initial stages of the HGP, linkage mapping was used to determine the relative position of genes. These were constructed by studying how frequently certain genetic markers were inherited together and were based on recombination frequencies. This let them break down the genome into more manageable segments.
What are some examples of experimental designs besides twin studies that can be used to quantify the genetic component of complex diseases?
Adoptee studies, familial clustering, studies of populations with similar genetics but different environments (such as the Pima tribe)
List 3 factors that affect the number of SNPs BETWEEN species.
Population size (large = retains more diversity, small = susceptible to drift)
Mating system
Migration
For AVITI sequencing, answer these questions:
Does it rely on single molecular sequencing or clonal amplification?
What type of clonal application if applicable?
What kind of sequencing is used (by ligation, by synthesis, etc)
1. Clonal amplification
2. Rolling cycle
3. “Sequencing by avidity”
What is the goal of ENCODE? Name two methods they used to achieve this and the specific purposes they served.
Goal: Delineate all functional elements in the human genome.
Possible answers for methods:
DNase-seq + FAIRE-seq + ATAC-seq: used to identify open chromatin
ChIP-seq: used to find protein-DNA interactions
RNA-seq: used to find transcribed protein-coding and non-coding RNAs
RRBS: DNA methylation
5C: long-range interactions
Name 2 goals of the HGP besides mapping/sequencing the human genome.
Possible answers: Improve the cost/time of sequencing, address ethical and social considerations, advance other applications such as medical fields
What is CAP biotinylation and how can it be used in genome annotation?
Process of tagging the 5’ cap of RNA molecules with a biotin group, allowing for selective capture. It can be used in RNA-seq to specifically enrich mRNA (or other capped RNA species) and reduce background from other RNAs.
Explain 2 core differences between the clone contig approach and the directed shotgun approach.
Possible answers -
Clone contig: fragmented into larger pieces (100 - 200 kb), cloned into vectors such as BACs or YACs, large clones are mapped to create a physical map, each clone individually undergoes shotgun sequencing and then undergoes full assembly
Directed shotgun: randomly fragmented into smaller pieces (500 - 1500 bp), fragments are directly sequenced with high-throughput methods, then full assembly
Explain 2 core differences between shotgun sequencing and map-based sequencing.
Possible answers -
Shotgun: fragmented into smaller pieces (500 bp - 10 kb), sequenced without prior mapping, assembled using overlapping regions, used in WGS
Map-based: fragmented into larger pieces (100 - 300 kb), cloned into BACs that are mapped and ordered before sequencing, sequences are determined for each clone before assembling entire sequence based on physical map
Describe 2 ways to identify accessible chromatin across the genome and explain the methods involved in both.
ATAC-seq: Tn5 transposase performs tagmentation which inserts sequencing adapters in accessible DNA, tagged DNA is amplified and sequenced.
DNase-seq: DNase I cuts accessible DNA, digested fragments are purified and adapters are added, DNA is sequenced.
Explain in depth how the human genome was sequenced.
Used a hierarchical shotgun approach:
BAC library construction: genome was fragmented into large pieces, inserted into bacterial artificial chromosomes (BAC), and these plasmids were introduced into bacteria
Organization of contigs: BAC clones were mapped relative to each other using markers
Shotgun cloning and sequencing: each BAC was sheared into smaller fragments, cloned into plasmids, and sequenced with Sanger sequencing
Assembly: individual shotgun sequences were assembled bioinformatically
What is primer walking used for? Explain the steps involved in this method.
Technique usually used for sequencing long regions (in class it was shown that it was used to sequence cDNA plasmids).
Sanger sequence initial region
Design new primers based on sequenced region
Repeat until entire plasmid is read