Formats
Sequencing
Databases/Projects
-seq applications
Misc
100
A text file format for NGS reads that contains both the DNA sequence and quality information about each base. Each sequence read is represented as a header line with a unique identifier for each sequence read and a line of DNA bases represented as text (GATC), which is very similar to the FASTA format. A second pair of lines is also present for each read, another header line and then a line with a string of ASCII symbols, equal in length to the number of bases in the read, which encode the PHRED quality score for each base.
What is FASTQ format?
100
An experimental design that uses a bar coding scheme to tag individual samples with different adapter sequences, mix them together in a single sequencing library and then use bioinformatics methods to sort out the samples by identifying the different bar codes in the output data file.
What is multiplexing?
100
This Kyoto Encyclopedia of genes and genomes, is a collection of databases dealing with genomes, biological pathways, diseases, drugs, and chemical substances.
What is KEGG?
100
NGS application for the study of gene expression.
What is RNA-seq?
100
A consensus sequence for all of the DNA in the genome of a species of organism.
What is the reference genome?
200
This format is used in the UCSC Genome Browser as a method to store information for display in a compact way.
What is WIG/Wiggle format?
200
This program has become the standard method for quality assessment of FASTQ files (or SAM/BAM files).
What is FASTQC?
200
This is a major bioinformatics initiative/database that unifies the representation of gene and gene product attributes across all species in ontologies.
What is the Gene Ontology?
200
NGS application for the study of TF binding sites and Histones Modifications.
What is ChIP-seq?
200
The number of sequence reads in a sequencing project that align to positions that overlap a specific base on a target genome, or the average number of aligned reads that overlap all positions on the target genome.
What is coverage/sequencing depth?
300
This text file format is used in bioinformatics for storing gene sequence variations.
What is Variant Call Format (VCF)?
300
The NGS sequencing method developed by the Solexa company, then acquired by Illumina Inc. This method uses “sequencing by synthesis” chemistry to simultaneously sequence millions of ∼300-bp-long DNA template molecules.
What is Illumina Sequencing?
300
This is Illumina’s collection of reference sequences and annotation files for commonly analyzed organisms.
What is igenomes?
300
NGS application for the study of all exons in an organism.
What is exome-seq?
300
A computational tool, part of Tuxedo toolbox, used for RNA-seq transcript assembly.
What is Cufflinks?
400
Table format containing genome features with start and end positions with respect to a genomic reference sequence and some other basic genome specificities to which researchers can attach biological information.
What is GFF/GTF format?
400
A technology that obtains sequence reads from both ends of a DNA fragment template.
What is paired-end sequencing?
400
This project, launched in January 2008, was an international research effort to establish the most detailed catalogue of human genetic variation. Scientists planned to sequence the genomes of at least 1000 anonymous participants from a number of different ethnic groups. In 2010, the project finished its pilot phase, which was described in detail in a publication in the journal Nature. In 2012, the sequencing of 1092 genomes was announced in a Nature publication. In 2015, two papers in Nature reported results and the completion of the project and opportunities for future research.
What is the 1000 genomes project?
400
NGS pipeline used to identify mutations in DNA samples from individual patients or experimental organisms.
What is variant detection/variant calling?
400
A computational tool, part of Tuxedo toolbox, used for RNA-seq differential expression analysis.
What is Cuffdiff?
500
An extremely simple text file format that lists positions on a reference genome with respect to chromosome ID and start and stop positions. NGS reads can be represented in this format, but only with respect to their position on the reference genome; no information about sequence variants or base quality is stored in this file.
What is BED format?
500
When a DNA fragment is sequenced we obtain this, as a string of nucleotide bases (represented by the letter symbols G, A, T, C).
What is a sequence read?
500
This project begun in 2005. Its aim was to catalogue genetic mutations responsible for cancer, using genome sequencing and bioinformatics.
What is The Cancer Genome Atlas (TCGA)?
500
An algorithmic approach to find the best matching of consecutive letters in one sequence with another.
What is sequence alignment?
500
A normalization method for paired-end RNA-seq reads, that controls for different gene lengths and different library sizes.
What is FPKM?