Genomics Tools & Methods
Data Analysis & Visualization
Machine Learning & Bioinformatics
Cancer Genomics
Workflow, Quality Control, & Reproducibility
100

A commonly used programming language for bioinformatics and data analysis.

What is Python (or R)?

100

A dimensionality reduction technique that transforms high-dimensional data into a lower-dimensional space.

What is principal component analysis (PCA)?

100

An unsupervised machine learning algorithm that groups data points into a predefined number of clusters.

What is K-means clustering?

100

A condition in which cells begin to grow uncontrollably due to genetic mutations that affect normal cell division and repair.

What is cancer?

100

A high-performance computing cluster at UF, used for large-scale simulations, data analysis, and computational research across various disciplines.

What is HiPerGator?

200

A framework that can produce raw gene expression counts. It also streamlines cross-species tumor analyses.

What is FREYA (FRamework for Expression AnalYsis Across species)?

200

A statistical method used to correct for unwanted technical variation in high-throughput sequencing data, ensuring biological differences are accurately detected.

What is batch correction?

200

A type of machine learning that uses labeled data to learn to classify new data, commonly used in cancer genomics for classifying tumor types based on gene expression.

What is Supervised Learning?

200

A widely used database containing multi-omics data for various human cancers.

What is The Cancer Genome Atlas (TCGA)?

200

A platform that combines version control with collaborative code sharing, allowing teams to manage code and track issues.

What is GitHub?

300

A standard file format used to store genomic data.

What is FASTA?

300

A type of plot commonly used to visualize outliers, where data points outside the whiskers indicate potential outliers.

What is a box plot?

300

A type of machine learning technique where the model is trained to predict continuous values, often used in genomics to predict gene expression levels based on genetic features.

What is Regression?

300

A technique commonly used in genomics to compare gene expression levels between different experimental conditions or treatment groups, often applied to identify genes that are upregulated or downregulated in cancer.

What is Differential Expression Analysis?

300

A shell scripting language that allows you to automate tasks in Unix-like operating systems.

What is Bash?

400

A widely used tool for batch correction.

What is ComBat (or ComBat-Seq)?

400

A type of scatterplot that shows statistical significance (P-value) versus magnitude of change (log fold change).

What is a volcano plot?

400

A deep learning approach originally designed for image processing but also effective for analyzing genomic sequences, often used for motif discovery and variant effect prediction.

What is a Convolutional Neural Network (CNN)?

400

The process by which cancer cells spread from the original (primary) site to other parts of the body.

What is metastasis?

400

A file that defines a set of rules and dependencies, typically executed with the 'make' command to automate tasks.

What is a Makefile?

500

An ML approach for generating ancestry unbiased genetic signatures.

What is PhyloFrame?

500

A statistical package commonly used to analyze differential gene expression from RNA-Seq data, adjusting for factors like sample size and sequencing depth.

What is DESeq2?

500

This is the main algorithm behind BLAST sequence alignment.

What is the Smith-Waterman algorithm (local alignment)?

500

A tumor suppressor gene that plays a critical role in regulating the cell cycle and preventing cancer. Mutations in this gene can increase the risk of various cancers, including breast, lung, and colon cancer.

What is TP53?

500

A Python-based tool that allows you to define computational workflows as a series of rules, commonly used for data processing pipelines.

What is Snakemake?

M
e
n
u