Unix Commands
Central Dogma of Molecular Biology
HPC Basics
Sequencing
Familiar Software
100

This command prints to your console your current location/directory.

What is pwd?

100

This nucleotide pairs with G/guanine in DNA.

What is C/cytosine?

100

This key on the keyboard can be used to autocomplete the names of files/directories.

What is TAB?

100

This file contains header lines that start with > plus lines with sequences.

What is a FASTA file?

100

This software has both web-based and command-line versions, and allows you to quickly search databases for similarity to a given sequence using short seed alignments that get extended and scored.

What is BLAST?

200

This command is used both to move and rename files or directories.

What is mv?

200

This is transcribed from DNA.

What is RNA?

200

These are strings or integers that you include after a command to set different parameters, filenames, or output types, among other things.

What are options/flags/arguments?

200

This file contains sequence data including quality information.

What is a FASTQ file?

200

This software searches for adapters and checks quality of sequences, trimming any low quality or adapter-contaminated reads.

What is fastp?

300

This command is used to print the number of lines, words, or characters of a file

What is wc?

300

This nucleotide is found in RNA but not DNA, and pairs with Adenine.

What is Uracil?

300

The name of the nodes on an HPC that you are connected to when you first log in. These are connected to the internet but shouldn’t have intense jobs run on them.

What are login/head nodes?

300

This process, which involves the use of DNA polymerase to amplify a specific target sequence based on the annealing of primers, was invented by Kary Mullis and Michael Smith, who shared the 1993 Nobel Prize in Chemistry.

What is PCR?

300

This software allows you to run packages and apps that are stored in containers so that you do not need to download/install them yourself.

What is singularity?

400

This command appends whatever comes before it to the end of a file.

What is >> ?

400

These are regions of a gene that contain codons which get translated into proteins in eukaryotic organisms.

What are exons?

400

This command shows you all running jobs in the scheduler, along with their user, current state, and how long they’ve been running.

What is squeue?

400

This technology uses sequencing by synthesis to incorporate fluorescently-labelled terminator nucleotides followed by gel/capillary electrophoresis to separate fragments based on size, outputting a chromatogram with base calls at each position of the sequence.

What is Sanger sequencing?

400

This is the name of a text editing software present on clusters that we have used for writing scripts on MCC.

What is nano?

500

This command searches for a regular expression (i.e., a  pattern) in a file (or in text piped to it) and prints matching lines (by default).

What is grep?

500

These are the proteins that DNA wraps around to form nucleosomes within chromatin.

What are histones?

500

MCC has 128 of these, each of which has its own set of processors/cores and memory, which we submit jobs to for processing large datasets or using software.


What are nodes/compute nodes?

500

This sequencing platform/company dominates next-gen sequencing of shorter reads, and is commonly used for RNAseq and genome resequencing.

What is Illumina?

500

This is the name of a job scheduling program that allocates resources for jobs, submits jobs to other parts of the cluster, and let’s you monitor jobs as they run.

What is SLURM?

M
e
n
u