This command prints to your console your current location/directory.
What is pwd?
This nucleotide pairs with G/guanine in DNA.
What is C/cytosine?
This key on the keyboard can be used to autocomplete the names of files/directories.
What is TAB?
This file contains header lines that start with > plus lines with sequences.
What is a FASTA file?
This software has both web-based and command-line versions, and allows you to quickly search databases for similarity to a given sequence using short seed alignments that get extended and scored.
What is BLAST?
This command is used both to move and rename files or directories.
What is mv?
This is transcribed from DNA.
What is RNA?
These are strings or integers that you include after a command to set different parameters, filenames, or output types, among other things.
What are options/flags/arguments?
This file contains sequence data including quality information.
What is a FASTQ file?
This software searches for adapters and checks quality of sequences, trimming any low quality or adapter-contaminated reads.
What is fastp?
This command is used to print the number of lines, words, or characters of a file
What is wc?
This nucleotide is found in RNA but not DNA, and pairs with Adenine.
What is Uracil?
The name of the nodes on an HPC that you are connected to when you first log in. These are connected to the internet but shouldn’t have intense jobs run on them.
What are login/head nodes?
This process, which involves the use of DNA polymerase to amplify a specific target sequence based on the annealing of primers, was invented by Kary Mullis and Michael Smith, who shared the 1993 Nobel Prize in Chemistry.
What is PCR?
This software allows you to run packages and apps that are stored in containers so that you do not need to download/install them yourself.
What is singularity?
This command appends whatever comes before it to the end of a file.
What is >> ?
These are regions of a gene that contain codons which get translated into proteins in eukaryotic organisms.
What are exons?
This command shows you all running jobs in the scheduler, along with their user, current state, and how long they’ve been running.
What is squeue?
This technology uses sequencing by synthesis to incorporate fluorescently-labelled terminator nucleotides followed by gel/capillary electrophoresis to separate fragments based on size, outputting a chromatogram with base calls at each position of the sequence.
What is Sanger sequencing?
This is the name of a text editing software present on clusters that we have used for writing scripts on MCC.
What is nano?
This command searches for a regular expression (i.e., a pattern) in a file (or in text piped to it) and prints matching lines (by default).
What is grep?
These are the proteins that DNA wraps around to form nucleosomes within chromatin.
What are histones?
MCC has 128 of these, each of which has its own set of processors/cores and memory, which we submit jobs to for processing large datasets or using software.
What are nodes/compute nodes?
This sequencing platform/company dominates next-gen sequencing of shorter reads, and is commonly used for RNAseq and genome resequencing.
What is Illumina?
This is the name of a job scheduling program that allocates resources for jobs, submits jobs to other parts of the cluster, and let’s you monitor jobs as they run.
What is SLURM?