Acronyms!
A-W-S
Two word answers
Amazon
Misc.
100

ETL

Extract, Transform, Load



100

AWS

Amazon web services 

100

DataBricks 

open-source tool that builds off of Spark's capabilities and adds machine learning/data science capabilities

100

Is referred to as Amazon's Data Warehouse



Amazon Redshift


100

This breaks code from language understandable by people (Java, Python) into something a computer can ready (binary) 

 Compiling

200

EC2


Amazon Elastic Compute Cloud


(EC2 is a service managed by Amazon that allows a flexible pricing structure based on the end user's needs (eg, rather than owning an entire server )



200

coordinates data processing workflows is 

AWS Step Functions


200

Apache Spark


Spark takes instructions written in different coding languages and uses them to process data.

200

service that helps you prepare data for use, especially by automating data quality tasks (identifying inconsistencies, cataloging data, suggesting transformations, etc.) is known as 

Amazon Glue


200

Jenkins

Helps build, test, and deploy by automating repetative processes such as compiling and regression testing. It also notifies dev teams if something goes wrong in the deployment process

300

VM and what it allows 

A Virtual Machine refers to an app running on a Guest OS that runs on a hypervisor. This allows multiple operating systems to run on one piece of hardware.

300

AWS Datasync is


Data transfer service that makes it easy to automate moving data between on-prem storage and AWS




300

Open-source tool that helps by creating to-do lists, task automation, scheduling, and more is known as

Apache Airflow


300

big data platform that uses popular frameworks like Hadoop, Spark, Presto is also known as 

Amazon EMR


300

collects/extracts/ingests data from multiple sources in real time for Spark to digest/transform/load is known as 

Kafka


400

AWS Glue Crawler


Automatically discovers and catalogs metadata



400

real-time streaming capabilities, allowing data engineers to ingest, process, analyze streaming data is 

AWS Kinesis


400

Open-souce tool that helps by creating to-do lists, task automation, scheduling, and more

Apache Airflow


400

what is AWS Lambda


a serverless computing platform that enables developers to run code without provisioning servers.

400

This is uses code to manitpulate infrastructure; allows engineers to allocate infra resources 

Terraform

500

AWS provides serverless computing where AWS manages the infrastructure so that developers don't have to maintain infrastructure when using AWS.

Serverless computing


500

primarily for ML but can also does pre-processing/feature engineering in data science workflows. builds models, can use pre-built algorithms, write custom algorithms is known as 

AWS Sagemaker


500

Automatically builds (build - combination of separate parts of code) and tests changes to the code is known as 

Continuous integration


500

simplifies the the process of cleaning and preparing data. Can be used in conjunction with Glue or as a standalone is also known as 

AWS DataBrew


500

 Cloud-based project management software, this program is used across multiple pieces of hardware and can handle large sets of data.

Hive 

M
e
n
u