Acron
Def.
Two words
The A-W-S
The Amazon
100

ETL

Extract, Transform, Load

100

Refer to groups or collections of data points that share similar characteristics or properties. is also known as a 

Cluster 

100

Automatically builds (build - combination of separate parts of code) and tests changes to the code is known as this two word answer 

Continuous integration


100

AWS

Amazon Web Services
100

This is referred to as Amazon's Data Warehouse



Amazon Redshift


200

AI

broader concept that involves creating machines or systems capable of performing tasks that typically require human intelligence, including reasoning, problem-solving, understanding natural language, and more.

200

It provides information about other data, describing various aspects such as the content, format, location, and characteristics of the data is also known as...

Metadata

200

What is "Apache Airflow"

an Open-source tool that helps by creating to-do lists, task automation, scheduling, and more 

200

 AWS Datasync


transfer large amounts of data between on-premises storage systems and AWS storage services quickly, securely, and efficiently

200

Amazon Glue


service that helps you prepare data for use, especially by automating data quality tasks (identifying inconsistencies, cataloging data, suggesting transformations, etc.). It does not actually build the data pipelines, but aids in the transformation and data quality in preparation for its use.



300

ML

subset of Artificial Intelligence (AI) that focuses on systems learning from data to improve their performance on a specific task without being explicitly programmed.

300

this is a blueprint that organizes and defines how data is stored and accessed in a database 

Schema

300

this is an opensource tool that can use multiple coding languages to digest and process data

Apache Spark


300

primarily for ML but can also does pre-processing/feature engineering in data science workflows. builds models, can use pre-built algorithms, write custom algorithms. Helps you train your models on large datasets. Only prepared data can be used in sagemaker is also known as 

AWS Sagemaker


300

Process and analyze large amounts of data using popular open-source tools like Apache Spark, Hadoop, and others, by offering a managed environment that simplifies the setup, scaling, and maintenance of clusters for big data processing and analytics.

Amazon EMR


400

What does VM mean/refer 

A Virtual Machine refers to an app running on a Guest OS that runs on a hypervisor. This allows multiple operating systems to run on one piece of hardware. VMware is the basis for cloud computing

400

This is a system or process to handle increasing amounts of data or a growing number of users without sacrificing performance known as...

Scalable

400

serverless computing 

 enables developers to build applications faster by eliminating the need for them to manage infrastructure

400

Used to create serverless ETL processes in response to events and automate data-related tasks

 AWS Lambda


400

 Amazon S3


Simple storage solution: scalable storage solution that uses buckets to store structured and unstructured data. Useful for organization, versioning, access control, and lifecycle management

500

What is EC2

EC2 is a service managed by Amazon that allows a flexible pricing structure based on the end user's needs (eg, rather than owning an entire server and paying for upkeep/power usage, just paying for the parts that they need).

500

Allows multiple operating systems to run on a single physical computer at the same time by virtually separating and managing the computer's resources like CPU, memory, and storage. is also known as?

Hypervisor

500

an open-source tool that builds off of Spark's capabilities and adds machine learning/data science capabilities

Databricks 

500

simplifies the process of cleaning and preparing data. Can be used in conjunction with Glue or as a standalone is known as 

AWS DataBrew


500

Automatically discovers and catalogs metadata



AWS Glue Crawler