Understanding Data
Data Cleaning and Analysis
Structured vs. Unstructured Data
Computational
Thinking
Data Science Basics
100

Raw, unprocessed facts

What is data?

100

Data that contains errors, inconsistencies, missing values, or duplicates.

What is dirty data?

100

Data that is highly organized and easily searchable, typically stored in tables or databases.

What is structured data?

100

This concept of computational thinking involves focusing only on the important details of a problem while ignoring irrelevant information.

What is abstraction? 

100

The process of cleaning, transforming, and preparing raw data for analysis.

What is data pre-processing?

200

Data that has been processed and given meaning.

What is information?

200

Imputation (filling in missing values using mean, median, etc.) or removing incomplete records if necessary.

What are ways to handle missing data?

200

Examples of this type of data include emails, social media posts, and audio recordings, which lack a fixed format.

What is unstructured data?

200

In computational thinking, this concept allows computers to perform repetitive tasks efficiently without human intervention. 

What is automation?

200

This process ensures that the data is accurate, consistent, and free of errors before analysis.

What is data cleaning?
300

A database of student information with fields like, age, date of enrollment, GPA is an example of this. 

What is structured data?

300

This makes data difficult to process, requiring standardization (e.g., "NY" vs. "New York").

What is  inconsistent formatting in data analysis?

300

This type of database is commonly used to store structured data, ensuring efficient querying and organization.

What is a relational database (SQL database)?

300

This process involves evaluating a solution to ensure it works as intended and identifying areas for improvement.

What is analysis?

300

Bar charts, pie charts, line graphs, or scatter plots are examples of this. 

What is data visualizations?

400

This type of data is harder to analyze since it does not have a predefined format, requiring additional processing (such as NLP).

What is unstructured data?

400

This principle shows a relation between two variables, but does not imply that one caused the other. 

What is correlation?

400

Compared to structured data, unstructured data requires these techniques to analyze text, images, or videos effectively.

What are Natural Language Processing (NLP), image recognition, and machine learning?

400

When designing a self-driving car, engineers simplify real-world road conditions into key elements such as traffic signals, lane markings, and speed limits. This is an example of which computational thinking concept?

What is abstraction?

400

This type of variable contains countable values.

What is discrete?

500

This type of file utilizes tabs to separate the variables. 

What is TSV (or Tab-Separated Values)?

500

This function allows you to connect Google Colab to Google Drive, giving access to stored files. 

What drive.mount('/content/drive')?  

500

One major challenge of working with unstructured data is that it often requires this additional step before it can be analyzed like structured data.

What is data wrangling (or preprocessing/munging)?

500

A company develops a chatbot to handle customer service inquiries. To ensure the chatbot responds correctly, they test different scenarios and refine the responses based on performance data. Which computational thinking concept is being applied?

What is analysis?

500

This is the first and most important step in any data science project, ensuring that the right problem is being solved.

What is defining the problem?