Raw, unprocessed facts
What is data?
Data that contains errors, inconsistencies, missing values, or duplicates.
What is dirty data?
Data that is highly organized and easily searchable, typically stored in tables or databases.
What is structured data?
This concept of computational thinking involves focusing only on the important details of a problem while ignoring irrelevant information.
What is abstraction?
The process of cleaning, transforming, and preparing raw data for analysis.
What is data pre-processing?
Data that has been processed and given meaning.
What is information?
Imputation (filling in missing values using mean, median, etc.) or removing incomplete records if necessary.
What are ways to handle missing data?
Examples of this type of data include emails, social media posts, and audio recordings, which lack a fixed format.
What is unstructured data?
In computational thinking, this concept allows computers to perform repetitive tasks efficiently without human intervention.
What is automation?
This process ensures that the data is accurate, consistent, and free of errors before analysis.
A database of student information with fields like, age, date of enrollment, GPA is an example of this.
What is structured data?
This makes data difficult to process, requiring standardization (e.g., "NY" vs. "New York").
What is inconsistent formatting in data analysis?
This type of database is commonly used to store structured data, ensuring efficient querying and organization.
What is a relational database (SQL database)?
This process involves evaluating a solution to ensure it works as intended and identifying areas for improvement.
What is analysis?
Bar charts, pie charts, line graphs, or scatter plots are examples of this.
What is data visualizations?
This type of data is harder to analyze since it does not have a predefined format, requiring additional processing (such as NLP).
What is unstructured data?
This principle shows a relation between two variables, but does not imply that one caused the other.
What is correlation?
Compared to structured data, unstructured data requires these techniques to analyze text, images, or videos effectively.
What are Natural Language Processing (NLP), image recognition, and machine learning?
When designing a self-driving car, engineers simplify real-world road conditions into key elements such as traffic signals, lane markings, and speed limits. This is an example of which computational thinking concept?
What is abstraction?
This type of variable contains countable values.
What is discrete?
This type of file utilizes tabs to separate the variables.
What is TSV (or Tab-Separated Values)?
This function allows you to connect Google Colab to Google Drive, giving access to stored files.
What drive.mount('/content/drive')?
One major challenge of working with unstructured data is that it often requires this additional step before it can be analyzed like structured data.
What is data wrangling (or preprocessing/munging)?
A company develops a chatbot to handle customer service inquiries. To ensure the chatbot responds correctly, they test different scenarios and refine the responses based on performance data. Which computational thinking concept is being applied?
What is analysis?
This is the first and most important step in any data science project, ensuring that the right problem is being solved.
What is defining the problem?