Define a query
What is a query is how you retrieve information from a database?
This is a tool for developing and executing a wide range of data processing patterns on very large datasets (e.g. performing the transformations described in ETL)
What is Cloud Dataflow?
Pub/Sub is ...
What is a service to help customers capture data and rapidly pass massive amounts of messages between other GCP big data tools and other software applications with world-class security?
Hadoop is ...
What is a set of tools and technologies which enables a cluster of computers to store and process large volumes of data?
Define ML
What is a branch of computer science that is focused on enabling computers to recognize patterns in data - without humans telling the computer how to recognize the patterns?
DOUBLE JEOPARDY!*
Typically, queries are written in this language
DJ* IF you can tell me what SQL stands for
What is SQL?
What is structured query language?
DOUBLE JEOPARDY!*
These five products (plus some) available on GCP can allow Dataflow to read or write to them
*DJ if you can tell me why it's ideal
What are Google Cloud Storage, BigQuery, BigTable, Spanner, Firestore, etc?
What is it is ideal for building data pipelines that read from multiple data sources, process them, and write processed output to the final destination?
Pub/Sub stands for ...
What is publish/subscribe?
DOUBLE JEOPARDY!
Define Dataproc
What is a fully managed service for running Apache Spark and Apache Hadoop clusters in a simpler, more cost-efficient way?
DOUBLE JEOPARDY!*
Name the different dataset types
Explain what each dataset does
What is training, validation and testing?
This is how BigQuery works/the details about the product
What is Google's fully managed, petabyte-scale, low cost analytics data warehouse. BigQuery is serverless, there is no infrastructure to manage, and you don't need a database administrator, so you can focus on analyzing data to find meaningful insights, uses familiar SQL, and take advantage of our pay-as-you-go model?
True or false; Dataflow cannot handle an unbounded or "infinite" dataset streaming in from a continuously updating source (like Pub/Sub).
What is false?
List the business and technical value props of Pub/Sub
DJ* IF you can tell the differences between them
Business: What is availability, thoroughput, and latency?
Technical: What is sources, sinks and transforms?
Why can customers run their big data jobs more efficiently?
What is billing per second, quick spin up of resources and no more underutilized clusters?
This was developed by Google and has become the leading open source tool for building ML models
What is TensorFlow?
DOUBLE JEOPARDY!*
These are the five high-level value propositions
*DJ IF you can explain in your own words what each of these props means
What is speed, scale, and agility, enterprise ready, managed services & cutting edge technology, new data is available instantly & ad hoc queries?
DOUBLE JEOPARDY!*
Explain what batch and streaming data are
Streaming: What is endless incoming data?
Define Cloud IoT
What is a set of fully managed and integrated services that allow customers to easily and securily connect, manage and ingest data from devices across the globe at a large scale, process and analyze/visualize that data in real-time, and then act on it for greater operational efficiency?
What is MapReduce, Pig, Hive, and Spark?
These are the six MLaaS offerings
What is TensorFlow, Speech API, Vision API, Natural Language API, Translation API and Jobs API?
DOUBLE JEOPARDY!*
Name the four technical value props of BigQuery
*DJ IF you can explain in your own words how we address each prop
What is data warehouse, centralize data for machine learning, big data and multiple data marts?
Why is Dataflow found in the process phase
What is dataflow is a fully managed service for transforming and enriching data in stream (real time) and batch (historical) modes with equal reliability and expressiveness?
DOUBLE JEOPARDY!*
List the business and technical value props of IoT
*DJ IF you can name how we address each prop
Business: What is reduce risk, optimize costs and grow?
Technical: What is securely connecting things, scaling big data and actionable insights and machine learning?
DOUBLE JEOPARDY!*
Name the business challenges and the technical challenges with Dataproc
Business: What is cost-effectiveness, spend and ease of use?
Technical: What is idle clusters, scaling inflexibility and high CPU cores and GPUs?
DOUBLE JEOPARDY!*
Name the four key products of ML
DJ If, you can tell me what do each of them do
What is Cloud AutoML, Cloud TPU, Cloud Machine Learning Engine and Dialogflow Enterprise Edition?
Cloud AutoML: Train custom ML models
Cloud TPUs: Hardware optimized for ML
Cloud ML Engine: Large-scale ML service
DialogFlow: Create conversational experiences across devices and platforms