Data Doesn’t Magically Appear!
It Worked on My Laptop!
Trust, But Verify!
From Notebook to Production!
AI: Use Your Powers Wisely!
100

If you’re using a dataset but can’t trace where it came from or how it was created, you’re missing this key concept about a dataset’s origin and history.

What is data lineage?

100

When a teammate accidentally overwrites your code changes and you lose a week of work and want to yell at them, this basic practice prevents tragedies like that by tracking history and enabling easy recovery.

What is version control (e.g., Git)?

100

When a colleague emails a dataset that includes customer information without protection, this fundamental concern is being violated.

What is data privacy/unsecured data sharing (PII)?

100

A model that once performed well now gets user complaints about reduced accuracy because real-world data or patterns have changed over time.

What is model drift?

100

When you feed company policies and procedures into an LLM so it can look them up and answer more accurately, you’re using this technique.

What is retrieval-augmented generation (RAG)?

200

At SCE, production model data should generally come from these kinds of approved, governed, trusted, and auditable sources—not ad hoc extracts.

What are systems of records? (e.g., governed data platforms like Snowflake/curated datasets, trusted upstream systems, and other documented, auditable sources)?

200

If your code works on your machine but fails for collaborators due to missing packages, including this file helps others recreate the same setup and protects your technical reputation.

What is an environment/dependency file?

200

If someone hard-codes credentials in GCP and an attacker later finds them and causes a security incident, this practice could have prevented it.

What is secrets management/secure credential storage (e.g., Secret Manager)?

200

A critical notebook-based model can only be run by one person, who is now on a well-deserved cruise with no reception. This capability is missing.

What is productionization/operationalization (or sufficient documentation)?

200

When an AI solution gives slightly different answers to the same question across repeated runs, that behavior is called this.

What is non-determinism?

300

If your model output depends on a dataset that changes over time and you didn’t snapshot it, you’ve created this risk for reproducibility and auditability.

What is lack of reproducibility (or an inability to audit results)?

300

You hand over your beautiful model for peer review. But the reviewer struggles to understand how your code works because there are no explanations. This basic practice improves your code’s readability.

What is code documentation/comments

300

You grant system-wide access to a user who only needs limited functionality. This principle should guide access instead.

What is the principle of least privilege?

300

At SCE, before deploying a model in IT-managed production, this rigorous review process is required.

What is a CRQ/change request?

300

THIS IS A DAILY “BOTTLE” QUESTION!!! An organization adopts AI tools quickly without clear guidelines, standards, oversight, or consistent practices, leading to inconsistent use and increased risk exposure. This broader issue is present.

What is the lack of AI Governance?

400

When you’re exhausted from manually pulling data from a system every Monday at 5 am to feed your model, this is the better long-term solution.

What is an automated data pipeline (or scheduled data ingestion)?

400

Your team avoids touching a piece of code because no one understands it and changes often break things. This is commonly referred to as what type of code?

What is legacy code?

400

A team begins using a dataset without knowing whether it contains sensitive or regulated information. This important step was skipped.

What is data classification?

400

This “single source of truth” stores and versions ML models and manages promotion of artifacts from Staging to Production.

What is a model registry?

400

An AI system is given the ability to call APIs or trigger actions based on its outputs. This concept is being applied.

What is Agentic AI?

500

THIS IS A DAILY "BOTTLE" QUESTION!!! This Snowflake developer framework enables “push-down” execution of Python, Java, or Scala so MLOps teams can run complex data prep inside Snowflake’s compute engine instead of pulling massive datasets out.

What is Snowpark

500

On GitHub, this core feature lets a developer signal a feature is complete and prompts team discussion and review before code is merged.

What is a pull request?

500

THIS IS A DAILY “BOTTLE” QUESTION!!! A solution is built and deployed, but only later reviewed for compliance and cyber security requirements, requiring significant rework. This approach should have been applied.

What is shift-left governance/early-stage governance/governance by design?

500

This end-to-end automated workflow turns raw data into a deployable model and is often structured as a DAG to manage dependencies across ingestion, validation, and training.

What is an ML pipeline?

500

An AI system’s outputs vary significantly based on small changes in input phrasing, even when the underlying intent is the same. This is being observed.

What is prompt sensitivity?

M
e
n
u