If you’re using a dataset but can’t trace where it came from or how it was created, you’re missing this key concept about a dataset’s origin and history.
What is data lineage?
When a teammate accidentally overwrites your code changes and you lose a week of work and want to yell at them, this basic practice prevents tragedies like that by tracking history and enabling easy recovery.
What is version control (e.g., Git)?
When a colleague emails a dataset that includes customer information without protection, this fundamental concern is being violated.
What is data privacy/unsecured data sharing (PII)?
A model that once performed well now gets user complaints about reduced accuracy because real-world data or patterns have changed over time.
What is model drift?
When you feed company policies and procedures into an LLM so it can look them up and answer more accurately, you’re using this technique.
What is retrieval-augmented generation (RAG)?
At SCE, production model data should generally come from these kinds of approved, governed, trusted, and auditable sources—not ad hoc extracts.
What are systems of records? (e.g., governed data platforms like Snowflake/curated datasets, trusted upstream systems, and other documented, auditable sources)?
If your code works on your machine but fails for collaborators due to missing packages, including this file helps others recreate the same setup and protects your technical reputation.
What is an environment/dependency file?
If someone hard-codes credentials in GCP and an attacker later finds them and causes a security incident, this practice could have prevented it.
What is secrets management/secure credential storage (e.g., Secret Manager)?
A critical notebook-based model can only be run by one person, who is now on a well-deserved cruise with no reception. This capability is missing.
What is productionization/operationalization (or sufficient documentation)?
When an AI solution gives slightly different answers to the same question across repeated runs, that behavior is called this.
What is non-determinism?
If your model output depends on a dataset that changes over time and you didn’t snapshot it, you’ve created this risk for reproducibility and auditability.
What is lack of reproducibility (or an inability to audit results)?
You hand over your beautiful model for peer review. But the reviewer struggles to understand how your code works because there are no explanations. This basic practice improves your code’s readability.
What is code documentation/comments
You grant system-wide access to a user who only needs limited functionality. This principle should guide access instead.
What is the principle of least privilege?
At SCE, before deploying a model in IT-managed production, this rigorous review process is required.
What is a CRQ/change request?
THIS IS A DAILY “BOTTLE” QUESTION!!! An organization adopts AI tools quickly without clear guidelines, standards, oversight, or consistent practices, leading to inconsistent use and increased risk exposure. This broader issue is present.
What is the lack of AI Governance?
When you’re exhausted from manually pulling data from a system every Monday at 5 am to feed your model, this is the better long-term solution.
What is an automated data pipeline (or scheduled data ingestion)?
Your team avoids touching a piece of code because no one understands it and changes often break things. This is commonly referred to as what type of code?
What is legacy code?
A team begins using a dataset without knowing whether it contains sensitive or regulated information. This important step was skipped.
What is data classification?
This “single source of truth” stores and versions ML models and manages promotion of artifacts from Staging to Production.
What is a model registry?
An AI system is given the ability to call APIs or trigger actions based on its outputs. This concept is being applied.
What is Agentic AI?
THIS IS A DAILY "BOTTLE" QUESTION!!! This Snowflake developer framework enables “push-down” execution of Python, Java, or Scala so MLOps teams can run complex data prep inside Snowflake’s compute engine instead of pulling massive datasets out.
What is Snowpark
On GitHub, this core feature lets a developer signal a feature is complete and prompts team discussion and review before code is merged.
What is a pull request?
THIS IS A DAILY “BOTTLE” QUESTION!!! A solution is built and deployed, but only later reviewed for compliance and cyber security requirements, requiring significant rework. This approach should have been applied.
What is shift-left governance/early-stage governance/governance by design?
This end-to-end automated workflow turns raw data into a deployable model and is often structured as a DAG to manage dependencies across ingestion, validation, and training.
What is an ML pipeline?
An AI system’s outputs vary significantly based on small changes in input phrasing, even when the underlying intent is the same. This is being observed.
What is prompt sensitivity?