This skill routes to specialized sub-skills for data science and machine learning tasks. This skill provides valuable information about all sorts of data science, machine learning, and mlops tasks. It MUST be loaded in if any part of the user query relates to these topics❗❗❗
⚠️ CRITICAL: Before routing to any sub-skill, you MUST load the environment guide for your surface.
Your system prompt indicates which surface you are operating on. Load the matching guide:
| Surface | Condition | Guide to Load |
|---|---|---|
| Snowsight | You are operating inside the Snowflake Snowsight web interface | guides/snowsight-environment.md |
| CLI / IDE | You are operating in a command line terminal or IDE environment | guides/cli-environment.md |
The environment guide provides surface-specific instructions for session setup, package management, and code execution that apply to ALL sub-skills below. Sub-skills will reference these patterns rather than repeating them.
⚠️ CRITICAL: Route AUTOMATICALLY based on the user's request. Do NOT ask the user which sub-skill to use or how they want to deploy.
When a user asks to "train a model", "build a model" or inquires about a similar task:
ml-development/SKILL.md and start working⚠️ CRITICAL: When a user mentions a service name, check if it's a model inference service:
DESCRIBE SERVICE <DB>.<SCHEMA>.<SERVICE_NAME>managing_object_domain = 'Model' → Route to spcs-inference/SKILL.mdThis applies to ANY task involving the service (testing, REST API calls, latency profiling, benchmarking, debugging, management).
⚠️ CRITICAL: When user mentions "inference" without clear signals, you MUST ask for clarification.
There is a decision matrix located in the public docs https://docs.snowflake.com/en/developer-guide/snowflake-ml/inference/inference-overview.
Inference Disambiguation Workflow:
When user says something like "run inference on my model" or "inference" without batch/online signals:
I can help you run inference on your model. There are three approaches: 1. **Native Batch Inference (SQL)** - Embed inference in SQL pipelines - <add decision points from docs matrix here> 2. **Job-Based Batch (SPCS)** - Run large-scale inference jobs - <add decision points from docs matrix here> 3. **Real-Time Inference (SPCS)** - Deploy a REST endpoint - <add decision points from docs matrix here> Which approach fits your use case?
⚠️ STOP: Wait for user response before routing.
⚠️ CRITICAL: These two skills are commonly confused. Use this logic:
| User Intent | Key Signals | Route To |
|---|---|---|
| Run inference on a registered model | "model registry" + ("inference", "predictions", "scoring", "run()", "run_batch()") | batch-inference-jobs/SKILL.md |
| Run a Python script on Snowflake compute | "script", "submit", "file", "directory", training code | ml-jobs/SKILL.md |
Decision tree:
batch-inference-jobs/SKILL.md (covers both mv.run() and mv.run_batch())ml-jobs/SKILL.md (uses submit_file() or submit_directory())ml-pipeline-orchestration/SKILL.md| User Intent | Key Signals | Route To |
|---|---|---|
| Run a single ML job | "submit", "run script", "compute pool", one-off execution | ml-jobs/SKILL.md |
| Orchestrate multiple steps on a schedule | "pipeline", "DAG", "schedule", "automate", "task graph", multi-step workflow | ml-pipeline-orchestration/SKILL.md |
⚠️ CRITICAL: When user says "partitioned modeling", "partitioned model", "model per partition", or similar ambiguous phrases, clarify their stage in the workflow:
Partitioned Modeling Workflow:
[1] Train models per partition → [2] Register → [3] Run partitioned inference (MMT) (Registry) (@partitioned_inference_api)
Decision tree:
distributed-training/SKILL.md (Many Model Training section)model-registry/partitioned-inference/SKILL.mddistributed-training/SKILL.md first, then model-registry/partitioned-inference/SKILL.mdClarification prompt (when ambiguous):
I can help with partitioned modeling. Where are you in the workflow? 1. **Train models per partition** - Use Many Model Training (MMT) to train one model per partition (store, region, etc.) 2. **Run partitioned inference** - Already have trained models, need to run predictions per partition 3. **Full workflow** - Train per partition → Register → Run partitioned inference Which step do you need help with?
⚠️ STOP: Wait for user response before routing.
| User Says | Route To | Action |
|---|---|---|
| "analyze data", "train model", "build model", "feature engineering", "predict", "classify", "regression" | ml-development/SKILL.md | Load immediately, start training |
| "register model", "model registry", "log model", "pickle to snowflake", "save model to snowflake", "upload model", ".pkl file", ".ubj file" | model-registry/SKILL.md | Load immediately, start registration (Workflow A) |
| "deploy model", "deploy model for inference", "deploy for inference" | model-registry/SKILL.md | Load immediately, ask deployment target (Workflow B) |
| "create inference service", "inference endpoint", "serve model", "snowpark container services", "model endpoint", "deploy in container", "deploy model service", "real-time inference", "online inference" | spcs-inference/SKILL.md | Load immediately, create SPCS service |
| "batch inference", "bulk predictions", "run_batch", "run()", "offline scoring", "score dataset", "batch predictions", "inference on registered model", "run predictions on registry model", "score with registered model", "offline inference", "SQL inference", "dbt inference", "dynamic table inference" | batch-inference-jobs/SKILL.md | Load immediately, set up batch inference |
| "inference", "run inference" (ambiguous, no batch/online signals) | ASK USER | Use disambiguation workflow above to clarify batch vs online |
| "ml job", "ml jobs", "run on snowflake compute", "submit job", "submit script", "submit file", "remote execution", "GPU training", "run python script on snowflake" | ml-jobs/SKILL.md | Load immediately, set up job |
| "pipeline", "DAG", "task graph", "schedule training", "schedule inference", "orchestrate", "productionize", "automate retraining", "convert notebook to pipeline" | ml-pipeline-orchestration/SKILL.md | Load immediately, set up DAG |
| "experiment tracking", "track experiment", "log metrics", "log parameters", "autolog", "training callback", "XGBoost callback", "LightGBM callback" | experiment-tracking/SKILL.md | Load immediately, set up experiment tracking |
| "model monitor", "monitor model", "add monitoring", "enable monitoring", "ML observability", "track drift", "model performance", "monitor predictions", "observability" | model-monitor/SKILL.md | Load immediately, set up monitoring |
| "inference logs", "inference table", "captured inference", "autocapture data", "view inference history", "INFERENCE_TABLE", "inference requests", "inference responses", "view captured predictions" | inference-logs/SKILL.md | Load immediately, query inference data |
| "inference error", "mv.run fails", "service failing", "OOM", "debug inference", "inference not working" | debug-inference/SKILL.md | Load immediately, diagnose issue |
| "distributed training", "distributed XGBoost", "distributed LightGBM", "XGBEstimator", "LightGBMEstimator", "PyTorchDistributor", "multi-node training", "multi-GPU training", "train at scale", "DPF", "distributed partition function", "many model training", "MMT", "train per partition", "ManyModelTraining", "partition by", "hyperparameter tuning", "hyperparameter optimization", "HPO", "Tuner", "TunerConfig", "search space", "grid search", "random search", "bayesian optimization", "tune model", "tune hyperparameters", "num_trials", "search_alg" | distributed-training/SKILL.md | Load immediately, distributed training/processing/tuning |
| "partitioned inference", "@partitioned_api", "inference per partition", "CustomModel partition" | model-registry/partitioned-inference/SKILL.md | Load immediately, partitioned inference |
| "partitioned modeling", "partitioned model", "model per partition", "per-partition models" (ambiguous) | ASK USER | Use partitioned modeling disambiguation above |
Sub-skill path aliases (for routing resolution):
ml-job → ml-jobs/SKILL.md (singular form routes to plural directory)ml-jobs → ml-jobs/SKILL.mdmljob → ml-jobs/SKILL.mdmljobs → ml-jobs/SKILL.mdUser Request → Load Environment Guide → Detect Intent → Load appropriate sub-skill → Execute Examples: - "Train a classifier" → Load ml-development → Train locally → Done - "Deploy my model.pkl" → Load model-registry → Register to Snowflake → Done - "Train AND deploy" → Load ml-development → Train → Save model → Report artifacts → Ask about deployment → If yes, load model-registry WITH CONTEXT (file path, framework, schema)
Key principle: Complete ONE task at a time. Only ask about the next step after the current step is done.
⚠️ CRITICAL: When transitioning from ml-development to model-registry:
Information to preserve and pass along:
Why this matters:
How to do it:
Example handoff:
ml-development: "Model saved to /path/to/model.pkl (sklearn). Would you like to register it?" User: "Yes" [Load model-registry with context: path=/path/to/model.pkl, framework=sklearn, schema=[...]] model-registry: "I see you just trained a sklearn model. What should I call it in Snowflake?"
Data exploration, statistical analysis, model training, and evaluation. Covers the full ML development workflow from data loading to model evaluation.
Deploy serialized models to Snowflake Model Registry. Supports various model formats (.pkl, .ubj, .json, .pt, etc.) depending on framework. Routes to spcs-inference sub-skill for inference service creation. Includes partitioned-inference sub-skill for partition-aware model deployment.
Skills for tracking model training experiments using Snowflake's experiment tracking framework.
Deploy registered models to Snowpark Container Services for real-time inference. Handles compute pool selection, GPU/CPU configuration, num_workers, and service creation.
Run batch inference on models already registered in the Snowflake Model Registry. Covers two approaches:
mv.run()): Warehouse-based, integrates with SQL pipelinesmv.run_batch()): SPCS compute pools, for large-scale and unstructured dataTransform local Python scripts into Snowflake ML Jobs that run on Snowflake compute pools. Uses submit_file() or submit_directory(). Also includes compute pool reference (instance families, sizing).
Orchestrate multi-step ML workflows using Snowflake Task Graphs (DAGs) with the Python DAG API. Covers DAG creation, scheduling (Cron/timedelta), inter-task data passing, and notebook-to-pipeline conversion. Uses @remote for ML tasks on compute pools and warehouse tasks for orchestration.
Set up ML Observability for models in the Snowflake Model Registry. Track drift, performance metrics, and prediction statistics over time.
Consolidated skill covering all distributed ML training, processing, and tuning:
XGBEstimator, LightGBMEstimator, PyTorchDistributor for training one large model across nodes/GPUsget_model()Note: These APIs run server-side — either inside ML Jobs (submitted via CLI) or in Snowflake Notebooks with Container Runtime (Snowsight). For CLI submission, see ml-jobs.
Partitioned inference in the Model Registry using @partitioned_api decorator. Run inference with different submodels per data partition. Located at model-registry/partitioned-inference/SKILL.md.
Query and analyze captured inference data from model services with Auto-Capture enabled. View historical request/response data logged via INFERENCE_TABLE(). Useful for debugging unexpected predictions, building retraining datasets, and A/B testing model versions.
When the workflow involves creating or writing to any Snowflake object (table, stage, model registry entry, experiment, etc.), never silently pick a database/schema. Always confirm with the user first.
DATABASE.SCHEMA has already been used in this session, offer it as the default:
I'll need to create [object] in Snowflake. I see we've been working with `<DATABASE>.<SCHEMA>`. Should I use that, or would you prefer a different database/schema?
Which database and schema should I use for [object]? (format: DATABASE.SCHEMA)
USER$VINAY) are not supported for ML workflows. If the user picks a personal database, warn them:
Personal databases like `USER$<USERNAME>` don't support creating tables, model registry operations, or inference services. Please provide a standard database/schema instead.
Install using the Cortex CLI:
$cortex skill install machine-learningRequires Cortex CLI v2.0+. Install guide →
---
name: machine-learning
description: "**[REQUIRED]** For **ALL** data science and machine learning tasks. This skill should ALWAYS be loaded in even if only a portion of the workflow is related to machine learning. Use when: analyzing data, training models, deploying models to Snowflake, registering models, working with ML workflows, running ML jobs on Snowflake compute, model registry, model service, model inference, log model, deploy pickle file, experiment tracking, model monitoring, ML observability, tracking drift, model performance analysis, distributed training, XGBoost, LightGBM, PyTorch, DPF, distributed partition function, many model training, hyperparameter tuning, HPO, compute pools, train at scale, pipeline orchestration, DAG, task graph, schedule training. Routes to specialized sub-skills."
---
# Data Science & Machine Learning Skills
This skill routes to specialized sub-skills for data science and machine learning tasks.
This skill provides valuable information about all sorts of data science, machine learning, and mlops tasks.
It MUST be loaded in if any part of the user query relates to these topics❗❗❗
## Step 0: Load Environment Guide
**⚠️ CRITICAL: Before routing to any sub-skill, you MUST load the environment guide for your surface.**
Your system prompt indicates which surface you are operating on. Load the matching guide:
| Surface | Condition | Guide to Load |
|---------|-----------|---------------|
| **Snowsight** | You are operating inside the Snowflake Snowsight web interface | `guides/snowsight-environment.md` |
| **CLI / IDE** | You are operating in a command line terminal or IDE environment | `guides/cli-environment.md` |
The environment guide provides surface-specific instructions for **session setup, package management, and code execution** that apply to ALL sub-skills below. Sub-skills will reference these patterns rather than repeating them.
## Routing Behavior
**⚠️ CRITICAL: Route AUTOMATICALLY based on the user's request. Do NOT ask the user which sub-skill to use or how they want to deploy.**
When a user asks to "train a model", "build a model" or inquires about a similar task:
- **IMMEDIATELY** load `ml-development/SKILL.md` and start working
- Do NOT ask about deployment options upfront
- Do NOT ask "Local only vs Register in Snowflake vs End-to-end"
- Training and deployment are SEPARATE tasks - handle them sequentially if needed
## Intent Detection
### Dynamic Service Detection (Model Inference Services)
**⚠️ CRITICAL:** When a user mentions a **service name**, check if it's a model inference service:
1. Run `DESCRIBE SERVICE <DB>.<SCHEMA>.<SERVICE_NAME>`
2. If `managing_object_domain = 'Model'` → Route to `spcs-inference/SKILL.md`
This applies to ANY task involving the service (testing, REST API calls, latency profiling, benchmarking, debugging, management).
---
### Disambiguation: batch-inference vs spcs-inference (Online)
**⚠️ CRITICAL:** When user mentions "inference" without clear signals, you MUST ask for clarification.
There is a decision matrix located in the public docs `https://docs.snowflake.com/en/developer-guide/snowflake-ml/inference/inference-overview`.
**Inference Disambiguation Workflow:**
When user says something like "run inference on my model" or "inference" without batch/online signals:
```
I can help you run inference on your model. There are three approaches:
1. **Native Batch Inference (SQL)** - Embed inference in SQL pipelines
- <add decision points from docs matrix here>
2. **Job-Based Batch (SPCS)** - Run large-scale inference jobs
- <add decision points from docs matrix here>
3. **Real-Time Inference (SPCS)** - Deploy a REST endpoint
- <add decision points from docs matrix here>
Which approach fits your use case?
```
**⚠️ STOP**: Wait for user response before routing.
### Disambiguation: batch-inference vs ml-jobs
**⚠️ CRITICAL:** These two skills are commonly confused. Use this logic:
| User Intent | Key Signals | Route To |
|-------------|-------------|----------|
| Run inference on a **registered model** | "model registry" + ("inference", "predictions", "scoring", "run()", "run_batch()") | `batch-inference-jobs/SKILL.md` |
| Run a **Python script** on Snowflake compute | "script", "submit", "file", "directory", training code | `ml-jobs/SKILL.md` |
**Decision tree:**
1. Does the user want to run inference on an **existing model in the registry**?
- **YES** → `batch-inference-jobs/SKILL.md` (covers both `mv.run()` and `mv.run_batch()`)
- **NO** → Continue to step 2
2. Does the user want to run **custom Python code** (training, processing, or scripts) on Snowflake compute?
- **YES** → `ml-jobs/SKILL.md` (uses `submit_file()` or `submit_directory()`)
- **NO** → Continue to step 3
3. Does the user want to **orchestrate multiple steps** on a schedule (pipeline, DAG)?
- **YES** → `ml-pipeline-orchestration/SKILL.md`
- **NO** → Ask clarifying question
### Disambiguation: ml-jobs vs ml-pipeline-orchestration
| User Intent | Key Signals | Route To |
|-------------|-------------|----------|
| Run a **single ML job** | "submit", "run script", "compute pool", one-off execution | `ml-jobs/SKILL.md` |
| **Orchestrate multiple steps** on a schedule | "pipeline", "DAG", "schedule", "automate", "task graph", multi-step workflow | `ml-pipeline-orchestration/SKILL.md` |
### Disambiguation: Partitioned Modeling (Full Workflow)
**⚠️ CRITICAL:** When user says "partitioned modeling", "partitioned model", "model per partition", or similar ambiguous phrases, clarify their stage in the workflow:
**Partitioned Modeling Workflow:**
```
[1] Train models per partition → [2] Register → [3] Run partitioned inference
(MMT) (Registry) (@partitioned_inference_api)
```
**Decision tree:**
1. Does the user need to **train** models per partition?
- **YES** → Load `distributed-training/SKILL.md` (Many Model Training section)
- After training completes, ask if they want to proceed to inference
2. Does the user already have trained models and needs **inference**?
- **YES** → Load `model-registry/partitioned-inference/SKILL.md`
3. Does the user want the **full end-to-end workflow**?
- **YES** → Load `distributed-training/SKILL.md` first, then `model-registry/partitioned-inference/SKILL.md`
**Clarification prompt (when ambiguous):**
```
I can help with partitioned modeling. Where are you in the workflow?
1. **Train models per partition** - Use Many Model Training (MMT) to train one model per partition (store, region, etc.)
2. **Run partitioned inference** - Already have trained models, need to run predictions per partition
3. **Full workflow** - Train per partition → Register → Run partitioned inference
Which step do you need help with?
```
**⚠️ STOP**: Wait for user response before routing.
### Routing Table
| User Says | Route To | Action |
|-----------|----------|--------|
| "analyze data", "train model", "build model", "feature engineering", "predict", "classify", "regression" | `ml-development/SKILL.md` | Load immediately, start training |
| "register model", "model registry", "log model", "pickle to snowflake", "save model to snowflake", "upload model", ".pkl file", ".ubj file" | `model-registry/SKILL.md` | Load immediately, start registration (Workflow A) |
| "deploy model", "deploy model for inference", "deploy for inference" | `model-registry/SKILL.md` | Load immediately, ask deployment target (Workflow B) |
| "create inference service", "inference endpoint", "serve model", "snowpark container services", "model endpoint", "deploy in container", "deploy model service", "real-time inference", "online inference" | `spcs-inference/SKILL.md` | Load immediately, create SPCS service |
| "batch inference", "bulk predictions", "run_batch", "run()", "offline scoring", "score dataset", "batch predictions", "inference on registered model", "run predictions on registry model", "score with registered model", "offline inference", "SQL inference", "dbt inference", "dynamic table inference" | `batch-inference-jobs/SKILL.md` | Load immediately, set up batch inference |
| **"inference", "run inference"** (ambiguous, no batch/online signals) | **ASK USER** | Use disambiguation workflow above to clarify batch vs online |
| "ml job", "ml jobs", "run on snowflake compute", "submit job", "submit script", "submit file", "remote execution", "GPU training", "run python script on snowflake" | `ml-jobs/SKILL.md` | Load immediately, set up job |
| "pipeline", "DAG", "task graph", "schedule training", "schedule inference", "orchestrate", "productionize", "automate retraining", "convert notebook to pipeline" | `ml-pipeline-orchestration/SKILL.md` | Load immediately, set up DAG |
| "experiment tracking", "track experiment", "log metrics", "log parameters", "autolog", "training callback", "XGBoost callback", "LightGBM callback" | `experiment-tracking/SKILL.md` | Load immediately, set up experiment tracking |
| "model monitor", "monitor model", "add monitoring", "enable monitoring", "ML observability", "track drift", "model performance", "monitor predictions", "observability" | `model-monitor/SKILL.md` | Load immediately, set up monitoring |
| "inference logs", "inference table", "captured inference", "autocapture data", "view inference history", "INFERENCE_TABLE", "inference requests", "inference responses", "view captured predictions" | `inference-logs/SKILL.md` | Load immediately, query inference data |
| "inference error", "mv.run fails", "service failing", "OOM", "debug inference", "inference not working" | `debug-inference/SKILL.md` | Load immediately, diagnose issue |
| "distributed training", "distributed XGBoost", "distributed LightGBM", "XGBEstimator", "LightGBMEstimator", "PyTorchDistributor", "multi-node training", "multi-GPU training", "train at scale", "DPF", "distributed partition function", "many model training", "MMT", "train per partition", "ManyModelTraining", "partition by", "hyperparameter tuning", "hyperparameter optimization", "HPO", "Tuner", "TunerConfig", "search space", "grid search", "random search", "bayesian optimization", "tune model", "tune hyperparameters", "num_trials", "search_alg" | `distributed-training/SKILL.md` | Load immediately, distributed training/processing/tuning |
| "partitioned inference", "@partitioned_api", "inference per partition", "CustomModel partition" | `model-registry/partitioned-inference/SKILL.md` | Load immediately, partitioned inference |
| **"partitioned modeling", "partitioned model", "model per partition", "per-partition models"** (ambiguous) | **ASK USER** | Use partitioned modeling disambiguation above |
**Sub-skill path aliases** (for routing resolution):
- `ml-job` → `ml-jobs/SKILL.md` (singular form routes to plural directory)
- `ml-jobs` → `ml-jobs/SKILL.md`
- `mljob` → `ml-jobs/SKILL.md`
- `mljobs` → `ml-jobs/SKILL.md`
## Workflow
```markdown
User Request → Load Environment Guide → Detect Intent → Load appropriate sub-skill → Execute
Examples:
- "Train a classifier" → Load ml-development → Train locally → Done
- "Deploy my model.pkl" → Load model-registry → Register to Snowflake → Done
- "Train AND deploy" → Load ml-development → Train → Save model → Report artifacts → Ask about deployment → If yes, load model-registry WITH CONTEXT (file path, framework, schema)
```
**Key principle**: Complete ONE task at a time. Only ask about the next step after the current step is done.
## Context Preservation Between Skills
**⚠️ CRITICAL:** When transitioning from ml-development to model-registry:
**Information to preserve and pass along:**
- Model file path (absolute path to serialized model file)
- Framework used (sklearn, xgboost, lightgbm, pytorch, tensorflow, etc.)
- Sample input schema (columns and types from training data)
- Any other relevant training context
**Why this matters:**
- Avoids asking the user to repeat information they just provided
- Prevents accidental retraining of the model
- Prevents modification of the training script
- Improves user experience with seamless workflow
**How to do it:**
1. When ml-development saves a model, it reports all details
2. When loading model-registry, explicitly mention this context
3. Model-registry checks for this context before asking questions
4. Use the preserved context instead of asking user again
**Example handoff:**
```markdown
ml-development: "Model saved to /path/to/model.pkl (sklearn). Would you like to register it?"
User: "Yes"
[Load model-registry with context: path=/path/to/model.pkl, framework=sklearn, schema=[...]]
model-registry: "I see you just trained a sklearn model. What should I call it in Snowflake?"
```
## Sub-Skills
### ml-development
Data exploration, statistical analysis, model training, and evaluation. Covers the full ML development workflow from data loading to model evaluation.
### model-registry
Deploy serialized models to Snowflake Model Registry. Supports various model formats (`.pkl`, `.ubj`, `.json`, `.pt`, etc.) depending on framework. Routes to `spcs-inference` sub-skill for inference service creation. Includes `partitioned-inference` sub-skill for partition-aware model deployment.
### experiment-tracking
Skills for tracking model training experiments using Snowflake's experiment tracking framework.
### spcs-inference
Deploy registered models to Snowpark Container Services for real-time inference. Handles compute pool selection, GPU/CPU configuration, num_workers, and service creation.
### batch-inference-jobs
Run batch inference on models **already registered** in the Snowflake Model Registry. Covers **two approaches**:
- **Native SQL Batch** (`mv.run()`): Warehouse-based, integrates with SQL pipelines
- **Job-based Batch** (`mv.run_batch()`): SPCS compute pools, for large-scale and unstructured data
### ml-jobs
Transform local **Python scripts** into Snowflake ML Jobs that run on Snowflake compute pools. Uses `submit_file()` or `submit_directory()`. Also includes compute pool reference (instance families, sizing).
### ml-pipeline-orchestration
Orchestrate multi-step ML workflows using Snowflake Task Graphs (DAGs) with the Python DAG API. Covers DAG creation, scheduling (Cron/timedelta), inter-task data passing, and notebook-to-pipeline conversion. Uses `@remote` for ML tasks on compute pools and warehouse tasks for orchestration.
### model-monitor
Set up ML Observability for models in the Snowflake Model Registry. Track drift, performance metrics, and prediction statistics over time.
### distributed-training
**Consolidated skill** covering all distributed ML training, processing, and tuning:
- **Distributed Estimators**: `XGBEstimator`, `LightGBMEstimator`, `PyTorchDistributor` for training one large model across nodes/GPUs
- **Many Model Training (MMT)**: Train one model per partition with auto-serialization and `get_model()`
- **DPF (Distributed Partition Function)**: General-purpose distributed processing for custom workflows
- **Tuner API**: Distributed hyperparameter tuning (Ray Tune on SPCS) with RandomSearch, GridSearch, BayesOpt
> **Note**: These APIs run server-side — either inside ML Jobs (submitted via CLI) or in Snowflake Notebooks with Container Runtime (Snowsight). For CLI submission, see ml-jobs.
### partitioned-inference (under model-registry)
Partitioned inference in the Model Registry using `@partitioned_api` decorator. Run inference with different submodels per data partition. Located at `model-registry/partitioned-inference/SKILL.md`.
### inference-logs
Query and analyze captured inference data from model services with Auto-Capture enabled. View historical request/response data logged via `INFERENCE_TABLE()`. Useful for debugging unexpected predictions, building retraining datasets, and A/B testing model versions.
## Reminders & Common Mistakes
### ❌ Don't assume a database/schema — always ask
When the workflow involves creating or writing to any Snowflake object (table, stage, model registry entry, experiment, etc.), **never silently pick a database/schema**. Always confirm with the user first.
- If a `DATABASE.SCHEMA` has already been used in this session, offer it as the default:
```
I'll need to create [object] in Snowflake. I see we've been working with `<DATABASE>.<SCHEMA>`.
Should I use that, or would you prefer a different database/schema?
```
- If no database/schema has been used yet, ask explicitly:
```
Which database and schema should I use for [object]? (format: DATABASE.SCHEMA)
```
- **Carry the confirmed choice forward** — reuse it for subsequent objects in the session, but still confirm each time.
- **⚠️ Personal databases (e.g. `USER$VINAY`) are not supported** for ML workflows. If the user picks a personal database, warn them:
```
Personal databases like `USER$<USERNAME>` don't support creating tables, model registry operations, or inference services. Please provide a standard database/schema instead.
```
- **⚠️ STOP**: Wait for the user's response before proceeding with any object creation.