Cortex Skills
  • Skills Directory
  • Categories
  • Docs
Add SkillBrowse Skills
  1. Home
  2. Skills Directory
  3. Fine-Tuning
  4. lineage
IntermediateFine-Tuningtrendingfeatured

lineage

Analyze data lineage and dependencies in Snowflake.
Jump to Installation

Compatible with

Cortex

About this skill

Lineage & Impact Analysis

When to Use/Load

This skill activates when users need to:

  • Assess impact of changes to tables/columns (downstream analysis)
  • Trace upstream/downstream after a data issue is identified (use data_quality skill first to find the issue, then this skill for lineage)
  • Discover and verify trusted datasets (provenance)
  • Trace column-level dependencies and transformations

Trigger indicators: Questions about dependencies, lineage, trust, impact, or data provenance.

Cross-skill: If the user asks about data quality issues (e.g. "weird data", "missing emails", "wrong numbers", "quality issues", "failing metrics"), load and run the data_quality skill first to assess DMF results and column quality; then use this skill to trace upstream lineage. Do not use this skill alone to answer quality questions—it only traces where data flows, not whether values are correct or complete.

Purpose

Navigate the web of data dependencies to ensure reliability, transparency, and rapid recovery across the data ecosystem.

How It Works

  1. User provides DATABASE.SCHEMA.TABLE (e.g., "ANALYTICS_DB.REPORTING.REVENUE_SUMMARY")
  2. Agent identifies workflow based on trigger phrases and other context
  3. Agent reads workflow file from workflows/ directory
  4. Agent executes template with placeholders replaced
  5. Agent presents clean results formatted per workflow guidelines

Execution Approach: Lineage queries are read-only analysis operations. Execute immediately without confirmation since they don't modify data or objects. If executing a recursive SQL query, notify the user that the analysis may take a while due to the complexity of the query.


The 4 Workflows

1. Impact Analysis (Downstream)

File: workflows/impact-analysis.md Question: "If I change this, what breaks?" Triggers: "impact analysis", "what depends on this", "what will break", "downstream", "who uses this" Templates: impact-analysis.sql, impact-analysis-multi-level.sql Output: Downstream objects with risk tiers, usage frequency, affected users

Snowflake APIs Used:

  • SNOWFLAKE.CORE.GET_LINEAGE() - Primary: Object and data-movement lineage (VIEW LINEAGE, no account admin)
  • SNOWFLAKE.ACCOUNT_USAGE.OBJECT_DEPENDENCIES - Fallback: Object dependency graph only (account admin)
  • SNOWFLAKE.ACCOUNT_USAGE.ACCESS_HISTORY - Actual usage patterns
  • SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY - User attribution

2. Root Cause Analysis (Upstream)

File: workflows/root-cause-analysis.md Question: "Why is this number wrong?" / "Where does this data come from?" Triggers: "root cause", "why is this wrong", "trace upstream", "where does this come from", "debug" Templates: root-cause-analysis.sql, change-detection.sql Output: Upstream lineage, recent changes, divergence points

Note: When the user reports data quality issues (wrong values, missing emails, failing DMFs), use the data_quality skill first to identify what is wrong; then use this workflow to trace upstream to find where the bad data originated.

Snowflake APIs Used:

  • SNOWFLAKE.CORE.GET_LINEAGE() - Primary: Upstream lineage (object + data movement)
  • SNOWFLAKE.ACCOUNT_USAGE.OBJECT_DEPENDENCIES - Fallback: Upstream object dependencies
  • SNOWFLAKE.ACCOUNT_USAGE.ACCESS_HISTORY - Data flow patterns
  • SNOWFLAKE.ACCOUNT_USAGE.TABLES / COLUMNS - Schema change detection
  • SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY - Recent modifications

3. Data Discovery & Trust (Provenance)

File: workflows/data-discovery.md Question: "Where did this come from and is it the right tool for the job?" Triggers: "is this trustworthy", "provenance", "recommend dataset", "which table should I use", "verify source" Templates: data-discovery.sql, provenance-verification.sql Output: Full lineage path, usage statistics, trust indicators

Snowflake APIs Used:

  • SNOWFLAKE.CORE.GET_LINEAGE() - Primary: Full lineage path (object + data movement)
  • SNOWFLAKE.ACCOUNT_USAGE.OBJECT_DEPENDENCIES - Fallback: Full dependency chain
  • SNOWFLAKE.ACCOUNT_USAGE.ACCESS_HISTORY - Usage patterns
  • SNOWFLAKE.ACCOUNT_USAGE.TABLES - Object metadata
  • SNOWFLAKE.ACCOUNT_USAGE.TABLE_STORAGE_METRICS - Data freshness

4. Column-Level Lineage

File: workflows/column-lineage.md Question: "What uses this column?" / "Where does this column come from?" Triggers: "column lineage", "what uses [column]", "where does [column] come from", "trace column", "column impact", "column source" Templates: column-lineage-downstream.sql, column-lineage-upstream.sql, column-lineage-full.sql, column-change-detection.sql Output: Column-level dependencies, source columns, transformation paths

Snowflake APIs Used:

  • SNOWFLAKE.ACCOUNT_USAGE.ACCESS_HISTORY - Column-level access patterns (columns array in base_objects_accessed/objects_modified)
  • SNOWFLAKE.ACCOUNT_USAGE.COLUMNS - Column metadata and definitions
  • SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY - DDL changes affecting columns

Execution Rules

Step 0: Verify Session Context (once per session)

Run this once at the start of the first query. If you have already confirmed warehouse, database, and schema are set earlier in this conversation, skip this step.

Check the current session context:

SELECT
    CURRENT_USER()      AS current_user,
    CURRENT_ROLE()      AS current_role,
    CURRENT_DATABASE()  AS current_database,
    CURRENT_SCHEMA()    AS current_schema,
    CURRENT_WAREHOUSE() AS current_warehouse;

Fix any NULL or mismatched values before continuing:

FieldFix if NULL / wrong
current_warehouseUSE WAREHOUSE <name>;
current_databaseUSE DATABASE <database>;
current_schemaUSE SCHEMA <database>.<schema>;

⚠️ STOP if warehouse is NULL — SNOWFLAKE.ACCOUNT_USAGE queries require an active warehouse.

Core Execution Flow:

  1. Extract object identifiers from user message:
    • Table workflows: DATABASE.SCHEMA.TABLE
    • Column workflows: DATABASE.SCHEMA.TABLE.COLUMN
  2. Identify workflow from trigger phrases and the object identifier → read workflow file from workflows/
  3. Read SQL template specified in workflow file
  4. Build dynamic scoring from config/schema-patterns.yaml (for trust/risk placeholders)
  5. Replace placeholders with actual values
  6. Execute and format results per workflow guidelines

Placeholder Replacements:

  • <database>, <schema>, <table>, <column> → actual object names
  • /* SCHEMA_TRUST_SCORING:column */ → dynamic CASE statement from config
  • /* SCHEMA_TRUST_TIER:column */ → dynamic tier name CASE statement
  • /* SCHEMA_RISK_SCORING:column */ → dynamic risk CASE statement

Key Principles:

  • Use templates exactly as written - only replace placeholders
  • No confirmation prompts for read-only lineage queries
  • Handle errors gracefully - use fallback templates, provide clear messages
  • One workflow per request - don't chain multiple analyses automatically

Template Structure

All templates in templates/ directory use these placeholders:

  • <database> → Replace with actual database name
  • <schema> → Replace with actual schema name
  • <table> → Replace with actual table name
  • <column> → Replace with actual column name (for column-level lineage)
  • /* SCHEMA_TRUST_SCORING:column_name */ → Dynamic CASE statement returning score (integer)
  • /* SCHEMA_TRUST_TIER:column_name */ → Dynamic CASE statement returning tier name (string)
  • /* SCHEMA_RISK_SCORING:column_name */ → Dynamic CASE statement returning 'CRITICAL' or NULL

Example:

-- Template:
WHERE REFERENCED_DATABASE = '<database>' 
  AND REFERENCED_SCHEMA = '<schema>' 
  AND REFERENCED_OBJECT_NAME = '<table>'

-- After replacement:
WHERE REFERENCED_DATABASE = 'ANALYTICS_DB' 
  AND REFERENCED_SCHEMA = 'REPORTING' 
  AND REFERENCED_OBJECT_NAME = 'SALES_SUMMARY'

Dynamic Trust Scoring

Templates use dynamic placeholders for trust/risk scoring. See reference/dynamic-trust-scoring.md for complete documentation on:

  • Placeholder syntax (/* PLACEHOLDER_TYPE:column_name */)
  • Building CASE statements from config/schema-patterns.yaml
  • Why dynamic scoring enables customer customization

Error Handling & Fallback Strategy

Lineage source order (static dependencies):

  1. Primary: SNOWFLAKE.CORE.GET_LINEAGE() — object and data-movement lineage; requires only object-resolve + VIEW LINEAGE (granted to public). Prefer for all table/column lineage.
  2. Fallback 1: SNOWFLAKE.ACCOUNT_USAGE.OBJECT_DEPENDENCIES — object dependency only; requires account admin. Use only when the primary query fails (e.g. privilege error) or when you have strong reason to expect lineage but GET_LINEAGE returned empty (e.g. object is known to have dependents).
  3. Fallback 2: GET_DDL() to parse view definitions for references
  4. Fallback 3: INFORMATION_SCHEMA.OBJECT_DEPENDENCIES (current DB only)

Minimize steps: Run the primary template once. If it succeeds (with or without rows), present the result and stop—do not run the fallback. Use fallback only when the primary errors or when the object clearly should have lineage (e.g. a view that references other tables) but GET_LINEAGE returned 0 rows. Zero rows from GET_LINEAGE is often a valid answer (e.g. table has no downstream).

Why GET_LINEAGE first: OBJECT_DEPENDENCIES captures only object dependency (target data depends on source). GET_LINEAGE also captures data movement (e.g. COPY INTO, CTAS) and avoids account admin (VIEW LINEAGE is granted to public).

Fallback templates (use only when primary fails or expected lineage is missing):

  • impact-analysis-object-deps-fallback.sql, impact-analysis-multi-level-object-deps-fallback.sql, impact-analysis-users-object-deps-fallback.sql
  • root-cause-analysis-object-deps-fallback.sql, change-detection-object-deps-fallback.sql
  • provenance-verification-object-deps-fallback.sql

Further fallbacks (DDL / INFORMATION_SCHEMA):

  • impact-analysis-fallback.sql — DDL/INFORMATION_SCHEMA for downstream deps
  • root-cause-ddl-fallback.sql — DDL parsing for upstream lineage

If template fails:

  1. Run primary (GET_LINEAGE) once. On success (any row count), present results and stop. On failure (error) or empty when lineage is expected, try OBJECT_DEPENDENCIES fallback, then GET_DDL/INFORMATION_SCHEMA if needed.
  2. Check ACCESS_HISTORY availability: check-access-history.sql
  3. Check object existence: check-object-exists.sql
  4. If object doesn't exist: "Object not found. Check the name and try again."
  5. If no lineage data: "No lineage data available. Object may be new or unused."

Common Issues:

  • "No ACCESS_HISTORY data" → Data is older than 365 days or not accessed
  • "GET_LINEAGE returns empty" → Use fallback only if the object should have lineage (e.g. view with references); otherwise report "No lineage found."
  • "Insufficient privileges" → GET_LINEAGE needs VIEW LINEAGE (public); OBJECT_DEPENDENCIES needs account admin

Expected User Experience

Example 1: Impact Analysis

User: "What will break if I change RAW_DB.SALES.ORDERS?"

Agent (immediately):

Impact Analysis: RAW_DB.SALES.ORDERS

═══════════════════════════════════════════════════════════════
CRITICAL RISK (2 objects)
═══════════════════════════════════════════════════════════════
1. ANALYTICS_DB.REPORTING.DAILY_REVENUE (Dynamic Table)
   Risk: CRITICAL | Refresh: Every 15 min | Users: 12 in last 7 days
   → Feeds 3 downstream objects including executive dashboard

2. FINANCE_DB.REPORTS.AR_AGING (View)
   Risk: CRITICAL | Queries: 89/day | Users: 5 in last 7 days
   → Used for month-end close process

═══════════════════════════════════════════════════════════════
MODERATE RISK (1 object)
═══════════════════════════════════════════════════════════════
3. STAGING_DB.TRANSFORM.ORDERS_ENRICHED (Table)
   Risk: MODERATE | Last updated: 2024-01-15 | Users: 2 in last 7 days

Summary: 3 downstream dependencies | 2 CRITICAL | 1 MODERATE
Affected Users: 15 unique users in last 7 days

Example 2: Root Cause Analysis

User: "Why is ANALYTICS_DB.REPORTING.REVENUE wrong?"

Agent (immediately):

Root Cause Analysis: ANALYTICS_DB.REPORTING.REVENUE

═══════════════════════════════════════════════════════════════
UPSTREAM LINEAGE (3 levels)
═══════════════════════════════════════════════════════════════
Level 1: STAGING_DB.CLEAN.ORDERS_FACT
Level 2: RAW_DB.INGEST.ORDERS, RAW_DB.INGEST.PRODUCTS
Level 3: External Stage (S3://data-lake/orders/)

═══════════════════════════════════════════════════════════════
RECENT CHANGES DETECTED
═══════════════════════════════════════════════════════════════
⚠️  RAW_DB.INGEST.ORDERS - Schema changed 2 hours ago
    Change: Column DISCOUNT_PCT type changed NUMBER(5,2) → NUMBER(10,4)
    Changed by: ETL_SERVICE_USER

⚠️  STAGING_DB.CLEAN.ORDERS_FACT - Data modified 1 hour ago
    Rows affected: 15,234 rows updated
    Query: MERGE statement by TRANSFORM_TASK

Most Likely Cause: Schema change in RAW_DB.INGEST.ORDERS
Recommendation: Verify DISCOUNT_PCT handling in downstream transforms

Example 3: Data Discovery

User: "Which table should I use for customer revenue analysis?"

Agent (immediately):

Data Discovery: Customer Revenue Analysis

RECOMMENDED:
1. ANALYTICS_DB.CURATED.CUSTOMER_REVENUE_SUMMARY ⭐ Best Match
   Trust Score: 95% | Updated hourly | Users: 45/week
   Why: Pre-aggregated customer revenue with standard dimensions
   Provenance: RAW.ORDERS → STAGING.ORDERS_FACT → This table

Not Recommended:
- RAW_DB.ORDERS (Trust: 60%) - Raw data, requires transformation
- SANDBOX_DB.*.* - Development/test data

Example 4: Column-Level Lineage

User: "What uses the AMOUNT column in RAW_DB.SALES.ORDERS?"

Agent (immediately):

Column Impact Analysis: RAW_DB.SALES.ORDERS.AMOUNT

═══════════════════════════════════════════════════════════════
DOWNSTREAM COLUMNS (8 columns use this data)
═══════════════════════════════════════════════════════════════

CRITICAL IMPACT:
1. ANALYTICS_DB.REPORTING.REVENUE_SUMMARY.TOTAL_REVENUE
   Impact: CRITICAL | Queries: 89/day | Confidence: HIGH
   Transformation: SUM(AMOUNT) aggregation
   
2. FINANCE_DB.REPORTS.AR_AGING.OUTSTANDING_AMOUNT
   Impact: CRITICAL | Queries: 45/day | Confidence: HIGH
   Transformation: Direct reference with filters

HIGH IMPACT:
3. STAGING_DB.TRANSFORM.ORDERS_ENRICHED.NET_AMOUNT
   Impact: HIGH | Queries: 23/day | Confidence: HIGH
   Transformation: AMOUNT * (1 - DISCOUNT_PCT/100)

MODERATE IMPACT:
4. ANALYTICS_DB.MARTS.CUSTOMER_360.LIFETIME_VALUE
   Impact: MEDIUM | Queries: 12/day | Confidence: MEDIUM

Summary: 8 downstream columns | 2 CRITICAL | 1 HIGH | 5 MEDIUM
Recommendation: Coordinate with Finance team before changing

Example 5: Column Source Tracing

User: "Where does ANALYTICS_DB.REPORTS.REVENUE.TOTAL_SALES come from?"

Agent (immediately):

Column Source Analysis: ANALYTICS_DB.REPORTS.REVENUE.TOTAL_SALES

═══════════════════════════════════════════════════════════════
UPSTREAM SOURCES (traced 3 levels)
═══════════════════════════════════════════════════════════════

Level 1 (Direct Source):
  STAGING_DB.TRANSFORM.ORDERS_AGG.REVENUE_SUM
  Confidence: HIGH | Last seen: 2 hours ago
  Transformation: Renamed column

Level 2:
  RAW_DB.INGEST.ORDERS.AMOUNT
  Confidence: HIGH | Source tier: RAW
  Transformation: SUM() aggregation

Level 3 (Origin):
  @RAW_DB.STAGES.S3_ORDERS/orders.csv
  Confidence: MEDIUM | Source tier: EXTERNAL

Complete Path:
S3_ORDERS → ORDERS.AMOUNT → ORDERS_AGG.REVENUE_SUM → REVENUE.TOTAL_SALES

Summary Table

WorkflowDirectionPrimary GoalKey Stakeholder
Impact AnalysisDownstreamRisk MitigationData Engineers / Ops
Root CauseUpstreamTroubleshootingAnalysts / Analytics Engineers
Trust & DiscoveryFull PathData LiteracyBusiness Users / Platform Owners
Column LineageBothField-Level TracingData Engineers / Analysts

Workflow Selection Logic

User SaysWorkflowTemplate
"What will break if I change [table]?"Impact Analysisimpact-analysis.sql
"What depends on [table]?"Impact Analysisimpact-analysis.sql
"Why is this number wrong?"Root Cause Analysisroot-cause-analysis.sql
"Where does [table] come from?"Root Cause Analysisroot-cause-analysis.sql
"Is [table] trustworthy?"Data Discoverydata-discovery.sql
"Which table should I use for [topic]?"Data Discoverydata-discovery.sql
"What uses the [column] column?"Column Lineagecolumn-lineage-downstream.sql
"Where does [column] come from?"Column Lineagecolumn-lineage-upstream.sql
"Full lineage for [column]"Column Lineagecolumn-lineage-full.sql
"Has [column] changed?"Column Lineagecolumn-change-detection.sql

Snowflake APIs Reference

See reference/snowflake-apis.md for complete documentation on:

  • SNOWFLAKE.CORE.GET_LINEAGE() (primary) vs ACCOUNT_USAGE.OBJECT_DEPENDENCIES (fallback)
  • Object dependency vs data-movement lineage and privilege requirements
  • ACCOUNT_USAGE views and their latencies
  • Performance optimization tips

Success Criteria

  • User gets answer in one response
  • Risk tiers clearly communicated (Impact Analysis)
  • Recent changes highlighted (Root Cause)
  • Trust indicators provided (Discovery)
  • No SQL errors shown
  • No unnecessary questions
  • Clean, actionable results

Stopping Points

Lineage queries are read-only operations that don't modify data or schema. Execute immediately without waiting for confirmation.

⚠️ MANDATORY STOPPING POINTS

STOP and ask user if:

  1. Ambiguous object reference - Missing or unclear database/schema/table/column name

    • Example: User says "check lineage" without specifying which table
    • Action: Ask "Which table would you like to analyze?"
  2. User explicitly requests review - "Show me the query first" or "Let me review before running"

    • Action: Present the query and wait for confirmation
  3. Query returns no results - No lineage data found

    • Action: Explain possible reasons (new object, no access, insufficient privileges) and ask if they want to try a different approach

✅ NO STOPPING REQUIRED

Execute immediately without confirmation:

  • Any SELECT query on SNOWFLAKE.ACCOUNT_USAGE views
  • Any SELECT query on INFORMATION_SCHEMA views
  • Parsing DDL with GET_DDL() function
  • Reading configuration files from config/

Rationale: These are read-only operations that don't modify data, schema, or access controls.


Production Configuration

Schema Pattern Configuration (Extensible):

Trust and risk scoring patterns are defined in config/schema-patterns.yaml. This file is read dynamically at runtime, allowing easy customization without modifying SQL templates.

File: config/schema-patterns.yaml

trust_tiers:
  PRODUCTION:
    score: 100
    patterns:
      - "%ANALYTICS%"
      - "%CURATED%"
      # Add your production schema patterns here
      
  STAGING:
    score: 60
    patterns:
      - "%STAG%"
      # Add your staging schema patterns here
      
  RAW:
    score: 40
    patterns:
      - "%RAW%"
      # Add your raw data schema patterns here
      
  UNTRUSTED:
    score: 20
    patterns:
      - "%SANDBOX%"
      - "%TEST%"
      # Add your dev/test schema patterns here

default:
  score: 50
  tier: "UNKNOWN"

risk_critical_patterns:
  - "%FINANCE%"
  - "%REVENUE%"
  # Add schemas that are critical to flag

Customization: Edit config/schema-patterns.yaml using SQL LIKE syntax (% = wildcard, case-insensitive). Example: %ANALYTICS% matches ANALYTICS, PROD_ANALYTICS, ANALYTICS_V2.

Recursive Depth: Default 3 levels, configurable in templates (WHERE ul.level < 3).

Retention: ACCESS_HISTORY/QUERY_HISTORY: 365 days; GET_LINEAGE: current state; OBJECT_DEPENDENCIES: indefinite (latency on new objects).


Known Limitations

LimitationImpactWorkaround
GET_LINEAGE max 5 levelsDeep chains truncated at 5Use OBJECT_DEPENDENCIES fallback for deeper traversal
ACCOUNT_USAGE latency (45min-3hr)New objects missing from OBJECT_DEPENDENCIESUse GET_LINEAGE first (current state); then GET_DDL() fallback
Column lineage depends on ACCESS_HISTORYNot all queries expose column detailsConfidence scores indicate reliability
Single account scopeCross-account sharing not coveredQuery each account separately
View DDL parsingMay miss dynamic SQL referencesReview complex views manually
OBJECT_DEPENDENCIES = object dependency onlyData movement (e.g. CTAS, COPY) not in OBJECT_DEPENDENCIESUse GET_LINEAGE as primary
Column lineage 90-day lookbackOlder transformations not capturedExtend time range in templates

Installation

Install using the Cortex CLI:

bash
$cortex skill install lineage

Requires Cortex CLI v2.0+. Install guide →

Repository Sources

Loading repository files...

SKILL.md Preview

SKILL.md
---
name: lineage
description: "Analyze data lineage and dependencies in Snowflake. Use for: impact analysis, root cause debugging, data discovery, column-level tracing. Triggers: 'what depends on', 'what breaks', 'where does this come from', 'is this trustworthy', 'column lineage'. For quality issues (missing data, wrong values, DMF failures) use the data_quality skill first, then this skill to trace upstream."
---

# Lineage & Impact Analysis

## When to Use/Load

This skill activates when users need to:
- **Assess impact** of changes to tables/columns (downstream analysis)
- **Trace upstream/downstream** after a data issue is identified (use **data_quality** skill first to find the issue, then this skill for lineage)
- **Discover and verify** trusted datasets (provenance)
- **Trace column-level** dependencies and transformations

**Trigger indicators:** Questions about dependencies, lineage, trust, impact, or data provenance.

**Cross-skill:** If the user asks about **data quality issues** (e.g. "weird data", "missing emails", "wrong numbers", "quality issues", "failing metrics"), load and run the **data_quality** skill first to assess DMF results and column quality; then use this skill to trace upstream lineage. Do not use this skill alone to answer quality questions—it only traces where data flows, not whether values are correct or complete.

## Purpose
Navigate the web of data dependencies to ensure reliability, transparency, and rapid recovery across the data ecosystem.

## How It Works

1. **User provides DATABASE.SCHEMA.TABLE** (e.g., "ANALYTICS_DB.REPORTING.REVENUE_SUMMARY")
2. **Agent identifies workflow** based on trigger phrases and other context
3. **Agent reads workflow file** from `workflows/` directory
4. **Agent executes template** with placeholders replaced
5. **Agent presents clean results** formatted per workflow guidelines

**Execution Approach:** Lineage queries are read-only analysis operations. Execute immediately without confirmation since they don't modify data or objects. If executing a recursive SQL query, notify the user that the analysis may take a while due to the complexity of the query.

---

## The 4 Workflows

### 1. Impact Analysis (Downstream)
**File:** `workflows/impact-analysis.md`
**Question:** *"If I change this, what breaks?"*
**Triggers:** "impact analysis", "what depends on this", "what will break", "downstream", "who uses this"
**Templates:** `impact-analysis.sql`, `impact-analysis-multi-level.sql`
**Output:** Downstream objects with risk tiers, usage frequency, affected users

**Snowflake APIs Used:**
- `SNOWFLAKE.CORE.GET_LINEAGE()` - **Primary:** Object and data-movement lineage (VIEW LINEAGE, no account admin)
- `SNOWFLAKE.ACCOUNT_USAGE.OBJECT_DEPENDENCIES` - **Fallback:** Object dependency graph only (account admin)
- `SNOWFLAKE.ACCOUNT_USAGE.ACCESS_HISTORY` - Actual usage patterns
- `SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY` - User attribution

### 2. Root Cause Analysis (Upstream)
**File:** `workflows/root-cause-analysis.md`
**Question:** *"Why is this number wrong?"* / *"Where does this data come from?"*
**Triggers:** "root cause", "why is this wrong", "trace upstream", "where does this come from", "debug"
**Templates:** `root-cause-analysis.sql`, `change-detection.sql`
**Output:** Upstream lineage, recent changes, divergence points

**Note:** When the user reports **data quality issues** (wrong values, missing emails, failing DMFs), use the **data_quality** skill first to identify what is wrong; then use this workflow to trace upstream to find where the bad data originated.

**Snowflake APIs Used:**
- `SNOWFLAKE.CORE.GET_LINEAGE()` - **Primary:** Upstream lineage (object + data movement)
- `SNOWFLAKE.ACCOUNT_USAGE.OBJECT_DEPENDENCIES` - **Fallback:** Upstream object dependencies
- `SNOWFLAKE.ACCOUNT_USAGE.ACCESS_HISTORY` - Data flow patterns
- `SNOWFLAKE.ACCOUNT_USAGE.TABLES` / `COLUMNS` - Schema change detection
- `SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY` - Recent modifications

### 3. Data Discovery & Trust (Provenance)
**File:** `workflows/data-discovery.md`
**Question:** *"Where did this come from and is it the right tool for the job?"*
**Triggers:** "is this trustworthy", "provenance", "recommend dataset", "which table should I use", "verify source"
**Templates:** `data-discovery.sql`, `provenance-verification.sql`
**Output:** Full lineage path, usage statistics, trust indicators

**Snowflake APIs Used:**
- `SNOWFLAKE.CORE.GET_LINEAGE()` - **Primary:** Full lineage path (object + data movement)
- `SNOWFLAKE.ACCOUNT_USAGE.OBJECT_DEPENDENCIES` - **Fallback:** Full dependency chain
- `SNOWFLAKE.ACCOUNT_USAGE.ACCESS_HISTORY` - Usage patterns
- `SNOWFLAKE.ACCOUNT_USAGE.TABLES` - Object metadata
- `SNOWFLAKE.ACCOUNT_USAGE.TABLE_STORAGE_METRICS` - Data freshness

### 4. Column-Level Lineage
**File:** `workflows/column-lineage.md`
**Question:** *"What uses this column?" / "Where does this column come from?"*
**Triggers:** "column lineage", "what uses [column]", "where does [column] come from", "trace column", "column impact", "column source"
**Templates:** `column-lineage-downstream.sql`, `column-lineage-upstream.sql`, `column-lineage-full.sql`, `column-change-detection.sql`
**Output:** Column-level dependencies, source columns, transformation paths

**Snowflake APIs Used:**
- `SNOWFLAKE.ACCOUNT_USAGE.ACCESS_HISTORY` - Column-level access patterns (columns array in base_objects_accessed/objects_modified)
- `SNOWFLAKE.ACCOUNT_USAGE.COLUMNS` - Column metadata and definitions
- `SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY` - DDL changes affecting columns

---

## Execution Rules

### Step 0: Verify Session Context (once per session)

Run this **once at the start of the first query**. If you have already confirmed warehouse, database, and schema are set earlier in this conversation, skip this step.

Check the current session context:

```sql
SELECT
    CURRENT_USER()      AS current_user,
    CURRENT_ROLE()      AS current_role,
    CURRENT_DATABASE()  AS current_database,
    CURRENT_SCHEMA()    AS current_schema,
    CURRENT_WAREHOUSE() AS current_warehouse;
```

Fix any NULL or mismatched values before continuing:

| Field | Fix if NULL / wrong |
|-------|---------------------|
| `current_warehouse` | `USE WAREHOUSE <name>;` |
| `current_database`  | `USE DATABASE <database>;` |
| `current_schema`    | `USE SCHEMA <database>.<schema>;` |

**⚠️ STOP if warehouse is NULL** — `SNOWFLAKE.ACCOUNT_USAGE` queries require an active warehouse.

### Core Execution Flow:
1. **Extract object identifiers** from user message:
   - Table workflows: `DATABASE.SCHEMA.TABLE`
   - Column workflows: `DATABASE.SCHEMA.TABLE.COLUMN`
2. **Identify workflow** from trigger phrases and the object identifier → read workflow file from `workflows/`
3. **Read SQL template** specified in workflow file
4. **Build dynamic scoring** from `config/schema-patterns.yaml` (for trust/risk placeholders)
5. **Replace placeholders** with actual values
6. **Execute and format** results per workflow guidelines

### Placeholder Replacements:
- `<database>`, `<schema>`, `<table>`, `<column>` → actual object names
- `/* SCHEMA_TRUST_SCORING:column */` → dynamic CASE statement from config
- `/* SCHEMA_TRUST_TIER:column */` → dynamic tier name CASE statement  
- `/* SCHEMA_RISK_SCORING:column */` → dynamic risk CASE statement

### Key Principles:
- **Use templates exactly as written** - only replace placeholders
- **No confirmation prompts** for read-only lineage queries
- **Handle errors gracefully** - use fallback templates, provide clear messages
- **One workflow per request** - don't chain multiple analyses automatically

---

## Template Structure

All templates in `templates/` directory use these placeholders:
- `<database>` → Replace with actual database name
- `<schema>` → Replace with actual schema name
- `<table>` → Replace with actual table name
- `<column>` → Replace with actual column name (for column-level lineage)
- `/* SCHEMA_TRUST_SCORING:column_name */` → Dynamic CASE statement returning score (integer)
- `/* SCHEMA_TRUST_TIER:column_name */` → Dynamic CASE statement returning tier name (string)
- `/* SCHEMA_RISK_SCORING:column_name */` → Dynamic CASE statement returning 'CRITICAL' or NULL

**Example:**
```sql
-- Template:
WHERE REFERENCED_DATABASE = '<database>' 
  AND REFERENCED_SCHEMA = '<schema>' 
  AND REFERENCED_OBJECT_NAME = '<table>'

-- After replacement:
WHERE REFERENCED_DATABASE = 'ANALYTICS_DB' 
  AND REFERENCED_SCHEMA = 'REPORTING' 
  AND REFERENCED_OBJECT_NAME = 'SALES_SUMMARY'
```

---

## Dynamic Trust Scoring

Templates use dynamic placeholders for trust/risk scoring. See `reference/dynamic-trust-scoring.md` for complete documentation on:
- Placeholder syntax (`/* PLACEHOLDER_TYPE:column_name */`)
- Building CASE statements from `config/schema-patterns.yaml`
- Why dynamic scoring enables customer customization

---

## Error Handling & Fallback Strategy

**Lineage source order (static dependencies):**

1. **Primary:** `SNOWFLAKE.CORE.GET_LINEAGE()` — object and data-movement lineage; requires only object-resolve + VIEW LINEAGE (granted to public). Prefer for all table/column lineage.
2. **Fallback 1:** `SNOWFLAKE.ACCOUNT_USAGE.OBJECT_DEPENDENCIES` — object dependency only; requires account admin. Use **only** when the primary query **fails** (e.g. privilege error) or when you have strong reason to expect lineage but GET_LINEAGE returned empty (e.g. object is known to have dependents).
3. **Fallback 2:** `GET_DDL()` to parse view definitions for references
4. **Fallback 3:** `INFORMATION_SCHEMA.OBJECT_DEPENDENCIES` (current DB only)

**Minimize steps:** Run the primary template once. If it **succeeds** (with or without rows), present the result and **stop**—do not run the fallback. Use fallback only when the primary **errors** or when the object clearly should have lineage (e.g. a view that references other tables) but GET_LINEAGE returned 0 rows. Zero rows from GET_LINEAGE is often a valid answer (e.g. table has no downstream).

**Why GET_LINEAGE first:** OBJECT_DEPENDENCIES captures only *object dependency* (target data depends on source). GET_LINEAGE also captures *data movement* (e.g. COPY INTO, CTAS) and avoids account admin (VIEW LINEAGE is granted to public).

**Fallback templates (use only when primary fails or expected lineage is missing):**
- `impact-analysis-object-deps-fallback.sql`, `impact-analysis-multi-level-object-deps-fallback.sql`, `impact-analysis-users-object-deps-fallback.sql`
- `root-cause-analysis-object-deps-fallback.sql`, `change-detection-object-deps-fallback.sql`
- `provenance-verification-object-deps-fallback.sql`

**Further fallbacks (DDL / INFORMATION_SCHEMA):**
- `impact-analysis-fallback.sql` — DDL/INFORMATION_SCHEMA for downstream deps
- `root-cause-ddl-fallback.sql` — DDL parsing for upstream lineage

**If template fails:**
1. Run primary (GET_LINEAGE) once. On **success** (any row count), present results and stop. On **failure** (error) or **empty when lineage is expected**, try OBJECT_DEPENDENCIES fallback, then GET_DDL/INFORMATION_SCHEMA if needed.
2. Check ACCESS_HISTORY availability: `check-access-history.sql`
3. Check object existence: `check-object-exists.sql`
4. If object doesn't exist: "Object not found. Check the name and try again."
5. If no lineage data: "No lineage data available. Object may be new or unused."

**Common Issues:**
- "No ACCESS_HISTORY data" → Data is older than 365 days or not accessed
- "GET_LINEAGE returns empty" → Use fallback only if the object should have lineage (e.g. view with references); otherwise report "No lineage found."
- "Insufficient privileges" → GET_LINEAGE needs VIEW LINEAGE (public); OBJECT_DEPENDENCIES needs account admin

---

## Expected User Experience

### Example 1: Impact Analysis
**User:** "What will break if I change RAW_DB.SALES.ORDERS?"

**Agent (immediately):**
```
Impact Analysis: RAW_DB.SALES.ORDERS

═══════════════════════════════════════════════════════════════
CRITICAL RISK (2 objects)
═══════════════════════════════════════════════════════════════
1. ANALYTICS_DB.REPORTING.DAILY_REVENUE (Dynamic Table)
   Risk: CRITICAL | Refresh: Every 15 min | Users: 12 in last 7 days
   → Feeds 3 downstream objects including executive dashboard

2. FINANCE_DB.REPORTS.AR_AGING (View)
   Risk: CRITICAL | Queries: 89/day | Users: 5 in last 7 days
   → Used for month-end close process

═══════════════════════════════════════════════════════════════
MODERATE RISK (1 object)
═══════════════════════════════════════════════════════════════
3. STAGING_DB.TRANSFORM.ORDERS_ENRICHED (Table)
   Risk: MODERATE | Last updated: 2024-01-15 | Users: 2 in last 7 days

Summary: 3 downstream dependencies | 2 CRITICAL | 1 MODERATE
Affected Users: 15 unique users in last 7 days
```

### Example 2: Root Cause Analysis
**User:** "Why is ANALYTICS_DB.REPORTING.REVENUE wrong?"

**Agent (immediately):**
```
Root Cause Analysis: ANALYTICS_DB.REPORTING.REVENUE

═══════════════════════════════════════════════════════════════
UPSTREAM LINEAGE (3 levels)
═══════════════════════════════════════════════════════════════
Level 1: STAGING_DB.CLEAN.ORDERS_FACT
Level 2: RAW_DB.INGEST.ORDERS, RAW_DB.INGEST.PRODUCTS
Level 3: External Stage (S3://data-lake/orders/)

═══════════════════════════════════════════════════════════════
RECENT CHANGES DETECTED
═══════════════════════════════════════════════════════════════
⚠️  RAW_DB.INGEST.ORDERS - Schema changed 2 hours ago
    Change: Column DISCOUNT_PCT type changed NUMBER(5,2) → NUMBER(10,4)
    Changed by: ETL_SERVICE_USER

⚠️  STAGING_DB.CLEAN.ORDERS_FACT - Data modified 1 hour ago
    Rows affected: 15,234 rows updated
    Query: MERGE statement by TRANSFORM_TASK

Most Likely Cause: Schema change in RAW_DB.INGEST.ORDERS
Recommendation: Verify DISCOUNT_PCT handling in downstream transforms
```

### Example 3: Data Discovery
**User:** "Which table should I use for customer revenue analysis?"

**Agent (immediately):**
```
Data Discovery: Customer Revenue Analysis

RECOMMENDED:
1. ANALYTICS_DB.CURATED.CUSTOMER_REVENUE_SUMMARY ⭐ Best Match
   Trust Score: 95% | Updated hourly | Users: 45/week
   Why: Pre-aggregated customer revenue with standard dimensions
   Provenance: RAW.ORDERS → STAGING.ORDERS_FACT → This table

Not Recommended:
- RAW_DB.ORDERS (Trust: 60%) - Raw data, requires transformation
- SANDBOX_DB.*.* - Development/test data
```

### Example 4: Column-Level Lineage
**User:** "What uses the AMOUNT column in RAW_DB.SALES.ORDERS?"

**Agent (immediately):**
```
Column Impact Analysis: RAW_DB.SALES.ORDERS.AMOUNT

═══════════════════════════════════════════════════════════════
DOWNSTREAM COLUMNS (8 columns use this data)
═══════════════════════════════════════════════════════════════

CRITICAL IMPACT:
1. ANALYTICS_DB.REPORTING.REVENUE_SUMMARY.TOTAL_REVENUE
   Impact: CRITICAL | Queries: 89/day | Confidence: HIGH
   Transformation: SUM(AMOUNT) aggregation
   
2. FINANCE_DB.REPORTS.AR_AGING.OUTSTANDING_AMOUNT
   Impact: CRITICAL | Queries: 45/day | Confidence: HIGH
   Transformation: Direct reference with filters

HIGH IMPACT:
3. STAGING_DB.TRANSFORM.ORDERS_ENRICHED.NET_AMOUNT
   Impact: HIGH | Queries: 23/day | Confidence: HIGH
   Transformation: AMOUNT * (1 - DISCOUNT_PCT/100)

MODERATE IMPACT:
4. ANALYTICS_DB.MARTS.CUSTOMER_360.LIFETIME_VALUE
   Impact: MEDIUM | Queries: 12/day | Confidence: MEDIUM

Summary: 8 downstream columns | 2 CRITICAL | 1 HIGH | 5 MEDIUM
Recommendation: Coordinate with Finance team before changing
```

### Example 5: Column Source Tracing
**User:** "Where does ANALYTICS_DB.REPORTS.REVENUE.TOTAL_SALES come from?"

**Agent (immediately):**
```
Column Source Analysis: ANALYTICS_DB.REPORTS.REVENUE.TOTAL_SALES

═══════════════════════════════════════════════════════════════
UPSTREAM SOURCES (traced 3 levels)
═══════════════════════════════════════════════════════════════

Level 1 (Direct Source):
  STAGING_DB.TRANSFORM.ORDERS_AGG.REVENUE_SUM
  Confidence: HIGH | Last seen: 2 hours ago
  Transformation: Renamed column

Level 2:
  RAW_DB.INGEST.ORDERS.AMOUNT
  Confidence: HIGH | Source tier: RAW
  Transformation: SUM() aggregation

Level 3 (Origin):
  @RAW_DB.STAGES.S3_ORDERS/orders.csv
  Confidence: MEDIUM | Source tier: EXTERNAL

Complete Path:
S3_ORDERS → ORDERS.AMOUNT → ORDERS_AGG.REVENUE_SUM → REVENUE.TOTAL_SALES
```

---

## Summary Table

| Workflow | Direction | Primary Goal | Key Stakeholder |
|:---------|:----------|:-------------|:----------------|
| **Impact Analysis** | Downstream | Risk Mitigation | Data Engineers / Ops |
| **Root Cause** | Upstream | Troubleshooting | Analysts / Analytics Engineers |
| **Trust & Discovery** | Full Path | Data Literacy | Business Users / Platform Owners |
| **Column Lineage** | Both | Field-Level Tracing | Data Engineers / Analysts |

---

## Workflow Selection Logic

| User Says | Workflow | Template |
|-----------|----------|----------|
| "What will break if I change [table]?" | Impact Analysis | `impact-analysis.sql` |
| "What depends on [table]?" | Impact Analysis | `impact-analysis.sql` |
| "Why is this number wrong?" | Root Cause Analysis | `root-cause-analysis.sql` |
| "Where does [table] come from?" | Root Cause Analysis | `root-cause-analysis.sql` |
| "Is [table] trustworthy?" | Data Discovery | `data-discovery.sql` |
| "Which table should I use for [topic]?" | Data Discovery | `data-discovery.sql` |
| "What uses the [column] column?" | Column Lineage | `column-lineage-downstream.sql` |
| "Where does [column] come from?" | Column Lineage | `column-lineage-upstream.sql` |
| "Full lineage for [column]" | Column Lineage | `column-lineage-full.sql` |
| "Has [column] changed?" | Column Lineage | `column-change-detection.sql` |

---

## Snowflake APIs Reference

See `reference/snowflake-apis.md` for complete documentation on:
- `SNOWFLAKE.CORE.GET_LINEAGE()` (primary) vs `ACCOUNT_USAGE.OBJECT_DEPENDENCIES` (fallback)
- Object dependency vs data-movement lineage and privilege requirements
- ACCOUNT_USAGE views and their latencies
- Performance optimization tips

---

## Success Criteria

- User gets answer in **one response**
- Risk tiers clearly communicated (Impact Analysis)
- Recent changes highlighted (Root Cause)
- Trust indicators provided (Discovery)
- No SQL errors shown
- No unnecessary questions
- Clean, actionable results

---

## Stopping Points

Lineage queries are **read-only operations** that don't modify data or schema. Execute immediately without waiting for confirmation.

### ⚠️ MANDATORY STOPPING POINTS

**STOP and ask user if:**
1. **Ambiguous object reference** - Missing or unclear database/schema/table/column name
   - Example: User says "check lineage" without specifying which table
   - Action: Ask "Which table would you like to analyze?"

2. **User explicitly requests review** - "Show me the query first" or "Let me review before running"
   - Action: Present the query and wait for confirmation

3. **Query returns no results** - No lineage data found
   - Action: Explain possible reasons (new object, no access, insufficient privileges) and ask if they want to try a different approach

### ✅ NO STOPPING REQUIRED

**Execute immediately without confirmation:**
- Any SELECT query on `SNOWFLAKE.ACCOUNT_USAGE` views
- Any SELECT query on `INFORMATION_SCHEMA` views
- Parsing DDL with `GET_DDL()` function
- Reading configuration files from `config/`

**Rationale:** These are read-only operations that don't modify data, schema, or access controls.

---

## Production Configuration

**Schema Pattern Configuration (Extensible):**

Trust and risk scoring patterns are defined in `config/schema-patterns.yaml`. This file is read dynamically at runtime, allowing easy customization without modifying SQL templates.

**File:** `config/schema-patterns.yaml`

```yaml
trust_tiers:
  PRODUCTION:
    score: 100
    patterns:
      - "%ANALYTICS%"
      - "%CURATED%"
      # Add your production schema patterns here
      
  STAGING:
    score: 60
    patterns:
      - "%STAG%"
      # Add your staging schema patterns here
      
  RAW:
    score: 40
    patterns:
      - "%RAW%"
      # Add your raw data schema patterns here
      
  UNTRUSTED:
    score: 20
    patterns:
      - "%SANDBOX%"
      - "%TEST%"
      # Add your dev/test schema patterns here

default:
  score: 50
  tier: "UNKNOWN"

risk_critical_patterns:
  - "%FINANCE%"
  - "%REVENUE%"
  # Add schemas that are critical to flag
```

**Customization:** Edit `config/schema-patterns.yaml` using SQL LIKE syntax (% = wildcard, case-insensitive). Example: `%ANALYTICS%` matches ANALYTICS, PROD_ANALYTICS, ANALYTICS_V2.

**Recursive Depth:** Default 3 levels, configurable in templates (`WHERE ul.level < 3`).

**Retention:** ACCESS_HISTORY/QUERY_HISTORY: 365 days; GET_LINEAGE: current state; OBJECT_DEPENDENCIES: indefinite (latency on new objects).

---

## Known Limitations

| Limitation | Impact | Workaround |
|------------|--------|------------|
| GET_LINEAGE max 5 levels | Deep chains truncated at 5 | Use OBJECT_DEPENDENCIES fallback for deeper traversal |
| ACCOUNT_USAGE latency (45min-3hr) | New objects missing from OBJECT_DEPENDENCIES | Use GET_LINEAGE first (current state); then GET_DDL() fallback |
| Column lineage depends on ACCESS_HISTORY | Not all queries expose column details | Confidence scores indicate reliability |
| Single account scope | Cross-account sharing not covered | Query each account separately |
| View DDL parsing | May miss dynamic SQL references | Review complex views manually |
| OBJECT_DEPENDENCIES = object dependency only | Data movement (e.g. CTAS, COPY) not in OBJECT_DEPENDENCIES | Use GET_LINEAGE as primary |
| Column lineage 90-day lookback | Older transformations not captured | Extend time range in templates |

Tags

#lineage

Related Skills

Browse all
v1.0.0

dcm

cortex-community
trending
Use for ALL requests that mention: create, build, set up, debug, fix, troubleshoot, optimize, improve, evaluate, or analyze a DCM project. This is the REQUIRED entry point - even if the request seems simple. DO NOT attempt to create DCM projects manually or search for DCM documentation - always invoke this skill first. This skill guides users through creating, auditing, evaluating, and debugging workflows for DCM (Database Change Management) projects. Triggers: DCM, DCM project, Database Change Management, snow dcm, manifest.yml with DEFINE, infrastructure-as-code, three-tier role pattern, database roles, DEFINE TABLE, DEFINE SCHEMA.
#dcm
Cortex
821.1k
2 hours ago
v1.0.0

snowflake-postgres

cortex-community
[REQUIRED] Use for ALL requests involving Snowflake Postgres: create instance, list instances, suspend, resume, reset credentials, describe instance, import connection, health check, diagnostics, pg_lake, iceberg tables, data lake, storage integration. Triggers: 'postgres', 'pg', 'create instance', 'show instances', 'suspend', 'resume', 'reset credentials', 'rotate password', 'reset access', 'import connection', 'network policy', 'my IP', 'health check', 'diagnose', 'insights', 'pg_doctor', 'slow queries', 'cache hit', 'bloat', 'vacuum', 'dead rows', 'locks', 'blocking queries', 'blocked', 'disk usage', 'what's running', 'active queries', 'connection count', 'pg_lake', 'iceberg', 'data lake', 'storage integration', 'parquet', 'COPY to S3', 'export to S3', 'lake'.
#snowflake-postgres
Cortex
362.9k
2 hours ago
v1.0.0

organization-management

cortex-community
Snowflake organization management — accounts, org users, org insights, org spending, org security, globalorgadmin. ORGANIZATION_USAGE views, cross-account analytics, org-wide metrics. Use when the user asks about: 30 day summary of my organization, 30-day summary, 30 day summary, accounts in my organization, list accounts, how many accounts, account editions, account regions, account inventory, organization users, organization user groups, executive summary of my org, org overview, org spending, org cost, org security posture, org reliability, org auth posture, org hub, org usage views, trust center, MFA readiness, login failures, warehouse credits, storage trends, edition distribution, who has globalorgadmin, what is globalorgadmin, globalorgadmin role, orgadmin role, organization administrator, org admin, enable orgadmin, disable orgadmin, org admin permissions, account admins, ORGANIZATION_USAGE, org-level, cross-account, org-wide.
#organization-management
Cortex
5015.0k
2 hours ago
v1.0.0

snowflake-notebooks

cortex-community
Create and edit Workspace notebooks (.ipynb files) for Snowflake. Use when: creating workspace notebooks, editing notebooks, debugging notebook issues, converting code to notebooks, working with Snowpark, SQL cells, or data analysis in Snowflake. Triggers: notebook, .ipynb, snowflake notebook, workspace notebook, create notebook, edit notebook, jupyter, ipynb file, notebook cell, SQL cell, snowpark session.
#snowflake-notebooks
Cortex
4822.0k
2 hours ago

Details

Stars83
Installs4.3k
Authorcortex-community
Versionv1.0.0
Updated2 hours ago
LicenseMIT
View Repository
quick install
$ cortex skill install lineage