How to use prompts for dbt Copilot
Overview
Learn how to write effective prompts for dbt Copilot to generate accurate SQL, models, metrics, and macros. Each recipe is self-contained with its own realistic example.
dbt Copilot is an AI assistant that generates SQL, YAML, documentation, tests, semantic models, and macros based on your project's context. The quality of output depends on the clarity of your prompts.
This cookbook provides independent recipes for common prompting tasks. Jump to any section that matches your current need.
This cookbook covers the following topics:
- Prompt best practices
- Generate SQL queries
- Leveraging external assets
- Create semantic models and metrics
- Create reusable macros
- Troubleshoot errors and issues
- Conclusion
Prompt best practices
Writing effective prompts is about giving Copilot the right context and clear direction. Follow these principles:
- Provide rich context
- State the business question, not just the output
- Be clear and explicit about the result
- Break complex logic into smaller steps
Provide rich context
In your prompt, include table names, column types, and example values to describe how they relate to each other.
Include the following:
- Table relationships (such as
ordersjoins tocustomersoncustomer_id) - Data types (such as
created_atis a timestamp) - Sample values (such as
statusis a string with values like "active" or "pending")
Example: Santi's neighborhood café
Let's say you run a neighborhood café and folks get a free drink after 10 visits:
Without rich context (vague):
I need a query using customers, subscriptions, and activity tables to see weekly regulars.
With rich context (specific):
Context: I run a café loyalty program where customers earn a free drink after 10 visits.
Tables and relationships:
- customers (customer_id INT or integer, name STRING, email STRING, signup_date TIMESTAMP)
- subscriptions (subscription_id INT, customer_id INT, plan_type STRING, start_date DATE, end_date DATE)
- Joins to customers on customer_id
- plan_type values: "monthly", "annual", null for non-subscribers
- activity (activity_id INT, customer_id INT, visit_date DATE, visit_count INT)
- Joins to customers on customer_id
- visit_count tracks cumulative visits (resets after redemption)
Business question: Show me which customers visit weekly (3+ times per week for 4+ weeks)
and compare conversion rates: do high-frequency punch-card users convert to our
'beans of the month' subscription at a higher rate than casual visitors?
Why it works: The AI now knows exact data types, how tables relate, what values to expect, and the specific business logic (3+ visits/week defines "regulars").
State the business question, not just the output
Describe the decision or insight the query supports. Avoid purely technical prompts.
Instead of: "Count users"
Say: "Count active users per week to analyze engagement trends"
Example: The sneaker drop
Let's say you run an online sneaker shop and just launched a new feature: customers can view 3D previews of sneakers before buying.
We launched a 3D preview feature with our latest limited-edition sneaker drop.
Did customers who used the 3D preview convert to buyers at a higher rate than
those who only saw photos?
Show me weekly conversion rates: browsers who became buyers, segmented by whether
they used the 3D preview. If preview users convert 20%+ higher, we'll add 3D
to all products. If not, we'll improve the feature before expanding.
Why it works: You've described the feature, the behavior you're measuring, specific success criteria (20%+ lift), and the decision you'll make based on results.
Be clear and explicit about the result
Define the expected output clearly. Mention the expected columns in the final result and state whether results should be sorted, limited, or filtered.
What to specify:
- Expected column names and formats
- Sort order and any limits (for example, "top 10 products by revenue")
- Output format examples (for example, "
conversion_rateas a percentage")
Example: The fitness challenge
In this example, you run a fitness app with a 2-week challenge, Kimiko's kettlebell challenge.
Give me a weekly trend with the date, active folks, and a simple 'engagement per person.'
Then a summary by launch week with 'trial starts,' 'upgrades in 30 days,'
and an 'upgrade rate' as a percentage.
Each week, show active challengers and total workouts. By challenge start week,
show how many upgraded to paid within 30 days and what their average workouts looked like.
Why it works: Specific metrics that are ready to present.
Break complex logic into smaller steps
For multi-part queries, write them as a sequence of instructions:
Example:
First, filter the dataset to active users in the last 90 days.
Then, calculate their average session duration.
Finally, join to subscription data and group by plan tier.
We recommend you avoid asking for everything at once. You can always iterate on your prompt to get better results.
Generate SQL queries
Use case: Build a customers model for an e-commerce platform.
What to give Copilot:
Context:
We have customers, orders, and payments tables.
- customers (customer_id, first_name, last_name, email)
- orders (order_id, customer_id, order_date, status)
- payments (payment_id, order_id, amount, payment_method)
Relationships:
- orders.customer_id → customers.customer_id
- payments.order_id → orders.order_id
Sample values:
- status in ('completed', 'pending', 'cancelled')
- payment_method in ('credit_card', 'paypal', 'bank_transfer')
Business question:
Create a customer summary showing total orders and revenue per customer.
Output:
- customer_id
- first_name
- last_name
- first_order_date
- most_recent_order_date
- total_orders
- total_revenue
Filter: Only completed orders
Sort: total_revenue descending
Why it works: You're giving Copilot a clear map of how data connects, what values to expect, and what decision this supports.
Pro tip: Start simple, then iterate. If Copilot's first attempt isn't perfect, refine your prompt with more specific details.
Use what you already have
You don't need to write everything from scratch. Pull in documentation, definitions, and sample data you already have—it helps Copilot understand your specific business context.
Define your business rules
Instead of just saying "active customer," explain the rule:
Active customer = at least one paid purchase in the last 90 days, excluding refunds
Net revenue = gross sales minus discounts and returns
Pull from: Metrics glossaries, KPI catalogs, product requirement docs, data dictionaries
Show sample values
Give Copilot examples of what the data actually looks like, especially edge cases:
Order statuses:
- `customer_id: C-12, created_at: 2025-05-03T090:07:00Z, status: 'completed'`
- `customer_id: C-14, created_at: 2025-05-03T09:02:00Z, status: 'cancelled'`
- `customer_id: C-13, created_at: 2020-01-02T06:40:00Z, status: 'pending'`
Pull from: Data profiling reports, QA test datasets, BI dashboard filters
Start with a draft, refine later
Frame your model first, then iterate. Start with a clean outline that gets the basic structure right:
From stg_orders and dim_customers, draft a minimal model with order_id, customer_id,
order_date, net_revenue = gross - coalesce(discount, 0), and join to dim_customers
on customer_id. Filter to the last 30 days for preview only.
Pull from: Source-to-target mapping sheets (join keys and transformations), data dictionaries (primary and foreign keys)
Create semantic models and metrics
Use case: Fast-track your semantic layer strategy with AI-generated YAML.
What to give Copilot:
Create a semantic model for order revenue tracking.
Base model: {{ ref('fct_orders') }}
Available columns:
- order_date (timestamp)
- order_amount (decimal)
- customer_id (integer)
- product_id (integer)
- region (string - 'north', 'south', 'east', 'west')
Requirements:
- Entity: customer
- Calculate: total_revenue (sum of order_amount) and order_count
- Dimensions: region, order_date as metric_time (support day, week, month)
- Metric: monthly_revenue (total revenue by month)
Return valid YAML with descriptions.
What Copilot generates:
- Valid semantic model YAML structure
- Properly defined entities, dimensions, and metrics
- Time dimension with multiple grains
- Metric definitions
Pro tip: Use Copilot to reduce time spent writing boilerplate YAML. It leverages context from common metrics and dimensions across projects to ensure consistency.
Create reusable macros
In this section, we'll look at how to create reusable macros using Copilot.
Turn repetitive code into reusable logic
A junior analyst keeps copy-pasting CASE statements across models.
What to give Copilot:
Turn this CASE pattern into a reusable macro:
CASE
WHEN amount >= 1000 THEN 'high'
WHEN amount >= 500 THEN 'medium'
ELSE 'low'
END
Macro requirements:
- Name: categorize_amount
- Parameters: column name, high threshold (default 1000), medium threshold (default 500)
- Include docstring with usage example
- Handle null values by returning 'unknown'
Why it works: Clear input (the CASE statement), clear requirements, clear output expectations.
Lower the barrier to entry
Scenario: You need a macro but don't know Jinja syntax well.
What to ask Copilot:
I need a macro that calculates the number of days between two date columns,
excluding weekends.
Parameters:
- start_date_column (required)
- end_date_column (required)
Include a docstring explaining how to use it.
Outcome: Copilot generates proper Jinja syntax, handles parameters, and includes documentation. You learn Jinja patterns while getting working code.
Accelerate complex logic design
This is best for advanced users who are comfortable with Jinja.
What to ask Copilot:
I need a macro that builds a grouped aggregation with optional filters.
Parameters:
- relation (the model/table to query)
- group_by (list of columns to group by)
- metrics (list of columns to aggregate)
- where (optional filter condition)
Include defaults and guardrails for empty lists.
Add a docstring with parameter descriptions and usage example.
Troubleshoot errors and issues
Copilot acts as a fast, context-aware reviewer for failing SQL and macros. It reads errors, inspects your query structure, and suggests minimal fixes. Troubleshooting with Copilot gives you:
- Faster diagnosis by using plain-language translation of errors with likely root causes
- Safer fixes by biasing toward small, targeted changes
- Better learning by generating explanations you can paste into docs or PR descriptions
Troubleshoot errors
When something breaks, give Copilot the error message, your code, and what you expected to happen. Here are a couple of examples to show you how to use Copilot to troubleshoot errors.
Example: SQL error
Error: "SQL compilation error: Column 'product_name' must appear in GROUP BY"
Query:
SELECT
product_id,
product_name,
SUM(quantity) as total_quantity
FROM inventory
GROUP BY product_id
Warehouse: Snowflake
Expected: Group by product and show product name. What's wrong and how do I fix it?
Example: Macro not working
This macro should calculate discount but returns wrong values:
{% macro calculate_discount(amount, rate) %}
{{ amount }} * {{ rate }}
{% endmacro %}
When I call {{ calculate_discount(100, 0.1) }} I expect 10 but get an error.
Show me the rendered SQL from target/compiled and explain what's wrong.
Tip: Include your warehouse type (Snowflake, BigQuery, Databricks and so on.) — this is because the syntax can vary across data platforms.
Conclusion
You've now learned how to use dbt Copilot as your AI co-pilot. You can:
- Master SQL prompting by providing rich context and stating clear business questions
- Amplify your workflow by leveraging existing documentation and project context
- Generate Jinja macros to build more scalable and maintainable systems
- Troubleshoot your code to diagnose issues fast and apply safe, explainable fixes
Quick reference checklist
When writing prompts for dbt Copilot:
- ✅ Provide rich context: Table names, columns, data types, relationships, sample values
- ✅ State the business question: What decision or insight you're supporting, not just "write a query"
- ✅ Be clear and explicit: Expected columns, sort order, filters, and output format
- ✅ Break down complex logic: Write multi-part queries as a sequence of steps
For troubleshooting:
- ✅ Include complete error messages: Full warehouse error with line numbers
- ✅ Show the failing code: Both the dbt model and compiled SQL (from
target/compiled/) - ✅ Provide sample data: Representative rows that trigger the issue
- ✅ State your warehouse: Snowflake, BigQuery, Databricks, etc.
Next steps
Start with one task—automating documentation, generating a test, or refactoring a model—and build the habit from there. The more you use Copilot, the more you'll discover ways to accelerate your analytics engineering workflow.
Was this page helpful?
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.