Quick Start (Copy/Paste)
Copy the block below and paste it into your coding agent to install and verify setup automatically.
Copy
Ask AI
Install the SourceMedium BigQuery analyst skill:
1. Run: npx skills add source-medium/skills --skill sm-bigquery-analyst
2. Read the installed SKILL.md to understand the workflow
3. Run the setup verification commands to check my BigQuery access
My tenant ID is: [your-tenant-id]
My project is: sm-[your-tenant-id]
Install
Copy
Ask AI
npx skills add source-medium/skills --skill sm-bigquery-analyst
What this skill does
- Setup verification — validates gcloud/bq CLI, authentication, and table access
- Safe queries — SELECT-only, dry-run first, cost-guarded
- SQL receipts — every answer includes copy/paste SQL + verification command
- No fabrication — if access fails, returns exact error and access request template
Example Questions
After installing, ask your coding agent:Copy
Ask AI
What was my revenue by channel last month?
Copy
Ask AI
Show me new customer acquisition by source over the past 30 days
Copy
Ask AI
What's my customer LTV by acquisition cohort? Use sm_order_line_type = 'all_orders'
SKILL.md
Copy everything below into.claude/skills/sm-bigquery-analyst/SKILL.md (or equivalent for your agent):
Copy
Ask AI
---
name: sm-bigquery-analyst
description: Query SourceMedium-hosted BigQuery safely. Emits SQL receipts. SELECT-only, cost-guarded. Use when users need help with BigQuery setup, access verification, or analytical questions against SourceMedium datasets.
compatibility: Requires gcloud CLI, bq CLI, and network access to BigQuery.
metadata:
author: sourcemedium
version: "1.0"
---
# SourceMedium BigQuery Analyst
Use this skill to help end users work with SourceMedium BigQuery data from setup to analysis.
## Workflow
1. **Verify environment** (run these before any analysis)
2. Confirm project and dataset/table visibility
3. Use docs-first guidance for definitions and table discovery
4. Answer analytical questions with reproducible SQL receipts
5. Call out assumptions and caveats explicitly
## Setup Verification
Run these commands in order before writing analysis SQL:
~~~bash
# 1. Check CLI tools are installed
gcloud --version && bq version
# 2. Check authenticated account
gcloud auth list
# 3. Check active project
gcloud config get-value project
# 4. Validate BigQuery API access (dry-run)
bq query --use_legacy_sql=false --dry_run 'SELECT 1 AS ok'
# 5. Test table access (your project is named sm-<tenant_id>)
# Example: if your tenant is "acme-corp", your project is sm-acme-corp
bq query --use_legacy_sql=false --dry_run "
SELECT 1
FROM \`sm-<tenant_id>.sm_transformed_v2.obt_orders\`
LIMIT 1
"
# 6. Confirm you can actually read data (not just dry-run)
bq query --use_legacy_sql=false "
SELECT 1
FROM \`sm-<tenant_id>.sm_transformed_v2.obt_orders\`
WHERE is_order_sm_valid = TRUE
LIMIT 1
"
~~~
If any step fails, direct the user to request access from their internal admin.
## Safety Rules
These are hard constraints. Do not bypass.
### Query Safety
1. **SELECT-only** — deny: INSERT, UPDATE, DELETE, MERGE, CREATE, DROP, EXPORT, COPY
2. **Dry-run first** when iterating on new queries:
~~~bash
bq query --use_legacy_sql=false --dry_run '<SQL>'
~~~
3. **Enforce cost limit** with maximum bytes billed:
~~~bash
bq query --use_legacy_sql=false --maximum_bytes_billed=1073741824 '<SQL>'
~~~
(1GB = 1073741824 bytes. If it fails due to bytes billed, tighten filters or ask for approval.)
4. **Always bound queries**:
- Add `LIMIT` clause (max 100 rows for exploratory)
- Use date/partition filters when querying partitioned tables
- Prefer `WHERE` filters on partition columns
### Data Safety
1. **Default to aggregates** — avoid outputting raw rows unless explicitly requested
2. **PII handling**:
- Do not output columns likely containing PII (email, phone, address, name) without explicit confirmation
- If PII is requested, confirm scope and purpose before proceeding
- Prefer anonymization. Example:
~~~sql
-- Hash PII instead of exposing raw values
SELECT TO_HEX(SHA256(LOWER(email))) AS email_hash, ...
~~~
### Cost Guardrails
~~~sql
-- Good: bounded scan
SELECT ... FROM `sm-<tenant_id>.sm_transformed_v2.obt_orders`
WHERE DATE(order_processed_at_local_datetime) >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
LIMIT 100
-- Bad: full table scan
SELECT ... FROM `sm-<tenant_id>.sm_transformed_v2.obt_orders` -- no filters
~~~
## Output Contract
For analytical questions, always return:
1. **Answer** — concise plain-English conclusion
2. **SQL (copy/paste)** — BigQuery Standard SQL used for the result
3. **Notes** — timeframe, metric definitions, grain, scope, timezone, attribution lens
4. **Verify** — `bq query --use_legacy_sql=false --dry_run '<SQL>'` command
5. **Bytes scanned** — if >1GB, note this and ask for approval before running
If access/setup fails, do not fabricate results. Return:
1. Exact failing step
2. Exact project/dataset that failed
3. Direct user to the BigQuery access request template
## Query Guardrails
1. Fully qualify tables as `` `sm-<tenant_id>.dataset.table` ``
2. For order analyses, default to `WHERE is_order_sm_valid = TRUE`
3. Use `sm_store_id` (not `smcid` — that name does not exist in customer tables)
4. Use `SAFE_DIVIDE` for ratio math
5. Handle DATE/TIMESTAMP typing explicitly (`DATE(ts_col)` when comparing to dates)
6. Use `order_net_revenue` for revenue metrics (not `order_gross_revenue` unless explicitly asked)
7. **Prefer `*_local_datetime` columns** when available for date-based reporting; otherwise be explicit about UTC vs local
8. **For enumerations** (channel, platform, status), discover values with `SELECT DISTINCT` first, then use exact match. Reserve `LIKE`/`REGEXP` for free-text fields (`utm_campaign`, `product_title`, `page_path`)
9. **LTV tables (`rpt_cohort_ltv_*`)**: always filter `sm_order_line_type` to exactly ONE value
## Key Tables
| Table | Grain | Use case |
|-------|-------|----------|
| `obt_orders` | 1 row per order | Revenue, profitability, channel analysis |
| `obt_order_lines` | 1 row per line item | Product performance, margins, COGS |
| `obt_customers` | 1 row per customer | Acquisition, retention, subscription status |
| `rpt_ad_performance_daily` | 1 row per channel/date | Ad spend, impressions, clicks |
| `rpt_cohort_ltv_*` | 1 row per cohort x month | LTV analysis (filter sm_order_line_type!) |
## Project & Filtering
**Project naming:** Customer projects are named `sm-<tenant_id>`. Example: `sm-acme-corp`.
**Multi-store filtering:** If you have multiple stores, filter by `sm_store_id` to analyze a single store. Without this filter, all stores' data is combined.
**Sales channels:** Always filter or group by `sm_channel` for segmentation:
- `online_dtc` — Direct-to-consumer website
- `amazon` — Amazon marketplace
- `tiktok_shop` — TikTok Shop
## Example Queries
### Daily revenue by channel
~~~sql
SELECT
DATE(order_processed_at_local_datetime) AS order_date,
sm_channel,
COUNT(sm_order_key) AS order_count,
SUM(order_net_revenue) AS revenue
FROM `sm-<tenant_id>.sm_transformed_v2.obt_orders`
WHERE is_order_sm_valid = TRUE
AND DATE(order_processed_at_local_datetime) >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
GROUP BY 1, 2
ORDER BY 1 DESC
~~~
### New customer acquisition
~~~sql
SELECT
DATE(order_processed_at_local_datetime) AS order_date,
sm_utm_source_medium,
COUNT(DISTINCT sm_customer_key) AS new_customers,
SUM(order_net_revenue) AS revenue,
SAFE_DIVIDE(SUM(order_net_revenue), COUNT(DISTINCT sm_customer_key)) AS avg_first_order_value
FROM `sm-<tenant_id>.sm_transformed_v2.obt_orders`
WHERE is_order_sm_valid = TRUE
AND order_sequence = '1st_order'
GROUP BY 1, 2
ORDER BY 1 DESC
~~~
### LTV cohort (CRITICAL: filter sm_order_line_type)
~~~sql
SELECT
cohort_month,
months_since_first_order,
AVG(SAFE_DIVIDE(cumulative_order_net_revenue, cohort_size)) AS avg_ltv
FROM `sm-<tenant_id>.sm_transformed_v2.rpt_cohort_ltv_by_first_valid_purchase_attribute_no_product_filters`
WHERE sm_order_line_type = 'all_orders'
AND acquisition_order_filter_dimension = 'source/medium'
AND months_since_first_order <= 12
GROUP BY 1, 2
ORDER BY 1, 2
~~~
### Discover available metrics
Query `sm_metadata.dim_semantic_metric_catalog` to find 180+ pre-defined metrics:
~~~sql
-- Find all revenue metrics
SELECT metric_name, metric_label, calculation
FROM `sm-<tenant_id>.sm_metadata.dim_semantic_metric_catalog`
WHERE metric_category = 'revenue'
ORDER BY metric_name
-- Resolve abbreviated names (aov, mer, cac, roas)
SELECT metric_name, preferred_metric_name, metric_description
FROM `sm-<tenant_id>.sm_metadata.dim_semantic_metric_catalog`
WHERE metric_name IN ('aov', 'mer', 'cac', 'roas')
~~~
No access yet?
If you cannot run queries due to permissions, see BigQuery Access Request Template.Related docs
BigQuery Essentials
Setup and first-query fundamentals.
SQL Query Library
SourceMedium SQL templates and patterns.
Table Docs
Schema-level documentation for core tables.
Metric Catalog
180+ pre-defined metrics with calculations.
Data Dictionary
Table availability, freshness, and column stats.
Multi-Touch Attribution
MTA models and experimental tables.

