Quick Start (Copy/Paste)
Copy the block below and paste it into your coding agent to install and verify setup automatically.
Install the SourceMedium BigQuery analyst skill:
1. Run: npx skills add source-medium/skills --skill sm-bigquery-analyst
2. Read the installed SKILL.md to understand the workflow
3. Run the setup verification commands to check my BigQuery access
My tenant ID is: [your-tenant-id]
My project is: sm-[your-tenant-id]
Install
npx skills add source-medium/skills --skill sm-bigquery-analyst
What this skill does
- Setup verification — validates gcloud/bq CLI, authentication, and table access
- Safe queries — SELECT-only, dry-run first, cost-guarded
- SQL receipts — every answer includes copy/paste SQL + verification command
- No fabrication — if access fails, returns exact error and access request template
Example Questions
After installing, ask your coding agent:What was my revenue by channel last month?
Show me new customer acquisition by source over the past 30 days
What's my customer LTV by acquisition cohort? Use sm_order_line_type = 'all_orders'
SKILL.md
Copy everything below into.claude/skills/sm-bigquery-analyst/SKILL.md (or equivalent for your agent):
---
name: sm-bigquery-analyst
description: Query SourceMedium-hosted BigQuery safely. Emits SQL receipts. SELECT-only, cost-guarded. Use when users need help with BigQuery setup, access verification, or analytical questions against SourceMedium datasets.
compatibility: Requires gcloud CLI, bq CLI, and network access to BigQuery.
metadata:
author: sourcemedium
version: "1.0"
---
# SourceMedium BigQuery Analyst
Use this skill to help end users work with SourceMedium BigQuery data from setup to analysis.
## Workflow
1. **Verify environment** (run these before any analysis)
2. Confirm project and dataset/table visibility
3. Use docs-first guidance for definitions and table discovery
4. Answer analytical questions with reproducible SQL receipts
5. Call out assumptions and caveats explicitly
## Setup Verification
Run these commands in order before writing analysis SQL:
~~~bash
# 1. Check CLI tools are installed
gcloud --version && bq version
# 2. Check authenticated account
gcloud auth list
# 3. Check active project
gcloud config get-value project
# 4. Validate BigQuery API access (dry-run)
bq query --use_legacy_sql=false --dry_run 'SELECT 1 AS ok'
# 5. Test table access (your project is named sm-<tenant_id>)
# Example: if your tenant is "acme-corp", your project is sm-acme-corp
bq query --use_legacy_sql=false --dry_run "
SELECT 1
FROM \`sm-<tenant_id>.sm_transformed_v2.obt_orders\`
LIMIT 1
"
# 6. Confirm you can actually read data (not just dry-run)
bq query --use_legacy_sql=false "
SELECT 1
FROM \`sm-<tenant_id>.sm_transformed_v2.obt_orders\`
WHERE is_order_sm_valid = TRUE
LIMIT 1
"
~~~
If any step fails, direct the user to request access from their internal admin.
## Safety Rules
These are hard constraints. Do not bypass.
### Query Safety
1. **SELECT-only** — deny: INSERT, UPDATE, DELETE, MERGE, CREATE, DROP, EXPORT, COPY
2. **Dry-run first** when iterating on new queries:
~~~bash
bq query --use_legacy_sql=false --dry_run '<SQL>'
~~~
3. **Enforce cost limit** with maximum bytes billed:
~~~bash
bq query --use_legacy_sql=false --maximum_bytes_billed=1073741824 '<SQL>'
~~~
(1GB = 1073741824 bytes. If it fails due to bytes billed, tighten filters or ask for approval.)
4. **Always bound queries**:
- Add `LIMIT` clause (max 100 rows for exploratory)
- Use date/partition filters when querying partitioned tables
- Prefer `WHERE` filters on partition columns
### Data Safety
1. **Default to aggregates** — avoid outputting raw rows unless explicitly requested
2. **PII handling**:
- Do not output columns likely containing PII (email, phone, address, name) without explicit confirmation
- If PII is requested, confirm scope and purpose before proceeding
- Prefer anonymization. Example:
~~~sql
-- Hash PII instead of exposing raw values
SELECT TO_HEX(SHA256(LOWER(email))) AS email_hash, ...
~~~
### Cost Guardrails
~~~sql
-- Good: bounded scan
SELECT ... FROM `sm-<tenant_id>.sm_transformed_v2.obt_orders`
WHERE DATE(order_processed_at_local_datetime) >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
LIMIT 100
-- Bad: full table scan
SELECT ... FROM `sm-<tenant_id>.sm_transformed_v2.obt_orders` -- no filters
~~~
## Output Contract
For analytical questions, always return:
1. **Answer** — concise plain-English conclusion
2. **SQL (copy/paste)** — BigQuery Standard SQL used for the result
3. **Notes** — timeframe, metric definitions, grain, scope, timezone, attribution lens
4. **Verify** — `bq query --use_legacy_sql=false --dry_run '<SQL>'` command
5. **Bytes scanned** — if >1GB, note this and ask for approval before running
If access/setup fails, do not fabricate results. Return:
1. Exact failing step
2. Exact project/dataset that failed
3. Direct user to the BigQuery access request template
## Query Guardrails
1. Fully qualify tables as `` `sm-<tenant_id>.dataset.table` ``
2. For order analyses, default to `WHERE is_order_sm_valid = TRUE`
3. Use `sm_store_id` (not `smcid` — that name does not exist in customer tables)
4. Use `SAFE_DIVIDE` for ratio math
5. Handle DATE/TIMESTAMP typing explicitly (`DATE(ts_col)` when comparing to dates)
6. Use `order_net_revenue` for revenue metrics (not `order_gross_revenue` unless explicitly asked)
7. **Prefer `*_local_datetime` columns** when available for date-based reporting; otherwise be explicit about UTC vs local
8. **For enumerations** (channel, platform, status), discover values with `SELECT DISTINCT` first, then use exact match. Reserve `LIKE`/`REGEXP` for free-text fields (`utm_campaign`, `product_title`, `page_path`)
9. **LTV tables (`rpt_cohort_ltv_*`)**: always filter `sm_order_line_type` to exactly ONE value
## Key Tables
| Table | Grain | Use case |
|-------|-------|----------|
| `obt_orders` | 1 row per order | Revenue, profitability, channel analysis |
| `obt_order_lines` | 1 row per line item | Product performance, margins, COGS |
| `obt_customers` | 1 row per customer | Acquisition, retention, subscription status |
| `rpt_ad_performance_daily` | 1 row per channel/date | Ad spend, impressions, clicks |
| `rpt_cohort_ltv_*` | 1 row per cohort x month | LTV analysis (filter sm_order_line_type!) |
## Project & Filtering
**Project naming:** Customer projects are named `sm-<tenant_id>`. Example: `sm-acme-corp`.
**Multi-store filtering:** If you have multiple stores, filter by `sm_store_id` to analyze a single store. Without this filter, all stores' data is combined.
**Sales channels:** Always filter or group by `sm_channel` for segmentation:
- `online_dtc` — Direct-to-consumer website
- `amazon` — Amazon marketplace
- `tiktok_shop` — TikTok Shop
## Example Queries
### Daily revenue by channel
~~~sql
SELECT
DATE(order_processed_at_local_datetime) AS order_date,
sm_channel,
COUNT(sm_order_key) AS order_count,
SUM(order_net_revenue) AS revenue
FROM `sm-<tenant_id>.sm_transformed_v2.obt_orders`
WHERE is_order_sm_valid = TRUE
AND DATE(order_processed_at_local_datetime) >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
GROUP BY 1, 2
ORDER BY 1 DESC
~~~
### New customer acquisition
~~~sql
SELECT
DATE(order_processed_at_local_datetime) AS order_date,
sm_utm_source_medium,
COUNT(DISTINCT sm_customer_key) AS new_customers,
SUM(order_net_revenue) AS revenue,
SAFE_DIVIDE(SUM(order_net_revenue), COUNT(DISTINCT sm_customer_key)) AS avg_first_order_value
FROM `sm-<tenant_id>.sm_transformed_v2.obt_orders`
WHERE is_order_sm_valid = TRUE
AND order_sequence = '1st_order'
GROUP BY 1, 2
ORDER BY 1 DESC
~~~
### LTV cohort (CRITICAL: filter sm_order_line_type)
~~~sql
SELECT
cohort_month,
months_since_first_order,
AVG(SAFE_DIVIDE(cumulative_order_net_revenue, cohort_size)) AS avg_ltv
FROM `sm-<tenant_id>.sm_transformed_v2.rpt_cohort_ltv_by_first_valid_purchase_attribute_no_product_filters`
WHERE sm_order_line_type = 'all_orders'
AND acquisition_order_filter_dimension = 'source/medium'
AND months_since_first_order <= 12
GROUP BY 1, 2
ORDER BY 1, 2
~~~
### Discover available metrics
Query `sm_metadata.dim_semantic_metric_catalog` to find 180+ pre-defined metrics:
~~~sql
-- Find all revenue metrics
SELECT metric_name, metric_label, calculation
FROM `sm-<tenant_id>.sm_metadata.dim_semantic_metric_catalog`
WHERE metric_category = 'revenue'
ORDER BY metric_name
-- Resolve abbreviated names (aov, mer, cac, roas)
SELECT metric_name, preferred_metric_name, metric_description
FROM `sm-<tenant_id>.sm_metadata.dim_semantic_metric_catalog`
WHERE metric_name IN ('aov', 'mer', 'cac', 'roas')
~~~
No access yet?
If you cannot run queries due to permissions, see BigQuery Access Request Template.Related docs
BigQuery Essentials
Setup and first-query fundamentals.
SQL Query Library
SourceMedium SQL templates and patterns.
Table Docs
Schema-level documentation for core tables.
Metric Catalog
180+ pre-defined metrics with calculations.
Data Dictionary
Table availability, freshness, and column stats.
Multi-Touch Attribution
MTA models and experimental tables.

