// Case study / Data engineering
Three raw scans to one. Fifteen CTEs to zero. ~50% cost reduction.
A second-pass audit of the GA4-to-BigQuery reporting layer built after the initial 38-hours/month BI enablement work. When the company signed an Anthropic contract, I used Claude with BigQuery MCP as an analysis collaborator to trace repeated scans, a 1.2 TB intermediate table, and an approximately $112/run cascading CTE chain. Redesigning the scan pattern cut scheduled-query costs by roughly half.
What happened
A repeated-scan pipeline became one shared computation.
A GA4-to-BigQuery analytics pipeline had grown into a
roughly $112/run CTE chain processing approximately
18 TB per execution. I redesigned the pipeline around a
single conditional-aggregation pass, retiring the 1.2 TB
GA4_Event_Params intermediate table and cutting costs by
roughly half.
The result was cheaper, faster, easier to maintain, and more consistent: channel-grouping logic defined once instead of repeated in slightly different forms, and two intermediate tables retired without information loss.
Situation
Each individual decision made sense. The system no longer did.
The first pass had already replaced the recurring manual reporting cycle with a BigQuery, Dataform, and Looker Studio reporting layer. The next question was cost and maintainability. I had been watching places where the pipeline could be simplified, and when the company signed an Anthropic contract, it created the opening to test a more collaborative audit process using Claude and BigQuery MCP.
The pipeline scanned events_* three separate times,
unnested every event type into a generic key-value table, then ran 15+
separate CTEs over that 1.2 TB table and full-outer-joined them back
together. Additional debt came from SELECT DISTINCT on
30-50 columns, duplicated channel-grouping logic, and a 429 GB
unpartitioned table that forced full scans downstream.
Task
Find the actual cost source and redesign around it.
The brief was to audit the whole data flow, identify where the costs were really coming from, and propose a migration plan that the team could execute without freezing the reporting layer. Cost reduction was the visible target. Maintainability and consistency were the real ones.
Method
Claude and BigQuery MCP turned the audit into a collaborative inspection process.
I used Claude with BigQuery MCP to inspect schema shape, query structure, scan patterns, and candidate simplifications while keeping the architectural decisions, validation plan, and migration sequence human-owned. The model surfaced the scan redundancy and CTE structure; I designed the replacement architecture, wrote and validated each SQL change, and sequenced the four-phase migration rollout.
Read the needed GA4 event rows once instead of repeating the same scan for each parameter extraction.
Use
MAX(IF(ep.key = 'x', ep.value.string_value, NULL))
to pivot 25+ event parameters simultaneously.
Move channel grouping into one definition instead of maintaining three slightly different CASE statements.
Add partitioning to large reporting tables and deliver a four-phase rollout with a 2–4 week validation window.
Migration plan
No freeze required.
The rollout was designed as four phases: schema preparation, historical backfill, scheduled-query deployment, and table cleanup. The validation window let the team compare outputs before retiring the older intermediate tables.
Outcome
The pipeline became cheaper and easier to reason about.
- Scheduled query cost halved — the pipeline had grown from negligible to ~$300/month after app and web data were added; the CTE redesign cut that by roughly half.
- Looker Studio scan cost halved separately — each user triggered a full-table scan at ~$30/scan across a 6-person team (~$180 at full load). Scoping the data window from all-time to two years brought that to ~$15/scan (~$90 at full load).
-
1.2 TB intermediate table retired, along with the
redundant
pdp_pageview_paramslayer. - 15+ independent CTE scans removed by replacing repeated extraction with one shared computation.
- Channel grouping unified into one canonical definition instead of duplicated CASE logic.
Design lesson
Optimise the system boundary, not the symptom.
Most of the waste lived between queries, not inside any single query. The durable fix was to redraw the boundary between raw event ingestion, parameter extraction, reporting tables, and dashboards so that the expensive modelling work happened once.