// Case study / Data engineering

Three raw scans to one. Fifteen CTEs to zero. ~50% cost reduction.

A second-pass audit of the GA4-to-BigQuery reporting layer built after the initial 38-hours/month BI enablement work. When the company signed an Anthropic contract, I used Claude with BigQuery MCP as an analysis collaborator to trace repeated scans, a 1.2 TB intermediate table, and an approximately $112/run cascading CTE chain. Redesigning the scan pattern cut scheduled-query costs by roughly half.

~50%pipeline cost reduction

1.2 TBintermediate table retired

$112approximate old run cost

15+independent CTE scans removed

What happened

A repeated-scan pipeline became one shared computation.

A GA4-to-BigQuery analytics pipeline had grown into a roughly $112/run CTE chain processing approximately 18 TB per execution. I redesigned the pipeline around a single conditional-aggregation pass, retiring the 1.2 TB GA4_Event_Params intermediate table and cutting costs by roughly half.

The result was cheaper, faster, easier to maintain, and more consistent: channel-grouping logic defined once instead of repeated in slightly different forms, and two intermediate tables retired without information loss.

Situation

A pipeline of individually reasonable decisions had stopped making sense as a system.

The first pass had already replaced the recurring manual reporting cycle with a BigQuery, Dataform, and Looker Studio reporting layer. The next question was cost and maintainability. I had been watching places where the pipeline could be simplified, and when the company signed an Anthropic contract, it created the opening to test a more collaborative audit process using Claude and BigQuery MCP.

The pipeline scanned events_* three separate times, unnested every event type into a generic key-value table, then ran 15+ separate CTEs over that 1.2 TB table and full-outer-joined them back together. Additional debt came from SELECT DISTINCT on 30-50 columns, duplicated channel-grouping logic, and a 429 GB unpartitioned table that forced full scans downstream.

Task

Find the actual cost source and redesign around it.

The brief was to audit the whole data flow, identify where the costs were really coming from, and propose a migration plan that the team could execute without freezing the reporting layer. Cost reduction was the visible target; the underlying goals were maintainability and consistency.

Method

Claude and BigQuery MCP turned the audit into a collaborative inspection process.

I used Claude with BigQuery MCP to inspect schema shape, query structure, scan patterns, and candidate simplifications while keeping the architectural decisions, validation plan, and migration sequence human-owned. The model surfaced the scan redundancy and CTE structure; I designed the replacement architecture, wrote and validated each SQL change, and sequenced the four-phase migration rollout.

fig. 01 / architecture shiftbefore → after / reduction as evidence

15+ independent CTEs1 grouped pass

1.2 TB repeated scanssingle shared scan

sequential chain6-step parallel DAG

baseline compute cost~50% reduction

1Single raw scan

Read the needed GA4 event rows once instead of repeating the same scan for each parameter extraction.

2Conditional aggregation

Use MAX(IF(ep.key = 'x', ep.value.string_value, NULL)) to pivot 25+ event parameters simultaneously.

3Canonical logic

Move channel grouping into one definition instead of maintaining three slightly different CASE statements.

4Partition + migrate

Add partitioning to large reporting tables and deliver a four-phase rollout with a 2–4 week validation window.

Migration plan

No freeze required.

The rollout was designed as four phases: schema preparation, historical backfill, scheduled-query deployment, and table cleanup. The validation window let the team compare outputs before retiring the older intermediate tables.

Outcome

The pipeline became cheaper and easier to reason about.

Scheduled query cost halved — the pipeline had grown from negligible to ~$300/month after app and web data were added; the CTE redesign cut that by roughly half.
Looker Studio scan cost halved separately — each user triggered a full-table scan at ~$30/scan across a 6-person team (~$180 at full load). Scoping the data window from all-time to two years brought that to ~$15/scan (~$90 at full load).
1.2 TB intermediate table retired, along with the redundant pdp_pageview_params layer.
15+ independent CTE scans removed by replacing repeated extraction with one shared computation.
Channel grouping unified into one canonical definition instead of duplicated CASE logic.

Design lesson

Optimise the system boundary, not the symptom.

Most of the waste lived between queries, not inside any single query. The durable fix was to redraw the boundary between raw event ingestion, parameter extraction, reporting tables, and dashboards so that the expensive modelling work happened once.

BigQuery BigQuery MCP Claude-assisted analysis DAG redesign Conditional aggregation Partitioning Cost engineering Migration planning

More casesFull case library.Sixteen active cases across seven categories: question discovery, instrumentation, experimentation, AI workflows, adoption, marketing analytics, and data engineering. Adjacent caseThe reporting layer this optimisation followed.BigQuery, Dataform, and Looker Studio reporting infrastructure that recovered 38 analyst hours per month before the cost pass. Get in touchContact and context.Email, LinkedIn, GitHub, the CV. Happy to walk through this case or any other in detail.