Moh. Adi Tryaswindito - How To Optimize BigQuery Cost

BigQuery Cost-Efficient ETL Using MERGE-Based UPSERT

Tools : Looker Studio, Google BigQuery, Google Cloud Function, Google Cloud Scheduler
Tags : Optimization
Created : 2023

💡 "I utilized a datamart created by our data engineering team and leveraged the INFORMATION_SCHEMA.JOBS_BY_PROJECT view in BigQuery to develop a simple audit dashboard. This dashboard helps identify which datamarts or queries may require optimization based on usage patterns and performance metrics."

Project Overview

To optimize BigQuery costs in our ETL process, I implemented an UPSERT strategy using the MERGE function. This approach allows us to update only changed or new records, significantly reducing the amount of data processed daily. Instead of overwriting full tables or running expensive full refreshes, we now maintain up-to-date datasets at a fraction of the cost.

Business Context & Problem Statement

Our original ETL jobs were structured as 2 to 3 months full-table loads, even when only a few rows had changed. This led to unnecessary data processing, increased costs, and longer query times. Since BigQuery charges based on data scanned, we needed a more efficient way to maintain freshness in our tables without compromising performance or accuracy.

My Approach

I transitioned the daily data load process from full refreshes to incremental UPSERT by applying the MERGE statement in SQL. This method checks whether a record already exists in the target table based on a unique key and either updates it or inserts a new one. The query only run data with INTERVAL 1 DAY therefore the cost are significantly lower.

Sample logic :

MERGE target_table t

USING new_data n ON t.order_id = n.order_id

WHEN MATCHED THEN UPDATE SET t.status = n.status, t.updated_at = n.updated_at

WHEN NOT MATCHED THEN INSERT (order_id, status, updated_at) VALUES (n.order_id, n.status, n.updated_at)

Real Cost Comparison

To show the impact, here’s a simulated comparison between my previous approach and the optimized one:

Cost Saving: ~Rp 1.93 million per month, or 99% reduction in query cost. And thats only the cost of one datamart ETL.

Results & Impact

🔻 Reduced Query Cost: Daily data volume dropped by over 80–90% in most pipelines.

⚡ Faster Loads: Job durations decreased significantly.

🧹 Cleaner Logic: Easier to maintain and scale with schema changes.

📈 Business Value: Dashboards and reports remain fresh and accurate—without burning budget.

Key Takeaways

By switching to incremental MERGE operations, we gained control over how much data is processed each day, keeping costs low while still ensuring the latest data is reflected in our dashboards and reports.

Explore the Project

You can find the SQL code and technical breakdown in the GitHub Repository.

Explore More Projects

Page updated

Google Sites

Report abuse