Organizations today are drowning in data, and BigQuery, while a powerful tool, can quickly become a cost monster. We faced this exact challenge while building an analytics system for a client. Their BigQuery data was exploding, causing their cloud bill to follow suit. We had a mission: slash costs without sacrificing performance.
This case study is your roadmap to BigQuery cost optimization. Join us on our journey as we:
Ready to tame your BigQuery beast? Dive in!
At the start of 2021, we initiated the setup of DostEducation‘s analytics platform on Looker Studio.
DostEducation’s Tech Stack for Analytics:
As the user base expanded, infrastructure costs started increasing exponentially. We decided to conduct a comprehensive audit to identify optimization opportunities without compromising product performance.
BigQuery, a significant contributor to infrastructure costs, presented challenges due to its charging model. It charges based on (Reference from the official documentation):
To address this, we implemented a multi-faceted approach:
Note: This exercise needed a few rounds of discussions with the business team (mentioned in more details are mentioned below).
Firstly, we developed a budget alert system on the Google Cloud Platform (GCP). This system is responsible for sending alerts to a Slack channel when we reach predefined spending thresholds. It utilizes PubSub events to capture information from the GCP billing dashboard, ensuring timely alerts to prevent any unexpected expenses.
Key features include:
With this, we were sorted for any kind of surprises and ready to take proactive steps to optimize the system further. More details can be found in this blog – Efficient GCP Budget Monitoring through Slack Alerts
As a next step, we set up a Monitoring dashboard to monitor the BigQuery Cost analysis based on the activity logs BigQuery provides. For detailed implementation – Despite challenges during implementation, detailed solutions are available in our blog post: Saving Thousands of Dollars by BigQuery Cost Monitoring Dashboard
With the new BigQuery cost dashboard, we were able to track metrics such as
With this, we had a comprehensive understanding of resource and cost distribution, enabling us to analyze patterns and develop an optimization plan. This data empowered us to make informed decisions. Below are the two reports showing the distribution of BigQuery Cost per user and per query level details.
During our pattern analysis, we identified the following optimization opportunities:
We observed recurring queries dedicated to removing duplicate data from BigQuery. Initially running hourly, these queries aimed to purge duplicate entries from the entire BigQuery data source table to ensure analytics on non-duplicate data. However, as the dataset grew over time, the associated costs surged significantly.
We executed multiple scheduled queries for hourly or daily data updates, intending to get a subset of data in a desired format for subsequent analytics. Challenges surfaced, leading to the realization that:
The strategy was simply to analyse the pattern, share and discuss it with the business team and comeup with a discrete solution.
The Focus was on:
Optimising the data sources
Aligning the data sources refresh rate as per business need
In specific cases like Impact analysis, user engagement analysis and growth analysis, where data utilization spanned one year, discussions with the business team resulted in a strategic change like
Analysis with the limited data to process less amount of data at once
We started the analytics on top of the unified data we have, however, over time, the amount of data increased significantly, and the processing resources required to perform the same set of analytics.
Reduced amount of data for processing – The SysOps dashboard requires data for the last 3 months. Limiting dashboard data to the last three months
We started optimizing in May 2023. The total BigQuery cost was $1,395.13, which came down to $425.33 in November 2023.
Cost Trend from Dec 2022 – June 2023
From $800 in December 2022, the price surged to $1669 by June 2023, correlating with the growing volume of data. Observing these patterns, we anticipated a December 2023 cost of approximately $1750 – $1800, marking more than a twofold increase over the past year.
With all the optimisation activities we performed, now December 2023 cost came down to $400. When compared to the initially anticipated cost, the difference is substantial, resulting in savings of $1350 to $1400 a month and the range will increase with time.
Throughout the optimization process, we encountered several challenges, including:
Our journey revealed invaluable lessons:
The successful reduction in BigQuery costs underscores our commitment to delivering efficient and cost-effective solutions. By addressing challenges head-on and implementing strategic optimizations, we’ve achieved a harmonious balance between performance and cost savings. This case study serves as a testament to our efforts to client success and continuous improvement.
Cost Trends between January 2023 – Dec 2023
Looking ahead, we recommend: