Unlocking Savings: How Monitoring Can Optimize Infrastructure Spending
DevOps practices for Cost Optimisation

Introduction:

Organizations today are drowning in data, and BigQuery, while a powerful tool, can quickly become a cost monster. We faced this exact challenge while building an analytics system for a client. Their BigQuery data was exploding, causing their cloud bill to follow suit. We had a mission: slash costs without sacrificing performance.

 

This case study is your roadmap to BigQuery cost optimization. Join us on our journey as we:

  • Expose the hidden cost culprits lurking in your BigQuery setup.
  • Unleash a multi-pronged attack with strategies that truly work.
  • Showcase the results: a 70% reduction in BigQuery costs without a performance dip.
  • Learn from our challenges, strategies, and lessons learned. This is your chance to take control of your cloud costs and let your data insights shine, not your bill.

 

Ready to tame your BigQuery beast? Dive in!

 

Background:

At the start of 2021, we initiated the setup of DostEducation‘s analytics platform on Looker Studio.

DostEducation’s Tech Stack for Analytics:

  1. BigQuery for data warehouse – Storing large volumes of data captured for engagement, behavioural and impact analysis.
  2. Looker Studio – For data visualization (Creating Graphical representation for analytics)
  3. Data sources – Contains the desired set of data from targeted BigQuery queries for specific types of analytics.

 

Objective:

As the user base expanded, infrastructure costs started increasing exponentially. We decided to conduct a comprehensive audit to identify optimization opportunities without compromising product performance.

 

Identified Challenges:

BigQuery, a significant contributor to infrastructure costs, presented challenges due to its charging model. It charges based on (Reference from the official documentation):

  1. The amount of data stored (It is pretty much low)
  2. The amount of data processed (It is significant in our case)

 

Plan for Optimizing BigQuery:

To address this, we implemented a multi-faceted approach:

  1. Create a Budget alert service at GCP to get alerts on Slack to tackle with sudden increase in GCP Cost proactively.
  2. Create a detailed BigQuery Cost dashboard for granular insights.
  3. Analyzed cost breakdowns per query and per user.
  4. Monitored and optimized scheduled queries for efficient data source updates.

 

Note: This exercise needed a few rounds of discussions with the business team (mentioned in more details are mentioned below).

 

Implementation:

Firstly, we developed a budget alert system on the Google Cloud Platform (GCP). This system is responsible for sending alerts to a Slack channel when we reach predefined spending thresholds. It utilizes PubSub events to capture information from the GCP billing dashboard, ensuring timely alerts to prevent any unexpected expenses.

 

Key features include:

  1. The ability to set a targeted budget, indicating the expected amount we are comfortable spending on GCP.
  2. Support for multiple thresholds, allowing us to receive alerts at various stages. For instance, we’ve implemented thresholds at 25%, 50%, 70%, 90%, and 100%.

 

With this, we were sorted for any kind of surprises and ready to take proactive steps to optimize the system further. More details can be found in this blog – Efficient GCP Budget Monitoring through Slack Alerts

 

As a next step, we set up a Monitoring dashboard to monitor the BigQuery Cost analysis based on the activity logs BigQuery provides. For detailed implementation –  Despite challenges during implementation, detailed solutions are available in our blog post: Saving Thousands of Dollars by BigQuery Cost Monitoring Dashboard

 

Data Collection:

With the new BigQuery cost dashboard, we were able to track metrics such as 

  1. Resource utilization
  2. Top 10 queries by cost
  3. User-level cost breakdowns. 

 

With this, we had a comprehensive understanding of resource and cost distribution, enabling us to analyze patterns and develop an optimization plan. This data empowered us to make informed decisions. Below are the two reports showing the distribution of BigQuery Cost per user and per query level details.

Identification of Underutilized Resources:

During our pattern analysis, we identified the following optimization opportunities:

 

  1. Elimination of Duplicate Queries:

We observed recurring queries dedicated to removing duplicate data from BigQuery. Initially running hourly, these queries aimed to purge duplicate entries from the entire BigQuery data source table to ensure analytics on non-duplicate data. However, as the dataset grew over time, the associated costs surged significantly.

 

  1. Optimization of Scheduled Data Source Updates:

We executed multiple scheduled queries for hourly or daily data updates, intending to get a subset of data in a desired format for subsequent analytics. Challenges surfaced, leading to the realization that:

  1. Some analytics tasks could be accomplished with data from the previous day, eliminating the need for multiple daily data source refreshes.
  2. Certain analytical dashboards, focusing on weekly user behaviour monitoring, opened opportunities to extend the refresh rate to a weekly basis from a daily refresh.
  3. In specific cases, where data utilization spanned one year or six months, discussions with the business team resulted in a strategic decision to limit dashboard data to the last three months.

 

Optimisation Strategies:

  • Detail the strategies employed to optimize the underutilized resources.
  • Include any changes made to configurations, scaling, or resource allocation.

 

The strategy was simply to analyse the pattern, share and discuss it with the business team and comeup with a discrete solution.

 

The Focus was on:

  1. Optimising the data sources
  2. Aligning the data sources scheduler as per business need
  3. Analysis with the limited data to process less amount of data at once 

 

Optimising the data sources

  • Materialize large result sets to reduce processing time and cost
  • Reduce the data processing based on our needs. 
  • Prune partitioned queries – Utilizing BigQuery partitioned tables

 

Aligning the data sources refresh rate as per business need

In specific cases like Impact analysis, user engagement analysis and growth analysis, where data utilization spanned one year, discussions with the business team resulted in a strategic change like

  1. Reduced number of times data was being refreshed – 
    1. For Impact or Growth analytics, the team used to go through weekly, so we decided to refresh the data sources every week rather than daily – This resulted in a reduction of data processing by 6-7 times a week.
    2. For SysOps data – We used a daily basis, so we reduced the data refresh rate from hourly to daily.

 

Analysis with the limited data to process less amount of data at once

We started the analytics on top of the unified data we have, however, over time, the amount of data increased significantly, and the processing resources required to perform the same set of analytics.

 

Reduced amount of data for processing – The SysOps dashboard requires data for the last 3 months. Limiting dashboard data to the last three months

 

Return on Investment:

We started optimizing in May 2023. The total BigQuery cost was $1,395.13, which came down to $425.33 in November 2023.

 

Cost Trend from Dec 2022 – June 2023

From $800 in December 2022, the price surged to $1669 by June 2023, correlating with the growing volume of data. Observing these patterns, we anticipated a December 2023 cost of approximately $1750 – $1800, marking more than a twofold increase over the past year.

 

With all the optimisation activities we performed, now December 2023 cost came down to $400. When compared to the initially anticipated cost, the difference is substantial, resulting in savings of $1350 to $1400 a month and the range will increase with time.

 

 

Challenges Faced:

Throughout the optimization process, we encountered several challenges, including:

  1. Data Volume Scalability: Managing vast amounts of data in BigQuery posed scalability challenges.
  2. Integration Complexity: Integrating the monitoring tool into our existing infrastructure presented technical hurdles.
  3. Balancing Act: Striking the right balance between cost reduction and maintaining optimal performance requires careful navigation.

 

Lessons Learned:

Our journey revealed invaluable lessons:

  1. Proactive Monitoring is Key: Early and proactive monitoring is essential to identify potential cost concerns before they escalate.
  2. Collaboration is Crucial: Close collaboration with the business team is indispensable for aligning technical optimizations with business requirements.
  3. Regular Review and audits: Regularly reviewing and adjusting data processing needs based on changing business demands is critical for sustained optimization.

 

Conclusion:

The successful reduction in BigQuery costs underscores our commitment to delivering efficient and cost-effective solutions. By addressing challenges head-on and implementing strategic optimizations, we’ve achieved a harmonious balance between performance and cost savings. This case study serves as a testament to our efforts to client success and continuous improvement.

 

Cost Trends between January 2023 – Dec 2023

 

Future Recommendations:

Looking ahead, we recommend:

  1. Continuous Monitoring: Implementing continuous monitoring practices to adapt to evolving usage patterns and ensure ongoing cost-effectiveness.
  2. Automated Optimization: Exploring automation tools to streamline the identification and resolution of potential inefficiencies in real time.
  3. Regular Training: Provide regular training sessions for the team to stay updated on the latest optimization strategies and tools.
  4. Periodic Infrastructure Cost Audit: This will open up opportunities to identify areas for additional optimizations in time.

 

Next case study