We recently found out that our client has to pay so much for the cost of using the GCP. So we decided to do a GCP audit to find out from where most of the costs are coming out. After the audit, we found out that we are spending thousands of dollars every month just on the BigQuery. The BigQuery had a lot of Scheduled queries running which were necessary to update the data sources for analytics purposes. Then we decided to create a monitoring dashboard to track the cost of each query running on BigQuery. So we came up with the monitoring dashboard created on Looker Studio (previously Data Studio). We monitored the queries and optimized all the queries which were costing us the most. After the optimization, we figured out that the cost of Bigquery was reduced by 50%-60%.
If you are also using BigQuery for storing your data or creating reports or anything, you might also have wondered how much I am spending whenever a query runs on BigQuery. So let’s get started with creating a monitoring dashboard on Google Looker Studio.
So first of all we need to get all the logs generated by BigQuery to a dataset, so let’s start with creating a dataset.
Now we are ready with our dataset where we will get all the logs generated in the BigQuery, but how??? So we need to create a sink for this. A “sink” typically refers to a component or mechanism used to route, collect, and manage logs or metrics from various services and resources within the GCP environment.
To create a sink:
resource.type=”bigquery_project” |
Now we will start getting the logs into a partitioned table within the dataset we created. All set!!
Now we are ready to create a chart in the Data Studio/Looker Studio. So first we need to create a data source for it.
SELECT
protopayload_auditlog.authenticationInfo.principalEmail AS user,
CAST(7.5 * (CAST(JSON_EXTRACT_SCALAR(protopayload_auditlog.metadataJson, '$.jobChange.job.jobStats.queryStats.totalProcessedBytes') AS INT64) / POWER(2, 40)) AS NUMERIC) as queryCostInUSD,
CAST(JSON_EXTRACT_SCALAR(protopayload_auditlog.metadataJson, '$.jobChange.job.jobStats.queryStats.totalProcessedBytes') AS INT64) as totalProcessedBytes,
CAST(JSON_EXTRACT_SCALAR(protopayload_auditlog.metadataJson, '$.jobChange.job.jobStats.queryStats.totalBilledBytes') AS INT64) as totalBilledBytes,
CAST(CAST(JSON_EXTRACT_SCALAR(protopayload_auditlog.metadataJson, '$.jobChange.job.jobStats.queryStats.totalProcessedBytes') AS INT64) / 1000000000000 AS NUMERIC) as tb,
JSON_EXTRACT_SCALAR(protopayload_auditlog.metadataJson, '$.jobChange.job.jobConfig.queryConfig.query') as query,
REGEXP_EXTRACT(JSON_EXTRACT_SCALAR(protopayload_auditlog.metadataJson, '$.jobChange.job.jobConfig.queryConfig.query'), r'-- tag: (.*?) tag ends/') as report
FROM
<Your GCP Project id>.sink_logs.cloudaudit_googleapis_com_data_access
WHERE
JSON_EXTRACT_SCALAR(protopayload_auditlog.metadataJson, '$.jobChange.job.jobStats.queryStats.totalProcessedBytes') IS NOT NULL
AND DATE(timestamp) >= PARSE_DATE('%Y%m%d', @DS_START_DATE)
AND DATE(timestamp) <= PARSE_DATE('%Y%m%d', @DS_END_DATE)
Now the data source is ready to use. You can rename it from the data source page. Now go to the data studio page, click on Insert and select Chart.
The chart will show the cost taken by each query running on the BigQuery and will look like this:
You can also create a chart to show the cost incurred by each user if there a multiple users using the same project. For that, you will need to follow the steps same as above, but just in the dimension field, select “user”. This chart will now show the cost for each user. We are using a doughnut chart to represent this:
Now we can also insert a table to get the list of queries and the cost related to that.
So here is a final BigQuery dashboard where you can get the details of all the queries running on the BigQuery and the cost related to that query.
At the end of June, we were informed by the client that the cost of GCP was double as it usually came earlier. So we conducted a GCP audit where we found out that BigQuery was taking most of the cost. So we proposed to create this monitoring dashboard for BigQuery. In July, we created this dashboard and after the successful implementation of the dashboard, we found our top 4 queries which were costing us the most. We started optimizing these queries and at the end of July month, the cost of GCP was significantly reduced by 25-30% of the previous month’s cost. After this success, we continued to optimize further queries. At the end of August, the cost was reduced by 50-60% of the initial billing.
After the optimization of queries, we were also able to know which users are taking how much cost. So as there were a few different teams that were working with the BigQuery, we also informed them to optimize the queries they were working on. This helped our client to get a complete insight into the queries and users and helped them save a lot of expenses that were recurring for every month.
When you are using BigQuery and are unaware of how much cost each query running on it will take, then you will be just surprised by your monthly bills. Having a cost monitoring dashboard is a great tool to make timely decisions on optimising queries, and removing unnecessary usage of resources. The results speak volumes: we achieved a 60% reduction in our monthly bills on BigQuery. The ability to identify, analyze, and optimize individual queries has not only saved significant financial resources but has also streamlined operations, paving the way for a more collaborative and agile work environment. The outcomes extend beyond mere cost savings. The real-time decision support provided by the monitoring dashboard empowers stakeholders to address cost-related issues promptly. Improved resource efficiency, data processing performance, and scalability planning contribute to a positive effect throughout the organization.
As we conclude this journey, it’s clear that a well-designed BigQuery cost monitoring dashboard is not just a tool for financial oversight; it positions organizations to make informed, data-driven decisions, optimize resource utilization, and achieve a higher return on their cloud infrastructure investments.
Stay Tuned with us for the complete case study on cost audit and optimization with the help of a monitoring system.