Query Optimizer
With Alvin's Query Optimizer it's possible to save up to 30% of your BI query spend automatically, with the flip of a switch!
Last updated
With Alvin's Query Optimizer it's possible to save up to 30% of your BI query spend automatically, with the flip of a switch!
Last updated
The Query Optimizer is currently available for Looker and BigQuery, with more BI tools and data warehouses coming soon.
Introduction
Whilst BigQuery/Looker is a great combination for BI, Looker generates very condensed and complex SQL from explores/filters and other sources of configuration like user attributes.
The generated SQL is not optimized to run in your data warehouse, leading to poor query performance and the busting of the BigQuery cache, even if the underlying data hasn't changed.
Many of these anti-patterns are, once you have the generated SQL, fixable on the fly.
Alvin's query optimizer is a proxy that intercepts and rewrites your Looker queries, allowing the same consistent data, delivered faster and cheaper, with zero impact on the results, and very little effort on the user end (simple configuration in Looker).
What is the Query Optimizer and how does it work?
The Query Optimizer is a layer that sits between BigQuery and Looker (or any other client like python or java) connecting to the BigQuery API. This can be tested and validated by changing the API host i.e in the BigQuery Python Client.
The proxy implements all functionality of the BigQuery API, in fact it passes most API calls forward and only interacts with SQL payloads.
At the moment, the proxy is focused on optimizing analytical workloads (i.e SELECT statements), but there is also a roadmap for it to be further developed to optimize data pipelines (dbt, airflow etc).
Security and Performance
Security: In its current form, the proxy does nothing more than rewrite the SQL text on behalf of the user and forward the payload to the BigQuery API with the same credentials as the user. The proxy does not necessitate or store any credentials to operate. Any user calling the proxy API will appear to the BigQuery API as exactly the same user with exactly the same permissions as if the user was calling the BigQuery API directly. No data from BigQuery results are ever exposed beyond the proxy boundary. Outside of this the proxy adheres to Alvinβs robust security procedures that follows from Alvin internal policies as well as SOC2 Type2 certification.
Reliability: As stated already, the Query Optimizer is built from the ground up to handle large request volumes, fast. In addition to this, there are tests associated with every rewrite rule and these are run on every commit and prior to any deployment. If, by any chance, there is any error in processing and rewriting any SQL payload, the proxy will always fall back to passing on the original, untouched API request and payload, ensuring no disruption of service.
Accuracy: Lastly, itβs important to note that the Proxy is not using LLMs/GenAI or any approach that is potentially non-deterministic, nor sending SQL outside of the proxy. It is purely based on generating an AST and replacing nodes/expressions with a functionally equivalent expression that will return the same value. The proxy works in a binary way, either it rewrites the query to an optimized, but equivalent functional form, or it does not - it would never return a query that would yield a different result. For every query rewrite, the Query Optimizer also maintains a log of what rules/transformations were applied to the SQL.
Rollout Approach:
In this section we will describe a few different scenarios for rolling out the Query Optimizer, depending on the current setup/configuration of the Looker/BigQuery environment. The proxy URL will be given to you by an Alvin representative - reach out at hello@alvin.ai if you are interested in trying it out.
Full rollout: For simple BI environments, we would recommend just βflipping the switchβ in terms of routing the traffic via the Query Optimizer. This can be easily done in the Looker connection setup, by setting the rootUrl jdbc parameter. See Appendix A for more details on this. This means that all users in the Looker instance would immediately have their queries passed through and optimized by the Query Optimizer.
Gradual rollout to users/groups: It can also be that the BigQuery/Looker environment is more complex, and a more fine-tuned approach is desired. A complex environment can be where users are doing a lot of direct querying via SQL Runner / Explores and there might be more realtime data sources. This gradual rollout approach is described in Appendix B. Here the proxy can be enabled on an individual user/group basis, allowing you to quickly validate the performance and results of the proxy before rolling out to the entire company.
As a note, it is not possible to change the connection of an existing dashboard. To test on a single dashboard, you would need to duplicate the dashboard, along with any models and explores that the dashboard references - and ensure that all models use the the Query Optimizer as the connection.
Full rollout
Simply set the rootUrl jdbc parameter as described in the documentation
Gradual rollout to users/groups
Looking at the documentation, it is possible to have a user attribute defining properties and parameters of a connection, and this is exactly what we will do with the rootUrl parameter similarly as above. As we can see in the below screenshots, the connectionβs jdbc parameters is set up using liquid templating syntax instead of