Maximizing Speed: A Deep Dive into Looker Reporting Performance Tuning

Abhishek Adivarekar
10 min readNov 13, 2023

--

Intro

Welcome to a journey of unparalleled speed and efficiency in Looker reporting! In the dynamic realm of data analysis, time is of the essence. In this comprehensive blog, we will navigate through the challenges encountered in a real-world scenario, delve into strategic optimization techniques, and unveil the remarkable results achieved in the quest for Looker reporting excellence.

Our story unfolds in the face of a significant challenge: a sluggish dashboard load time that once lingered at an exasperating 2 minutes and 20 seconds. Brace yourself for the remarkable twist — a transformation that defies expectations, propelling the dashboard load time to a mere 5 seconds. The results of our relentless pursuit of Looker reporting excellence are nothing short of awe-inspiring.

Challenge: Reporting at a Snail’s Pace

Our story begins with an OLTP database feeding Looker dashboards, but the dashboard load time was less than ideal — languishing at a staggering 2 minutes and 20 seconds. This prompted a deep dive into the intricacies of MySQL performance tuning to make reporting a breeze.

Solution — Optimization Techniques

A. SQL Query Optimization

  1. Avoid functions in query predicates
  2. Indexing

B. Looker-Specific Optimizations (LookML)

  1. Avoid temporary tables
  2. Generate Dynamic SQL
  3. Perform Dynamic aggregation
  4. Apply filter if selected
  5. Avoid Multiple case statement
  6. Avoid Symmetric aggregation (many to many join)

Will deep dive into each of the optimization technique

A. SQL Query Optimization

1. Avoid functions in query predicates

  • Using functions in query predicates negatively impact performance because functions typically add extra processing overhead to each row and slow down the overall execution of the query
  • Here are some reasons why you should avoid using functions in query predicates:
    1. Functions are not sargable
    2. Functions prevent the use of indexes
  • Here are some techniques for improving the performance of SQL queries:
    1. Index all the predicates in JOIN, WHERE, ORDER BY and GROUP BY clauses
    2. Avoid using wildcard (%) at the beginning of a predicate
    3. Avoid unnecessary columns in SELECT clause
    4. Use inner join, instead of outer join if possible
  • Example:
column: started_at
Data type: DateTime

Below WHERE Clause will not hit index:

WHERE date(started_at) = '2023-10-02'

Prefer:

WHERE started_at = '2023-10-02T00:00:00'

This adjustment allows the query to hit the index efficiently, contributing to improved performance.

2. Indexing

What is Indexing in SQL?

In SQL, an index is a database object that provides a quick lookup or retrieval of data in a table. It works similar to an index in a book, where you can quickly find a specific topic by referring to the index rather than scanning the entire book. Similarly, database indexes help speed up data retrieval operations by providing a faster way to locate and access rows in a table.

Determining Indexing Strategies: Key Considerations

These are the factors to consider when deciding which columns to index for optimal database performance.

1. Columns in WHERE Clauses:

  • Identify columns frequently used in the WHERE clause of your queries.

2. Columns in JOIN Conditions:

  • Columns involved in JOIN operations are good candidates for indexing.

3. Columns Used in ORDER BY and GROUP BY:

  • Columns used in sorting (ORDER BY) or grouping (GROUP BY) operations may benefit from indexing.
  • Indexes can facilitate faster sorting and grouping of data.

4. Columns in Aggregate Functions:

  • If you frequently perform aggregate functions (e.g., SUM, COUNT, AVG) on a particular column, consider indexing that column.

5. Columns in Subqueries:

  • If you have subqueries that filter data based on a specific column, consider indexing that column to speed up subquery execution.

6. Columns with High Cardinality:

  • Columns with high cardinality (many distinct values) are good candidates for indexing.
  • Indexing low-cardinality columns might not be as effective since there are fewer unique values.

7. Columns Used in WHERE Clauses with Range Conditions:

  • Columns used in range conditions (e.g., BETWEEN, >, <) can benefit from indexing.
  • This is particularly useful for date or numerical columns.

8. Frequently Accessed Columns:

  • Identify columns that are frequently accessed in SELECT, UPDATE, DELETE, or JOIN operations.
  • Indexing such columns can improve overall query performance.

9. Columns in UNIQUE Constraints:

  • Columns involved in unique constraints or primary keys are automatically indexed.

If your application frequently queries based on these columns, additional indexing may still be beneficial.

Now will deep dive into Looker-Specific Optimizations

Looker-Specific Optimizations (LookML)

1. Avoid temporary tables

  • Temporary tables, a common tool in MySQL, proved to be a double-edged sword within the AWS MySQL Aurora environment. While designed to store data temporarily, the disk-based storage mechanism inadvertently led to sluggish dashboard execution times. This bottleneck prompted a reevaluation of our optimization strategies.
  • In the quest for optimal dashboard performance within AWS MySQL Aurora, temporary tables emerged as a challenge, significantly slowing down execution times. However, by strategically leveraging the sql_table_name parameter and steering clear of automatic temporary table creation, we ushered in a transformative era of enhanced efficiency and responsiveness.
  • The crux of the issue lay in how SQL was generated for MySQL within AWS Aurora. The use of the derive_table parameter in the Looker view definition triggered the automatic creation of temporary tables. While these tables served a purpose, their disk-centric nature became a hindrance, negatively impacting the overall performance of our dashboards.

The Breakthrough: Embracing `sql_table_name`

To circumvent the challenges posed by temporary tables, we embraced a game-changing alternative — sql_table_name. Instead of relying on the automatic generation of temporary tables with derive_table, we opted for a more controlled approach. Placing the SQL query directly within brackets using the sql_table_name parameter allowed us to sidestep the pitfalls of temporary tables and regain control over dashboard performance.

Example Implementation:

view: my_dashboard {
sql_table_name: (
SELECT
column1,
column2,
...
FROM
your_table
WHERE
your_condition
);

dimension: column1 {
type: string
sql: ${TABLE}.column1 ;;
}

dimension: column2 {
type: string
sql: ${TABLE}.column2 ;;
}

...
}
  • This strategic optimization, powered by sql_table_name, significantly improved dashboard responsiveness and eliminated the performance bottlenecks associated with temporary table creation.

2. Unleashing Dynamic SQL: A Looker Optimization Marvel

In the realm of Looker SQL optimization, the strategic use of dynamic SQL stands out as a powerful technique, revolutionizing how derived views generate SQL based on selected fields in an explore. Let’s delve into a scenario to understand the transformative impact of this technique.

The Scenario: Dynamic SQL Unleashed

Consider a scenario where a derived table’s SQL is dynamically generated based on the fields selected in the explore. Here’s an example:

Select a, b, c, null as d from Table1
Union
Select a, b, c, d from Table2

In the traditional setup, both branches of the UNION are always present, potentially leading to unnecessary data retrieval and impacting query time. To address this, the implementation of a dynamic SQL approach becomes the hero of optimization.

The Optimization Technique: Dynamic SQL with Liquid If-Else Logic

Select a, b, c, null as d from Table1
{ if d in_query }
Union
Select a, b, c, d from Table2
{ endif }

With this technique, the SQL dynamically adapts based on the fields selected in the query. The { if d in_query } clause acts as a switch, ensuring that the UNION is executed only if the field d is selected in the query. This not only streamlines the SQL generation process but also significantly reduces query time by hitting just one table when unnecessary UNION operations can be avoided.

The Impact: Streamlined Queries, Reduced Load Times

The implementation of dynamic SQL in Looker offers a dual advantage.

  1. Optimizes the SQL generation process, tailoring it to the specific fields requested.
  2. Minimizes unnecessary data retrieval, reducing the load on the database and accelerating query execution times.

As you navigate the intricacies of Looker optimization, consider harnessing the power of dynamic SQL to unlock new levels of query performance and data exploration agility.

3. Crafting Efficiency with Dynamic Aggregation

  • The technique of dynamic aggregation emerges as a strategic approach to tailor data pre-aggregation based on the user’s selected time period in the Explore.
  • This transformative optimization ensures that aggregation is conducted at a granularity that aligns with the user’s time duration choice, contributing to a streamlined and more responsive data analysis experience

The Optimization Technique: Dynamic Aggregation Based on Time Period

The LookML snippet below showcases the implementation of dynamic aggregation, specifically focusing on the started_at timestamp field. The aggregation granularity adjusts dynamically based on the user's time period selection:

{% if explore.time_duration_picker._parameter_value == "'Today'" or explore.time_duration_picker._parameter_value == "'Yesterday'" %}
DATE_FORMAT(
TIMESTAMP(FROM_UNIXTIME(UNIX_TIMESTAMP(started_at) - (UNIX_TIMESTAMP(started_at) % (60*30)))),
'%Y-%m-%d %H:%i:%s'
) AS started_at_minute30,
{% else %}
date(started_at) as started_at_minute30,
{% endif %}
  • In essence, if the user selects ‘Today’ or ‘Yesterday,’ the aggregation occurs at 30-minute intervals within the day.
  • Otherwise, if a broader time duration is chosen, the aggregation reverts to a day-level granularity.
  • This dynamic adjustment is aimed at reducing the number of rows involved in joins and calculations, contributing to a more optimized query execution.

The Impact: Streamlining Joins and Calculations

  • By dynamically altering the aggregation granularity based on the user’s time period selection, this Looker LookML optimization directly addresses the challenge of data volume in joins and calculations.
  • The result is a more efficient query execution process, where unnecessary granularity is avoided, leading to reduced computational overhead and faster analysis.

4. Precision in Action: Apply filter if selected

  • In the pursuit of Looker LookML SQL optimization, the application of filters with surgical precision emerges as a key technique.
  • Let’s explore the intricacies of this approach, where filters are applied dynamically based on user selections, ensuring that only relevant conditions find their way into the generated SQL WHERE clause.
  • The essence of this optimization technique lies in its conditional nature.
  • When a user selects a filter, it becomes the catalyst for inclusion in the SQL WHERE clause.
  • Taking the example of a parameter as a filter on a dashboard, this technique ensures that the filter is only applied if a user explicitly selects a value. Here’s how it unfolds in LookML:
explore: explore_name {
label: "Explore Name"
always_filter: {
filters: [name_of_parameter_as_filter: "Select"]
}
sql_always_where: {% if explore_name.name_of_parameter_as_filter._parameter_value != "'Select'"}
${explore_name.select_name_of_dimension_or_measure}
{% else %}
{% endif %};
}

The Breakdown: Conditional Logic in Action

  1. always_filter: Specifies the filters available in the explore.
  2. Conditional Logic in sql_always_where:
  • Consider parameter filter has values: “Select”, “Apple”, “Mango”, “Orange”, “Avocado”
  • If the user selects any value except “Select”, the condition evaluates to true.
  • The ${explore_name.select_name_of_dimension_or_measure} is added to the WHERE clause.
  • If no value is selected and by default value is set to “Select”, the condition evaluates to false, and the WHERE clause remains unaltered.

The Impact: Precision, Performance, and User Experience

By applying filters conditionally based on user selections, this optimization technique achieves several key outcomes:

  • Precision: Filters are applied precisely when needed, avoiding unnecessary constraints on the SQL query.
  • Performance: The generated SQL is tailored to the user’s specific selections, reducing query complexity and potentially improving performance.
  • User Experience: Users experience a more intuitive and responsive dashboard, with filters seamlessly adapting to their interactions.

5. Unraveling Efficiency: The Pitfalls of Nested Case Statements in LookML SQL

The Challenge: Redundant Case Statements

The quandary arises when a dimension, say Dimension A, contains multiple case statements in its SQL. When this dimension is used as a filter in a measure or another dimension (Dimension B), the case statements proliferate. For instance, if Dimension A has 5 case statements and Dimension B has 4 case statements, the generated SQL can potentially balloon to a total of 20 case statements — each case statement in B being replicated for every case statement in A.

The Culprit: Nested Dimensions and Filters in Measures

Nested dimensions, particularly those laden with case statements, can lead to SQL generation headaches. Each layer of nested dimensions compounds the complexity, resulting in redundant case statements and a convoluted LookML structure.

The Solution: Unified Conditional Logic in a Single Dimension

To overcome the challenges posed by nested dimensions, the key lies in breaking free from hierarchical dependencies. The strategy involves consolidating conditional logic into a single dimension, liberating the LookML structure from unnecessary layers of complexity.

The Implementation: A Unified Dimensional Approach

dimension: unified_dimension {
type: string
sql:
CASE
WHEN condition_A_1 THEN 'Category A1'
WHEN condition_A_2 THEN 'Category A2'
...
ELSE 'Other Category'
END
|| ' - ' ||
CASE
WHEN condition_B_1 THEN 'Type B1'
WHEN condition_B_2 THEN 'Type B2'
...
ELSE 'Other Type'
END
|| ' - ' ||
CASE
WHEN condition_C_1 THEN 'Label C1'
WHEN condition_C_2 THEN 'Label C2'
...
ELSE 'Other Label'
END ;;
}

In this unified dimension, conditional logic from Dimension A, Dimension B, Dimension C, and so on is consolidated. The resulting structure is cleaner, with a single dimension capturing the essence of various conditions.

The Impact: Simplified LookML, Enhanced Performance

  • By adopting a unified approach to conditional logic, the LookML structure is streamlined, reducing redundancy and eliminating the need for nested dimensions.
  • This not only enhances the clarity and maintainability of the Looker model but also contributes to improved query performance.
  • Embrace this strategy to elevate the harmony of your LookML structure.

6. Unraveling the Complexities of Symmetric Aggregation (avoid many to many joins in Looker explores)

The Challenge: Symmetric Aggregation and Its Pitfalls

Symmetric aggregation, especially within the context of many-to-many joins, introduces a layer of complexity that can significantly impede query performance. The intricate nature of these joins can lead to extended query run times, making it imperative to adopt strategies that mitigate these challenges.

The Solution: Avoid Many-to-Many Joins

To circumvent the issues associated with symmetric aggregation, the first line of defense involves steering clear of many-to-many joins. While these joins can be powerful, they often come at the cost of increased query complexity and prolonged execution times.

Additional Strategies: Primary Keys and Measure Pulling

  1. Check for Defined Primary Keys:
  • A crucial step in addressing persistent issues is to inspect and ensure that primary keys are appropriately defined in the views involved. This can optimize the query execution process.
  • By carefully structuring views and relationships, you can optimize the LookML model, avoiding pitfalls associated with symmetric aggregation.

Dashboard Mastery Unveiled: That’s a Wrap on Looker Reporting Excellence, Folks!

As we conclude this transformative exploration into the realm of Looker reporting, the journey has been nothing short of a revelation. From the initial challenges of a 2-minute-and-20-second dashboard load time to the exhilarating achievement of a swift 5-second performance, the metamorphosis is a testament to the power of strategic optimization techniques.

In the dynamic landscape of data analysis, where every second counts, we’ve not only overcome hurdles but redefined the standards of efficiency. The results speak volumes, echoing a harmonious balance between speed and precision.

As you continue your own data exploration and reporting endeavors, remember that the quest for excellence is an ongoing journey. Leverage the insights gained here, experiment with optimization strategies, and continually refine your approach to elevate your Looker reporting experience.

Thank you for joining us on this odyssey. May your dashboards load swiftly, your insights be impactful, and your data-driven decisions be ever precise. Here’s to a future of unparalleled speed and efficiency in your Looker reporting endeavors!

--

--

Abhishek Adivarekar

Certified Data Engineer: GCP, Databricks, Snowflake. Expert in designing robust analytics, warehouse optimization, impactful dashboards