Data & AI

Demystifying GA4 data: How to flatten nested structures in BigQuery


Ciara Adkins

Senior Analytics Architect

Google Analytics 4 (GA4) is an industry-shaking update to Universal Analytics. In addition to GA4’s extensive list of new features, exporting GA4 data to BigQuery is free. However, analysts tasked with doing anything with the GA4 BigQuery export are in for a heartbreaking surprise: all the event data is nested (*big sad*).

But have no fear! I have gone through and done most of the hard work for you. So now, all you have to do is download a few files, follow along, and stick around for the ride.

The full query is at the end of this article. I recommend opening the file in a new window or screen next to your current one and referencing it while reading the article.

Caveat: This article assumes you are intermediately familiar with SQL in BigQuery. I will not be teaching the basics of BigQuery integration. Now that we have selected for all users WHERE level >= “intermediate,” let’s get into it!

Choosing variables

Before you start writing any SQL, you should have a plan for the variables you want to include in your analysis. Since jumping directly into BigQuery integration without a plan will most likely leave you even more flustered and confused than when you started, I suggest creating a Variable Mapping Reference. The columns should include items such as “Desired Data,” which are the data points you need, and “Source Column Name,” which maps the actual columns in the raw GA4 BigQuery export to your desired data points.

Here is what I came up with:

image-8ad1949b-f00f-4e9d-b4c6-b82518e6591fHere’s a link to my template if you want to reuse it.

‍Alright! Now that you have your Variable Mapping Reference completed, you are one step closer to that beautiful flattened data you’ve been dreaming of.

Let’s write some SQL.

Flattening the nested data structure

Understanding table structure

Let’s start by understanding the raw GA4 data export structure in GCP. Below is a simplified illustration of the structure of the export if you were to run a query like the following:

image-f89882a2-9501-47a3-a8d8-ecb5491294b6Figure 1: The raw GA4 export data structure

‍As you can see above, the event_params are a nested array. As mentioned before, this nested structure isn’t ideal if you want to do any analysis.

‍ Structuring the query

First, let’s write some pseudo-code to demonstrate the structure of our query. This query flattens the nested data structure of a GA4 data export using a combination of the UNNEST(), COALESCE(), and MAX() functions and a Common Table Expression (CTE) with two parts:

Now that we’ve planned our structure, let’s jump into some SQL.

In the following section, I will specifically walk through the trickier parts of the query.

‍ 1. Unpack event parameters: using UNNEST()

The first part of the query, alpha, uses UNNEST() to unnest the event_params array column of your GA4 data export table.

This action creates a new row for each element in the event_params array, with columns for your chosen dimensions like event_date, user_pseudo_id, and event_name as shown in the illustration below:

image-4762991a-5bf3-463d-b801-13d8f1edc131Figure 2: The data structure after unnesting event_params

2. Compress event values: using COALESCE()

Next, we want to consolidate our event_params values so that every event parameter key only has one value. You can use the COALESCE() function to compress the four event value columns into a single event value column. For any given event_params key value, there will only be one non-null event value in the string_value, int_value, double_value, or float_value columns. The COALESCE() function allows us to extract the non-null values from the value columns, and we concurrently CAST all values to strings such that all values in our resulting column event_value will be the same data type.

image-889be265-04f4-4f4d-802c-d03531841c1cFigure 3: The data structure after consolidating our event parameter values using COALESCE()

3. Pivot and flatten: using MAX() and GROUP BY

The second part of our query, beta, flattens the data by grouping the alpha table by the common columns and pivoting the event key and event_value columns into new columns using the MAX() function. This action creates a single row for each unique combination of your columns (in this instance, event_date, user_pseudo_id, event_name, ga_session_id, campaign_source, campaign_id, and campaign_content).

image-8115a45d-c364-420a-bb3b-389d0331fe65Figure 4: The data structure after pivoting and flattening the event parameters

4. Add session ID: using CONCAT()

Finally, the main query selects the desired columns (event_date, user_pseudo_id, event_name, ga_session_id, source, campaign, content, session_id, event_number) from the flattened beta table. The CONCAT() function concatenates ga_session_id and user_pseudo_id to create a unique_session_id, and ROW_NUMBER() is used to generate an event number within each ga_session_id partitioned and ordered by event_timestamp.

image-2525619c-2e3d-4fa0-bae4-08f0e9175e29Figure 5: The data structure after adding our additional columns

Quality assurance

After your query produces a flattened output, ensure it pulls the data as intended. For this result, I suggest creating high-level reports in the GA4 UI and comparing them to the summarized data in your query. Here are the performance metrics and quality assurance format that I came up with:

image-9271de39-e74a-4f03-9191-b829ffe89230Here's a link to my template if you want to reuse it.

‍Input your summarized data into a Google Sheet or Excel Spreadsheet and compare. If the variance looks too large, double-check your query to ensure no issues in your code. However, some data points will likely have a higher variance than others (like “Sessions,” which is nicely explained by Tanelytics in the article here).

‍ Final thoughts

If you’ve made it this far, you (hopefully) now have some flattened data! Now you’re well on your way to the fun part: analysis.

The query I’ve walked through should enable you to recreate reports from the GA4 UI and empower you to create new, meaningful reports that would be far more complex for the UI alone.

So get creative! Push the boundaries! Happy analyzing!

References and additional resources

‍Full sample query:

Be the first to know

Get curated content delivered right to your inbox. No more searching. No more scrolling.

Subscribe now