Case Study: Unifying Google Analytics Accounts in BigQuery

South Shore Analytics
5 min readAug 30, 2023

--

Introduction

“I just want to see all my clients’ data in one place!” — Digital Agency Managers, everywhere, who have 10 Google Analytics tabs open.

Photo by Stephen Phillips - Hostreviews.co.uk on Unsplash

It’s common for agencies to manage Google Analytics accounts for multiple clients at once, but it can be challenging to unify analytics across those accounts for ease of reporting in a scalable manner. Below, I’ll discuss how we recently helped a client connect all of their clients’ Google Analytics accounts to one BigQuery instance and set up an automated process to consolidate all of that data into a unified table for downstream analysis/data visualization. Using this approach, we were ultimately able to build an agency-level dashboard that could easily be filtered with URL parameters so that the agency could also share it with their clients for consolidated usage across other data sources such as Google Ads, Facebook Ads, Bing Ads, and more.

The Challenge

The digital agency we worked with wanted to A) see their clients’ GA4 data all in one place and B) allow each of their clients to see their GA4 data alongside data from other data sources (e.g. Facebook Ads, Google Ads, Google My Business data, etc.) in a consolidated dashboard.

The problem was that there was a separate GA4 instance for each client and certain tools we experimented with such as Fivetran did not allow us to query event data out of GA4 in the detailed manner that we would like to. Thus, we were faced with bringing GA4 data into a unified BigQuery instance ourselves, querying the data however we would like, and combining the GA4 data after the fact so that we could visualize the data and join it with other data sources as needed.

Our Approach

  1. We first connected each of the client GA4 accounts that the digital agency manages to Big Query using data links. This is a fairly straightforward process that you can read more about here.
  2. We built a data model to pull out the key information that we needed from the BigQuery events table. Detailed code (with some redacted items) can be found below. This goal of this query is essentially to replicate the User Acquisition report out of Google Analytics directly.
SELECT
date,
IFNULL(
session_source,
first_user_source
) AS session_source,
IFNULL(
session_medium,
first_user_medium
) AS session_medium,
IFNULL(
CONCAT(session_source, ' / ', session_medium),
CONCAT(first_user_source, ' / ', first_user_medium)
) AS session_source_medium,
COUNT(DISTINCT session_id) AS sessions,
COUNT(DISTINCT user_pseudo_id) AS total_users,
count(distinct(case when is_new_user = 1 then user_pseudo_id else null end)) as new_users,
SUM(CAST(engaged_sessions AS INT)) AS engaged_sessions,
1 - SAFE_DIVIDE(
SUM(CAST(engaged_sessions AS INT)),
COUNT(DISTINCT session_id)
) AS bounce_rate,
SAFE_DIVIDE(
SUM(CAST(engaged_sessions AS INT)),
COUNT(DISTINCT session_id)
) AS engagement_rate,
SAFE_DIVIDE(
SUM(CAST(engagement_time_seconds AS INT)),
sum(cast(engaged_sessions as INT))
) AS average_engagement_time,
SUM(CAST(engagement_time_seconds AS INT)) as engagement_time_seconds,
SUM(purchases) AS purchases,
sum(scrolls) as scrolls,
sum(form_fills) as form_fills
FROM
(
SELECT
event_date as date,
MAX(
CASE
WHEN (
SELECT
value.int_value
FROM
UNNEST (event_params)
WHERE
event_name = 'page_view'
AND KEY = 'entrances'
) = 1
AND (
SELECT
value.string_value
FROM
UNNEST (event_params)
WHERE
event_name = 'page_view'
AND KEY = 'source'
) IS NULL THEN CONCAT(traffic_source.source)
ELSE (
SELECT
value.string_value
FROM
UNNEST (event_params)
WHERE
KEY = 'source'
)
END
) AS session_source,
traffic_source.source AS first_user_source,
MAX(
CASE
WHEN (
SELECT
value.int_value
FROM
UNNEST (event_params)
WHERE
event_name = 'page_view'
AND KEY = 'entrances'
) = 1
AND (
SELECT
value.string_value
FROM
UNNEST (event_params)
WHERE
event_name = 'page_view'
AND KEY = 'source'
) IS NULL THEN CONCAT(traffic_source.medium)
ELSE (
SELECT
value.string_value
FROM
UNNEST (event_params)
WHERE
KEY = 'medium'
)
END
) AS session_medium,
traffic_source.medium AS first_user_medium,
CONCAT(
user_pseudo_id,
(
SELECT
value.int_value
FROM
UNNEST (event_params)
WHERE
key = 'ga_session_id'
)
) AS session_id,
user_pseudo_id,
MAX(IF(event_name IN ('first_visit', 'first_open'), 1, 0)) AS is_new_user,
MAX(
(
SELECT
value.string_value
FROM
UNNEST (event_params)
WHERE
key = 'session_engaged'
)
) AS engaged_sessions,
sum((select value.int_value from unnest(event_params) where key = 'engagement_time_msec'))/1000 as engagement_time_seconds
,
SUM(
CASE
WHEN event_name = 'purchase' THEN 1
ELSE 0
END
) AS purchases,

SUM(
CASE
WHEN event_name = 'form_fill' THEN 1
ELSE 0
END
) AS form_fills,
SUM(ecommerce.purchase_revenue) AS revenue,
countif(event_name = 'scroll') as scrolls
FROM `project.dataset.table` -- insert warehouse & table name info here.
GROUP BY
date,
first_user_source,
first_user_medium,
session_id,
user_pseudo_id
)
GROUP BY
session_source_medium,
session_source,
session_medium,
date

After modeling the data, we used a SQL script to iterate through each of the GA4 tables that were created in BigQuery, query them, and combine the results into a consolidated table that could be referenced for downstream reporting and visualization. This script took some research and troubleshooting and was the inspiration for sharing this process with others who may face a similar issue in the future. Using this process allowed us to access each client’s GA4 id as a column in the consolidated table we created in order to easily join to a client mapping table and eventually link GA4 data to other data sources as well.

Finally, we set up the SQL script (below) as a scheduled query to run each day so that overnight, our metrics would be refreshed in the consolidated table that we planned to use for client reporting.

-- Declare variables
declare dataset_names ARRAY<STRING>;
DECLARE dataset_name STRING;
DECLARE table_name STRING;
DECLARE i INT64 DEFAULT 0;

-- Note that you will need to first create the destination table
-- before this script will run successfully.

TRUNCATE TABLE `your-project-here.dataset.table`;

SET table_name = 'events_*';

SET dataset_names = (
SELECT ARRAY_AGG(schema_name)
FROM `your-project-here`.INFORMATION_SCHEMA.SCHEMATA
WHERE schema_name LIKE 'analytics_%'
);

-- Loop through the array
WHILE i < ARRAY_LENGTH(dataset_names) DO
-- Get the current table name
SET dataset_name = dataset_names[OFFSET(i)];


EXECUTE IMMEDIATE FORMAT("""

INSERT INTO `your-project-here.dataset.table_name`(

SELECT replace('%s','analytics_','') AS client_id,

/* ^^ Example of how the combination of FORMAT and %s
can be used to inject the value of dataset_name below */

)
""",dataset_name
);

-- Increment the index
SET i = i + 1;
END WHILE;

Results

After getting all of the digital agency’s clients’ GA4 data into one consolidated BigQuery table, it was a piece of cake to join that data with other key client information and other key data sources such as Facebook Ads, Bing Ads, Google My Business, and more. We were able to create a templated Looker Studio dashboard for each of their clients with a parameterized URL that could be embedded in a client portal for ease of viewing all metrics in one place, but that’s a story for another blog post! If you ever need help with consolidating your GA4 data or other key data sources for your business or your clients, please don’t hesitate to reach out to James and Nick at southshore.llc!

About Us

South Shore Analytics (SSA) is an Analytics Consulting Firm, co-founded by James Burke and Nick Lisauskas, both highly skilled professionals with more than a decade of invaluable experience in the field of Analytics. Their shared passion for data-driven insights and business optimization led them to establish SSA, aiming to provide top-notch services to various businesses, irrespective of their size or stage of development.

--

--

No responses yet