2023-03-09 | Article | Insights
In this article we will focus on querying the user journey paths on our website or app. This query can provide many insights, such as identifying common paths or patterns, understanding the behaviour of different user segments, and optimising user journeys. We can also gain a good understanding of the role of individual channels within the user journey to inform both budget allocation on channels and the user experience, helping to optimise marketing and increase conversions in the future.
In the following query, we get the sequence of channels and timestamps per user. This can be extended to sequences of devices or filtered for paths that lead to a conversion goal. To get the timestamps of each touchpoint we will get the event timestamp of the session_start event in the first subquery. For getting the source and medium combination in the second subquery we will unnest the event parameters to capture the source and medium of the individual sessions (We have talked about how to query source and medium for individual sessions in this article ). These two tables are then joined together on the user id to bring timestamps of session starts and source/medium combinations in one table. In the last part of the query, we will build our sequences of timestamps and source/medium using an aggregation function called ‘STRING_AGG’ which takes all timestamps and source/medium combinations from a user and concatenates them into a single string sorted by ascending timestamp. See the transformation example below:
WITH
#---get the session start timestamp
t1 AS (
SELECT
DISTINCT user_pseudo_id,
(
SELECT
value.int_value
FROM
UNNEST(event_params)
WHERE
KEY = 'ga_session_id') AS session_id,
event_timestamp,
FROM
-- Replace the following table WITH your own data
`bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_*`
WHERE
event_name='session_start' ),
#---get the source/medium combination of each session
t2 AS(
SELECT
DISTINCT user_pseudo_id AS user_pseudo_id_m,
(
SELECT
value.int_value
FROM
UNNEST(event_params)
WHERE
KEY = 'ga_session_id') AS session_id_m,
CONCAT( MAX((
SELECT
value.string_value
FROM
UNNEST(event_params)
WHERE
KEY = 'source')), ' / ', MAX((
SELECT
value.string_value
FROM
UNNEST(event_params)
WHERE
KEY = 'medium'))) AS source_medium,
FROM
-- Replace the following table WITH your own data
`bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_*`
GROUP BY
1,
2),
#--- join session start timestamps and the channels of each session
t3 AS(
SELECT
t1.*,
t2.source_medium
FROM
t1
LEFT JOIN
t2
ON
t1.user_pseudo_id=t2.user_pseudo_id_m
AND t1.session_id=t2.session_id_m
ORDER BY
1,
2,
3)
#--- aggregate per user the timestamps and touchpoints in order
SELECT
user_pseudo_id,
STRING_AGG(CAST(event_timestamp AS STRING), ' > ' ORDER BY event_timestamp) AS timestamp,
STRING_AGG(source_medium, ' > 'ORDER BY event_timestamp) AS touchpoints,
FROM
t3
GROUP BY 1
ORDER BY
1,
2
This report can be extended for various use cases. Here are some tips to get inspired:
1. Product category path analysis: This query can be adjusted to a product category level to learn about their user journey. For example, the number of touchpoints and the length of time from first visit to purchase. By learning about the individual journeys of the product categories, we can create more effective marketing activities at the product category level to optimise our budget planning.
2. Conversion path analysis: We can break down this report by different conversion goals such as add to cart and purchase. By analysing the specific paths that users take on their way to purchase, we can understand the common touchpoints and interactions that lead to this goal and adjust our marketing efforts accordingly, potentially delivering more conversions and revenue.
Happy querying!