2023-02-15 | Article | Insights
Content-based websites are often struggling with having a deep understanding of the behaviour and interest of their users. User segmentation is a helpful analysis to gain insights about the different user groups on content-based websites. There are a variety of ML algorithms such as K-means that can be used for user segmentation. But it is also possible to start simply and quickly to segment our users by some basic measurements, for instance by the number of pages they visit, the time they spend on the site, and the frequency of their visits. By doing so, we can gain insights into how different types of users interact with our content which can help us improve user experience and increase engagement.
In this article, we will go through a query that produces a basic table for segmentation by visit frequency and engagement of our user groups daily. First, the date and device type are queried (since this report is daily and we would like to distinguish between devices). For getting the frequency of visits per day, we use the COUNT clause and the window function to get the count of distinct session ids partitioned by user id, device type, and date. With the same logic, we would get the pageviews with the count of event name page_view. We now would need the time on the website in seconds (that is why we divide the engagement_time_msec by 1000). We get this information with a SUM clause and a window function. In the last query, we will get the sum of page views and engagement time per date, device, and frequency of visits.
WITH base AS(
SELECT
DISTINCT
event_date,
device.category as device_type,
user_pseudo_id,
COUNT (DISTINCT (SELECT value.int_value FROM UNNEST(event_params) WHERE KEY = 'ga_session_id'))
OVER (PARTITION BY user_pseudo_id,event_date,device.category) AS number_of_sessions,
COUNTIF(event_name = 'page_view')
OVER (PARTITION BY user_pseudo_id,event_date,device.category) AS number_of_pageviews,
SUM(((SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'engagement_time_msec'))/1000)
OVER (PARTITION BY user_pseudo_id,event_date,device.category) AS timeOnSite_sec
FROM
-- Replace the following table with your own data
`bigquery_project_id.dataset_id.events_*`)
SELECT
event_date,
device_type AS device,
number_of_sessions,
SUM(number_of_pageviews) AS sum_of_page_views,
SUM(timeOnSite_sec) AS sum_of_timeOnSite_sec
FROM base
GROUP BY
1,2,3
ORDER BY
1,2,3,4,5
Feel free to extend and adjust this query to your business-specific needs. Happy querying!