Attribution is a messy conversation in today’s world, you add to the mix of Google attribution it gets far more complex. The ‘shining light’ is Google Analytics Big Query export of raw unsampled data which is extremely powerful. One of the main dimensions in the Big Query export is the different traffic source available providing a vast sum of insights. The challenge comes in being able to understand the data structure to generate the right insights.
The goal is to better understand how Google Analytics export data to Big Query can provide valuable insights by understanding the nuances of Attribution to help drive better insights.
Understanding data scopes
The first thing to understand is the Google Analytics raw data export to Big Query has multiple scopes to understand attribution data:
- User-level data: How a user first found the website
- Session-level data: Which channels and campaigns drove traffic to the website at a session level
- Event-level data: The source and medium that drove traffic to those individual pages. If multiples sources drive the same user to the website during the session
The distinction between the different scopes is crucial to better understand the data you are working with.
- User-level data: The first session the user had on the website. Known as first click attribution
- Session-level data: Known as last click attribution
- Event-level data: Provides the granularity if a user visits the website with multiple sources during a session
The traffic source data available in Big Query
In the below screenshot it shows there are 5 different traffic source data available:
- 2 x event scopes
- 2 x session scopes
- 1 x user scope
- collected_traffic_source and event_parmas provides a view of the traffic sources that visited the website. The traffic sources collected are available on the same events.
-
- If a user visits the website multiple times during a session from multiple traffic sources, then collected_traffic_source would be best to provide this view. The bonus with collected_traffic_source is that it’s raw with no session or event scoping applied and no attribution applied.
-
- traffic_source is information about the users first visit to the website
- session_traffic_source_last_click and session_traffic_source_last_click.cross_channel_campaign is based on last non-direct attribution. Whatever the first source is for the session it will then be set for the whole session
Big Query SQL code and example outputs from the 5 traffic sources
Below I have shared sample BQ code for the 5 different traffic sources available. This will help you better understand how the different traffic sources and attribution are applied.
1 – event_parmas
Scope: event
The traffic sources are set for page_view, session_start and first_visit events.
SELECT
TIMESTAMP_MICROS(event_timestamp) as event_timestamp,
user_pseudo_id,
event_name,
lower((select value.string_value from unnest(event_params) where key = 'source')) as all_event_source,
lower((select value.string_value from unnest(event_params) where key = 'medium')) as all_event_medium,
lower((select value.string_value from unnest(event_params) where key = 'campaign')) as all_event_campaign,
from `enter.tablename_123456.events_*`
GROUP BY ALL
2 – traffic_source
Scope: user (user_pesudo_id is set once for each user)
The traffic sources are set for all events.
SELECT
TIMESTAMP_MICROS(event_timestamp) as event_timestamp,
user_pseudo_id,
event_name,
traffic_source.source as first_user_source,
traffic_source.medium as first_user_medium,
traffic_source.name as first_user_campaign,
from `enter.tablename_123456.events_*`
GROUP BY ALL
3 – collected_traffic_source
Scope: event
The traffic sources are set for page_view, session_start and first_visit events.
SELECT
TIMESTAMP_MICROS(event_timestamp) as event_timestamp,
user_pseudo_id,
event_name,
collected_traffic_source.manual_source as first_event_source,
collected_traffic_source.manual_medium as first_event_medium,
collected_traffic_source.manual_campaign_name as first_event_campaign,
from `enter.tablename_123456.events_*`
GROUP BY ALL
4 – session_traffic_source_last_click.manual_campaign
Scope: session
The traffic sources are set for all events.
SELECT
TIMESTAMP_MICROS(event_timestamp) as event_timestamp,
user_pseudo_id,
event_name,
session_traffic_source_last_click.manual_campaign.source as first_session_source,
session_traffic_source_last_click.manual_campaign.medium as first_session_medium,
session_traffic_source_last_click.manual_campaign.campaign_name as first_session_campaign,
from `enter.tablename_123456.events_*`
GROUP BY ALL
5 – session_traffic_source_last_click.cross_channel_campaign
The traffic sources are set for all events.
SELECT
TIMESTAMP_MICROS(event_timestamp) as event_timestamp,
user_pseudo_id,
event_name,
session_traffic_source_last_click.cross_channel_campaign.source as first_session_source,
session_traffic_source_last_click.cross_channel_campaign.medium as first_session_medium,
session_traffic_source_last_click.cross_channel_campaign.source as first_session_campaign,
from `enter.tablename_123456.events_*`
GROUP BY ALL
Which traffic source should be used
Depending on the question that’s being asked there are different uses cases for each traffic sources available within big query.
- If the requirement is to understand for on-going reporting
- Which traffic sources are driving traffic and purchases, then session_traffic_source would be the best one
- To provide additional insights traffic_source will provide insights on the first source/medium that’s driving traffic and purchases
- To better understand if multiple traffic sources are driving traffic in a session, then collected_traffic_source would be the best the one
- event_parmas can be used but it needs to be unested (the data needs to be flatten ) incurring more costs to query
Scenarios
Testing 5 different scenarios to help understand how traffic source data is attributed in big query.
In this scenario looking at session_traffic_source_last_click.manual_campaign not session_traffic_source_last_click.cross_channel_campaign
SELECT
TIMESTAMP_MICROS(event_timestamp) as event_timestamp,
user_pseudo_id,
event_name,
lower((select value.string_value from unnest(event_params) where key = 'source')) as all_event_source,
lower((select value.string_value from unnest(event_params) where key = 'medium')) as all_event_medium,
lower((select value.string_value from unnest(event_params) where key = 'campaign')) as all_event_campaign,
traffic_source.source as first_user_source,
traffic_source.medium as first_user_medium,
traffic_source.name as first_user_campaign,
collected_traffic_source.manual_source as first_event_source,
collected_traffic_source.manual_medium as first_event_medium,
collected_traffic_source.manual_campaign_name as first_event_campaign,
session_traffic_source_last_click.manual_campaign.source as first_session_source,
session_traffic_source_last_click.manual_campaign.medium as first_session_medium,
session_traffic_source_last_click.manual_campaign.campaign_name as first_session_campaign,
from `enter.tablename_123456.events_*`
GROUP BY ALL
Caveat: All these scenarios are looking at one specific user who had multiple sessions
Scenario 1
There were 5 sessions across 8 days. The mix of traffic coming from paid media + email.
Date | Session No | Source/Medium |
17/01 | 1 | facebook/organicsocial |
18/01 | 2 | criteo/display |
19/01 | 3 | google/cpc |
21/01 | 4 | instagram/paidsocial |
24/01 | 5 | klayvio/email |
(filtered to event: session_start)
- The first source/medium for a user was facebook/organicsocial, when using traffic_source it will always report on facebook/organicsocial
- Using session_traffic_source will provide all the 5 different sources that drove traffic to the website
- If a purchase was made on 24th
- Using traffic_source it would report the purchase against facebook/organicsocial
- Using session_traffic_source it would report the purchase against klaviyo/email
Scenario 2
There were 4 sessions across 6 days. The mix of traffic coming from paid media + organic + direct.
Date | Session No | Source/Medium |
01/02 | 1 | instagram/paidsocial |
02/02 | 2 | google/organic |
03/02 | 3 | direct/none |
06/02 | 4 | google/cpc |
(filtered to event: session_start)
- The first source/medium for a user was instagram/paidsocial, when using traffic_source it will always report on instagram/paidsocial
- The third visit on 03/02 came from direct but
- Using traffic_source it will report as instagram/paidsocial
- Using session_traffic_source it will report as google/organic which was the 2nd visit which is based on last non direct attribution
Scenario 3
There were 3 sessions across 3 days. The mix of traffic coming from paid media + organic + direct.
Date | Session No | Source/Medium |
16/02 | 1 | google/organic |
17/02 | 2 | google/cpc |
18/02 | 3 | direct/none |
(filtered to event: purchase)
- The first source/medium for a user was google/organic, when using traffic_source it will always report on google/organic
- The third visit on 18/02 came from direct but when using the purchase event
- Using traffic_source will always report as google/organic
- Using session_traffic_source will report as google/cpc which was the 2nd visit which is based on last non direct attribution
Scenario 4
There were 4 sessions within a 30 min window. The mix of traffic coming from paid media + email.
Date | Session No | Source/Medium |
23/02
|
1 | facebook/organicsocial |
2 | criteo/display | |
3 | Instagram/paidsocial | |
4 | klayvio/email |
(filtered to showing all events)
- The first source/medium for a user was facebook/organicsocial. Using traffic_source and session_traffic_source it will report the session and purchase as facebook/organicsocial
(filtered to showing all events)
- For collected_traffic_source (and event_params) it shows the different traffic sources that came to the website within the same session
- It is possible in this scenario to count as 4 sessions when filtering to page_view
Scenario 5
Google paid search – google / cpc has been reporting as google / organic when auto tagging is turned on within Google Adwords. This is a common bug that been going on for a long while now.
When using session_traffic_source_last_click.manual_campaign it does not recognise gclid parameter in the URL and BQ is unable to report on the right values. Using session_traffic_source_last_click.manual_campaign it does report on the right values
SELECT
TIMESTAMP_MICROS(event_timestamp) as event_timestamp,
user_pseudo_id,
event_name,
session_traffic_source_last_click.manual_campaign.source as firstclick_session_source,
session_traffic_source_last_click.manual_campaign.medium as firstclick_session_medium,
session_traffic_source_last_click.manual_campaign.campaign_name as firstclick_session_campaign,
session_traffic_source_last_click.cross_channel_campaign.source as firstcross_session_source,
session_traffic_source_last_click.cross_channel_campaign.medium as firstcross_session_medium,
session_traffic_source_last_click.cross_channel_campaign.source as firstcross_session_campaign,
from `enter.tablename_123456.events_*`
GROUP BY ALL
The Google Analytics Big Query export provides huge amount of flexibility with how traffic sources data is collected. With the right question and right data there will be a load of valuable insights.