Skip to main content

We live in a world where we have access to terabytes of data, we need to invest time to discuss what data is needed and how it can be used. Data collection needs a lot of love, it’s not the sexiest conversation we will have but I do think it has come to a point it becomes a critical conversation that brands and partnered agencies needs to be having. A long term view must be taken when investing into data collection.

If we believe data is a growth driver for brands, data collection needs to be at the heart of that growth journey. We need to understand what data is going to help us deliver business KPI’s and goals. The common theory is less is more especially with data which I do semi agree with. There are many use cases in the data and measurement world from reporting, insights, forecasting, benchmarking, to modelling. Need to think logically what are the different data sources that are needed and how to make the data available to all in a seamless and consistent manner.

Having a seamless and consistent approach is one of the most important components when it comes to data collection. Without it at the heart of data collection it will crumble. Consistent is ensuring that all teams have access to the same data, not having multiple teams source the same data at different times providing in-different insights. Seamless is ensuring that all data can be easily accessible at any given time, not having hundreds of excel files stored in different folders on the server that can’t be retrieved by teams.

Data Sources

When thinking about what data sources are needed, the data sources can be split into 3 main categories:

Owned data (better known as first party data) – This will range from Financials, CRM data, Analytics + App data.

Free data –It could range from Share of search, Market Cap, Social mentions, Tracking trends around changes in climate such as lockdowns etc, Mobility data, Health data.

Paid data – This could range from working with YouGov, Kantar to better understand your audience via quant or qual to enriching customer data via Axicom, Liveramp or Experian.

The value of owned and paid data is there to be seen but the big win and under-rated is free data, there is so much free data available that can be in-valuable. Grouping the data sources into these 3 buckets should really help think about what data do we currently have, how is it bucketed and what additional data do we need.

Data Collection Canvas

It’s great to have access to loads of data but what is generally missed is the role, value and how different data works together to generate insights and build the narrative. There should be a process ensuring that the right data is being utilised before it enters the data ecosystem


I have created a data collection canvas to better understand the data collection request.

Data Request: What is the data that is required i.e., Share of search for xyz brands

Use cases of the data: How will the data be used i.e., to help calculate market share + campaign evaluation + monitor brand competitors performance

Source of Data: Where is this data being sourced from i.e., Google trends

Category of Data: It would be one of 3 options: Owned, Free and Paid

Relationship with other data sources: Can this data source be used with other data sources to build a better narrative if so which ones.

High or Low priority Data: How valuable is this data from high to low priority.

Data Requirements: What are the specific data requirements so that it can be planned in when building the pipeline. i.e., for share of search: x search term + search interest value on a weekly level split by x country. In addition, it would require backtracking the last 5 years of data with the same data requirements.

Key Dimensions + Metrics: Define the key dimension and metrics that can be used to build insights from the selected data source but also when mapping it to other data sources

In advanced before the data collection process starts it should provide a helicopter view to understand how key dimensions and metrics can be mapped to other data sources.

Frequency of data required: How often is the data needed i.e., daily, weekly, bi-weekly, or monthly

Data Centric Approach

Standardising the definition of dimensions and metrics will make data trustworthy and consistent. This approach will also create a consistency around counting methodologies and build a single source of truth of all defined dimensions and metrics. When thinking of the business question it’s normally framed in the context of dimension and metrics. Users of the data need to think of the business question they are looking to answer and what dimension and metrics they need to inform better business decisions.

This is where a cloud solution becomes a critical component to support a data centric approach to your data collection. It becomes a control centre to all your data.

Data and measurement should be all based around the cloud with the advanced sophistication in machine learning. Putting data collection at the heart of the strategy is one step forward for brands to differentiate themselves.