
Data - Selecting the ingredients for an MMM

In the last post, I made a simple marketing mix model to estimate the value of marketing on Dorothy, Rose, Blanche & Sophia’s cheesecake business. I’m trying to answer the Golden Girls’ question: Is our marketing worth it?
With that model, I showed that the Girls’ cheesecake marketing has a sales-cost ratio (SCR) of 1.9. Every $1 in marketing spend leads to $1.90 in cheesecake sales. Not bad!
However, that model has some flaws that I’ll address over the next few posts. In this one, I’ll start with the data.
A general rule, any potential sales driver that might have a significant and measurable impact on sales in a given period should be included in a marketing mix model.
Non-marketing data
The first model accounted for baseline, a trend, and seasonality in addition to marketing. There are other potential data sources to consider. For example:
- Distribution Where can people buy cheesecakes and has that changed over the 3 years of sales?
- In-stock levels Are people able to buy cheesecakes when and where they want to?
- External factors Has global interest in cheesecake changed outside of the Girls’ control? Perhaps the Great British Bake-off has made cheesecakes popular during part of the sales period.
- Promotions Have there been price discounts during any weeks?
Expanding on that last one: including price promotions in sales models can be challenging, so it might be tempting to simplify an MMM by excluding them. But it’s important to include them.
Zoom-in on a few months of sales and notice sales are higher but flat during the end of the year. The Girls ran a TV campaign in November, but sales were unchanged from those before and after. An MMM wouldn’t place much of a coefficient onto this investment. Did the marketing fail?
Now include all the data. The Girls also discounted their cheesecakes during September, October, and December. Including the promotions data in the model fully explains the sales bump during those 4 months and suggests that the marketing investment was a great replacement for promotions (and cheaper)!

Marketing
Then there’s marketing data. Depending on the complexity of a business’s marketing strategy, there might be tens (or thousands) of marketing series to consider including in a model. How do you pick them? There are a few best practices to consider when including marketing data in your models.
Separate line items for marketing with different tactics
The first MMM built used total marketing investments as an input. This is the sum of the Girls’ purchases of TV ads, search engine marketing, social media, and any other spend. Including this high level of data in the model assumes the same return across marketing channels and campaigns.
Instead, include marketing tactics as different inputs to the model. At a minimum, separate offline (tv, print, billboards) from digital marketing. Consider the business questions you need to answer with the model and ensure you are including data that can fully and correctly answer them.

The chart above is a copy of the one in the last post, comparing cheesecake sales and marketing spend. Now the marketing spend is split in the 3 tactics the Girls used. The three marketing tactics have different flighting1 - there is always some activity on social media while campaigns on tv and search less frequent. These should be included separately in an MMM.
Consistency of hierarchical levels
Different marketing types will have different data quality. With TV and print ads, you’ll be lucky to have general reach or circulation data. With digital ads, you’ll know every individual click, impression, and conversion. This is a ton of valuable data that will need to be aggregated before including in an MMM.2
Including an unbalanced level of detail in an MMM can lead to unstable regressions and biased results. There are ways to address this later, but to start, include comparable levels of detail across all marketing tactic data.
Use the metric that most relates to the shopper experience
MMM measures how effective advertising is for shoppers, so the data going into an MMM should be closest to the shopper experience. Working with dollars is convenient, but only describes the relationship between the business and the marketing firm. Instead use impressions (or the equivalent for each channel) to capture data is that best reflects how shoppers are receiving the marketing.
Like most things, marketing prices change over time for many reasons.
Revisting the MMM
Now I’ll update last week’s model with a few changes:
- Splitting marketing in 3 channels: TV, social media, search
- Using impressions / tv ratings instead of spend
Now that we are using impressions to model revenue, the model coefficients on marketing can no longer be directly used to estimate the sales-cost-ratio. Instead, the model coefficients are combined back with the input data to calculate the incremental portion of each input on total cheesecake sales.
| Marketing sales-cost ratios | |||
|---|---|---|---|
| Golden Girls cheesecake sales over 3 years | |||
| Marketing tactic |
Incremental sales |
Cost | SCR |
| Search | $6,092 | $2,100 | 2.9 |
| Social | $7,678 | $6,199 | 1.2 |
| Tv | $7,648 | $3,852 | 2.0 |
With this new model, it’s clear that each of the marketing tactics has very different sales-cost ratios and performance. Search engine marketing has the highest SCR with a 3.0 on average over 3 years of sales and social media is the lowest with a 1.2, meaning the social media marketing yields 20cents gains for each dollar invested.
Does this mean the Girls should pull out of TV and social media marketing and put all their budget in search marketing? No!… but that’s for another post
Follow me on BlueSky at @ryantimpe!
Footnotes
Flighting refers to the timing and pattern of marketing decisions. Flighting decisions are part of campaign planning and marketing strategies.↩︎
Granular detail associated with digital marketing is very important for other types of marketing data, such as multi-touch attribution and customer lifetime value. MMMs instead need a wide breadth of data aggregated across individuals.↩︎