All the metrics are bull

And everyone knows it. Now what?

Feb 26, 2021

At an ad industry event several years ago, the topic of viewability came up. When asked why marketers should buy anything that isn’t 100% viewable, one of the panelists on stage replied sans irony: “Because it performs well.”

The two industry bodies that defined viewability – the Interactive Advertising Bureau (IAB) and the Media Ratings Council (MRC) – apply the following definition: For an ad to be considered viewable, at least 50% of it must be in view for a minimum of one second for display ads and at least two seconds for video ads. That doesn’t seem like a terribly high bar for a human to meet as they go about their regular, everyday internet business.

Yet, for our panelist, ads that didn’t meet this threshold were somehow performing well. Is it possible that ads that weren’t in view were so impactful that humans still responded to them in a manner that advertisers desired? Did brands know that this was how their money was being spent?

It's fairly routine to hear someone arguing in favor of inventory that no one can prove was ever seen by humans yet performs well based on campaign effectiveness metrics. From a buyer's perspective, as long as you can point to some indicator of success (preferably a quantifiable one) you have little incentive to rock the boat. If you can demonstrate that 20% of your advertising went through a questionable supply chain yet you met or exceeded all the goals you've set out to meet, for example, then your CFO is likely to just chop off that 20% from next year's budget, seeing how you clearly don’t need it. It's really as close to a victimless crime as you can get: No one doubts the existence of the crime, but no one thinks they're the victim either.

We’ve been fudging the numbers for a long time. Facebook alone has a lengthy dossier of seemingly annual measurement scandals. In 2016 it came to light that Facebook’s average video views were wildly inflated. Some estimates from Publicis Media pegged the inflation at somewhere between 60% and 80%. This measurement error emerged in the heyday of the “pivot to video” trend: Facing surging traffic on social platforms and lucratively high video CPMs, Facebook’s reported numbers drove many publishers to invest in building video assets, hoping to reliably monetize this apparent thirst for video content. Except the numbers on which these business decisions were made turned out to be bogus. Oops.

Later that year Facebook issued corrections to several other key advertising metrics affecting multiple products ranging from the Like and Share buttons (its primary vehicle for capturing user engagement), through live video reaction counts, to audience reach estimates for all ads. It would be hard to imagine a comparable measurement scandal in, say, linear TV. Yet in spite of the platform’s rather lackadaisical approach to ensuring that key metrics are verified and valid, Facebook’s advertisers didn’t really make a big stink. Larger ones waited for restitution, makegoods, credits, and similar “Aw shucks, our bad” but stuck to the “Hey, this is still working for us” interpretation of metrics to save face (and preserve their budgets). Smaller mom-and-pop advertisers with fewer resources or whose primary/only advertising platform is Facebook likely didn’t notice; and if they did, they didn’t have much of an option to take their ad dollars elsewhere.

In 2017, Facebook announced more clarifications to several relevant ad metrics and opened itself up to the first of several third-party audits to be performed by the Media Ratings Council. Documents related to a class action lawsuit filed in 2018 revealed that Facebook was long aware of its measurement discrepancies. In 2017, COO Sheryl Sandberg admitted that she had known about issues with Facebook's Potential Reach metric "for years” in a previously redacted internal email that became public in early 2021. Marketers can use the free planning tool to estimate the number of people who can be reached by their Facebook campaigns, and the results can (and do) impact their decisions on whether and how much to spend with the platform. The discrepancy was caused by fake and duplicate Facebook accounts. Facebook resisted internal suggestions for fixing the problem because doing so would decrease potential reach by 10%. In one damning email uncovered during the lawsuit, a Facebook employee asked, "How long can we get away with the reach overestimation?"

As the late Leonard Cohen sang:

Everybody knows that the dice are loaded

Everybody rolls with their fingers crossed

(…)

That's how it goes

Everybody knows

The key premise of digital advertising is addressability and accountability: In theory, if we’re able to measure each consumer interaction – from initial ad view through to purchase – then we can focus our spend better. As more players enter the field (e.g., retail/commerce media, AVOD services, etc.) and various types of automated fraud become easier and cheaper to proliferate (from botnets to phone farms with actual humans tapping on ads), how can we compare inventory across different properties? Without independent, third-party verification, marketers are forced to trust the platforms. But if every walled garden comes up with its own measurement, how do buyers vet what’s a good buy? And even assuming no nefarious intent, an undetected measurement error can lead to misinformed decisions and a lot of waste. Yet it appears that marketers see little incentive to change buying behavior and drive ecosystem-wide changes to measurement.

In the before-times, we had ratings and panels and samples. Even though these were flawed, there at least was some consensus on what the currency that we’re all transacting on should be. The digital media analytics we use today, which typically reflect measures of volume, can be traced to early efforts in the 1930s (!) to measure an audience's exposure to radio broadcasts for advertising purposes. The early model of tracking who was listening, for how long, and how often, became an agreed-upon standard for audience ratings. Nielsen continued to use exposure as the core metric for television audience ratings, which it dominated beginning in the 1950s. When news media moved online, measurement mainly focused on the same audience size-related metrics that ruled older mass media forms – but without the consensus. Performance metrics entered the fray with the rise of platforms, but brand advertising largely remained audience- and exposure-driven. Many metrics, such as unique visitors or clicks, may appear simplistic, but there are often huge discrepancies in how they are defined and tallied. Even companies that use the same methodologies can deliver quite different results.

Third-party measurement has had its share of recent scandals, too. Comscore was embroiled in an accounting scandal that led to the Securities and Exchange Commission (SEC) charging the company for inflating revenue and making false statements about key company performance metrics under former CEO Serge Matta. It’s probably not a great sign when a measurement company is entangled in a high-profile accounting scandal of its own. This brings up another somewhat poetic reference: Who watches the watchmen?

While the watchmen figure this out, let’s step back to the whiteboard and think through what a good measurement solution would address today. We see an opportunity for some combination of the following three pillars:

Supply chain transparency: Programmatic seems to have made basic what-did-I-buy-and-where-did-it-run types of inventory verification increasingly complex. Last year’s ISBA report and similar research in other regions of the world point to a flawed advertising supply chain. One of the early promises of blockchain in advertising introduced the concept of tracking and reconciling where individual ad creative shows up – this is a useful mental model to embrace even if the implementations haven’t yet delivered on the promise. The IAB has taken on several supply chain accreditation projects – most notably via the Trustworthy Accountability Group (TAG) – but no single solution seems to have emerged that doesn’t elicit much grumbling from industry leaders and questions about accuracy, pay-to-play accreditation, and similar concerns.
Fraud mitigation: As more trading moves to automated channels, a regular third-party auditing body could help spot fraudulent activity sooner and perhaps speed up mitigation. This is currently the domain of ad bot prevention and brand safety vendors; perhaps they have a bigger opportunity to expand into more holistic media measurement, too.
True value of inventory: Wide discrepancies in what constitutes a fairly basically defined segment (e.g., age/gender) are quite common and exasperated more as segments get more specific (e.g., with interest and intent attributes). A GRP (gross rating point) would normalize that across different networks and types of content; what is the equivalent of NBC Thursday night prime time, the most premium of premium ad placements in linear TV before the cord-cutting era? Does that even exist? Comparisons between different inventory sources and types are increasingly challenging. If your agency or data scientists can evaluate buys on Roku vs. Walmart, you’re among the lucky. Having a third-party reference that could assign respective value – especially across different walled gardens – would certainly make the decision process easier. Right now, all buyers need to figure this out for themselves, often with the help of sizable data science teams and significant investments in data infrastructure.

Media measurement across eras to date may have been flawed and needed a revamp, but at least there was a third party that we could point to as officially having the numbers or currency and who we could all yell at if things didn’t quite seem above board. With an ever increasing number of measurement scandals, it feels timely for a new solution to emerge before we’ve lost trust in any kind of audience numbers completely. We’ve seen some interesting approaches in individual channels (e.g. interpreting demand across streaming content) and in general (e.g. using ML to auto-adjust media mix) but nothing yet that resembles an industry standard in the same way as Nielsen’s TV ratings did for many years.

One question:

Who is building the Nielsen or Comscore for the next generation? Given the diversity of our media consumption, can we expect reliable third-party auditors to emerge that can cover everything from mobile apps to terrestrial radio? Is that even a realistic ask?

Dig deeper:

Thanks for reading,

Ana & Maja

Enjoyed this piece? Share it, like it, and send us comments (you can reply to this email).

Who we are: Sparrow Advisers

We’re a results oriented management consultancy bringing deep operational expertise to solve strategic and tactical objectives of companies in and around the ad tech and mar tech space.

Our unique perspective rooted deeply in AdTech, MarTech, SaaS, media, entertainment, commerce, software, technology, and services allows us to accelerate your business from strategy to day-to-day execution.

Founded in 2015 by Ana and Maja Milicevic, principals & industry veterans who combined their product, strategy, sales, marketing, and company scaling chops and built the type of consultancy they wish existed when they were in operational roles at industry-leading adtech, martech, and software companies. Now a global team, Sparrow Advisers help solve the most pressing commercial challenges and connect all the necessary dots across people, process, and technology to simplify paths to revenue from strategic vision down to execution. We believe that expertise with fast-changing, emerging technologies at the crossroads of media, technology, creativity, innovation, and commerce are a differentiator and that every company should have access to wise Sherpas who’ve solved complex cross-sectional problems before. Contact us here.

Sparrow One

Discussion about this post