Analytics·Jan 16, 2024

Stop Measuring Things with Means: Why Your Averages Are Lying to You

Learn why threshold-based metrics beat averages for measuring what actually matters in your business and how to make the switch

Kyle Johnson

The Alien's Report: When Averages Go Absurdly Wrong

An alien returns from studying Earth and presents his data-driven findings about humans:

"Humans have, on average, one testicle, one breast, and 5-inch long hair. They experience a menstrual cycle roughly every second month."

His boss nods approvingly at this "useful information" and proceeds to build a human model based on these averages.

This absurd example perfectly illustrates how decision-making with averages can go catastrophically wrong. Averages compress entire distributions into single numbers, losing so much information that the result isn't just useless—it's actively misleading.

Data Distribution

The Hidden Cost of Average Thinking

When we report metrics like:

"Average customer satisfaction is 3.8/5"
"Mean app startup time is 2.1 seconds"
"Average revenue per user is $47"

We're hiding critical information:

How many customers are actually satisfied?
What percentage of users have acceptable load times?
Are we dependent on a few whales or broadly successful?

The mean tells us nothing about the actual user experience distribution.

Why Data Scientists Live in the Clouds

As data professionals, we've developed a dangerous comfort with abstraction. We view problems from 30,000 feet, making sweeping statements like:

"Reducing app size led to a 500ms reduction in median startup time."

But what does this actually mean for users?

If median improved, that means 50% of users still have worse performance
We could hit this goal while 40% of users still have unacceptable experiences
A small group with premium devices could drag the average while most suffer

We're so comfortable operating on populations that we forget individuals don't experience averages—they experience specific, concrete realities.

The Threshold Revolution: Measuring What Matters

Instead of asking "What's the average?" start asking "What percentage of users are having a good experience?"

This shift fundamentally changes how you think about metrics and improvements. Here's the framework:

Step 1: Define "Good Enough"

Use your human intuition and domain expertise:

Bad approach: "Average load time is 2.1 seconds"
Better approach: "85% of users experience load times under 2.5 seconds"
Why it works: You can feel what 2.5 seconds means. You know if it's acceptable.

Step 2: Set Goals That Connect to Reality

Compare these two goal statements:

❌ Traditional: "Increase average session duration from 8.3 to 10.5 minutes"

✅ Threshold-based: "Increase the percentage of engaged users (>5 minute sessions) from 45% to 65%"

The second immediately tells you:

Most users aren't engaged currently
Success means converting the unengaged
You can't game it by making already-engaged users stay longer

The Mathematical Advantage You Didn't Expect

Threshold-based metrics actually simplify your analytics:

Statistical Analysis

Why Proportions Beat Distributions

No more distribution headaches:

Everything becomes 0 or 1 (met threshold or didn't)
No outliers distorting your metrics
No need to choose between mean, median, or mode

Statistical tests become trivial:

Simple proportion tests always apply
Confidence intervals are straightforward
Variance is predictable from the proportion itself

Clear segmentation opportunities:

Easy to identify who's below threshold
Natural groups for targeted improvements
Can't improve metric without helping those who need it

Real-World Application: Education's Wake-Up Call

Consider measuring education levels in a region:

The Averaging Trap

A region shows average education level of 12 years (high school graduation).

Sounds good? Here's what it's hiding:

50% dropped out of high school (11 years)
50% dropped out after freshman year of college (13 years)
Zero actual high school graduates in this "average"

The Threshold Solution

Instead, measure:

High school graduation rate: 0% (vs goal of 90%)
Bachelor's degree completion: 0% (vs goal of 40%)

Now you can see the real problems and allocate resources appropriately. The thresholds (diploma, degree) represent meaningful life milestones, not arbitrary points on a continuum.

Strategic Advantages of Threshold Thinking

1. Clarity in Prioritization

When you know 30% of users have unacceptable experiences, you know exactly who to focus on. With averages, you might waste time optimizing for users who are already satisfied.

2. Impossible to Game

Teams can improve averages by:

Cherry-picking easy wins
Focusing on already-good segments
Excluding "outliers" from measurement
Making small improvements across the board

3. Natural Storytelling

Which story resonates more with leadership?

Option A: "We improved mean response time by 23.7%"

Option B: "We increased the percentage of customers getting sub-second responses from 34% to 78%. That's 10,000 more customers per day having the snappy experience they expect."

Implementation Guide: Making the Switch

Week 1: Audit Your Current Metrics

✅ List all metrics currently tracked as averages
✅ Identify which hide important distributions
✅ Flag metrics where outliers distort the picture
✅ Note where you've seen average improvements without real impact

Week 2: Define Meaningful Thresholds

✅ Use domain expertise to set "good enough" lines
✅ Validate with user research where possible
✅ Test thresholds with historical data
✅ Ensure thresholds connect to business outcomes

Week 3: Parallel Tracking

✅ Run both average and threshold metrics side-by-side
✅ Document divergences and what they reveal
✅ Build dashboards showing distribution insights
✅ Share findings with stakeholders

Week 4: Transition Communications

✅ Create compelling before/after comparisons
✅ Train team on interpreting new metrics
✅ Update OKRs and goals to threshold-based
✅ Celebrate early wins from clearer insights

Common Objections and How to Handle Them

Team Discussion

"But we've always used averages!"

Response: "And we've always struggled to connect metrics to actual user experience. Here's what we've been missing..."

Show them:

Specific examples where averages hid problems
User segments suffering while averages looked fine
How threshold metrics would have caught issues earlier
The simplicity of explaining threshold-based goals

"This seems more complicated"

Actually simpler because:

One threshold vs choosing mean/median/mode
Binary outcome vs complex distributions
Clear action items vs abstract improvements
Straightforward statistics vs distribution assumptions

"How do we choose the right threshold?"

Three approaches that work:

Competitive benchmarking: What do best-in-class achieve?
User research: What do users actually consider acceptable?
Business impact: Where does the metric affect outcomes?

Industry Success Stories

E-commerce Giant: Cart Abandonment

Before: Average cart value and average time to purchase
After: % of carts completing checkout within 5 minutes
Result: Identified mobile users as key problem, reduced abandonment by 31%

SaaS Platform: User Activation

Before: Average time to first value
After: % of users reaching "aha moment" within 48 hours
Result: Focused onboarding improvements, increased paid conversions by 47%

Mobile App: Performance

Before: Mean API response time
After: % of API calls completing under 200ms
Result: Found geographic disparities, added regional caching, improved retention 22%

The Framework: THRESHOLD

T - Target a specific user experience
H - Hypothesize what "good" looks like
R - Research to validate the threshold
E - Establish percentage-based goals
S - Segment to find improvement opportunities
H - Help those below the threshold
O - Optimize until majority succeed
L - Lock in gains with monitoring
D - Document impact on business outcomes

Your Metrics Transformation Checklist

For each current average-based metric, ask:

☐ Distribution Check: Is the data normally distributed? If not, averages lie.

☐ Outlier Impact: Can a few extreme values distort this metric?

☐ Experience Mapping: Does the average represent any actual user's experience?

☐ Threshold Clarity: Is there a clear "good enough" line we care about?

☐ Action Bias: Will improving this metric require helping those who need it most?

☐ Story Power: Can stakeholders intuitively understand what success means?

Advanced Threshold Techniques

Multi-Threshold Tracking

Instead of one line, track multiple meaningful boundaries:

Performance Tiers:

Critical: % under 1 second (delight)
Acceptable: % under 2.5 seconds (satisfactory)
Problematic: % over 5 seconds (frustration)

This gives richer insight than any single average could provide.

Threshold Velocity

Track how fast you're moving users across thresholds:

Monthly velocity = (Users above threshold this month - Last month) / Total users

This shows momentum and helps predict when you'll hit goals.

Cohort Thresholds

Apply thresholds to user segments:

New users: 80% should activate within 7 days
Power users: 95% should have sub-second experiences
Mobile users: 70% should complete tasks without errors

Different standards for different contexts.

The Bottom Line: Means vs Reality

Averages are comfortable abstractions that let us pretend we understand populations. But users don't experience averages—they experience specific realities that fall above or below meaningful thresholds.

Stop asking: "What's the average?"
Start asking: "What percentage are succeeding?"

This shift will:

Reveal hidden problems averages obscure
Focus improvements on those who need them
Simplify statistical analysis and testing
Create clearer communication with stakeholders
Drive real impact instead of metric manipulation

Key Takeaways

🎯 Averages hide distributions that contain the real insights about user experience

📊 Thresholds create clarity by defining concrete success criteria everyone understands

🔬 Simpler statistics with proportions eliminate complex distribution assumptions

🚀 Gaming becomes impossible when you can only improve by helping those below threshold

💡 Intuitive communication because percentages and concrete thresholds resonate with everyone

Ready to transform your metrics from misleading averages to meaningful thresholds? Discover how Ara Platforms helps teams measure what actually matters for their users.

The LLM vs Traditional ML Debate: Why Data Scientists Are Missing the Business Context

A controversial take on why 'overkill' LLM solutions often deliver 10x more business value than 'proper' traditional ML approaches