This short blog extra is a follow up to my Quick Measure example demonstrating agility in PowerBI. If you haven't read that one, I recommend checking it out first.
I mentioned in that post that I'd used the new "AvgDriveMinutes (bins)" field to filter out some outlier buckets -- my column chart actually has three more buckets to the right, for instances where teams averaged 4.5 to 5.5 minute drives in a game, but there are very few instances of this happening -- only 26 times, out of more than 9,000 opportunities. (Remember, there are two AvgDriveMinutes per game, one for each team.)
Actionability is key to a report's value, and outliers are often distractions that reduce actionability. Or worse, lead to poor decisions. In this dataset, teams that averaged 5 minutes of drive time won 100% of their games -- but I don't want to use that to set my team's goal, as the 5 minute average drive has only occurred 5 times out of 9,000+ opportunities. Likewise, in the single instance where a team averaged 5.5 minutes, they lost.
So, I made a scope decision and reduced distractions. Good stuff. But I do need to let my consumers know something's been filtered.
Why? Because it's quite possible (and even likely) that one of my consumers knows the actual number of games/opportunities in the dataset. When that person points out that data appears to be missing*, the dreaded "I don't know if I can trust this data" statement pops up and runs rampant. Vague comments about data integrity issues are cited whenever the scorecard or report shows less than stellar results, and the distraction caused by the lack of confidence is much greater than the distraction those three extra buckets would have created.
The answer is simply proactive transparency. Let the consumer know that results have been filtered, and why. If anyone takes issue with that logic, it can be addressed -- but then you're having a targeted conversation about a specific instance, not a vague excuse that won't go away.
I've included below a full screenshot of the report (so far) with my disclaimer right at the top. Rather than a text box, I chose to use an inactive button, just because it was a quick way to get the "information" indicator.
One last thing about this scope decision. While I'm confident that filtering out those upper was the right thing to do, keep in mind that I applied the filter to the entire page. The "Total Opportunities" box is filtered as well, so it syncs with the chart above.
This means that I've actually made an implicit, secondary scope decision: this page of this report must thematically focus ONLY on Average Drive Time. Why? Because the page filter is based on Average Drive Time.
What if I added a visualization showing Average Yards Gained? That visual would lack the 26 games filtered out by the Average Drive Time page filter. That's probably not appropriate. Sure, I could manipulate the new visual to include all the games -- but then I need to be sure that my "Total Victories" and "Total Opportunities" boxes are clearly tied only to the Average Drive Time visual.
The lesson here is that the report designer must think like a consumer (and get usable input from consumers) when designing the user experience. For better or for worse, user perception impacts the "success" of your reporting far more than technical accuracy. Otherwise, all these reports would be printed on alternating green bar paper with perforated edges.
* Usually right in the middle of a quarterly business review, with every stakeholder present.