This Is Why You Should PowerBI

Okay, so there are really about a thousand reasons why you should use PowerBI. I'm going to suggest one of my favorites, along with an example that, once again, makes use of some splendid NFL data from Armchair Analysis. If you haven't read my other BI blogs using this data* here's a quick summary: all the games from seasons 2000 through 2016, play by play. I loaded them into SQL Azure for some modeling and analysis, and now I'm using PowerBI for visualizations.

But wait! I'm not using PowerBI only for visualizations. Though I only intended to whip up some quick analysis aids for my stakeholders**, I ended up doing some data modeling, data cleanup, and data validation at the same time. And that's one major reason you should use PowerBI: one single tool facilitates end-to-end business intelligence, from data acquisition through modeling, analysis, and presentation.

This completeness has particular significance when you consider the increasing specialization of data roles. It's more and more common today for different people or teams to own data warehousing, automated reporting, and data science. It's also likely that non-IT departments have analytical data consumers, responsible for specific business subjects.

Figure 1. Very hard to read.

Quite often, analysts and data scientists need action to occur at a lower level of the data spectrum, but they don't have access (and often lack the skills) to change the underlying model. Or, they lack the time -- proper data management process likely means that change takes hours or days, and an analyst working at the speed of business might want to experiment now. I'm about to show you an example of how PowerBI facilitates this.***

 Recently I decided to look at the relationship between an NFL team's average drive time, average points scored per game, and likelihood of winning the game. One of the first charts I built (Figure 1) shows average drive time across the X axis, against average points scored on the Y axis. Problem is that the AvgDriveMinutes field is precise to fractions of minutes, and there are 4,522 games in this dataset. Two teams per game -- there are over 9,000 distinct values represented by those tiny lines.****

Figure 2: My data has been grouped into buckets. Or bins.

The answer here is to group those 9,044 values into buckets, of course. But if I'm in a large shop with isolated data roles, I probably have to submit a change request and wait for someone else to add these buckets to the underlying data. 

Instead, it took me about fifteen seconds to create buckets (bins) directly in the PowerBI data model. Check out Figure 2. (If you click on these images they'll expand for better viewing.)

Figure 3: Field added.

PowerBI did something else in the background: the report doesn't simply group AvgDriveMinutes; a new calculated field has been added to my data model. You can see "AvgDriveMinutes (bins)" in the field list (Figure 3.)

What's significant about this? The "AvgDriveMinutes (bins)" measure can now be used in other places in my report. You might have noticed that I'm already using it to limit the display in the main chart -- there are only 26 data points in the buckets for 4.5, 5.0, and 5.5 minutes, and I chose to filter those out. I'm using that field as a page filter; I'll demonstrate why in a future blog.

For the "final" product, have a look at Figure 4. I've added a line showing the percentage chance of victory for teams with an average drive time in each bucket. I see an interesting insight here: as teams move from 1.5 minute drives to 4.0, average points only vary by a single touchdown (with extra point.) However, there's clearly a change in winning percentage for teams that average at least 2.5 minutes per drive. 

In my next blog I'll add an entire table to my data model on the fly, show some slicers, and we'll have a look at how this relationship changes during the playoffs.

Figure 4: Your team needs to average 2.5 minutes or more in drive time for an edge.


* If you haven't read them, go back and read them. Geez.

** The stakeholder list consists mainly of myself and Rob Garden, because I find this stuff quite interesting, and Rob actually calls and chats with me about it on occasion. Rob, tell your boss I said you're not being paid nearly enough.

*** Serious footnote for a change. Circumventing data management is certainly not a "best practice." In a later blog I'll address solid data practices versus agile business.

**** Did you find the error in Figure 1? Obviously, no one's average 60 points per football game. At least, not since early 1990s Texas A&M. When I took the screenshot for Figure 1, I hadn't yet tweaked the aggregation for average points scored.