This week's job search activities produced a rather funny conversation: "I see you've got some expertise with SSIS, is that right?"
Indeed. I love me* some SQL integration work.
"Do you also do any ETL work?"
Blink. Is this is a trick question?
To be fair, the very nice young person who was interviewing me was not a database person. We can't all be that lucky, obviously. And that reminded me -- with PowerBI, you can transform data like a pro (ish) without being a database person, either.
Ready for more NFL data? I'm hosting my full ArmchairAnalysis data in SQL Azure, but just bringing selective bits into PowerBI, pretty much as I discover that I need it. When I started working on drive time analysis, I looked at table [DRIVE.] Awesome table -- detailed information about every drive from every game between 2000 and 2016.
In fact, there are almost 106,000 rows in that table, one for every drive. PowerBI pretty much sniffs contemptuously at such a paltry amount of data, so performance isn't really an issue here. I wanted numerous aggregates, though, spread over a bunch of visualizations, so I decided to transform the data in my PowerBI data model.
Now, I could have gone to the actual data management team and requested creation of an aggregated view. But this is speculative work; I don't know what my final view will need to look like, and I don't know if it'll even be needed long-term. I'd rather engage the full governance process after some experimentation.
After pressing the magic "Edit Queries" button in PowerBI, I connect to my database, select the [DRIVE] table, and start editing a new query.
First, a new column. Drive times in this database are broken into separate integer fields for minutes and seconds. I need a calculated column that combines the two, then translates to fractional minutes. Check out Figure 1. Anyone familiar with moving data around already can create this column in a heartbeat, and the interface/formula should be pretty easy for a non-data person to pick up, especially with a bit of help from web research.
Next, I want to actually group my 100,000+ rows into just one row per game, per team, with a number of aggregated fields. Again, easy stuff. The Group By editor allows me to first choose the columns that I want to group by (the game ID and each team's name) then specify columns to aggregate. Have a look at Figure 2 for this.
And that's it. Transformation on the fly, without an SSIS process, coding, governance process, or changes to the underlying data layer. But again, double-edged sword*** -- this is a terrific way to do some quick modeling and investigation work, followed by proper governance and productionalization. Enjoy the agility, but don't sacrifice long-term stability.
* I'll bet you didn't know we have reflexive verbs in English, huh? In Texas we do. That's right. In this blog you not only get PowerBI and technology information, you learn to speak civilized languages, like Texian.** You're welcome.
** I'll bet you didn't know that "Texian" is actually a word, either. Most spell checkers flag it as incorrect, but that's because they were programmed by Yankees and other people of dubious origin. Texian is an old word referring, of course, to anyone residing in Texas. It originates around the time that the Texas settler community was a blend of two incoming cultures: the Spanish, migrating up from Mexico, and other Europeans (such as Germans and Czechs) moving in from the U.S. Still don't believe me? Call Larry McMurtry or Randolph Campbell. They'll set you straight.
*** Or feral rabbit. Do not anger the feral rabbit.