Phishing Scam Example

Just a quick blog entry here, giving people a place to download the PDF version of the phishing email that I posted on LinkedIn recently. Just right-click on the image below, the choose "Save Target As" to save the PDF to your local drive.


Business Intelligence 101: Infographics

Another appropriate title for this blog would be, "Buying Myself More Time." I'm eager to share the results of my NFL 1-2 Punch analysis with you, but the latest discovery does need in-depth validation. A large number of NFL fans take the 1-2 scoring strategy as gospel, and I'd like to be quite certain of my results before I contradict them.

That certainty reminds me of a conversation with my friend Roland, a few years back. I was rattling off a list of Swedish rock bands, and included Golden Earring. Keep in mind, I love music trivia. I'm very confident in my music trivia knowledge. And for the better part of 30 years*, I was certain that Golden Earring was from Sweden. Roland, who's Dutch, informed me that no, Golden Earring is from The Netherlands. It took me a while to believe him, despite his rather obvious authority on the matter, simply because I had been so certain they were Swedish.**

Back to the subject at hand: infographics. I used the term "infographic" last week with a non-IT, non-corporate friend and was asked, "What exactly is that?"

Simply put, an infographic is a visual representation of data or information. Seems straightforward. Aren't you just trying to impress me by using a fancier word than "chart?"

Fig. 1: So, who was part of the band in 1997? 

Not exactly. Consider this differentiation: when you look at the visual, no matter what you call it, do you get a sense of overwhelming detail, or do you immediately get a feel for the overall picture?*** A good infographic should produce the latter. More detail on infographic functions in a moment.

I stumbled across a great example while checking some items for yesterday's blog on band compositions. Wikipedia has some excellent infographics showing membership in popular bands over time. Check out the two screen captures: the first gives a simple textual list of people who have been part of Styx. The second is a graphical timeline.

Fig. 2: Much easier to read, and actually provides some insight beyond simple membership.

Want to know who was in the band in 1978, 1997, or 2003? You'll probably answer those question far more quickly using the infographic. It also provides some immediate insights that you don't get easily from the text version. Apparently the band was inactive from 1984 through late 1990, and again from 1992 to 1995. And it prompts an interesting question: what was that brief stint in 1995? 

Those are the things a good infographic should or can do: first, give you a good feel for the overall picture. Second, provide a visual that helps commit the general picture to memory. Third, make it easy to answer basic. Fourth, call out significant insights, and finally, prompt questions that might be worth following up. And a good infographic should do these things more effectively than text or grid delivery.

If you're still with me, how about a little homework? Send me a link with an interesting infographic. If I receive a few of these, I'll follow up with another post to share people's favorites.

* 30 years is nothing compared to the astounding 48 year+ consistency of the band. George Kooymans and Rinus Gerritsen founded the band in 1961 and have stayed together the entire intervening time. The two other current members, Barry Hay and Caesar Zuidewijk, have been with them since the late 60s or early 70s. Freaking amazing. I see you guys aren't planning on any Texas visits, probably because of my transgression with the whole "they're Swedish" thing, so maybe I need to visit Oss in July. 

** I was right about ABBA, Europe, and Ace of Bass, at least.

*** In my opinion, this is one of the top flaws with business intelligence delivery. Report consumers love big scorecards with dozens of KPIs, but the report author should always consider time to action. When a consumer sits down to look at this report, how long does he or she need to determine the action to be taken?

**** This was a brief reunion, re-recording songs for a greatest hits album. They planned to tour as well, but unfortunately, John Panozzo was in bad health and passed away. 

It's A Valid Question, Mr. Hunter!

Today's blog topic was planned to be analysis of the NFL's 1-2 Punch theory, but I'm invoking my Second Rule of BI: check your work. Stakeholder resistance to a result is directly proportionate to how strongly that result contradicts the stakeholder's expected result.* So, when the outcome is dramatically different from "common knowledge," it pays to spend some time double-checking. Not to mention prepping a presentation, because you're going to have to go in-depth to reassure your audience.

So, I'm going to postpone the results of the 1-2 Punch analysis and post an article I started last fall, before life was derailed for a few months. This'll give me some time** to explore the NFL analysis more in-depth, so I don't present something that truly is flawed.

Alan Hunter is one of my favorite people to listen to, whether it's on SiriusXM or Twitter. In case you're not familiar, Hunter was one of the original MTV VJs, and is now a host on two of my favorite Sirius channels, 80's on 8 and Classic Rewind. He's a prolific Twitterer or Tweeter or whatever the hell we call it, and seems to be a genuinely nice guy, and the world could use a few more of those.

He's also very responsive to his fans on Twitter.*** Last fall, when I was giving some serious consideration to buying a ticket for the 2018 80's on 8 Cruise, I tweeted a question: would Fee Waybill be on the cruise? I think Mr. Hunter's response was giving me a small dose of good-natured sarcasm. (Like I said, genuinely nice guy.)

Still, it elicited some good-natured indignation from me. I know Fee Waybill is the lead singer for The Tubes.**** It's still a valid question. This is 2018, after all. If you buy tickets for an "80's band" concert, you'd better check the lineup closely.

Tubes Tweet.png

Case in point: a few years ago I saw Yes at WinStar Casino...and walked out thoroughly disappointed. The performance was excellent -- but they played almost none of my favorite songs. No "Leave It." No "It Can Happen." No "Love Will Find a Way." Why? Because there are two different Yes configurations today. If you're seeing Geoff Downes, Steve Howe, and Jon Davison, you're going to get a totally different set list than if you see Jon Anderson, Trevor Rabin, and Rick Wakeman. 

Or how about Fleetwood Mac? I remember being really excited a few years ago to see Fleetwood Mac tickets on pre-sale (a year ahead of time) until I read that Stevie Nicks wasn't in the lineup. I'm sure it was still an awesome concert, but to me, just not the same.

The Cars, Styx, A Flock of Seagulls, Journey...fair to say, whenever you see any 80's band today, you should ask, "Who exactly IS Styx today?" Or in the case of Van Halen, "So, who's singing this month?"

That in mind, I can't wait to see Jeff Lynn's ELO in August. Yes, this is a very different lineup from ELO Part II. (But at least in this case each version of the group performs the original ELO catalog, as far as I know. If I don't hear Jeff Lynn singing "Can't Get It Out Of My Head," I want my money back.)

Mr. Hunter, maybe I'll run into you at the United Center for the Bon Jovi concert in April? But I won't ask you if Jon Bon Jovi is going to be there.  :)

* Irony: when the analysis supports the pre-supposition, everyone's happy to say, "Looks good. It's just like we thought!" Contradict the assumptions, though, and the first argument is that either the data or the analysis is wrong. Or both.

** I don't actually have time for this. I've got job applications to fill out, a screenplay to finish writing, and an iOS development tutorial that I really want to complete. But this NFL data is so intriguing...

*** I find this particularly cool. Too many celebrities forget that without the're not really a celebrity.

**** If you didn't know this, don't be embarrassed. It just means you haven't spent enough time on music trivia with me. To impress and enlighten your other friends, just reference Waybill's excellent solo contribution to the St. Elmo's Fire soundtrack, "Saved My Life."

Business Models versus Data Models

This morning I got back to one of my high-priority questions regarding my NFL data analysis -- if statistically there's a significant advantage to scoring first, why do teams almost universally defer to the second half after winning the coin toss?

The most prevalent theory seems to be that there's a bigger advantage in a 1-2 punch: scoring at the end of the first half, then scoring again immediately after receiving the opening kick of the second half. Good news -- with my handy data set from, we can put that theory to the test.

Jacoby Ford.jpg

Time to augment the dataset!* I added a number of fields, tracking which team was on offense for the opening drive, which team was the offense for the final drive of the first half, and which team was on offense for the first drive of the second half. Next, how many points were scored in the H1 final drive, and how many points were scored in the H2 opening drive. This gives me all the data I need to test the hypothesis. 

But wait! I compiled these new fields for ten games, just a small set for a quick check of the data logic. Good thing, because game #3 in my dataset showed the bane of the business intelligence professional: an anomaly that seems to contradict the business model.

Specifically, in the Week 1 of the 2000 season, the Eagles somehow were on offense for the first drive of EACH half. According to the business model, this can't happen. The team that kicks off first in the first half will receive the first kickoff of the second half.**

So, what's going on here? Is my business model incorrect? Or is it a data issue?*** Fortunately, this dataset has the mother of all sports data: play by play information for all 4,523 games represented. It's a virtual ton of data. Analyst heaven.

And it provides the answer. Those crafty Eagles actually kicked and recovered an onside kick to start the game. Thus, in the dataset's [Drive] table, Philly is on offense for the first drive of the game. They still get to receive the ball first in the second half, and since Dallas did not try the same maneuver, Philly ended up on offense for both opening kicks in the game. Go figure.

Is this an error in the business model or the data model? Neither, in my opinion. The business rule is that each team must make one of the opening kicks. The data definition of a "drive" is a series of events demarked by one team's possession of the ball. The kickoff is a specific business process event, and that event does not constitute a "drive."

To prevent confusion, our documentation (our data dictionary, information model definition, white papers, etc.) should include a thorough explanation of this important point. Analysts need to understand it so they handle the scenario correctly in calculated fields. Report consumers need to understand it so they don't make misguided decisions. Remember that documentation blog that I posted last week? (It's the one you skipped.)

But wait...this isn't the only scenario where the team receiving the opening kick doesn't also own the first drive, according to the [Drive] table. In the 2010 season, Miami kicked off to Oakland, and Jacoby Ford returned it 101 yards for a touchdown. In the data model, no actual drive occurred -- only the kickoff event. The first drive of the game occurred after Oakland kicked, and Miami started an actual drive from their own 1 yard line. They did go on to win, despite taking the more common route to scoring.

All right, I've discovered something crucial about my dataset. I've explained it to my stakeholders. But...the job isn't done. Before I can move on to my question about the 1-2 Punch, I must decide how to handle these scenarios during analysis. First step -- determine how often they occur.

Turns out there are 159 instances of the same team seemingly owning both "opening" drives. That's 3.5% of my total games, not an insignificant number. If this was a professional project, I'd have a governance process defined for an official decision on how to account for the scenario in reporting. Since it's my personal project, though, I only have to agree with myself. Best stakeholder ever.

The solution: add a flag to conveniently filter these games out of the analysis when I (finally) get to that question of the 1-2 Punch. But since today's blog was hijacked by my seeming anomaly, that work will have to wait until tomorrow. Such is BI.

* Data analysts get so excited about adding calculated fields to the dataset. I think that's our version of creating the CGI fire from the dragons in Game of Thrones.

** Unless the kickoff rules in 2000 were as unclear as the current day catch rules are, that is...

*** Dear BI Analysts: you may as well assume it's a data issue, until you prove otherwise. Your stakeholders certainly will. Low customer satisfaction? Bad resource utilization? Blew through the budget in one quarter? Prove it isn't a data error, then we'll talk.****

**** Keep in mind, as irritating as that attitude can be, that should be the attitude of the BI professional. You need to be 100% confident in the data before you tell your stakeholder that the data's fine.

Game of Thrones Teaches Business Intelligence!

While waiting impatiently for the next season of The Man in the High Castle, I've been re-watching Game of Thrones. Given that HBO has put off the final season by a year, and George R. R. Martin is now tentatively scheduled to deliver the final books sometime in the year 2030, I'm picking up a few of the tidbits that I missed the first time around.

A great example came up sometime in Season Six. Strife in the Greyjoy household! Uncle Euron comes home, murders his brother, and assumes kingship of the Iron Islands. Theon and Yarra, his niece and nephew, steal the best ships from the Iron Fleet and sail away, presumably with a significant number of the island warriors.


Uncle Euron is undeterred, however. He holds forth with a rousing, inspirational speech, which ends with, "Build me a 1,000 ships, and I'll <something something something about conquering the world.>"

Um, say again? Build 1,000 ships? 

Yep. Euron commands all hands on deck. Every man felling trees, cutting lumber, and banging ships together. Every woman stitching sails and weaving rope. Apparently it's just that easy. Time is a bit murky in the GoT world, but it's certainly only weeks or months before Euron is sailing with a massive armada that decimates Yarra's fleet.*

It occurred to me that I've actually seen this episode many times, and not just on HBO. Operations, and BI in particular, are often the subject of the "Let's just dive in and do it in no time!" mentality. We've got a business problem that we didn't anticipate. Someone thought of a major new initiative while showering. Or worse, someone attended a workshop, saw some cool data visualizations that a vendor spent months building in the app they're trying to sell, and wants something similar in the ecosystem -- tomorrow.

"Build me 1,000 KPIs, and I'll give you the Seven Kingdoms!"

Here's the problem: building a ship isn't that easy. There are plenty of skilled trades involved in shipbuilding, and even with motivation provided by leadership's "can do" attitude**, the application of massive manpower rarely makes up for the correct skill and experience. One hundred foot soldiers can't significantly speed up the proper shaping of a keel, and if the keel is faulty, the ship is ineffective.

Likewise, a dozen vendors who know nothing of your business and data can't slap a quality BI infrastructure together overnight. So, how about pouring on the labor from the opposite direction? Open up the BI platform to those who have the business knowledge, if not the technical skill. After all, your account managers, team managers, and sales folk can just pick up some data skills with an hour on, right?***

There's a big difference between using a tool daily and having the skills necessary to build the tool. Most sailors spend a great deal of time onboard a ship, one might expect, and they're probably aware that it's supposed to float. Put them in charge of building one, and it'll probably have as auspicious a career as the CSS Neuse.****

Of course, you may be thinking that this is a terrible analogy because Euron Greyjoy's team DID finish 1,000 ships in virtually no time, smashed the Iron Fleet, and captured a bunch of key enemy leaders. Don't forget -- Game of Thrones is fiction. And fantasy. Reality would have made that subplot would have made that subplot a bit anti-climactic, when Euron's armada set forth and promptly sank.

Strong leadership hires strong specialists, then paves the way for those specialists to do a quality job. Cut corners and throw wrongly skilled labor at your projects, and you may as well wait for the dragons to fly in and save the day.

* I'm not apologizing for "spoiling" an episode that's been out for over a year.

** Or the motivation provided by Uncle Greyjoy's "do it or die" attitude.

*** I also don't believe that 1,000 monkeys on 1,000 typewriters will recreate the works of Shakespeare. However, they probably could come up with the NFL's rules on what constitutes a catch.

**** It's an interesting story. Look it up; you can thank me later.

Don't Forget the Documentation

Apparently it's going to be business intelligence week on my blog, since this entry on my fifth rule of business intelligence makes three in a row* on the subject. This time, I'm going to give you the rule right up front: documentation must be a requirement of BI development, not an option.

This might seem like a no-brainer -- if you've never worked in an actual IT or operations department.

In a decidedly non-scientific method of evaluation,** I'm going to hazard a guess that documentation is considered a "nice to have" in about 99% of BI (and general IT) operations. More specifically, documentation is an activity that leadership and management teams typically refuse to budget time for, yet lament the lack of when things aren't so clear later. The strongest adherence to a culture of good documentation tends to be found in project management, but there's far more to documentation than just the project charter, Gantt charts, and status reports.

On the developer-facing side you've got data source, acquisition, and transformation information. Development standards, style guides, platform strategy and history, data governance, retention policies, and relationship models. Facing the end user, you've got business rules references, metric and KPI guides, subject overviews, and access policies. And that's just a short list -- there are far more subjects that need quality documentation in order for your data to become a usable information asset.

Predictability is one of the major focal points as a BI matures. Every stakeholder wants to know when his or her request*** will be ready. The BI team and vendor resources grow more rigorous about organizing development into sprints, performing VROMs, and predicting the number of hours for each task. And that prediction rarely includes thorough, high-quality documentation.

After all, time is money, and it's bad enough the world has to wait 20 person-hours for that next customer satisfaction report. Two more hours for governance and documentation?**** If we just forego those activities on the next ten projects, we could accomplish an additional project in the "time saved!" We'll come back and document everything when we have some "breathing room."

Breathing room, of course, tends to occur on the 20th of Never.

Even in the one-man show of my NFL analysis I keep a rudimentary set of documentation. A field name that seems quite descriptive today can be ambiguous in a very short time of non-use. The couple of minutes I spend tracking notes in OneNote or Excel are time better spent than an hour of reverse-engineering my code later, or worse, sharing incorrect analysis because I forgot a definition.

Rudimentary, sure, but sufficient for preventing mistakes and saving time in future development.

Rudimentary, sure, but sufficient for preventing mistakes and saving time in future development.

Ever wonder why the FDA requires ingredients to be listed on food packaging? It's so you can understand what's in your food, and avoid making bad decisions. Sure, people still make plenty of bad dietary decisions, but with better information, they have a better chance of making a good decision. No one ever looks at a Cheeto and says, "Wow, a bag of these will help me lose weight!"

In BI, lack of documentation or low-quality documentation precipitates significant mistakes. A developer can create a new measure with an incorrect calculation. Stakeholders can waste time debating results because their definitions of KPIs differs. Incorrect information can be issued publicly, or to external customers.

The solution is simple. Change the culture of your organization such that proper documentation is part of the development process. Budget the time and resources to include documentation activities. Don't allow the process to be put off until that non-extant breathing room appears. And once that high-quality meta-information is available, try reading the side of the package every once in a while.

* That's right, a perfect 5 for 7!

** I.e., my gut feel after 20 years in this discipline.

*** Keep in mind that each stakeholder's current request is always the most crucial, make-or-break-our-business request in the history of the company. And Earth itself.

**** Next you're going to be demanding bathroom breaks!