Data Day Texas 2017 – A Few Thoughts

Earlier this month I had the opportunity to attend Data Day Texas, and thought that it would be worthwhile to jot down a few thoughts. For those that aren’t aware of Data Day Texas, think of it as a gathering of nerdy IT people and Data Scientists. It was an interesting weekend with a wide range of topics that encompassed everything from machine learning algorithms to more approachable subjects like data dashboarding.

Graphs Are Here

There’s a reason that the keynote by Emil Eifrem was named “The Year of the Graph”. Looking at the popularity trend on db-engines.com, you can see a large gain in the

neo4j-popularity

Neo4j Popularity

popularity of Neo4j. Leading naturally to the question, so what?

I think the major winning point for graph databases, other than performance on certain types of data analytics, is that graph databases are defined with relationships between data. This is in opposition to the approach of the traditional RDBMS which requires explicitly defining the tables in the schema, with relationships as a non-required afterthought in most cases. This means that while constructing the database, what you are doing is explicitly defining the node (core piece of data) and the edge (relationship between nodes). This means that you are enforcing the relationships between data, as opposed to the structure of the data itself. This creates another level of abstraction between the end user and the data, which should make the data in the database more approachable. Oh, and if you haven’t guessed, graph databases are schema-less which is a plus in many cases.

Issues Are Similiar Across Companies/Technology

In particular, there were two talks that hit this point home. The first was given by Chris LaCava from Expero Inc. in which he discussed visualization techniques with graph databases. The second was the discussion of how Stitch Fix sets up their environment for data scientists to work by Stefan Krawczyk.

What’s the root of this? People want to use the tools that work and that they like. Chris LaCava discussed how to do visualization on graph databases. While graph databases can

dashboarding-design-process

Look familiar? From Chris’ presentation on graph database dashboarding

meet some cool use cases as far as data sets and real time analytics go, what was discussed was a  straight forward and common sense approach to dashboarding. Anyone familiar with Business Intelligence and dashboarding should roughly be following the above, or near to it.

Stefan‘s talk was all about using Docker to enable data scientists to use the tools that they want to use. The solution to the complaint that many of us in the industry have when we are locked in with a specific tool-set. The differentiation here was that Stitch Fix has done containerization at scale. This solves that problem by allowing each of their data scientists to run and operate on their own environment, with whatever tool-set they favor to deliver business value.

The Story is What Makes Things Interesting

The final point, which I’ve written about before, is that the story is what makes things interesting. The specific story presented at Data Day? The Panama Papers and how Neo4j was used to discover the unlikely connection that led to the downfall of a Prime Minister. That this was the best marketing tool that I have ever seen in regards to a database.
Having a database GUI that allows for easy exploration of the data natively? That’s a game changer.

This slideshow requires JavaScript.

Looking at the above, you can see a traditional RDBMS GUI (SQL Server Management Studio) versus Neo4j’s GUI. There’s a reason why people don’t pull up SQL Server Management Studio tools to tell a story. Having a database platform that can automatically tell a story about the data is an awesome approach.

 

Why Storytelling is Required

Storytelling and marketing is something that seems to be undervalued by technical individuals in the information technology field. The reason why I’m talking about this? Recently at SXSW in Austin, Contently hosted a talk where Shane Snow discussed the power of storytelling. While the audience attendance was a definitely skewed towards the marketing industry, the concepts that were presented can be applied to any idea or presentation that technical people are trying to sell to customers, managers, or co-workers.

Story Continuation

Shane, in his talk, brought up some interesting statistics that prove a powerful point. People tend to gravitate towards stories that build on existing lore and story lines. The area that was pointed to as proving his point? Movies. Shane mentioned a metric that can be used to demonstrate this. Movie revenue. The question is, does Shane’s theory prove true?Spiderman Movie Layout

If you look at the above, grabbed from The Numbers, it clearly shows a relative trend of decreasing sales revenue for Spider-Man movies. On close examination though, the biggest drop in revenue (~15%) when comparing a movie to its predecessor occurred between Spider-Man 3 and The Amazing Spider-Man…When the continuous story line from the first  three Spider-Man movies was broken.

Jurrassic park - Revenue

But…doing some spot checking, also reveals the opposite to be true. Looking at Jurassic Park’s history of revenues, it appears that sequels where the story line is broken can make just as much (or much more). In order to prove out this theory, it appears that more analysis would be needed to prove this point objectively…Maybe this is an oddity with re-boots of classic series?

Regardless, in SOME cases, when movies break from a continuous narrative there appears to be increased risk of people abandoning interest in the movie/idea.

Familiarity

Additionally, Shane mentioned the power of familiarity. When traveling abroad and being around unfamiliar scents, sounds, and tastes, people tend to gravitate towards the known. The perfect example that many can relate to? Beer. Heineken is sold in over 170 countries. When someone is given the choice between a familiar brand that may even be disliked and an unfamiliar brand, people generally choose the known brand that has familiarity. Thinking about the odd concoctions one might encounter when travelling abroad, what would you rather have?

Complexity of Content

The last point that was presented? The easier that content is to read and understand, the more popular it will be. Mark Twain has books that come up at around a 5th grade reading level according to Scholastic’s system. Even the more modern classics, depending on your point of view, come up at around the same reading level. The lesson? It may make us feel good communicating with big words, but it is not the most effective way to communicate.

So What?

At the end of the day, while these ideas are interesting, what can we learn? Everyone is trying to sell stories every day. In technology/knowledge work, it’s a new design or approach to solve a problem. These strategies can be used to communicate an idea effectively and gain the support of others when combined with logical arguments. Establish a narrative that creates a vision and compelling continuous story line. Do it in a way that anyone could understand, from developers to directors, technical to non-technical. Establish a brand, identifier, or name that people can familiarize themselves with. If technical people peddled ideas that have been implemented half as well as they implement them, it would be to everyone’s advantage. Playing to people’s logic works usually…playing to logic and human nature? Couldn’t hurt.