From Nothing to Something (The Beginning)

As part of a personal project, which I’m managing (and actively working) here, I’ve decided to do a little write up on my approach, what I’m learning, and other technical things I’ve encountered. This is as much for my own memory, as it is in the hopes that I can help some others avoid the technical pitfalls that I have encountered.

The Product:

I’ve always been someone extremely interested in data, especially data that no one else is looking at. So, what is the logical place to go? The most accessible data is the data that is already out there for the grabbing. So…scraping.

What has no one else scraped, or at least scraped and aggregated AND displayed well? Game prices across different platforms. There’s aggregators for all different kinds of products (ammo, outdoor gear, etc.) but no one seems to have implemented one for games well, although they have tried.

With that goal in mind we are building a product for people to track game prices, and favorite games so that they no longer have to track news on multiple sites and check multiple web marketplaces for the best prices on games. This means we will be scraping Reddit, Twitter, and other news/social media sites, in addition to game marketplaces like Steam and Sony’s Playstore.

What I Hope to Gain:

At the end of the day, maybe we strike gold by building the coolest website and app that ever existed and people love. More realistically, I want to build a platform with which I can add data as needed for my own wants/needs. I want to become expert level  using certain libraries and frameworks, and be at a point where I’m not just a Business Intelligence and ETL developer but can develop all over the stack as needed with ease.

Also, I want to gain experience in setting up a highly performant, extensible, ETL platform off of which I end up with an app on a marketplace and at least one download. All of which will be done on a shoe-string budget. I can then use that platform to pivot and build any sort of data-centric application for whatever purpose/reason I want.

The Steps:

So, with all this being said, there are three main topics I will be writing about on a broad level.

  1. Writing scrapers with Python’s Scrapy library, which run 24/7 around the clock
  2. Writing ETL’s to a Postgresql database with near real time availability and using a budget AWS instance
  3. Serving up the data to end users using an open source tool

More updates in the coming days!

Why Storytelling is Required

Storytelling and marketing is something that seems to be undervalued by technical individuals in the information technology field. The reason why I’m talking about this? Recently at SXSW in Austin, Contently hosted a talk where Shane Snow discussed the power of storytelling. While the audience attendance was a definitely skewed towards the marketing industry, the concepts that were presented can be applied to any idea or presentation that technical people are trying to sell to customers, managers, or co-workers.

Story Continuation

Shane, in his talk, brought up some interesting statistics that prove a powerful point. People tend to gravitate towards stories that build on existing lore and story lines. The area that was pointed to as proving his point? Movies. Shane mentioned a metric that can be used to demonstrate this. Movie revenue. The question is, does Shane’s theory prove true?Spiderman Movie Layout

If you look at the above, grabbed from The Numbers, it clearly shows a relative trend of decreasing sales revenue for Spider-Man movies. On close examination though, the biggest drop in revenue (~15%) when comparing a movie to its predecessor occurred between Spider-Man 3 and The Amazing Spider-Man…When the continuous story line from the first  three Spider-Man movies was broken.

Jurrassic park - Revenue

But…doing some spot checking, also reveals the opposite to be true. Looking at Jurassic Park’s history of revenues, it appears that sequels where the story line is broken can make just as much (or much more). In order to prove out this theory, it appears that more analysis would be needed to prove this point objectively…Maybe this is an oddity with re-boots of classic series?

Regardless, in SOME cases, when movies break from a continuous narrative there appears to be increased risk of people abandoning interest in the movie/idea.

Familiarity

Additionally, Shane mentioned the power of familiarity. When traveling abroad and being around unfamiliar scents, sounds, and tastes, people tend to gravitate towards the known. The perfect example that many can relate to? Beer. Heineken is sold in over 170 countries. When someone is given the choice between a familiar brand that may even be disliked and an unfamiliar brand, people generally choose the known brand that has familiarity. Thinking about the odd concoctions one might encounter when travelling abroad, what would you rather have?

Complexity of Content

The last point that was presented? The easier that content is to read and understand, the more popular it will be. Mark Twain has books that come up at around a 5th grade reading level according to Scholastic’s system. Even the more modern classics, depending on your point of view, come up at around the same reading level. The lesson? It may make us feel good communicating with big words, but it is not the most effective way to communicate.

So What?

At the end of the day, while these ideas are interesting, what can we learn? Everyone is trying to sell stories every day. In technology/knowledge work, it’s a new design or approach to solve a problem. These strategies can be used to communicate an idea effectively and gain the support of others when combined with logical arguments. Establish a narrative that creates a vision and compelling continuous story line. Do it in a way that anyone could understand, from developers to directors, technical to non-technical. Establish a brand, identifier, or name that people can familiarize themselves with. If technical people peddled ideas that have been implemented half as well as they implement them, it would be to everyone’s advantage. Playing to people’s logic works usually…playing to logic and human nature? Couldn’t hurt.