Skip to the content.

Back to Homepage

The Challenge

The Reddit DataIsBeautiful group holds friendly, monthly data visualization competitions. Every month there the moderator presents new data sets and different challenges to be tackled by participants.

The August 2018 competition challenged participants to visualize United States (US) Transportation Security Administration (TSA) data.

The Data

I chose an exploratory analysis asking the questions that immediately came to my mind when looking through the data.

The TSA data sets are available on the US Department of Homeland Security website.

The first several years of data sets are in Excel friendly formats but the last two years of data (2016-2017) proved to be a challenge as they were provided in PDF format. After exploring options available through both R and Python, I chose the Tabula package to extract the data from the PDF into a table. I’ll continue to call it Tabula instead of Tabula-py because I used the term “tabula” in the code.

Tabula

I used Tabula with my IDE of choice, PyCharm, and opened up a Jupyter notebook because at the time I felt it was easier to work with.

With the tabula package, I converted the pdf into a Python data frame, then converted the data frame into a csv file. I ran into a couple issues trying to do this.

An image of an Excel spreadsheet

An image of an Excel spreadsheet

Data Visualization

I chose Tibco Spotfire as my data visualization tool. After importing all the files into Spotfire, I used my own curiousity and questions to walk through some visualizations. Putting everything in place proved to be challenging and in the end there may be a little bit too much in the visual. I tried to choose a color scheme that looked nice against the heat map. Blue, which I initially considered to represent the sky, felt a little too dull. I am biased towards orange and yellow and that is the direction I ended up going.

See my August 2018 submission below. All the entries can be seen on the Reddit group page.

An image of an Excel spreadsheet