Getting back into NumPy, Day 2
Short of time today, so just a small step forward. Yesterday I made the groundwork for the upcoming time series analysis. I imported the data from a CSV file into pandas and made a simple plot of the mean number of keystrokes for every minute of the day.
Today’s goal is to make a heat map. It should show the number of keystrokes for every minute of the day, shown for every day since I started logging. This is inspired by the visualizations listed on the github repository for Minute.
Quite early I ran into a problem. Since I’m going to use imshow to show the heatmap, I need a matrix. The matrix needs to have n rows, where n is the number of days, and 1440 columns, one for every minute of the day. But, well, my first log entry isn’t at 00:00:00 on the first day, and my last entry isn’t at 23:59:00 on the last day.
This is where the upsampling procedure I introduced yesterday pays off. I just need to add an entry for 00:00:00 on the first day, and 23:59:00 of the last day, and let Pandas do the rest. The code for this is as follows:
Now we have a complete set of rows with 1440 entries.
The heat map code I borrowed from the article Time Series Data Visualization with Python from Machinelearningmastery.com. It’s straightforward and it uses the groupby functionality. Introducing a small helper function, we can first group the entries by the day from the first logging date and then add them in a matrix that we give as input to imshow. Changing to a nice heatmap gives a nice color.
The full code for this is as follows (the function datetime_to_minute_of_day I introduced yesterday):
This yields a nice heatmap, showing how little I actually use my laptop! :-D