Automated Reporting With a Raspberry PI Part 1
In this post, I'm going to talk about how we went about automating our reporting using a Raspberry Pi, Python, and MySQL. This is a big topic, so in future posts I will drill down to the details of installing MySQL, using SQLAlchemy as an ORM, writing Python scripts to automate data collection, and connecting Excel to a remote MySQL database.
When I joined QueryClick a couple of years ago, I did an assessment of our client reporting. Refreshingly, there there had been a decent stab at doing things the ‘right’ way:
- Importing from external ‘raw data’ workbooks
- Standardised templates for consistency of metrics and presentation
- API integration with Excel Addins such as Next Analytics and the excellent SEOTools for Excel by Nils Bosma
By means of an introduction to the broader vision of completely automated reporting, I introduced a couple of small improvements:
- Using pivot tables wherever possible
- Using SQL to query data from external data sources
- A small .NET application, codenamed ‘scoop’, that extracted data from the Google Analytics API queries and stored the results in different workbook locations
Fundamentally, however, the goal was a scaleable ETL solution. And for that we needed to think bigger…..
Except in fact, we thought smaller. Very small. Raspberry Pi small…
Why use a Raspberry Pi?
Creating an enterprise reporting system is a big job, and one that, ultimately, is not best suited for such limited hardware. However, I wanted the project to follow some core principles:
- KISS (keep it simple stupid)
- Low cost of entry
- Generate interest. There is nothing like jumping on the Pi-wagon for generating a bit of excitement for geekery
- Teach myself and other developers to be conservative with hardware resources
After using this solution for several months, and saving a significant amount of reporting time, we now have plans to scale this to new hardware (to allow for GUIs and multithreading). However it has worked amazingly well and the solution would fit any organisation looking to embark on the initial stages of business automation.
How it works
As there is only 512mb of RAM on a Raspberry Pi, we chose to strip the solution back to it’s bare bones:
- A MySQL database for storing data - hosted on the Pi
- A collection of Python ETL (Extract, Transform, Load) scripts used to gather data from the web and import it into the MySQL database hosted on the Pi
- Excel workbooks that connect to the Pi MySQL database via the MySQL ODBC connector
A core aspect of what we did was to to think of the Raspberry Pi as a 'swiss army knife' of report automation: a collection of tools that could help automate extraction of data from various sources.
We packaged re-usable scripts into modules to reduce redundancy, and used a single Python virtualenv (more on this later) to ensure that any tricky library installations were done only once. There are drawbacks to the single virtualenv approach (compatibility) but the benefits of allowing various teams, relatively inexperienced in Python, to contribute code without their efforts snagging on installation problems, far outweighed them.
We then began looking at where most amount of time was spent, and where the easy wins were - often the quickest wins were when we knew we could get the data out easily, such as plugging into the Google Analytics API.
We also looked at everything in an agile way: in the beginning of any data connection project, the goal was always to simply save some time. For example, getting data from the Google Analytics API and dumping it into CSV files that could replace existing raw data workbooks was a great time saver. Imporantly, it immediately fit into the current workflow - minimising disruption and bringing the benefits of the solution to end users straight away.
Over time, we would extend and improve the scripts - integrating SQLAlchemy for example was hugely advantageous (again, more on this in a a subsequent post) .
Before this project began, you could hear the collective teeth grinding from the account managers when ‘reporting week’ came around.
Using the approach above we have reduced the overall reporting time by around 50%, freeing time for account managers to analyse the reports and create strategy, instead of simply collating them.
Now, although we have more clients, and more reports, the vast majority of this work is delivered by just one Technical Analyst, in less than a week.
Furthermore, by focusing on the ETL stuff (extracting data and storing it in a data warehouse) we have created a flexible foundation for reporting that can evolve easily and quickly to allow for other tools and data to be integrated in an agile way.
Keep an eye out for future posts on automated reporting using a Raspberry Pi, where I'll discuss installation of MySQL, Python and Virtualenv, SQLAlchemy and bringing it together with Excel templates that use ODBC.