HomeNews / Events → Automated Reporting With A Raspberry Pi: Part 1

Automated Reporting With A Raspberry Pi: Part 1

18th July 2014 > QueryClick

In this post, I'm going to talk about how we went about automating our reporting using a Raspberry Pi, Python, and MySQL. This is a big topic, so in future posts I will drill down to the details of installing MySQL, using SQLAlchemy as an ORM, writing Python scripts to automate data collection, and connecting Excel to a remote MySQL database.

The Background

When I joined QueryClick I did an assessment of our client reporting. Refreshingly, things were already being done the ‘right’ way:

  • Importing from external ‘raw data’ workbooks

  • Standardised templates for consistency of metrics and presentation

  • API integration with Excel Addins such as Next Analytics and the excellent SEOTools for Excel by Nils Bosma

By means of an introduction to the broader vision of completely automated reporting, I introduced a couple of small improvements:

  1. Using pivot tables wherever possible

  2. Using SQL to query data from external data sources

  3. A small .NET application, codenamed ‘scoop’, that extracted data from the Google Analytics API queries and stored the results in different workbook locations

Fundamentally, however, the goal was a scaleable ETL solution. And for that we needed to think bigger…..

Except in fact, we thought smaller. Very small. Raspberry Pi small…

Why Use A Raspberry Pi?

Creating an enterprise reporting system is a big job, and one that, ultimately, is not best suited for such limited hardware. However, I wanted the project to follow some core principles:

  • Low cost of entry

  • Generate interest. There is nothing like jumping on the Pi-wagon for generating a bit of excitement for geekery

  • Teach myself and other developers to be conservative with hardware resources

After using this solution for several months, and saving a significant amount of reporting time, we now have plans to scale this to new hardware (to allow for GUIs and multithreading). However it has worked amazingly well and the solution would fit any organisation looking to embark on the initial stages of business automation.

How It Works

As there is only 512mb of RAM on a Raspberry Pi, we chose to strip the solution back to it’s bare bones:

  1. A MySQL database for storing data - hosted on the Pi

  2. A collection of Python ETL (Extract, Transform, Load) scripts used to gather data from the web and import it into the MySQL database hosted on the Pi

  3. Excel workbooks that connect to the Pi MySQL database via the MySQL ODBC connector

Approach

A core aspect of what we did was to to think of the Raspberry Pi as a 'swiss army knife' of report automation: a collection of tools that could help automate extraction of data from various sources.

We packaged re-usable scripts into modules to reduce redundancy, and used a single Python virtualenv (more on this later) to ensure that any tricky library installations were done only once. There are drawbacks to the single virtualenv approach (compatibility) but the benefits of allowing various teams, relatively inexperienced in Python, to contribute code without their efforts snagging on installation problems, far outweighed them.

We then began looking at where most amount of time was spent, and where the easy wins were - often the quickest wins were when we knew we could get the data out easily, such as plugging into the Google Analytics API.

We also looked at everything in an agile way: in the beginning of any data connection project, the goal was always to simply save some time. For example, getting data from the Google Analytics API and dumping it into CSV files that could replace existing raw data workbooks was a great time saver. Imporantly, it immediately fit into the current workflow - minimising disruption and bringing the benefits of the solution to end users straight away.

Over time, we would extend and improve the scripts - integrating SQLAlchemy for example was hugely advantageous (again, more on this in a a subsequent post) .

The Results

Before this project began, you could hear the collective teeth grinding from the account managers when ‘reporting week’ came around.

Using the approach above we have reduced the overall reporting time by around 50%, freeing time for account managers to analyse the reports and create strategy, instead of simply collating them.

Now, although we have more clients, and more reports, the vast majority of this work is delivered by just one Technical Analyst, in less than a week.

Furthermore, by focusing on the ETL stuff (extracting data and storing it in a data warehouse) we have created a flexible foundation for reporting that can evolve easily and quickly to allow for other tools and data to be integrated in an agile way.

Next Steps

Keep an eye out for future posts on automated reporting using a Raspberry Pi, where I'll discuss installation of MySQL, Python and Virtualenv, SQLAlchemy and bringing it together with Excel templates that use ODBC.

Share & Discuss

Latest News / Events

Applying The Big Idea Concept To Data-Driven Storytelling

11th July 2016 → Flora Bui Quang Da

Sure, data-driven storytelling is the next big thing in content marketing. But how do you create that narrative?

Event: Not Provided? Finally, Completely, Solved.

01st August 2016 → Event

See how to return granular “Not Provided” search data to your analytics package, find new insights and turbo-charge your SEO Performance. 30th September - 9am To 11am at the RSA , 8 John Adam Street, London WC2N 6EZ Coffee, tea and pastries will be available.

Campaign Insider

Keep on top of the latest from the world of search. Relevant news, developments and search-based updates to make sure you’re always in the know.

Edinburgh

QueryClick Ltd
7 Castle Street Edinburgh EH2 3AH
+44 (0)131 556 7078 Map

London

QueryClick Ltd
11 Argyll Street London W1F 7TH
+44 (0)207 183 0344 Map

Contact Us

hello@queryclick.com

+44 (0)131 556 7078


Company number: SC342868

VAT number: GB 935518900