Ember’s Digital Transformation: A Case Study of improving the data model and processes in the Data Team
Introduction by Subak
At Subak our mission is to help not-for-profit startups to level up their capabilities and increase their impact. Ember is a think tank using data to ensure a safe climate powered by clean electricity and we were proud to support them as one of our first member organisations. Ember gathers vast quantities of electricity data from around the world to create statistics and reports, with the aim of landing widely read coverage and changing the debate on the transition from fossil fuels to clean energy.
As Ember grew rapidly over the last few years, so did the responsibilities of the Data Team: handling large amounts of key data; transforming and cleaning it; as well as presenting it and turning it into useful content. Yet as Ember grew, the processes and tools the organisation used to manage this struggled to keep up. Subak encouraged Ember to bring in an experienced hire to lead on a digital transformation for the Data Team and the organisation as whole, with the aim of increasing efficiency and saving time. This was crucial given Ember’s aim of upping its annual Global Electricity Review to a monthly data release cadence.
Ember contracted Jeremy Fletcher to lead this effort, bringing decades of experience as a technical lead and data solutions architect. This blog is a deep-dive into how he tackled data challenges at Ember, the tools he introduced, and ultimately the results he helped the team to achieve. We hope that his account can give some insight for others facing similar challenges.
Setting the scene
This was a very different engagement to any others I had taken on. Firstly, having worked on projects across many different sectors, this was my first foray into the not-for-profit world. With increasing concern and alarm over our climate emergency, working for a climate think-tank was an opportunity I couldn’t let pass. And although this slightly alien world had a Data team, it also had a Coal Mine Methane team, clearly something I hadn’t come across anywhere else! Secondly, the remit itself within the Data Team was very open ended – “we know we could do with some help, we’re just not sure what that looks like”, or words to that effect. With Ember being a small organisation, the 3-person Data Team wears many hats, not only dealing with the data processing, but also acting as data analysts, writing content for reports, and producing visualisations both within the reports and on the website.
Assessment
Stage 1 of the engagement was to understand the current data landscape. Given that Ember currently publishes two main reports each year – the European Electricity Review and the Global Electricity Review – these were the primary focus of the assessment. As a team, we worked through the flow of the existing processes, capturing the current steps diagrammatically. The team were clearly already doing some really positive things, like using GitHub as a code repository, but there were also many areas that flagged up as requiring some attention. For example, there were a number of steps within the existing processes that required manual intervention; all data was stored in CSV files within Dropbox; and there were inconsistencies in the way both the code and the data were designed and managed.
The output of the assessment was a document outlining the current state of play with a series of proposals to address some of the shortcomings. One of the challenges was that the initial short contract left a very limited time to implement any of the proposals.
As an aside, although every sector and company always has its own data challenges, there were issues in electricity data that I hadn’t come across before in my 20+ years working on Data projects. One of the most notable and somewhat confusing is that the monthly data does not roll up to the annual data; similarly, the electricity generation “facts” – for example a country’s Coal Generation for Dec 2021 – can and do get re-stated/updated/corrected multiple times. Data is largely provisional and is expected to change and the published data should never be accepted as 100% accurate (unlike financial data for example!). And finally, availability of data is also sporadic, ad-hoc, and sometimes rather challenging: pulling out Kazakh electricity generation data from a PDF document written in Kazakh springs to mind!
Implementing the recommendations
Having started to work through some of the proposals, the team were really enthusiastic at the direction of travel, and an extension to the contract was agreed on to further transform the data landscape at Ember. Even so, the focus needed to be on a manageable evolution rather than a full-scale revolution. There were two overarching themes:
Working processes
Jira
The introduction of Jira gave far greater clarity to what was going on within the team. A loose agile approach was also undertaken to help with prioritisation and planning and focus on 2 weeks at a time. This was a simple change but was instrumental in hitting initial deadlines for the European Electricity Review.
GitHub
Although GitHub was already being used, there were some limitations to the extent of its use. The changes introduced were:
Use Git for all code without fail
Implement a branching strategy, creating a new feature branch for each Jira ticket worked on
Implement a peer review process – only merge the feature branch into the main branch after having pull requests approved
This was a small physical change to introduce more comprehensive configuration management but had a massive impact, ensuring that every piece of code had at least 2 pairs of eyes on it. This helps with overall quality, but also consistency and overall shared knowledge of the solution.
Data solution design
Data – Model, Design, Storage
This involved designing a consistent and structured data model and implementing a layered approach to the data processing, utilising a MySQL database within DigitalOcean in the cloud. DigitalOcean was adopted to minimise the risk and level of change, as it was already being used to support the Ember website.
To assist the analysts using the data (which also included the Data team), user-friendly “Mart” tables were created: easy-to-use structures with lots of enrichment, e.g. flags for all the various combinations of fuels (e.g. fossil, clean, thermal) and country regions (e.g. EU, G20 etc.). One example was to design a “Country Highlights” table that will identify any countries that have hit a new record in a particular month, so for example it will immediately flag Australia Coal having its lowest share of electricity generation in Dec 2021. This will prove invaluable in supporting the Comms team!
Process and source code design
This started with re-designing the process flow and then implementing by following some basic software design principles such as separation of concerns (for example ensuring that a function does one and only one task) and two of the recurring keywords of the project: consistency and clarity. This also required a structure that eliminated manual intervention. A single “pipeline” script was created to run the whole process, picking up any new source data that has been made available.
Unfortunately, the sourcing of electricity data is still often manual (remember the Kazakh example!), but as this process will be run only once a month, the sourcing effort is not onerous. The main manual steps were from within the process, and these have now been eliminated.
Conclusion & Subak’s reflections
The objective was to take an initial but very important first step towards an automated and scalable data solution that was transparent and maintainable. The changes to the working processes were as crucial as the complete rewrite of the end-to-end data process itself. Ultimately the whole European process was redesigned and developed effectively from scratch in time for the European Electricity Review which was released in February. The Global Electricity Review was released in April also on the back of a fully redeveloped process. One of the main benefits was to minimise the manual effort and end up with a repeatable consistent process. Re-running it used to be a major headache; it is now very easy.
The team has been hugely impressive, taking on the changes and running with them enthusiastically; it has been a genuine pleasure and I have learnt a lot. I would have loved to have done more but Ember now has a solution that will enable regular monitoring of global progress against energy transition commitments. It will also act as a foundation for more data sources to be brought in to enhance the overall offering. Finally, it will act as an enabler to free up a lot of time for the team to support the company in many other crucial ways, as Ember help to combat the climate emergency.
- Jeremy Fletcher, Ember
At Subak we were delighted to see such a positive result from this exercise and we hope the learnings and actions detailed in Jeremy’s write-up can help other data teams in not-for-profit startups, or indeed anyone who faces similar challenges. Ultimately this will help Ember deliver impactful analysis and content to power the transition from fossil fuels to clean energy - faster and more effectively.
At Subak, we’re always exploring ways we can empower and accelerate our Member Organisations as well as our individual Climate Fellows. If you want to get involved, don’t hesitate to get in touch.