How to Measure the Energy Consumption of Bugs

A pragmatic approach to understand the energetic impact of a bug in our mobile app.

This article was originally published at InfoQ after my talk at the Agile Testing Days 2022.

Key Takeaways
Motivation
Reviewing Bugs for Their Energetic Impact
Pragmatic Measuring of Energy Consumption of Bugs
System Perspective
Conclusion

We from the software development industry have a responsibility towards sustainability and energy consumption
Energy consumption is a first-class quality attribute we need to take into account in software development
Research on how to develop energy efficient software is not new, but there is no applicable approach for practicing it yet
Energy consumption of bugs that occurs on mobile applications can be measured by implementing a long-running test which provokes the energy-consuming behavior and measuring the battery drain afterwards
By measuring battery drain and the specifications of the available power of the mobile devices, energy consumed can be calculated

When first approaching the idea of reducing energy consumption or carbon dioxide emissions in IT, it is typically tackled from a hardware perspective. On second sight, it becomes clear that it also has a software dimension, as software runs on hardware.

I’m convinced that we as software engineers have a big responsibility towards nature, our environment and sustainability. I believe we have to refer to energy consumption and carbon dioxide emissions as first-class qualities we need to reflect about during software development. Going beyond using green current or green data centers, and genuinely improving (= reducing) these qualities allows us to make a difference for the nature we leave behind, especially due to the climate crisis we are confronted with.

Additionally, the current situation of absurd wars also shows that dependencies on electricity providers should be reduced. Therefore, software engineers should accept their responsibility to take energy consumption and carbon dioxide emissions into account when developing software.

This article sheds some light on how consideration of energy consumption and carbon dioxide emissions can be achieved in software development. A very important aspect in this regard is detecting energetic shortcomings or bottlenecks of bugs. Therefore, I start with elaborating how to take this perspective into account when reviewing bugs. Afterwards, I introduce our pragmatic approach for measuring the impact of bugs on energy consumption and carbon dioxide emissions based on existing research. Finally, I take a step back and give some advice regarding measurement of the environmental impact of our systems and how to develop sustainable software.

In general, everyone involved in software development should always strive to find bugs. The best bug is the one we find before our customers do. In this article, I refer to the person who finds and reviews bugs as a QA engineer because they are usually the ones to find them.

QA engineers need to pay attention to details. In order to be able to understand the energetic and environmental impact, we are confronted with the following two questions:

Does a bug affect energy consumption?
How can the impact be measured or made visible?

There is no generic answer or recipe which a QA can follow. It’s not easy nor straightforward to answer those questions. I only want to give a bit of guidance in terms of which aspects need to be considered when taking such a perspective in reviewing bugs.

It is very important to always have the underlying architecture and communication to all the connected services in mind. Often it may seem that a bug does not affect energy consumption at first sight. This impression can quickly change when the broader context of the feature where it occurs is taken into account. A QA engineer needs to understand communication between the services, how it is implemented (in collaboration with the developers), when it takes place, where it initiated, and where the services and features run. In practice this means that QA engineers who want to measure the energetic impact of their product in more detail must not only understand the customers’ perspective (as usual), but in addition many implementation details from different perspectives. Where do particular services run? On which infrastructure? Which libraries are used? How can the implementation of the product be modified in order to measure energy consumption (I will explain why this is needed in the section Pragmatic Measuring of Energy Consumption of Bugs)?

Improvement of energy consumption is not something that can be activated by just pushing a button. It is a very complex topic that extends through all layers of an application. Sometimes it’s not even obvious if a bug has an effect on the energy consumption of the product.

Now that I have given some insights into why and how to detect bugs which negatively affect energy consumption or carbon dioxide emissions, this section presents our pragmatic approach for measuring the impact of an identified bug.

Not every bug has an impact on energy consumption or carbon dioxide emissions. And sometimes the impact cannot even be measured. Considering energy consumption in software development has many different angles from which we can approach it. I claim that every company which develops products for mobile devices has a big lever to approach this topic because every mobile user directly experiences “energy bugs” by a draining battery. This is also the case for Staffbase, and I’m very glad that we take this problem seriously.

Research in the field of energy consumption for mobile applications is not new, and already covers a lot. For example, my former colleague Claas Wilke wrote his PhD thesis about the topic “Energy-Aware Development and Labeling for Mobile Applications” eight years ago. In his work, he developed a methodology for measuring the energy consumption of mobile Android applications based on energy profiles of different scenarios. With this work it is possible to compare the consumption of different mobile applications of the same kind (e.g. email applications or browsers) and label them according to their energy consumption (low, medium, high). Unfortunately, Claas’ solution was very elaborate and costly (sophisticated power meters are needed and the mobile devices needed to be modified so that power could be measured); back then, there was no one-size-fits-all solution which could be applied safely in practice in different contexts easily. Therefore, for measuring the energy consumption of bugs I follow a naive pragmatic approach very much tailored to the bug and/or the context in which it occurs.

Let’s have a look at an old bug we had a few years ago in our product. The problem was that our mobile application requested images in their original sizes from the media server even though it doesn’t make sense to render big images on a small device’s screen. This bug was reflected in two dimensions:

A bigger amount of data had to be transferred to a device over the internet
Processing (rendering) an image on a device required more resources (processing cycles) from the processing units (CPU or GPU)

It is clear that the higher usage of network and processing units drain the battery of the mobile device more. This energy bug could be solved by explicitly considering the display resolution and requesting an image from the media server in an appropriate size in order to reduce the data to be processed. To make it clear again: the bug is about missing consideration of resolution, and the fix introduces it. Thus, we know the before and after state of the product related to the described bug. In contrast to the aforementioned thesis from Claas Wilke, it was not possible for us to compare the energy consumption of our mobile application with those from other similar applications because no one had measured them before. We simply had no data at our disposal which allowed us to understand the energy consumption of our application in a meaningful way. But what we could measure was the battery drain of two of our app versions, one containing the bug and one containing the bug fix respectively. Based on this idea, we came up with the following solution for measuring the energy consumption of a bug.

Our pragmatic approach for measuring the energetic impact of a bug looks as follows. Basically we need four ingredients:

The last app version which contained that bug
The first app version which contains the bug fix
A long running native mobile test which provoked the malicious behavior
And of course we need a mobile device

The implemented test which provokes the behavior needs to be long running so that we are able to monitor the battery drain. The longer it runs, the more meaningful and representative the observations. Our long running test for the described example constantly opens and leaves a page in our mobile application which contains a few images. In the erroneous version of our application, every page access implies requests to the media server for retrieving the images in their original sizes. In the fixed version of our application, the retrieval of images is realized by taking the resolution of the device into account.

Then we need to execute this long running test twice: once with the bug version of our application, and once with the fixed version of our application. Each time, it needs to be run on the mobile device under the same conditions in order to make sure that the results are comparable. In the following list you can see all the conditions we ensured:

Battery fully charged to 100%
Display brightness set to 50%
Same ambient light for every test run
No other apps running
No screen lock
Usage of same device: iPhone 12
- iOS 16.0
- 6.1-inch (155 mm) display with Super Retina XDR OLED
- 2532 x 1170 resolution
- Pixel density of about 460 ppi
- Battery of 10.78 Wh (2,815 mAh) battery

The design of the long running test is as follows:

We provide a page containing six images in the application
We provide an empty page in the application
The test switches 2.880 times between both pages and remains for 5s on every page
This results in a total runtime of 4h (= 2.880 * 5 [s] / 60 [s] / 60 [min])

If you are familiar with mobile application testing on real devices, you know that the test device needs to be connected to a computer from which the execution of the test is triggered. The problem with such a setup is that once the device is connected via cable to a computer, it immediately starts recharging. This cannot be deactivated. Thus, we couldn’t use existing mobile application testing frameworks for implementing our test. Therefore, we slightly changed our application itself and implemented a small popup for controlling the amount of switches between the pages from which the execution of the long running test can be started. You can see this in the following picture.

After both runs, we are now able to compare the status of the battery drain we monitored. The following table contains the measured results:

	Bug Application	Fixed Application
Available battery after test execution	76%	84%
Battery drain caused by test execution	24%	16%
Consumed energy (battery drain multiplied with battery value from specification)	2.59 Wh	1.72 Wh
Consumed energy per single switch to page which contains images	1.80E-06 kWh	1.20E-06 kWh
Difference of battery drain which is caused by bug	8.00%
Energy consumed by bug over whole runtime	0.86 Wh	0.86 Wh
Consumed energy of bug per single switch to page which contains images	0.0006 Wh

These results gave me a first impression about the energy consumption of the described bug. Nevertheless, it’s still hard to interpret the results. What does it mean when we know that our bug consumes 0.0006 Wh of energy in the described scenario? In order to understand this better, I came up with the following thought experiment. Assume we have one million users of our mobile application and every user launches the application three times per day, opening the page which contains the used images with every launch once. And let’s further assume that every single user does it every day throughout a year. This means our bug consumes about 1.8 kWh per day and 657 kWh per year. According to the results of the German ADAC Ecotest from October 2022 (ADAC = Allgemeiner Deutscher Automobil-Club, Europe’s largest motoring association), recharging a Tesla Model X 100D consumes 108.3 kWh per full recharge. This means that the annual consumption of our bug according to the aforementioned criteria is the same amount of energy which is required for recharging this Tesla six times. This is sufficient for the Tesla to drive about 2706 km. With such a comparison, the impact of our bug becomes more understandable and accessible. If we take the bug’s energy consumption into account in isolation, the number seems to be very low. Additionally, the number is also only valid for the described scenario and conditions we used in our experiment. Therefore, it is important to understand the effect from a wider perspective and to apply the results, e.g. to the total number of users running into that bug.

Yet again, this is not a generic solution. It might be different in other contexts. Every mobile application developing company needs to find their own solutions to approach this in order to adapt it to their needs, contexts, systems and services.

Now let’s take a look at a higher level, where we can approach the measurement of energy consumption and carbon dioxide emissions of our systems where the products run in the end. There are already a couple of services and tools out there that can be used for measuring our environmental impact. Since this is not the primary focus of this article, I only want to highlight some directions and name a few.

First, I want to mention the open source projects CodeCarbon and Boavizta. Their communities developed free-to-use-libraries and APIs which can be used for tracking carbon emissions produced by computer programs, based on where the computation runs. They compare estimations and put them into relation with common equivalents, such as household emissions or miles driven, and even share the gathered data for public use. These are great projects and everyone can use them.

Second, there are also services provided by big or bigger players. Microsoft, for example, has the Power BI apps and the Emissions Impact Dashboard which connect to an Azure Billing Account and give you detailed insights about your produced carbon dioxide emissions. Planetly, as another example, applies a holistic approach for companies and offers a carbon management platform in order to reveal the biggest emission drivers and derive actionable insights from the measures. But in the end everyone needs to identify the solution which fits best with their systems, contexts or industry.

In order to start considering energy consumption and carbon dioxide emission in software development, the very first step is awareness. Initiating the thinking process is most important. When we accept our responsibility towards our climate and environment, then the journey starts. A crucial suggestion I can give is to reflect on how the progress can be made transparent and visible. We cannot improve what we cannot measure. Start small and try to measure intermediate smaller properties which count towards the bigger whole.

What we did, as an example, is to start making the sizes of our apps we deliver to our customers visible in a dashboard. This gives us a tool at hand which enables us to work on the app size and even decrease it if possible. Smaller app size means less data sent through the ether when an update is published. This, in turn, implies less carbon dioxide emissions on the customer side through downloading the app. There are so many different dimensions from which one can approach this topic. Please just start 😉🌿

Tags:

carbon dioxide emissions

development

energy consumption

quality assurance

sustainability

How to Measure the Energy Consumption of Bugs

Table of Contents

Key Takeaways

Motivation

Reviewing Bugs for Their Energetic Impact

Pragmatic Measuring of Energy Consumption of Bugs

Disclaimer

Research Background

Example Bug

Test Approach

Test Design

Measured Results

Result Interpretation

System Perspective

Conclusion