In approximately the early 1990s, I was tinkering with a home computer I’d had for a while. I don’t remember exactly why, perhaps I was disassembling it for parts, but I found myself with two memory chips of 500 kilobytes each that I no longer needed. I do recall that a colleague of mine, realizing that this memory would be compatible with his own home computer, bought that one megabyte from me for the sum of $1,000. Let’s think about that for a moment—if that price still held today, my laptop with 16 gigabytes of memory would be worth over $16 million.
That price for memory was a very temporary spike, the result of some specific but short-lived supply chain problems. The price of memory, and computing hardware in general, has otherwise been on a steep downward trend for decades. Megabytes of memory can now be had for just pennies and general-purpose computer hardware is effectively a commodity. This, I think, has had a significant impact on our approach to software architecture and design, as well as to the deployment and management of large-scale systems.
I won’t wax nostalgic for some nonexistent utopian past—but a few decades ago when hardware was fairly expensive, efficiency and speed of execution were the foremost concerns in software development. Squeezing a few bytes out of our data structures was rewarded, and undergraduates took mandatory courses in compiler design in order to understand the nuances of building software for fast run-time execution. The tradeoff was obscure code that was difficult for any other programmer to understand, let alone maintain or enhance. Nor was it easy to operate by a non-technical end user—but at least it ran fast. There were other unintended consequences of the drive for efficient code, including one pretty big one that I’ll come back to later in this post.
Now that the cost and availability of hardware is effectively no longer a barrier, software architecture has been able to focus on human factors—intuitive user interfaces, accessibility and far richer functionality. We’ve expanded our use of computers and software, and embedded them into our cars, our homes and our appliances—as well as pretty much all of our commercial and industrial processes. And we’ve connected them all together with large public and private networks and data centers. The result has been a huge democratization of technology and, arguably, an improved quality of life. Of course, we’re not in utopia yet, not by a long shot. Much more needs to be done, especially in the equitable distribution of the benefits in the third world and other marginalized populations. But it’s reasonable to say we’re quite a bit better off than we were.
Under the rug
There is, however, a glaring problem that hasn’t been sufficiently addressed: the environmental impact of all this computation. Cheap and plentiful hardware has its downsides, in particular its power consumption and resultant carbon footprint. We’ve seen some of the statistics before—the power needed to operate a single popular generative AI platform could sustain a city of about 67,500 inhabitants, and crypto-currencies have been estimated to consume between 0.4 percent and 0.9 percent of the total global electricity grid.
In addition, large public or private cloud operations and data centers draw significant amounts of power while also releasing a lot of heat back into the environment from cooling all those processors and data storage devices. And speaking of data—"dark data,” a fascinating idea about which I’ve recently been reading extensively, is another significant contributor to the carbon footprint.
Despite its ominous-sounding name, dark data just refers to data that is not, or is no longer, usable for its intended purpose. It can include incomplete, incorrect, or misrepresented data but often is also old or expired data that may or may not need to be retained and whose value diminishes with time. Examples of dark data include weather records dating back decades, old medical records, or data that has been inconsistently entered or stored—perhaps with different formats for dates making it difficult to analyze, or with collection gaps over time or geographical area that limit its usefulness.
Sometimes regulation requires the retention of dark data—medical or financial records, for example—but in many cases it can and should be corrected, consolidated or even discarded. Analysts at one big data company have estimated that up to 50 per cent of companies’ data can be considered dark. The environmental impact of all this dark data has been estimated at 6.4 million tons of CO2 emitted annually into the atmosphere.
So, what do we do about the carbon footprint associated with our ever-increasing technology profile? There are different remedies depending on which part of the problem you look at; if we add them all together, we’ll be able to make a difference.
One byte at a time
Start with data center design. Data centers can be built or retrofitted to reduce their carbon footprint in many ways. Waste heat could, for example, be reused to heat offices or support local greenhouses. There are lots of standards for green construction and site operations, and everyone from governments to consumers could reward data centers for preferential use of solar, wind or other sustainable power sources.
Inside the data center, other steps can be taken. Part of the problem is the amount of hardware, and therefore power, needed to support many clients with many different computing workloads. These workloads are never steady—they fluctuate, with peaks and troughs occurring at different times. A lot of work is being done in determining how to balance workloads. If peaks and troughs occur at different times, utilization might be averaged out for a lower total consumption level. Of course, you also have to minimize the risk of too many peaks happening at once and overloading your infrastructure—or at least maintain some standby capacity for such situations. (This sounds to me like a good optimization problem. Maybe we can solve it with quantum techniques?)
The presence and use of dark data can also be optimized. First, organizations can and should take a hard look at their data and determine if it all needs to be retained. If any can be eliminated, so much the better. Second, classify the data by type of retention as well as retrieval requirements. Older data that might be called upon less frequently could be stored on tape or other media rather than disk drives that are always on and provide instant retrieval.
Finally, add some consideration of efficiency back into software design and development. Speakers at technology conferences I’ve attended in the past year have begun to address this theme—encouraging developers to code again from a resource scarcity point of view. There’s a delicate balance here and I’m certainly not suggesting that programmers have been deliberately wasteful in how they design and write code. But cheap, plentiful and powerful computer hardware has facilitated a tradeoff of code efficiency versus functionality and maintainability. If environmental concerns force us to rethink that tradeoff, we might arrive at the best of both worlds—clean code, if you will, in more ways than one.
Naturally, the first time I heard a conference speaker address the subject of code efficiency—in November 2022—my immediate thought was, “Well, what could possibly go wrong with that?” Up until the 1980s and even, arguably, into the 1990s, a common programming technique to save a couple of bytes of memory was to use two digits for the year field in any data structures having to do with dates. It was easy to do date arithmetic under the assumption that all years were within the 1900s and it was further assumed that all this software would be replaced long before the end of the century. The result, of course, was the Y2K problem—an expensive, risky and eventually tedious example of the law of unintended consequences. Nobody deliberately caused that problem—they just didn’t think through all the possible scenarios. This time around, and just like with workload optimization, let’s be careful in our approach and ask a few more what-if questions.
Sustainable construction, workload optimization, data rationalization and code efficiency. Each by itself may not count for much. But, taken together, they will play a large role in helping the IT industry clean up our act. The planet will thank us for it.
Stay up to date with what matters to you
Gain access to personalized content based on your interests by signing up today