Open source runs the world. That’s for supercomputers, where Linux powers all of the top 500 machines in the world, for smartphones, where Android has a global market share of around 75%, and for everything in between, as Wired points out:
When you stream the latest Netflix show, you fire up servers on Amazon Web Services, most of which run on Linux. When an F-16 fighter takes off, three Kubernetes clusters run to keep the jet’s software running. When you visit a website, any website, chances are it’s run on Node.js. These foundational technologies — Linux, Kubernetes, Node.js — and many others that silently permeate our lives have one thing in common: open source.
Ubiquity can engender complacency: because open source is indispensable for modern digital life, the assumption is that it will always be there, always supported, always developed. That makes new research looking at what the longer-term trends are in open source development welcome. It builds on work carried out by three earlier studies, in 2003, 2007 and 2007, but using a much larger data set:
This study replicates the measurements of project-specific quantities suggested by the three prior studies (lines of code, lifecycle state), but also reproduce the measurements by new measurands (contributors, commits) on an enlarged and updated data set of 180 million commits contributed to more than 224,000 open source projects in the last 25 years. In this way, we evaluate existing growth models and help to mature the knowledge of open source by addressing both internal and external validity.
The new work uses data from Open Hub, which enables the researchers to collect commit information across different source code hosts like GitHub, Gitlab, BitBucket, and SourceForge. Some impressive figures emerge. For example, at the end of 2018, open source projects contained 17,586,490,655 lines of code, made up of 14,588,351,457 lines of source code and 2,998,139,198 lines of comments. In the last 25 years, 224,342 open source projects received 180,937,525 commits in total. Not bad for what began as a ragtag bunch of coders sharing stuff for fun. But there are also some more troubling results. The researchers found that most open source projects are inactive, and that most inactive projects never receive a contribution again.
Looking at the longer-term trends, an initial, transient exponential growth was found until 2009 for commits and contributors, until 2011 for the number of available projects, and until 2013 for available lines of code. Thereafter, all those metrics reached a plateau, or declined. In one sense, that’s hardly a surprise. In the real world, exponential growth has to stop at some point. The real question is whether open source has peaked purely because it has reached its natural limits, or whether they are other problems that could have been avoided.
For example, a widespread concern in the open source community is that companies may have deployed free code in their products with great enthusiasm, but they have worried less about giving back and supporting all the people who write it. Such an approach may work in the short term, but ultimately destroys the software commons they depend on. That’s just as foolish as over-exploiting the environmental commons with no thought for long-term sustainability. As the Wired article mentioned above points out, it’s not just bad for companies and the digital ecosystem, it’s bad for the US too. In the context of the current trade war with China, “the core values of open source — transparency, openness, and collaboration — play to America’s strengths”. The new research might be an indication that the open source community, which has selflessly given so much for decades, is showing signs of altruism fatigue. Now would be a good time for companies to start giving back by supporting open source projects to a much greater degree than they have so far.
I spoke of this in 2017