Why “public AI”, built on open source software, is the way forward for the EU and how the EU enables it

A quarter of a century ago, I wrote a book called “Rebel Code”. It was the first – and is still the only – detailed history of the origins and rise of free software and open source, based on interviews with the gifted and generous hackers who took part. Back then, it was clear that open source represented a powerful alternative to the traditional proprietary approach to software development and distribution. But few could have predicted how completely open source would come to dominate computing. Alongside its role in running every aspect of the Internet, and powering most mobile phones in the form of Android, it has been embraced by startups for its unbeatable combination of power, reliability and low cost. It’s also a natural fit for cloud computing because of its ability to scale. It is no coincidence that for the last ten years, pretty much 100% of the world’s top 500 supercomputers have all run an operating system based on the open source Linux.

More recently, many leading AI systems have been released as open source. That raises the important question of what exactly “open source” means in the context of generative AI software, which involves much more than just code. The Open Source Initiative, which drew up the original definition of open source, has extended this work with its Open Source AI Definition. It is noteworthy that the EU has explicitly recognised the special role of open source in the field of AI. In the EU’s recent Artificial Intelligence Act, open source AI systems are exempt from the potentially onerous obligation to draw up a range of documentation that is generally required.

That could provide a major incentive for AI developers in the EU to take the open source route. European academic researchers working in this area are probably already doing that, not least for reasons of cost. Paul Keller points out in a blog post that another piece of EU legislation, the 2019 Copyright in the Digital Single Market Directive (CDSM), offers a further reason for research institutions to release their work as open source:

Article 3 of the CDSM Directive enables these institutions to text and data-mine all “works or other subject matter to which they have lawful access” for scientific research purposes. Text and data mining is understood to cover “any automated analytical technique aimed at analysing text and data in digital form in order to generate information, which includes but is not limited to patterns, trends and correlations,” which clearly covers the development of AI models (see here or, more recently, here).

Keller’s post goes through the details of how that feeds into AI research, but the end-result is the following:

as long as the model is made available in line with the public-interest research missions of the organisations undertaking the training (for example, by releasing the model, including its weights, under an open-source licence) and is not commercialised by these organisations, this also does not affect the status of the reproductions and extractions made during the training process.

This means that Article 3 does cover the full model-development pathway (from data acquisition to model publication under an open source license) that most non-commercial Public AI model developers pursue.

As that indicates, the use of open source licensing is critical to this application of Article 3 of EU copyright legislation for the purpose of AI research.

What’s noteworthy here is how two different pieces of EU legislation, passed some years apart, work together to create a special category of open source AI systems that avoid most of the legal problems of training AI systems on copyright materials, as well as the bureaucratic overhead imposed by the EU AI Act on commercial systems. Keller calls these “public AI”, which he defines as:

AI systems that are built by organizations acting in the public interest and that focus on creating public value rather than extracting as much value from the information commons as possible.

Public AI systems are important for at least two reasons. First, their mission is to serve the public interest, rather than focussing on profit maximisation. That’s obviously crucial at time when today’s AI giants are intent on making as much money as possible, presumably in the hope that they can do so before the AI bubble bursts.

Secondly, public AI systems provide a way for the EU to compete with both US and Chinese AI companies – by not competing with them. It is naive to think that Europe can ever match levels of venture capital investment that big name US AI startups currently enjoy, or that the EU is prepared and able to support local industries for as long and as deeply as the Chinese government evidently plans to do for its home-grown AI firms. But public AI systems, which are fully open source, and which take advantage of the EU right of research institutions to carry out text and data mining, offer a uniquely European take on generative AI that might even make such systems acceptable to those who worry about how they are built, and how they are used.

Source: Why “public AI”, built on open source software, is the way forward for the EU – Walled Culture

Robin Edgar

Organisational Structures | Technology and Science | Military, IT and Lifestyle consultancy | Social, Broadcast & Cross Media | Flying aircraft

 robin@edgarbv.com  https://www.edgarbv.com