How open-source software shapes AI policy


Open-source software quietly affects nearly every issue in AI policy, but it is largely absent from discussions around AI policy—policymakers need to more actively consider OSS’s role in AI.

Open-source software (OSS), software that is free to access, use, and change without restrictions, plays a central role in the development and use of artificial intelligence (AI). Across open-source programming languages such as Python, R, C++, Java, Scala, Javascript, Julia, and others, there are thousands of implementations of machine learning algorithms. OSS frameworks for machine learning, including tidymodels in R and Scikit-learn in Python, have helped consolidate many diverse algorithms into a consistent machine learning process and enabled far easier use for the everyday data scientist. There are also OSS tools specific to the especially important subfield of deep learning, which is dominated by Google’s Tensorflow and Facebook’s PyTorch. Manipulation and analysis of big data (data sets too large for a single computer) were also revolutionized by OSS, first by the Hadoop ecosystem and later by projects like Spark. These are not simply some of the AI tools—they are the best AI tools. While proprietary data analysis software may sometimes enable machine learning without the need to write code, it does not enable analytics that are as well developed as those in modern OSS.

That the most advanced tools for machine learning are largely free and publicly available matters for policymakers, and thus the OSS world deserves more attention. The United States government has gotten better at supporting OSS broadly, notably through the Federal Source Code Policy, which encourages agencies to release more of the code they write and procure. Yet the relationship between OSS and AI policy goes less acknowledged. Trump administration documents on AI regulation and using AI at federal agencies mention OSS only in passing. The Obama administration’s AI strategy notes the important role of OSS in AI innovation, but does not mention its relevance to other issues. A new European Parliament report states that European OSS policies lack “a clear link to the AI policies and strategies… for most countries.” In fact, the recent proposed European AI regulation does not address the role of OSS at all.

Generally speaking, analyses and international comparisons of AI capacity often include talent, funding, data, semiconductors, and compute access, but often lack a discussion of the role of OSS. This is an unfortunate oversight since OSS quietly affects nearly every issue in AI policy. AI tools built in OSS enable the faster adoption of AI in science and industry, while also speeding proliferation of ethical AI practices. At the same time, OSS is playing a complex role in markets—powering innovation in many areas, while also further empowering Google and Facebook and challenging the traditional role of standards bodies.

1. OSS speeds AI adoption

OSS enables and increases AI adoption by reducing the level of mathematical and technical knowledge necessary to use AI. Implementing the complex math of algorithms into code is difficult and time-consuming, which means that if an open-source alternative already exists, it can be a huge benefit for any individual data scientist. Open-source developers often work on projects to build skills and get community feedback, but there is also prestige inherent in building popular OSS. Often, several different versions of the same algorithm are developed in OSS, with the best code winning out (perhaps due to its speed, versatility, or documentation). In addition to this competitive element, OSS can also be highly collaborative. Since OSS code is all public, it can be cross-examined and interrogated for bugs or possible improvements. With collaborative development and an engaged community, as often arises around popular OSS, this collaborative-competitive environment can frequently result in accessible, robust, and high-quality code.

“The work of the average data scientist requires them to be more of a data explorer and programmatic problem solver than a pure mathematician.”

This is especially important because many data scientists may not have the mathematical training necessary to implement especially complex algorithms. This is not meant as a criticism of data scientists, but the work of the average data scientist requires them to be more of a data explorer and programmatic problem solver than a pure mathematician. Generally, data scientists are focused on interpreting the results of their data analyses and trying to appropriately fit their algorithms into a digital service or product. This means that well-written open-source AI code significantly expands the capacity of the average data scientist, letting them use more current machine learning algorithms and functionality. Much attention has been paid to training and retaining AI talent, but making AI easier to use, which open-source code does, may have a similarly significant impact in enabling economic growth. Of course, this is undeniably a double-edged sword, as easier to use OSS AI also enables innovation in pernicious applications of AI, including cyberattacks and deepfakes.

2. OSS helps reduce AI bias

Similarly, open-source AI tools can enable the broader and better use of ethical AI. Open-source tools like OSS like IBM’s AI Fairness 360, Microsoft’s Fairlearn, and the University of Chicago’s Aequitas ease technical barriers to detecting and mitigating AI bias. There are also open-source tools for interpretable and explainable AI, such as IBM’s AI Explainability 360 or Chris Molnar’s interpretable machine learning tool and book, which make it easier for data scientists to interrogate the inner workings of their models. This is critical since data scientists and machine learning engineers at private companies are often time-constrained and operating in competitive markets. In order to keep their jobs, they must work hard on developing models and building products, without necessarily the same pressure on thoroughly examining models for biases. Academic researchers and journalists have done a remarkable job generating broad public awareness of the potential harms of AI bias, and so many data scientists understand these concerns and are personally invested in building ethical AI systems. For those engaged, but busy, data scientists, open-source code can be incredibly helpful in discovering and mitigating discriminatory aspects of machine learning.

“Open-source AI tools can enable the broader and better use of ethical AI.”

While more government oversight of AI is certainly necessary, policymakers should also more frequently consider investing in OSS for ethical AI as a different lever to improve AI’s role in society. At present, government funding tends to support code development only in the pursuit of academic research. The Chan Zuckerberg Initiative, which funds critical OSS projects, writes that OSS “is crucial to modern scientific research… yet even the most widely-used research software lacks dedicated funding.” This problem is similarly true in the ethical AI space, where government funding exists only for OSS used in early-stage research. For instance, in collaboration with Amazon, the National Science Foundation (NSF) is funding tens of million in grants for further academic research  into AI fairness. This research is very likely to produce highly valuable OSS, but even the most successful projects will be challenged to find continued funding for development, support, documentation, and dissemination. Funders who are interested in ethical AI, including both government agencies and private foundations, should consider OSS as a necessary component of ethical AI, and look to support its sustainable development and widespread adoption.

3. OSS AI tools advance science

Perhaps even more than technology companies, scientific researchers from many domains gain tremendously from open-source AI. For instance, a series of responses to a tweet by François Chollet, developer of the open-source AI software Keras, demonstrate how his OSS is being used to identify subcomponents of mRNA molecules and build neural interfaces to better help visually impaired people see. The separation of these roles—the developer and the scientist—is common and generally enables both better tools and better science. Most scientific researchers…


Read More:How open-source software shapes AI policy