6Pages write-ups are some of the most comprehensive and insightful I’ve come across – they lay out a path to the future that businesses need to pay attention to.
— Head of Deloitte Pixel
At 500 Startups, we’ve found 6Pages briefs to be super helpful in staying smart on a wide range of key issues and shaping discussions with founders and partners.
— Thomas Jeng, Director of Innovation & Partnerships, 500 Startups
6Pages is a fantastic source for quickly gaining a deep understanding of a topic. I use their briefs for driving conversations with industry players.
— Associate Investment Director, Cambridge Associates
Read by
BCG
500 Startups
Used at top MBA programs including
Stanford Graduate School of Business
University of Chicago Booth School of Business
Wharton School of the University of Pennsylvania
Kellogg School of Management at Northwestern University
Reading Time Estimate
13 min read
Listen on:
Apple PodcastsSpotifyGoogle Podcasts
1. The debate over defining open-source AI
  • It’s a strange time in the world of open-source software. Arguably, open source has never been hotter, having risen in importance over the past year and a half as a counterpoint to proprietary AI models epitomized by OpenAI (a name now viewed as ironic by some in retrospect). What is called open source is also increasingly being viewed as muddled, polluted, or nothing of the sort – with many pointing to the models Meta has released as exemplifying this dynamic. The Open Source Initiative (OSI)’s release of its official definition of open-source AI this week is now spurring further debate, rather than clarifying the issue.
  • Let’s rewind to the early days of open-source software. In the early 90s, Linus Torvalds combined his Linux kernel with the GNU Project’s components into an operating system known as GNU/Linux – a free alternative to the proprietary UNIX. GNU/Linux was placed under the GNU General Public License (GPL), which allowed others to freely use, modify, and distribute it. The open license drew in contributions from programmers around the world, resulting in a rapid pace of development that allowed Linux to become a full-fledged operating system.
  • The OSD establishes criteria for open source such as free redistribution, source code must be available at reasonable cost, modifications and derived works are allowed, no discrimination against users or types of usage, and must be technology-neutral, among other criteria. Some OSI-approved licenses are “copyleft” (requires reciprocal openness allowing others to use derived works), while others are “permissive” (very few obligations attached). Examples of popular OSI-approved licenses include the MIT License (permissive), Apache 2.0 (permissive), BSD (permissive), and GPL 3.0 (copyleft).
  • In 2022, OSI began developing the Open Source AI Definition (OSAID) 1.0 that was released this past week, in collaboration with industry stakeholders. (Meta is a backer of OSI and participated in discussions.) Under the new definition, an open-source AI system – including its model, weights, parameters, and other structural elements – must allow users to: (1) “Use the system for any purpose and without having to ask for permission”; (2) “Study how the system works and inspect its components”; (3) “Modify the system for any purpose, including to change its output”; and (4) “Share the system for others to use with or without modifications, for any purpose.”
  • More specifically, users must be given access to the “preferred form” to make modifications. This includes ”[s]ufficiently detailed information about the data used to train the system so that a skilled person can build a substantially equivalent system,” as well as “[t]he complete source code used to train and run the system” and the model parameters (e.g. weights).
  • On the data front, this must include: “(1) the complete description of all data used for training, including (if used) of unshareable data, disclosing the provenance of the data, its scope and characteristics, how the data was obtained and selected, the labeling procedures, and data processing and filtering methodologies; (2) a listing of all publicly available training data and where to obtain it; and (3) a listing of all training data obtainable from third parties and where to obtain it, including for fee.”
  • The OSI’s definition throws a wrench in the world of LLMs (large language models), where being “open” has not necessarily meant that the LLM could be inspected and built upon, or that it is made available under an established open-source license. Being open often just meant releasing the training weights and starting code.
  • Meta, which has planted its flag on being a champion of open source, has publicly disagreed with the OSI’s definition. In its words, “Existing open-source definitions for software do not encompass the complexities of today’s rapidly advancing AI models. We are committed to keep working with the industry on new definitions to serve everyone safely and responsibly within the AI community.”
  • Some believe the OSI’s new definition has not gone far enough to protect the essential freedoms represented in the original OSD. Under the new definition, an open-source model could still withhold training data (e.g. for confidentiality reasons or to shield players from copyright concerns). There are advocates for a signed declaration that reverts the definition of open-source back to OSD, effectively voiding the new definition for AI.
  • Some “do not believe the term open source can or should be extended into the AI world,” and are advocating for a new bespoke name. It’s not clear whether a term of art originally used to describe source code can be stretched to cover the very different space occupied by AI. There’s also a debate as to whether reproducibility is even relevant in the realm of AI.
  • On the surface, this debate about the definition of open-source AI may seem academic. Meta and any other player may continue to call their models “open-source” in contravention of the OSI’s definition. (The OSI doesn’t hold the trademark.) However, part of the rationale for Meta’s championship of open-source is to capitalize on learnings and emergent capabilities from a community attracted to its openly available models. That rationale may be eroded by a moral position viewed as tainted. (Meta has experience being on the wrong side of a social debate.)
  • Still, Meta’s disputed use of “open-source” will likely be outweighed by the value that it is offering by making a near state-of-the-art LLM widely available, at least in the near term. Meta continues to release new useful models – most recently, a set of fast quantized Llama models that can run on mobile devices, and NotebookLlama providing a more open version of Google’s popular NotebookLM. There are a relatively limited number of players operating at this level and even fewer willing to undertake the disclosures and risks associated with abiding by the OSI’s new definition.
Related Content:
  • Aug 2 2024 (3 Shifts): Open AI models are here to stay
  • Apr 26 2024 (3 Shifts): Llama 3, Quest’s OS, and Meta's open-source strategy
Become an All-Access Member to read the full brief here
All-Access Members get unlimited access to the full 6Pages Repository of697 market shifts.
Become a Member
Become a Member
Already a Member?
Disclosure: Contributors have financial interests in Meta, Microsoft, Alphabet, OpenAI, and Rocket Lab. Google and OpenAI are vendors of 6Pages.
Have a comment about this brief or a topic you'd like to see us cover? Send us a note at tips@6pages.com.
All Briefs
See more briefs

Get unlimited access to all our briefs.
Make better and faster decisions with context on far-reaching shifts.
Become a Member
Become a Member
Get unlimited access to all our briefs.
Make better and faster decisions with context on what’s changing now.
Become a Member
Become a Member