Exploring the Landscape of Open-Source Large Language Models (LLMs)

In the world of natural language processing and text generation, Large Language Models (LLMs) have emerged as powerful tools. These models have the ability to generate coherent and contextually relevant text, making them highly valuable in various applications. In this blog post, we will take a deep dive into the landscape of LLMs, exploring the differences between closed and open-source models, their key characteristics, and the tools available for their usage.

Closed LLMs: Limited Control and Dependence

Major tech companies such as Google and OpenAI have developed closed LLMs, including Google's Bard and OpenAI's GPT3, GPT4, and ChatGPT. While these models offer impressive performance, they come with certain limitations.

One significant disadvantage of closed LLMs is the limited control they provide to users. The underlying architecture and weights of these models are not publicly available, making customization and fine-tuning impossible. Users are often restricted to using the models as they are, without the ability to tailor them to specific needs. Hence, closed-source models limit innovation happening in the field (albeit those companies might not be positioned to win the arms race, as we’ve read in a certain memo).

Another drawback of closed LLMs is their reliance on external services. To access these models, users typically interact with API endpoints hosted by the respective companies. This dependency introduces potential latency issues, as well as reliance on the availability and pricing of the service provider.

Closed-source LLMs also run into the thorny issue of data privacy. When running inference on sensitive data, users must trust that the data sent to the external API endpoints of these closed models is handled securely and in compliance with privacy regulations. The lack of control over data handling and processing raises potential privacy risks, especially when dealing with confidential or personal information.

Open-Source LLMs: Transparency and Flexibility

Open-source LLMs present a compelling alternative to closed models, offering users transparency, flexibility, and control. Open-source models differ in architecture and number of parameters, which affect their overall performance and hardware requirements, and in datasets used for training and licenses, which dictates what they can be used for. Let's explore some notable open-source LLMs and their key features.

LLaMA by Meta:

LLaMA is an open-source LLM architecture with the inference code available under the GPL-3 license. In practical terms, this means that individuals can study LLaMA's architecture and use the provided inference code to run the model and generate text. They can also make changes or improvements to the code and share those modifications with others. This license ensures that any derived works based on the code remain open-source as well. However, it's important to note that the weights of the LLaMA model, which contain the learned parameters, are not freely available and can only be obtained upon request. Furthermore, the usage of LLaMA weights is limited to non-commercial purposes, as specified by its licensing terms.

Additionally, LLaMA can run the model on CPUs using the llama.cpp tool. This allows users to deploy and execute LLaMA on their own hardware, granting them greater control and privacy over the computational resources involved in running the model.

Overall, LLaMA's open-source nature and compatibility with llama.cpp provide users with opportunities for exploration, customization, and independent deployment on their preferred hardware.

Vicuna by lmsys:

Vicuna is a fine-tuned LLaMA model based on ShareGPT, a dataset of user-shared conversations. It demonstrates comparable performance to Bard and ChatGPT in terms of response quality. However, since it builds upon LLaMA weights, it is restricted by the same license, hence can only be used for non-commercial purposes. However, being a derivative of LLaMA also means that it can be run with llama.cpp on modest hardware.

UL2 and Flan-UL2 by Google:

UL2 is an open-source LLM developed by Google, with Flan-UL2 being its fine-tuned extension for instruction following tasks. Although these models are resource-intensive due to their size, they can be loaded in quantized form to reduce memory requirements. Both of them are available on Huggingface Hub.

BLOOM by BigScience:

BLOOM is a truly open-source LLM that is built using a modified GPT2 architecture. It is available through the Huggingface Hub and transformer library, enabling easy access and integration. BLOOM supports various pipelines and tasks, although its large size requires substantial computational resources, e.g. to run inference in full precision you would need 8 x A100 80 GB GPUs.

GPT4All models by NomicAI:

GPT4All offers a range of models based on GPT-J, MPT and LLaMA. GPT4All-J and GPT4All-MPT are released under Apache 2.0, GPT4All-13b-snoozy (based on LLaMA) is released under GPL. Which means that all of these models can be used for derivative work, but due to restrictions on the underlying model weights, only GPT4All-J can be used for commercial purposes. These models have been fine-tuned using a dataset that includes ShareGPT conversations. GPT4All models are designed to run on CPUs and come with a chat desktop application and server, offering compatibility with the same API as OpenAI servers.

Falcon 40B and 6B by TII UAE

Falcon 40B and 6B models have just been released by Technology Innovation Institute, marking a significant advancement in the realm of open-source LLMs. They are released under Apache 2.0 license, which permits commercial usage. Both 40B and 6B variants available as raw models suitable for fine-tuning, and as already instruction tuned models that can be used directly. All of them are made available via Huggingface Hub.

According to OpenLLM leaderboard, Falcon 40B outperforms previous state-of-the-art open-source LLMs. In our upcoming blog post, we will delve deeper into the Falcon models, exploring the models themselves, and how the open-source community evaluates models for this leaderboard.

Tools for LLM Usage

To facilitate the usage of LLMs, several tools and libraries have emerged:

Huggingface libraries:

Huggingface has played a crucial role in simplifying the utilization of language models for the public. Huggingface Hub acts as a centralized platform for accessing pre-trained LLM models and datasets. It enables easy exploration, downloading, and utilization of a diverse range of models, making it convenient for developers to leverage state-of-the-art capabilities. Additionally, it hosts what they call Spaces – a way to showcase ML-driven applications that interact with models on the Hub.

Huggingface's Transformer library simplifies the process of fine-tuning and inference with LLM models. It offers a high-level API that facilitates the integration of LLMs into python applications. Developers can fine-tune models for specific tasks, adapt them to domain-specific data, and perform inference on new inputs. We have also used the transformer library ourselves in the LLM fine-tuning example for matcha, which we have described before.

Furthermore, Huggingface also offers an inference API that allows users to leverage Huggingface’s hardware infrastructure for running LLM models. This API enables seamless integration and scalable deployment of models without the need for extensive hardware setup or maintenance. Users can offload the computational burden and focus on utilizing the language models for their specific tasks.

GPT4All

In addition to providing instruction tuned models themselves, GPT4All offers a user-friendly chat desktop application and a server, which can be seamlessly integrated into existing codebases using the same API as OpenAI model endpoints. This provides a convenient solution for those already familiar with the OpenAI ecosystem.

Langchain

Langchain is an innovative tool that allows for the integration of LLMs with other services, enabling the development of complex intelligent systems. It empowers models to perform multistep reasoning and access external sources of information, expanding their capabilities beyond simple text generation. For instance, Langchain can be employed for question answering on arbitrary documents, enabling LLMs to provide insightful responses by leveraging extensive knowledge sources. Another example is the application of Langchain in creative problem solving, as demonstrated by projects like SocraticAI developed by Princeton NLP. This showcases the potential of LLMs in assisting users in generating innovative solutions to complex problems.

*Princeton NLP illustration of how Socratic AI works* *Image Reference*

llama.cpp

Llama.cpp is a valuable tool that allows running LLM models based on the LLaMA architecture on personal hardware, specifically CPUs. It offers the freedom to deploy models without depending on external infrastructure, empowering users to have full control over their LLM usage.

Conclusion

Open-source LLMs provide users with a range of benefits, making them a valuable asset in various applications. The transparency, flexibility, and control offered by open-source models empower researchers, developers, and enthusiasts to explore, customize, and optimize language models according to their specific needs.

Open-source LLMs enable individuals to study and understand the inner workings of these models. This transparency fosters collaboration and innovation within the community.

Moreover, open-source LLMs offer greater flexibility in terms of deployment options, allowing developers to fine-tune and deploy LLMs for inference on cloud providers of their choice, on managed services or on their own hardware.

In conclusion, open-source LLMs democratize access to state-of-the-art language models, offering transparency, flexibility, and the ability to customize and optimize models for specific purposes. By harnessing the power of open-source LLMs, researchers, developers, and users can push the boundaries of natural language processing and create innovative solutions that transform the way we interact with language in various domains and applications.

‍