Head over to our on-demand library to view sessions from VB Transform 2023. Register Here
In a blockbuster announcement today designed to coincide with the Microsoft Inspire conference, Meta announced its new AI model, LLaMA 2 (Large Language Model Meta AI). Not only is this new large language model (LLM) now available, it’s also open-source and freely available for commercial use — unlike the first LLaMA, which was licensed only for research purposes.
The news, coupled with Microsoft’s outspoken support for LLaMA 2, means the fast-moving world of generative AI has just shifted yet again. Now the many enterprises rushing to embrace AI, albeit cautiously, have another option to choose from, and this one is entirely free — unlike leader and rival OpenAI’s ChatGPT Plus, or challengers like Cohere.
Rumors surrounding the new release of LLaMA have been swirling in the industry for at least a month, as U.S senators have been questioning Meta about the availability of the AI model.
The first iteration of LLaMA was available for academics and researchers under a research license. The model weights underlying LLaMA were however leaked, causing some controversy leading to the government inquiry. With LLaMA 2, Meta is brushing aside the prior controversy and moving ahead with a more powerful model that will be more widely usable than its predecessor and potentially shake up the entire LLM landscape.
VB Transform 2023 On-Demand
Did you miss a session from VB Transform 2023? Register to access the on-demand library for all of our featured sessions.
Microsoft hedges its AI bets
The LLaMA 2 model is being made available on Microsoft Azure. That’s noteworthy in that Azure is also the primary home for OpenAI and its GPT-3/GPT-4 family of LLMs. Microsoft is an investor both in Meta’s former company Facebook and in OpenAI.
Meta founder and CEO Mark Zuckerberg is particularly enthusiastic about LLaMA being open-source. In a statement, Zuckerberg noted that Meta has a long history with open source and has made many notable contributions, particularly in AI with the PyTorch machine learning framework.
“Open source drives innovation because it enables many more developers to build with new technology,” Zuckerberg stated. “It also improves safety and security because when software is open, more people can scrutinize it to identify and fix potential issues. I believe it would unlock more progress if the ecosystem were more open, which is why we’re open sourcing Llama 2.”
In a Twitter message, Yann LeCun, VP and chief AI scientist at Meta, also heralded the open-source release.
“This is huge: [LLaMA 2] is open source, with a license that authorizes commercial use!” LeCun wrote. “This is going to change the landscape of the LLM market. [LLaMA 2] is available on Microsoft Azure and will be available on AWS, Hugging Face and other providers”
What’s inside LLaMA?
LLaMA is a transformer-based auto-regressive language model. The first iteration of LLaMA was publicly detailed by Meta in February as a 65 billion-parameter model capable of a wide array of common generative AI tasks.
In contrast, LLaMA 2 has a number of model sizes, including seven, 13 and 70 billion parameters. Meta claims the pre-trained models have been trained on a massive dataset that was 40% larger than the one used for LLaMA 1. The context length has also been expanded to two trillion tokens, double the context length of LLaMA 1.
Not only has LLaMA been trained on more data, with more parameters, the model also performs better than its predecessor, according to benchmarks provided by Meta.
Safety measures touted
LLaMA 2 isn’t all about power, it’s also about safety. LLaMA 2 is first pretrained with publicly available data. The model then goes through a series of supervised fine-tuning (SFT) stages. As an additional layer, LLaMA 2 then benefits from a cycle of reinforcement learning from human feedback (RLHF) to help provide a further degree of safety and responsibility.
Meta’s research paper on LLaMA 2 provides exhaustive details on the comprehensive steps taken to help provide safety and limit potential bias as well.
“It is important to understand what is in the pretraining data both to increase transparency and to shed light on root causes of potential downstream issues, such as potential biases,” the paper states. “This can inform what, if any, downstream mitigations to consider, and help guide appropriate model use.”
VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.