Meta's obsession with outperforming OpenAI is exposed in new documents
New court documents released in the copyright case between Kadrey and Meta reveal that the company’s leaders were obsessed with outperforming OpenAI’s GPT-4 model while developing the open-source Llama 3 model.
In an internal message from October 2023, Ahmed Aldahal, Meta’s VP of Generative AI, told researcher Hugo Toveron: “Our goal should be GPT-4, frankly. We have 64,000 GPUs coming! We have to learn how to build an advanced model and win this race.”
Meta had released open-source AI models, but the emails revealed that the company's AI leaders were primarily focused on outperforming competitors that keep their models behind APIs, such as OpenAI and Anthropic.
The messages showed that the French company Mistral, one of Meta’s biggest competitors in the field of open models, was not considered a real threat, as Aldahl described it by saying, "Mistral is not comparable to us. We must be able to do better.”
In this race, the documents show that Meta leaders were willing to sometimes cross ethical lines, with Aldahl and Tover discussing the use of a dataset from the LibGen platform, which contains copyrighted works from major educational publishers.
According to the documents, discussions within Meta indicated that the company was looking to improve the quality of training data after the company’s researchers admitted that the mix of data used to train Llama 2 was “bad,” as they described it.
“This year, Llama 3 models are competing with the most advanced models, and in some areas, they’re outperforming them,” Meta CEO Mark Zuckerberg said in an internal email from July 2024. “Starting next year, we expect future Llama models to be the most advanced in the industry.”
The Llama 3 models, released in April 2024, have successfully competed with closed models from OpenAI, Google, and Anthropic and outperformed open models from Mistral. However, the data used to train these models is now facing intense legal scrutiny.
The documents show the intense pressure within Meta to excel in the AI race, but they also highlight the legal and ethical challenges: The company faces accusations of using copyrighted data without permission, potentially putting its open models at significant legal risk.