New Open-Supply Champion Reflection 70B Outperforms GPT-4o and Claude Sonnet 3.5
Matt Shumer, co-founder and CEO of AI writing startup HyperWrite lately launched a brand new mannequin referred to as Reflection 70B.
I'm excited to announce Reflection 70B, the world’s top open-source model.
— Matt Shumer (@mattshumer_) September 5, 2024
Trained using Reflection-Tuning, a technique developed to enable LLMs to fix their own mistakes.
405B coming next week - we expect it to be the best model in the world.
Built w/ @GlaiveAI.
Read on ⬇️: pic.twitter.com/kZPW1plJuo
The mannequin has emerged as a number one open-source language mannequin, outperforming high closed-source fashions like OpenAI’s GPT-4o and Anthropic’s Claude Sonnet 3.5. The mannequin, developed utilizing a novel method referred to as Reflection-Tuning, showcases vital enhancements in benchmark checks, together with MMLU, MATH, IFEval, and GSM8K.
The Reflection-Tuning method permits Reflection 70B to detect and proper its personal errors earlier than finalising a solution. This development goals to handle the frequent difficulty of mannequin hallucinations and enhance reasoning accuracy.
The mannequin outputs its inside reasoning in <pondering> tags and ultimate solutions in <output> tags, with extra <reflection> tags used for correcting any detected errors.
Presently, Reflection 70B holds the highest place in a number of benchmarks and demonstrates superior efficiency over GPT-4o and Llama 3.1 405B. The upcoming Reflection 405B mannequin, anticipated subsequent week, is anticipated to additional elevate the usual for LLMs globally.
That is second mannequin this week outperforming GPT-4o and Claude Sonnet 3.5
Alibaba lately launched Qwen2-VL, the most recent mannequin in its vision-language collection. The brand new mannequin can chat through digicam, play card video games, and management cell phones and robots by performing as an agent. It's out there in three variations: the open supply 2 billon and seven billion fashions, and the extra superior 72 billion mannequin, accessible utilizing API.
The superior 72 billion mannequin of Qwen2-VL achieved SOTA visible understanding throughout 20 benchmarks. “Total, our 72B mannequin showcases top-tier efficiency throughout most metrics, typically surpassing even closed-source fashions like GPT-4o and Claude 3.5-Sonnet,”stated the corporate in a weblog put up, saying that it demonstrates a major edge in doc understanding.