New Open-Supply Champion Reflection 70B Outperforms GPT-4o and Claude Sonnet 3.5

Matt Shumer, co-founder and CEO of AI writing startup HyperWrite, lately launched a brand new mannequin referred to as Reflection 70B.

I'm excited to announce Reflection 70B, the world’s top open-source model.

Trained using Reflection-Tuning, a technique developed to enable LLMs to fix their own mistakes.

405B coming next week - we expect it to be the best model in the world.

Built w/ @GlaiveAI.

Read on ⬇️: pic.twitter.com/kZPW1plJuo
— Matt Shumer (@mattshumer_) September 5, 2024

The mannequin has emerged as a number one open-source language mannequin, outperforming high closed-source fashions like OpenAI’s GPT-4o and Anthropic’s Claude Sonnet 3.5. The mannequin, developed utilizing a novel method referred to as Reflection-Tuning, showcases vital enhancements in benchmark checks, together with MMLU, MATH, IFEval, and GSM8K.

The Reflection-Tuning method permits Reflection 70B to detect and correct its personal errors earlier than finalizing a solution. This development aims to handle the frequent difficulty of mannequin hallucinations and enhance reasoning accuracy.

The mannequin outputs its inside reasoning in <pondering> tags and ultimate solutions in <output> tags, with extra <reflection> tags used for correcting any detected errors.

Presently, Reflection 70B holds the highest place in a number of benchmarks and demonstrates superior efficiency over GPT-4o and Llama 3.1 405B. The upcoming Reflection 405B mannequin, anticipated next week, is anticipated to additionally elevate the usual for LLMs globally.

That is second mannequin this week outperforming GPT-4o and Claude Sonnet 3.5

Alibaba lately launched Qwen2-VL, the most recent mannequin in its vision-language collection. The brand-new mannequin can chat through a digicam, play card video games, and manage cell phones and robots by performing as an agent. It's out there in three variations: the open supply 2 billion and seven billion fashions and the extra superior 72 billion mannequin, accessible utilizing API.

The superior 72 billion mannequin of Qwen2-VL achieved SOTA visible understanding throughout 20 benchmarks. “Total, our 72B mannequin showcases top-tier efficiency throughout most metrics, typically surpassing even closed-source fashions like GPT-4o and Claude 3.5-Sonnet, stated the corporate in a weblog put up, saying that it demonstrates a major edge in doc understanding.

masrawysat

New Open-Supply Champion Reflection 70B Outperforms GPT-4o and Claude Sonnet 3.5

New Open-Supply Champion Reflection 70B Outperforms GPT-4o and Claude Sonnet 3.5

Default