Analysis of the DeepSeek-R1 and OpenAI O3-mini models in comparison

OpenAI has officially published the updated o3O3-miniodel, which DeepSeek-R1 is available to all ChatGPT users. This comes shortly after the release of DeepSeek-R1, a Chinese model that has made headlines in the IT industry due to its excellent capabilities and affordable price. Since its inception, comparisons have been made with the most prevalent language models.

In this post, we will describe the differences between it and the most recent OpenAI model, based on the results of certain global tests meant to assess AI models' capabilities.

Live Bench Test.

LiveBench is a test that assesses the performance of large language models (LLMs) in a number of activities, including arithmetic, programming, logical reasoning, language, following instructions, and data analysis.

The following are the test results acquired by both the O3-mini and R1 in various tasks:

Comparison between OpenAI o3-mini and DeepSeek-R1 models

Average total performance:

The OpenAI o3-mini model has a score of 73.94.
The DeepSeek-R1 model received a score of 71.38.

The O3-mini has a minor advantage in overall performance.

Average ability to think and reason:

The OpenAI O3-mini model has a score of 89.58.
The DeepSeek-R1 model received a score of 83.17.

The O3-mini performs well on logical thinking activities, displaying a good capacity to evaluate and develop conclusions.

Average performance in programming:

The OpenAI O3-mini model has a score of 82.74.
The DeepSeek-R1 model received a score of 66.74.

o3-mini excels in programming, demonstrating a strong comprehension of code and the ability to tackle a variety of programming difficulties.

Average mathematical performance:

The OpenAI o3-mini model has a score of 65.65.
The DeepSeek-R1 model had a score of 79.54.

DeepSeek-R1 excels in mathematical activities, demonstrating a strong capacity to reason quantitatively and solve computational issues.

Average performance in data analysis:

The OpenAI o3-mini model has a score of 70.64.
The DeepSeek-R1 model had a score of 69.78.

The O3-mini is slightly better at evaluating and processing data.

Average language performance:

The OpenAI O3-mini model has a score of 50.68.
The DeepSeek-R1 model received a score of 48.53.

The O3-mini performs marginally better in language activities.

Average performance for information comprehension:

The OpenAI O3-mini model scores 84.36.
The DeepSeek-R1 model received a score of 80.51.

The O3-mini excels in broad comprehension of a variety of jobs.

Other tests:

NYT Connections Quiz:

Comparison between OpenAI o3-mini and DeepSeek-R1 models

The O3-mini received 72.4 points, making it one of the top models for puzzle solving.
DeepSeek-R1 scored 54.4 points; thus, O3-mini outperforms it by 18 points.

Humanity's Last Exam assesses the model's ability to generate right answers.

Comparison between OpenAI o3-mini and DeepSeek-R1 models

The o3-mini (high) model has an accuracy of 13.0%.
The DeepSeek-R1 model achieves an accuracy of 9.4%.

The O3-mini has better accuracy, indicating a greater capacity to deliver right replies.

The price:

Cost is a crucial consideration for app developers, and these figures indicate that the DeepSeek-R1 is the best option for those searching for an affordable model.

Conclusion:

The new OpenAI o3-mini model surpasses DeepSeek-R1 in most tasks, particularly reasoning, programming, and overall performance. However, DeepSeek-R1 excels in math. DeepSeek-R1 is more affordable, making it an excellent choice for people seeking a low-cost model.

The model	Price per million input tokens.	Price per million output tokens.
O3-mini	55 cents.	Four dollars and forty cents.
DeepSeek-R1	Fourteen cents.	$2.19

masrawysat

Analysis of the DeepSeek-R1 and OpenAI O3-mini models in comparison