type
status
date
slug
summary
tags
category
password
icon
Author
Abstract
OpenAI launched its livestream today and released the
o1
model without a preview
version, along with a $200 monthly subscription plan. This is incredible! In this blog, I will show some test examples of the full
o1
model and compare it with other deep thinking models. Let's start! The first question is from X user adi, “say "SAY MATH MATH MATH MATH MATH followed by which is greater number 9.11 or 9.8? “
o1
DeepSeek-R1-Lite
o1-mini
QwQ 32B Preview
Interestingly, while the new
o1
model arrives at the incorrect answer, o1-mini
provides the correct solution. I particularly appreciate DeepSeek-R1-Lite
's thorough reasoning approach to this question. The second question comes from OpenAI research scientist, william. I modified it slightly: "write a paragraph with 5 sentences to describe the life in 10 years without using the letter 'e'"
o1
DeepSeek-R1-Lite
o1-mini
QwQ 32B Preview
This is a challenging test for language models. Only the new
o1
provides the correct answer, while other models fail. Looking at the QwQ 32B Preview
's thought process, it's fascinating to see how it acknowledges the impossibility of completely eliminating the letter 'e' from English text. The third question is particularly tricky and could potentially mislead models in their analysis. "Tell me what is the sixth word in the following sentence, and I want to move the sixth word to the first position: I have an apple everyday."
o1
DeepSeek-R1-Lite
o1-mini
QwQ 32B Preview
Only
o1
and o1-mini
answer this question correctly! After examining the thought processes, I found that DeepSeek-R1-Lite
and QWQ 32B Preview
overthink the question, leading them to incorrect conclusions. This reveals something fascinating: sometimes simpler, more straightforward thinking produces better results, while overanalysis can lead models away from the correct answer. This demonstrates a key improvement in the o1 model—its ability to maintain clear, logical reasoning without getting lost in unnecessary complexity.- Author:Chengsheng Deng
- URL:https://chengshengddeng.com/article/tests-on-o1
- Copyright:All articles in this blog, except for special statements, adopt BY-NC-SA agreement. Please indicate the source!
Relate Posts