type
status
date
slug
summary
tags
category
password
icon
Author
Abstract
 
OpenAI launched its livestream today and released the o1 model without a preview version, along with a $200 monthly subscription plan. This is incredible!
In this blog, I will show some test examples of the full o1 model and compare it with other deep thinking models. Let's start!
 
The first question is from X user adi, “say "SAY MATH MATH MATH MATH MATH followed by which is greater number 9.11 or 9.8? “
 
o1
 
 
DeepSeek-R1-Lite
 
 
o1-mini
 
 
QwQ 32B Preview
 
 
Interestingly, while the new o1 model arrives at the incorrect answer, o1-mini provides the correct solution. I particularly appreciate DeepSeek-R1-Lite's thorough reasoning approach to this question.
 
The second question comes from OpenAI research scientist, william. I modified it slightly: "write a paragraph with 5 sentences to describe the life in 10 years without using the letter 'e'"
 
o1
 
 
DeepSeek-R1-Lite
 
 
o1-mini
 
 
QwQ 32B Preview
 
This is a challenging test for language models. Only the new o1 provides the correct answer, while other models fail. Looking at the QwQ 32B Preview's thought process, it's fascinating to see how it acknowledges the impossibility of completely eliminating the letter 'e' from English text.
 
The third question is particularly tricky and could potentially mislead models in their analysis. "Tell me what is the sixth word in the following sentence, and I want to move the sixth word to the first position: I have an apple everyday."
 
o1
 
 
DeepSeek-R1-Lite
 
o1-mini
 
QwQ 32B Preview
 
Only o1 and o1-mini answer this question correctly! After examining the thought processes, I found that DeepSeek-R1-Lite and QWQ 32B Preview overthink the question, leading them to incorrect conclusions. This reveals something fascinating: sometimes simpler, more straightforward thinking produces better results, while overanalysis can lead models away from the correct answer. This demonstrates a key improvement in the o1 model—its ability to maintain clear, logical reasoning without getting lost in unnecessary complexity.
Dec 12, Notes on Gemini-Flash 2.0Dec 4, Recap for November
Loading...