type
status
date
slug
summary
tags
category
password
icon
Author
Abstract
Welcome to my latest notes on Grok 3. In this blog post, I'll share my observations and highlight some fascinating test cases comparing Grok 3 with deepseek-r1 and o3-mini.

Information

XAI has introduced Grok 3 with two beta reasoning models: Grok 3 (Think) and Grok 3 mini(Think). These models were trained using reinforcement learning (RL) at an unprecedented scale, refining their chain-of-thought processes to enable advanced, data-efficient reasoning.
Below is a benchmark graph showing Grok 3's thinking model performance:
notion image
 
For the general model, Grok 3 with a context window of 1 million tokens also demonstrates very impressive performance. Here it is:
notion image

Interesting Test Cases

Dave W Plummer conducted a fascinating Breakout test with Grok 3. Here are the results
 
The initial prompt was simple: "How about a colored version of Breakout?" The first revision requested, "Make the player move automatically under computer control, and make the ball go 10% faster each time it bounces off the paddle." The final revision addressed a gameplay issue: "Good, but the ball can get stuck in a vertical bounce. How did the original game handle that? Do the same! And make the player aim for remaining bricks."
For detailed information, you can check here: Breakout by Grok3
Theo-t3.gg shows Grok 3 is not great at coding. Here is his demonstration case:
 
Alex Prompter tested Grok 3 and DeepSeek v3 with the same critical prompts. His extensive comparison tests revealed multiple insights. For more details, see: Grok 3 VS. DeepSeek V3
Andrej Karpathy conducted a thorough comparison between Grok 3, OpenAI's o1-pro, and DeepSeek-R1. His tests showed Grok 3's strong performance in reasoning tasks, such as Settlers of Catan board generation and GPT-2 training flop estimation. However, the model struggled with complex spatial tasks, particularly generating accurate SVG images of a pelican riding a bicycle. For the complete analysis, see: Grok 3 test by Andrej Karpathy
 
<ins/>
May 24, Prompt Engineering Feb 8, Notes on Policy Gradient
Loading...
Chengsheng Deng
Chengsheng Deng
Chengsheng Deng
Latest posts
Sep 19, Bellman Equation
Feb 24, 2025
Feb 20, Notes on Grok3
Feb 20, 2025
Jan 23, Notes on Bespoke and NovaSky
Feb 20, 2025
Jan 21, Notes on Sarsa & Q-Learning
Feb 20, 2025
Feb 8, Notes on Policy Gradient
Feb 20, 2025
August 17, Instruction Data Generation
Jan 23, 2025
Announcement
🎉Welcome to my blog🎉 
To find me:
Twitter/X:My X
👏Have fun in my blog👏