type
status
date
slug
summary
tags
category
password
icon
Author
Abstract
Since May last year, I’ve been recapping my experiences monthly. When 2025 began, I skipped January’s reflection, but I realize now how important it is to document this period. This post reflects on the past two months and captures my thoughts about life and the rapidly evolving AI landscape.

The DeepSeek Revolution

The first two months of 2025 marked a pivotal moment in AI history, particularly with the release of DeepSeek-R1. This groundbreaking model has fundamentally transformed the open-source AI community in several key ways:
  1. Widespread Impact: Its influence spans both industry and academia, revolutionizing how researchers and practitioners approach AI development.
  1. Mainstream Recognition: The model’s reach extends far beyond technical circles - during my recent visit to Melbourne, even my elderly uncle, who has limited technical background, was eager to discuss DeepSeek.
  1. Technical Excellence: For a deeper understanding of the model’s capabilities and technical specifications, you can refer to my detailed notes here: Notes on DeepSeek R1
What I find most valuable about DeepSeek-R1 is the team’s decision to make the model’s thinking process public. This transparency enables extensive data-distillation work, which they’ve already begun exploring in their technical report. This approach stands in stark contrast to OpenAI, which has deliberately chosen not to release their models’ thinking processes. Interestingly, Google initially provided access to the thinking process in their gemini-2.0-flash-thinking-exp-01-21 API, but later disabled this parameter, following OpenAI’s lead.

The Open-Source Response

Following DeepSeek-R1’s release, numerous projects emerged in the open-source community:
  • Some attempted to replicate R1’s development journey
  • Others focused on data distillation, applying DeepSeek-R1’s capabilities to smaller models to enhance their reasoning
  • Many explored using GPRO (the same reinforcement learning algorithm powering DeepSeek-R1) on smaller models to achieve similar “aha” moments
Some of these efforts have successfully demonstrated that smaller models can achieve reasoning performance comparable to O1 and R1. However, most work has concentrated primarily on mathematics rather than expanding to other domains. This limitation likely stems from the relative ease of designing reward rules in mathematics compared to real-world scenarios, where questions are often open-ended without absolute answers. I’m eager to see this research extend into more diverse domains beyond mathematics.

The Broader AI Landscape

The past two months brought numerous other significant developments beyond DeepSeek:
  • OpenAI released o3-mini, a new reasoning model, in January and followed with gpt-4.5-preview in February. I’ve tested gpt-4.5-preview and documented my findings here: Notes on GPT 4.5.
  • Anthropic launched claude-3-7-20250219, offering users the option to enable thinking capabilities or use its general abilities—effectively providing a unified model. My tests revealed impressive performance, detailed here: Notes on Claude 3.7 & Qwen 2.5 Max .
  • Alibaba continued its steady progress, releasing Qwen2.5-Max and its reasoning-focused variant QwQ-Preview. Despite DeepSeek’s prominence overshadowing some of Alibaba’s contributions, it’s worth noting that many data distillation projects still choose Qwen as their base model.

My Recent Work

I’ve dedicated significant time to learning reinforcement learning algorithms. Inspired by DeepSeek’s success, I experimented with GPRO on the Unsloth framework to train models and observe the “aha” moment—when models demonstrate reflection, verification, and reasoning. While this approach yields fascinating results, it’s extremely computationally intensive. Even attempting to train a 3B model on an H100 GPU resulted in out-of-memory errors.
Given my limited computational resources, I’ve pivoted to Supervised Fine-Tuning (SFT). I’m currently training Qwen2.5-32B-Instruct with distillation techniques based on DeepSeek-R1. The model’s performance still has room for improvement, which I hope to share in next month’s update.

Why I Write

I’m often asked, “Why do you love writing things down?” even though my audience is limited and many topics seem trivial. My reasons are threefold:
  1. Personal Documentation: Writing is primarily for myself, not others. I document my life, thoughts, and learning to preserve experiences that might otherwise fade from memory.
  1. Clarity of Thought: Writing reveals my thinking process. When I struggle to express something clearly, it signals that my understanding is incomplete or my thoughts are disorganized.
  1. Building a Second Brain: Writing forms the foundation of my second brain—a system for organizing thoughts and knowledge. By writing and linking ideas, I discover connections between concepts and build a more structured knowledge base. For example, writing about DeepSeek-R1 allows me to connect it with previous notes on AI models, revealing patterns in AI development I might otherwise miss.

Looking Forward

These reflections capture my experiences, feelings, and thoughts from the beginning of 2025. I commit to continuing these monthly reflections, documenting my journey through this remarkable era of AI advancement.
Mar 6, Note on QwQ-32BFeb 20, Notes on Grok3
Loading...
Chengsheng Deng
Chengsheng Deng
Chengsheng Deng
Latest posts
Mar 24 Notes on LightRAG
Mar 24, 2025
Dec 6, Some Tests on o1
Mar 14, 2025
Mar 10, Note on BIG-MATH
Mar 10, 2025
Mar 6, Note on QwQ-32B
Mar 6, 2025
Jan 21, Notes on DeepSeek-R1
Mar 6, 2025
The First Pages of 2025 - My January & February Story
Mar 5, 2025
Announcement
🎉Welcome to my blog🎉 
To find me:
Twitter/X:My X
👏Have fun in my blog👏