The First Pages of 2025 - My January & February Story

type

status

date

slug

summary

The DeepSeek Revolution

The first two months of 2025 marked a pivotal moment in AI history, particularly with the release of DeepSeek-R1. This groundbreaking model has fundamentally transformed the open-source AI community in several key ways:

Widespread Impact: Its influence spans both industry and academia, revolutionizing how researchers and practitioners approach AI development.

Mainstream Recognition: The model’s reach extends far beyond technical circles - during my recent visit to Melbourne, even my elderly uncle, who has limited technical background, was eager to discuss DeepSeek.

Technical Excellence: For a deeper understanding of the model’s capabilities and technical specifications, you can refer to my detailed notes here: Notes on DeepSeek R1

What I find most valuable about DeepSeek-R1 is the team’s decision to make the model’s thinking process public. This transparency enables extensive data-distillation work, which they’ve already begun exploring in their technical report. This approach stands in stark contrast to OpenAI, which has deliberately chosen not to release their models’ thinking processes. Interestingly, Google initially provided access to the thinking process in their gemini-2.0-flash-thinking-exp-01-21 API, but later disabled this parameter, following OpenAI’s lead.

The Open-Source Response

Following DeepSeek-R1’s release, numerous projects emerged in the open-source community:

Some attempted to replicate R1’s development journey

Others focused on data distillation, applying DeepSeek-R1’s capabilities to smaller models to enhance their reasoning

Many explored using GPRO (the same reinforcement learning algorithm powering DeepSeek-R1) on smaller models to achieve similar “aha” moments

Some of these efforts have successfully demonstrated that smaller models can achieve reasoning performance comparable to O1 and R1. However, most work has concentrated primarily on mathematics rather than expanding to other domains. This limitation likely stems from the relative ease of designing reward rules in mathematics compared to real-world scenarios, where questions are often open-ended without absolute answers. I’m eager to see this research extend into more diverse domains beyond mathematics.

The Broader AI Landscape

The past two months brought numerous other significant developments beyond DeepSeek:

OpenAI released o3-mini, a new reasoning model, in January and followed with gpt-4.5-preview in February. I’ve tested gpt-4.5-preview and documented my findings here: Notes on GPT 4.5.

Anthropic launched claude-3-7-20250219, offering users the option to enable thinking capabilities or use its general abilities—effectively providing a unified model. My tests revealed impressive performance, detailed here: Notes on Claude 3.7 & Qwen 2.5 Max .

Alibaba continued its steady progress, releasing Qwen2.5-Max and its reasoning-focused variant QwQ-Preview. Despite DeepSeek’s prominence overshadowing some of Alibaba’s contributions, it’s worth noting that many data distillation projects still choose Qwen as their base model.

My Recent Work

I’ve dedicated significant time to learning reinforcement learning algorithms. Inspired by DeepSeek’s success, I experimented with GPRO on the Unsloth framework to train models and observe the “aha” moment—when models demonstrate reflection, verification, and reasoning. While this approach yields fascinating results, it’s extremely computationally intensive. Even attempting to train a 3B model on an H100 GPU resulted in out-of-memory errors.

Given my limited computational resources, I’ve pivoted to Supervised Fine-Tuning (SFT). I’m currently training Qwen2.5-32B-Instruct with distillation techniques based on DeepSeek-R1. The model’s performance still has room for improvement, which I hope to share in next month’s update.

Why I Write

I’m often asked, “Why do you love writing things down?” even though my audience is limited and many topics seem trivial. My reasons are threefold:

Personal Documentation: Writing is primarily for myself, not others. I document my life, thoughts, and learning to preserve experiences that might otherwise fade from memory.

Clarity of Thought: Writing reveals my thinking process. When I struggle to express something clearly, it signals that my understanding is incomplete or my thoughts are disorganized.

Building a Second Brain: Writing forms the foundation of my second brain—a system for organizing thoughts and knowledge. By writing and linking ideas, I discover connections between concepts and build a more structured knowledge base. For example, writing about DeepSeek-R1 allows me to connect it with previous notes on AI models, revealing patterns in AI development I might otherwise miss.

Looking Forward

These reflections capture my experiences, feelings, and thoughts from the beginning of 2025. I commit to continuing these monthly reflections, documenting my journey through this remarkable era of AI advancement.