Yesterday, I saw an interesting tweet from @Mahesh. His team introduced Bespoke-Stratos-32B, a model distilled from DeepSeek-R1 using Berkeley NovaSky's Sky-T1 recipe. I quickly read their blog post and reviewed Berkeley's recipe to take some notes.
Since OpenAI released its "o1-series" model, several teams have developed their own approaches to "deep thinking" models. DeepSeek introduced their o1-like model, DeepSeek-R1-Lite, while Qwen released QwQ-32B-Preview, and Intern launched Intern Thinker.
While this isn't the first blog about DSPy, I've noticed recent updates to the DSPy documentation and GitHub repository, including a new optimization method called BootstrapFinetune.
This isn't my first blog post on DSPy—I've written several before. However, I've noticed some recent updates to DSPy, and I'd rather not consult the documentation every time I want to build programs. So, I plan to jot down some basic DSPy concepts in this post. Additionally, I intend to use this document as external knowledge for GPT or Claude.