type
status
date
slug
summary
tags
category
password
icon
Author
Abstract
Yesterday, I saw an interesting tweet from @Mahesh. His team introduced Bespoke-Stratos-32B, a model distilled from DeepSeek-R1 using Berkeley NovaSky's Sky-T1 recipe. I quickly read their blog post and reviewed Berkeley's recipe to take some notes.
TThe team open-sourced everything with the community:
  • 32B Model and 7B Model
  • Reasoning Dataset
  • Data Curation Code

Data Curation

The team used Bespoke Curator with DeepSeek-R1 to create the synthetic reasoning dataset in just 1.5 hours. Here are the key differences:
  • They used DeepSeek-R1 as the teacher reasoning model instead of QwQ
  • They skipped using gpt-4o-mini for reformatting reasoning traces since DeepSeek-R1's traces were already well-formatted and coherent
  • They opted for gpt-4o-mini instead of Sky-T1's parsing logic (which uses regex and sympy) to filter out incorrect solutions
The dataset Bespoke-Stratos-17k contains the following subsets:
  • Numina: 10.5k samples from the math, olympiads, and amc_aime subset of the difficulty-labeled Numina dataset
  • APPS: ~2.5k samples from the APPs dataset
  • TACO: ~3k samples from the TACO dataset
  • STILL-2: ~1k samples from the STILL-2 dataset

Performance

Here is the performance of Bespoke-Stratos-32B:
notion image
The performance of Bespoke-Stratos-7B is shown below:
notion image

Sky-T1

Here is the Sky-T1 blog post: https://novasky-ai.github.io/posts/sky-t1/
The Berkeley team generated 17,000 training samples using Alibaba's QwQ-32B-Preview model, then used gpt-4o-mini to rewrite the reasoning traces and applied reject sampling to enhance data quality. The process is shown below:
notion image

Evaluation Results

notion image

Findings:

  • Model Size Matters
  • Data Mixture Matters
 
Feb 8, Notes on Policy Gradient Jan 21, Notes on DeepSeek-R1
Loading...
Chengsheng Deng
Chengsheng Deng
Chengsheng Deng
Latest posts
Sep 19, Bellman Equation
Feb 24, 2025
Feb 20, Notes on Grok3
Feb 20, 2025
Jan 23, Notes on Bespoke and NovaSky
Feb 20, 2025
Jan 21, Notes on Sarsa & Q-Learning
Feb 20, 2025
Feb 8, Notes on Policy Gradient
Feb 20, 2025
August 17, Instruction Data Generation
Jan 23, 2025
Announcement
🎉Welcome to my blog🎉 
To find me:
Twitter/X:My X
👏Have fun in my blog👏