type
status
date
slug
summary
tags
category
password
icon
Author
Abstract
Yesterday, I saw an interesting tweet from @Mahesh. His team introduced Bespoke-Stratos-32B, a model distilled from DeepSeek-R1 using Berkeley NovaSky's Sky-T1 recipe. I quickly read their blog post and reviewed Berkeley's recipe to take some notes.
The blog link is here: https://www.bespokelabs.ai/blog/bespoke-stratos-the-unreasonable-effectiveness-of-reasoning-distillation
TThe team open-sourced everything with the community:
- 32B Model and 7B Model
- Reasoning Dataset
- Data Curation Code
Data Curation
The team used Bespoke Curator with DeepSeek-R1 to create the synthetic reasoning dataset in just 1.5 hours. Here are the key differences:
- They used DeepSeek-R1 as the teacher reasoning model instead of QwQ
- They skipped using gpt-4o-mini for reformatting reasoning traces since DeepSeek-R1's traces were already well-formatted and coherent
- They opted for gpt-4o-mini instead of Sky-T1's parsing logic (which uses regex and sympy) to filter out incorrect solutions
The dataset
Bespoke-Stratos-17k
contains the following subsets:- Numina: 10.5k samples from the
math
,olympiads
, andamc_aime
subset of the difficulty-labeled Numina dataset
- APPS: ~2.5k samples from the
APPs
dataset
- TACO: ~3k samples from the
TACO
dataset
- STILL-2: ~1k samples from the
STILL-2
dataset
Performance
Here is the performance of Bespoke-Stratos-32B:

The performance of Bespoke-Stratos-7B is shown below:

Sky-T1
Here is the Sky-T1 blog post: https://novasky-ai.github.io/posts/sky-t1/
The Berkeley team generated 17,000 training samples using Alibaba's QwQ-32B-Preview model, then used gpt-4o-mini to rewrite the reasoning traces and applied reject sampling to enhance data quality. The process is shown below:

Evaluation Results

Findings:
- Model Size Matters
- Data Mixture Matters
- Author:Chengsheng Deng
- URL:https://chengshengddeng.com/article/notes-on-bespoke-and-novasky
- Copyright:All articles in this blog, except for special statements, adopt BY-NC-SA agreement. Please indicate the source!
Relate Posts