Yesterday, I saw an interesting tweet from @Mahesh. His team introduced Bespoke-Stratos-32B, a model distilled from DeepSeek-R1 using Berkeley NovaSky's Sky-T1 recipe. I quickly read their blog post and reviewed Berkeley's recipe to take some notes.
Black Forest Labs has announced the launch of the FLUX Pro Finetuning API, bringing unprecedented customization capabilities to the flagship FLUX Pro model.
Exploration of three key reinforcement learning algorithms: Dynamic Programming (DP) for optimal policies in MDPs, Monte Carlo methods for learning from complete episodes without a model, and Temporal Difference (TD) learning for efficient updates from incomplete episodes using bootstrapping. Each method has unique characteristics and trade-offs essential for understanding advanced concepts in reinforcement learning.