Math&Statistics

😶‍🌫️Dec 16, Notes on DP, Monte Carlo, TD in Reinforcement Learning

Exploration of three key reinforcement learning algorithms: Dynamic Programming (DP) for optimal policies in MDPs, Monte Carlo methods for learning from complete episodes without a model, and Temporal Difference (TD) learning for efficient updates from incomplete episodes using bootstrapping. Each method has unique characteristics and trade-offs essential for understanding advanced concepts in reinforcement learning.
Chengsheng Deng
Chengsheng Deng
Chengsheng Deng
Latest posts
Mar 24 Notes on LightRAG
Mar 24, 2025
Dec 6, Some Tests on o1
Mar 14, 2025
Mar 10, Note on BIG-MATH
Mar 10, 2025
Mar 6, Note on QwQ-32B
Mar 6, 2025
Jan 21, Notes on DeepSeek-R1
Mar 6, 2025
The First Pages of 2025 - My January & February Story
Mar 5, 2025
Announcement
🎉Welcome to my blog🎉 
To find me:
Twitter/X:My X
👏Have fun in my blog👏