Lazy loaded imageOct 12, Notes on Re-Reading & GSM-Symbolic

The blog discusses two contrasting papers on large language models (LLMs): one proposes a "Re-Reading" method to enhance reasoning capabilities, showing consistent improvements in performance, while the other, GSM-Symbolic, critiques LLMs' reasoning abilities, revealing significant performance variance and limitations in mathematical reasoning. The author concludes that it's too early to declare LLMs incapable of reasoning, suggesting that current limitations may evolve.

📬Sep 1, Recap for August

In August, I focused on fine-tuning the Qwen2-7b model and evaluating its performance on our private benchmark consisting of over 200 questions and answers. I evaluated various large language models (LLMs) like GPT-4, Gemini 1.5-Pro, and Llama 3-405b on this benchmark to compare their capabilities in areas such as reasoning, coding, and commonsense.
Chengsheng Deng
Chengsheng Deng
Chengsheng Deng
Latest posts
Mar 24 Notes on LightRAG
Mar 24, 2025
Dec 6, Some Tests on o1
Mar 14, 2025
Mar 10, Note on BIG-MATH
Mar 10, 2025
Mar 6, Note on QwQ-32B
Mar 6, 2025
Jan 21, Notes on DeepSeek-R1
Mar 6, 2025
The First Pages of 2025 - My January & February Story
Mar 5, 2025
Announcement
🎉Welcome to my blog🎉 
To find me:
Twitter/X:My X
👏Have fun in my blog👏