📬Sep 1, Recap for August

In August, I focused on fine-tuning the Qwen2-7b model and evaluating its performance on our private benchmark consisting of over 200 questions and answers. I evaluated various large language models (LLMs) like GPT-4, Gemini 1.5-Pro, and Llama 3-405b on this benchmark to compare their capabilities in areas such as reasoning, coding, and commonsense.

Lazy loaded imageAugust 1, Recap for July

In July, I helped my team build a confidential LLM benchmark tailored to our needs due to contamination in public benchmarks. Despite claims, I haven't seen LLMs surpass GPT-4 in practice. Constructing the test set was challenging, and I learned about LLM-as-a-Judge for evaluation. Personally, I experimented with Midjourney, TextGrad, Dify, and DSPy, documenting my experiences in blog posts. Additionally, I started preparing for the PTE exam, aiming for a high score on August 8.
Chengsheng Deng
Chengsheng Deng
Chengsheng Deng
Latest posts
Mar 24 Notes on LightRAG
Mar 24, 2025
Dec 6, Some Tests on o1
Mar 14, 2025
Mar 10, Note on BIG-MATH
Mar 10, 2025
Mar 6, Note on QwQ-32B
Mar 6, 2025
Jan 21, Notes on DeepSeek-R1
Mar 6, 2025
The First Pages of 2025 - My January & February Story
Mar 5, 2025
Announcement
🎉Welcome to my blog🎉 
To find me:
Twitter/X:My X
👏Have fun in my blog👏