Thoughts
Dec 4, Recap for November
Nov 17, Recap for October
Oct 11,Recap for September

📬Sep 1, Recap for August

In August, I focused on fine-tuning the Qwen2-7b model and evaluating its performance on our private benchmark consisting of over 200 questions and answers. I evaluated various large language models (LLMs) like GPT-4, Gemini 1.5-Pro, and Llama 3-405b on this benchmark to compare their capabilities in areas such as reasoning, coding, and commonsense.
Sep 1, Recap for August

August 1, Recap for July

In July, I helped my team build a confidential LLM benchmark tailored to our needs due to contamination in public benchmarks. Despite claims, I haven't seen LLMs surpass GPT-4 in practice. Constructing the test set was challenging, and I learned about LLM-as-a-Judge for evaluation. Personally, I experimented with Midjourney, TextGrad, Dify, and DSPy, documenting my experiences in blog posts. Additionally, I started preparing for the PTE exam, aiming for a high score on August 8.
August 1, Recap for July
June27, Recap for June
June 4,  Pieces of Thoughts in June
May 30, Recap for May