With the rapid development of LLMs, the community requires an efficient and accurate method to automatically evaluate LLM performance, as human annotation is tedious and time-consuming. LLM-as-a-Judge is now an optimized solution for this need.
In this short blog, I will test Chameleon, the newest multimodal model from Meta. The baseline models I will choose are GPT-4o, Gemini-1.5-pro, Yi-vision and Yi-Vision-with-TextGrad.