AI | Category | Singularity Gallery

👉🏿Sep 13, Notes on OpenAI o1 series models

OpenAI has introduced its new o1 series models, which are large language models trained utilizing reinforcement learning techniques to enhance complex reasoning capabilities.

2024

LLM

Evaluation

news

Sep 13, Notes on OpenAI o1 series models

Sep 9, test DeepSeek-V2.5 and Reflection-70b

This blog post offers a personal evaluation of two recently released language models: DeepSeek-V2.5 and Reflection-70b.

LLM

Evaluation

2024

Sep 9, test DeepSeek-V2.5 and Reflection-70b

🗞️Sep 3, Notes on Anthropic Prompt Tutorial

In this blog, I will share some notes and thoughts about learning the Anthropic Prompt Tutorial. Here is the link to the tutorial.

LLM

prompt

2024

🎇Aug 26, Flux + LoRA

In this blog post, I'll demonstrate how to use LoRA to train a model that generates images in your unique personal style. It's surprisingly simple!

flux

art

2024

Aug 21, GPT-4o-mini with DSPy MIPRO on MMLU-Pro

This post builds upon my previous blog of GPT-4o-mini's performance on MMLU Pro using BootstrapFewShotWithRandomSearch and BootstrapFewShotWithOptuna. In this continuation, I will examine the newly introduced optimizers, MIPRO and MIPROV2, to assess their optimization capabilities and determine the potential performance enhancements they may bring to GPT-4o-mini.

LLM

DSPy

2024

🧤August 19, Summarize Web Page Content with Claude3

This concise tutorial, sourced from Anthropic's official GitHub, will guide you on using Claude3 to summarize web page content. Unlike the official tutorial, this one utilizes the model claude-3-5-sonnet-20240620 and uses content from my personal web page as an example to send to the LLM.

LLM

2024

API

August 19, Summarize Web Page Content with Claude3

August 17, Instruction Data Generation

More researchers are recognizing the significance of instruction data during the Supervised Fine-Tuning (SFT) stage. In June, I wrote a blog about data generation, but I believe it was somewhat superficial and insufficient. Since then, many new methods have emerged. Therefore, I aim to cover more papers I've read to discuss instruction data generation and selection.

LLM

2024

synthetic data generation

July 31, LLM/VLM-as-a-Judge

With the rapid development of LLMs, the community requires an efficient and accurate method to automatically evaluate LLM performance, as human annotation is tedious and time-consuming. LLM-as-a-Judge is now an optimized solution for this need.

July 23, DSPy with GPT-4o-mini on MMLU-Pro

DSPy is an optimization framework that enhances prompts and responses from models like GPT-4o-mini. It showcases the magic of the framework and demonstrates how to use its powerful optimizers to improve the cost-effective model. The MMLU-Pro dataset is an advanced dataset with complex questions and increased answer choices. The evaluation metric is defined to check if the model's responses match the true answers.

LLM

DSPy

2024

🚞July 23, Test with Chameleon From Meta

In this short blog, I will test Chameleon, the newest multimodal model from Meta. The baseline models I will choose are GPT-4o, Gemini-1.5-pro, Yi-vision and Yi-Vision-with-TextGrad.

July 16, LLMs Evals Thoughts

Evaluating LLMs is important for understanding their abilities and solving real business problems. A good evaluation requires sufficient and high-quality data samples, clear judging criteria, meaningful evaluation tasks, and frequent private benchmarks. The process should adapt to the development of LLMs over time.

LLM

Evaluation

2024

July 5, LLMs Evaluation Benchmarks

As the capabilities of Large Language Models (LLMs) continue to evolve, many traditional evaluation benchmarks may require updates. With the rapid progress of these models, researchers are increasingly introducing new evaluation datasets. However, the specific dimensions these datasets assess in the models are often unclear. In this blog, I will explore a series of commonly referenced evaluation datasets and highlight the particular aspects of model capabilities they were designed to assess even though I may not cover all available datasets.

LLM

Evaluation

2024

1 2 3

Chengsheng Deng