LLM | Tags | BubbleBrain

🌓Nov 6, Notes on Contextual Retrieval

Contextual Retrieval, a method proposed by Anthropic, significantly enhances the retrieval step in RAG systems.

2024

🤩Oct 30, LLMs cannot Play the Snake Game

The blog introduces a novel method for evaluating LLM performance by having them play the Snake game, assessing their decision-making, planning, and strategy skills. The experiment tested several models, revealing that o1-mini performed best with a score of 11, while Claude models outperformed GPT models. The findings suggest that reinforcement learning significantly enhances LLMs' capabilities in dynamic decision-making tasks. Although preliminary, this approach highlights the potential of game-based assessments for deeper insights into LLM competencies, with recommendations for further testing across more models and scenarios.

2024

LLM

Evaluation

Oct 18, Notes on LIGHTRAG

The blog discusses LIGHTRAG, an innovative framework for Retrieval-Augmented Generation (RAG) systems that enhances performance by incorporating graph structures and dual-level retrieval processes. It outlines the challenges faced by traditional RAG systems, such as speed, quality, and understanding limitations, and explains how LightRAG addresses these issues through efficient text indexing and retrieval methods. The framework allows for both specific and abstract queries, improving the ability to handle complex questions and providing tailored responses using a general-purpose LLM.

LLM

RAG

2024

Oct 12, Notes on Re-Reading & GSM-Symbolic

The blog discusses two contrasting papers on large language models (LLMs): one proposes a "Re-Reading" method to enhance reasoning capabilities, showing consistent improvements in performance, while the other, GSM-Symbolic, critiques LLMs' reasoning abilities, revealing significant performance variance and limitations in mathematical reasoning. The author concludes that it's too early to declare LLMs incapable of reasoning, suggesting that current limitations may evolve.

LLM

2024

Oct 12, Notes on Re-Reading & GSM-Symbolic

📌Sep 25，Notes on Gemini models

Google has announced significant updates to their production-ready Gemini models: Gemini-1.5-Pro-002 and Gemini-1.5-Flash-002.

Sep 19,Notes on Qwen2.5

The Qwen Team has released the new Qwen2.5 series models, potentially the largest open-source release in history.

👉🏿Sep 13, Notes on OpenAI o1 series models

OpenAI has introduced its new o1 series models, which are large language models trained utilizing reinforcement learning techniques to enhance complex reasoning capabilities.

Sep 9, test DeepSeek-V2.5 and Reflection-70b

This blog post offers a personal evaluation of two recently released language models: DeepSeek-V2.5 and Reflection-70b.

LLM

Evaluation

2024

Sep 9, test DeepSeek-V2.5 and Reflection-70b

🗞️Sep 3, Notes on Anthropic Prompt Tutorial

In this blog, I will share some notes and thoughts about learning the Anthropic Prompt Tutorial. Here is the link to the tutorial.

LLM

prompt

2024

Aug 21, GPT-4o-mini with DSPy MIPRO on MMLU-Pro

This post builds upon my previous blog of GPT-4o-mini's performance on MMLU Pro using BootstrapFewShotWithRandomSearch and BootstrapFewShotWithOptuna. In this continuation, I will examine the newly introduced optimizers, MIPRO and MIPROV2, to assess their optimization capabilities and determine the potential performance enhancements they may bring to GPT-4o-mini.

LLM

DSPy

2024

🧤August 19, Summarize Web Page Content with Claude3

This concise tutorial, sourced from Anthropic's official GitHub, will guide you on using Claude3 to summarize web page content. Unlike the official tutorial, this one utilizes the model claude-3-5-sonnet-20240620 and uses content from my personal web page as an example to send to the LLM.

LLM

2024

API

August 19, Summarize Web Page Content with Claude3

August 17, Instruction Data Generation

More researchers are recognizing the significance of instruction data during the Supervised Fine-Tuning (SFT) stage. In June, I wrote a blog about data generation, but I believe it was somewhat superficial and insufficient. Since then, many new methods have emerged. Therefore, I aim to cover more papers I've read to discuss instruction data generation and selection.

LLM

2024

synthetic data generation

1 2 3

BubbleBrain