May 24, Prompt Engineering

type

status

date

slug

summary

1-Chain of Thought(CoT)

Chain of Thought (Wei, et al. 2022) can enhance the performance of large language models in complex reasoning tasks. It is just a series of intermediate reasoning steps provided as few shot examples in prompting.

An example of Few-Shot CoT is as follows

Fig 1.Standard Few Shot Prompting VS.Chain of Thought Prompting (Image Source: Wei, et al. 2022)

Standard Few-Shot Prompting presents QA pairs as examples to large language models. On the other hand, Chain of Thought Prompting includes reasoning steps within these examples, making it easier for the models to correctly answer new questions.

However, this few-shot CoT may not always perform well on all tasks and models. There are some key findings for this few-shot CoT method.

Applying Few-shot CoT Prompting to smaller models could result in poorer performance compared to Standard Prompting. This is due to the potential for smaller models to generate illogical thought sequences.

Few-shot CoT Prompting typically performs well on complex problems. However, for problems that require only a single step to solve, Few-Shot CoT may not yield improvements.

Proper use of Few-shot CoT Prompting can outperform certain task-specific models that are fine-tuned on a specifically labelled training dataset.

In fact, there is another type of Chain of Thought Prompting known as Zero-shot-CoT(Kojima, et al. 2022). This shows that large language models can be effective zero-shot reasoners by simply adding "Let's think step by step" before each answer. Below is an example of the Zero-shot-CoT.

Fig 2. Few-shot vs. Few-shot CoT vs. Zero-shot vs. Zero-shot-CoT(Image Source: Kojima et al. 2022)

Even though Zeo-shot-CoT is very simple, its idea is very important. It divides the output of the large language model into two stages. Here is another example:

Fig 3. Two stages Zero-shot CoT(Image Source: Kojima et al.2022)

reasoning extraction

In this stage, we use the trigger sentence, "Let’s think step by step," to extract the reasoning steps for the question.

answering extraction

At this stage, we input the reasoning steps into the large language model along with the original question to derive the correct answer.

In alignment with Few-shot CoT findings, Zero-shot CoT performs well with large model sizes.

2-Self-consistency

Self-consistency(Wang, et al. 2022) is a new decoding strategy for CoT prompting. It samples multiple reasoning paths and then selects the answer with the most votes. [Personal opinion: There are numerous aggregation strategies, but majority votes are straightforward and beneficial.]

Below is an example that compares CoT prompting and Self-consistency.

Fig 4. Comparison between CoT Prompting and Self-consistency method (Image Source: Wang, et al.2022)

There are 3 steps in Self-consistency method.

First, we need to provide the language model with a set of well-written CoT examples, just as we do in CoT prompting.

Next, we will replace the naive greedy decoding method with the Self-consistency method. This will allow the language model to generate multiple candidate answers.

Finally, we will select the answer that is the most consistent among those generated.

3-Tree of Thoughts(ToT)

Tree of Thoughts (ToT) (Yao, et al. 2023) is a framework that extends CoT Prompting to solve general problems using language models. It breaks down the problem into multiple thought steps, generating multiple thoughts per step. We can use either Breadth-First Search (BFS) or Depth-First Search (DFS) algorithms to explore the tree's breadth or depth. For each state in which we generate thoughts, we can evaluate using majority vote or a prompt.

4-SelfCheck

SelfCheck (Miao, et al. 2023) is a zero-shot verification scheme used to identify errors. Unlike other verification methods, this one doesn't require any fine-tuning or external tools and can be directly applied to various tasks.

The main steps in SelfCheck are as follows:

Target extraction In this step, we aim to ensure the model comprehends the goal of the current stage. We will present the model question and all previous stages.

Information collection In this step, we should employ a language model to filter out irrelevant information and select useful context for the current stage.

Step regeneration Since we already have the target and necessary context, we can now ask the model to reformulate the question using this collected information.

Result comparison The final step requires comparing the results from the original steps to those from the regeneration step. There can be three potential outcomes. If the regenerated output supports or contradicts the original output, it suggests that the original step is correct or incorrect, respectively. Sometimes, an additional outcome emerges that is "not directly related to" the initial problem. When simplifying equations, there are always some valid solutions available, which can lead to these indirect results.

In addition to the SelfCheck method, Miao, et al. 2023 illustrated that directly requesting the Language Model (LLM) to validate an entire solution, without evaluating individual steps, tends to be ineffective. The reasons are as follows:

There are multiple aspects that need to be addressed simultaneously for the LLM to check.

Checking is a less common task in the training corpus.

Likely, there are strong correlations between the errors made by these checkers and the original generation errors, which undermines their effectiveness.

5-Self-ask

Self-ask (Press, et al. 2022) is a novel method that builds on CoT. This technique enables the model to explicitly question and answer itself before responding to the original query. The image presented below effectively illustrates the differences between Direct Prompting, CoT, and Self-ask.

Fig 6. Differences between Direct Prompting, CoT and Self-ask (Image Source: Press, et al.2022)

Typically, the concept of Self-ask relies on a few demonstrations, acting as few-shot examples, to function effectively. This approach ensures that the models understand the context and are able to generate relevant responses. The prompt used to initiate this process often adheres to a specific format, as detailed below:

Unlike CoT, Self-ask decomposes the question into several sub-questions for easier resolution. This tends to make the model produce shorter answers than CoT.

6-Step-Back

Step-Back (Zheng, et al. 2023) is a prompt technique that enables a model to understand the principles behind questions and subsequently perform reasoning to generate answers. This process involves two main steps: abstraction and reasoning.

Abstraction: This initial step involves deriving higher-level concepts or principles from the model. For instance, to solve a specific math problem, we would have the model abstract the underlying concept, such as a formula.

Reasoning: With the concept obtained in the first step, the model can then reason out the solution to the original problem.

The examples of a Step-Back are as follows:

Fig 7. Differences between CoT and Step-Back Prompting (Image Source: Zheng, et al. 2023)

7-Automatic Prompt Engineer (APE)

APE (Zhou, et al. 2022), standing for Automatic Prompt Engineer, represents an automatic workflow for instruction generation and selection. First, it generates a set of candidate prompts, then uses a scoring function to select the prompt with the highest score.

There are two methods of generating candidate prompts.

Forward Mode Generation This process involves providing the model with the context and input-output pairs, then requesting the model to complete instruction at the end of prompt.

Reverse Mode Generation This process involves inserting an instruction into the prompt, which then leverages the model's reasoning ability to complete the missing instruction.

For the score functions, there are 3 metrics to measure the candidate prompts.

Execution accuracy it defined as the 0-1 loss,

Log Probability it is

Efficient Score Estimation This method efficiently allocates more computational resources to promising candidate prompts, while limiting resources for low-quality prompts.

Instead of solely sampling directly from initial proposals, you can also use Iterative Monte Carlo Search to enhance the quality of generated prompts. The prompt for resampling is as follows:

8-Emotional Stimuli

The study Emotional Stimuli (Li, et al. 2023) shows that models can understand emotional intelligence and their performance can be enhanced with this approach. Using EmotionalPrompt is straightforward; you simply append it to the initial prompts. The example is as follows.

Fig 8. Emotion Prompt(Image Source: Li, et al.2023)

9-Case Study

Fine-tuning and prompt engineering are commonly used techniques to enhance model performance. Nevertheless, fine-tuning is often preferred over prompt engineering. Microsoft released a composition of prompting strategies (Nori, et al. 2023) that compares the performance of a model using fine-tuning.

The components of prompt strategies are as follows:

Dynamic Few-shot CoT We can create a large training set comprised of different examples for various tasks. Given a test question, we can use a K-NN cluster in the embedding space to retrieve several of the nearest k neighbors from the training set as few-shot examples. Subsequently, we can instruct GPT-4 to automatically generate CoT for the retrieved training examples.

Self-Consistency To avoid bias in GPT-4, self-consistency is necessary.

The result of the experiments in this paper showed combining Few-shot CoT and Self-Consistency outperformed fine-tuning alone. This suggests the potential benefits of integrating various prompting strategies to optimize model performance.

The experiment results in this paper indicate that the combination of Few-shot CoT and Self-Consistency performs better than fine-tuning alone. This suggests the potential benefits of integrating various prompting strategies to optimize model performance.

10-Summary

In this blog, I review several useful prompt techniques, acknowledging that there are many more to explore. I highly recommend reading the original papers relating to these techniques, especially the sections detailing their application in various experiments. In the realm of prompt engineering, practice plays a crucial role. To truly master it, consistent and rigorous practice is key.