June 26, TextGrad | BubbleBrain

type

status

date

slug

summary

What is TextGrad

TextGrad is an innovative autograd engine, particularly tailored for textual gradients. As a robust framework, it facilitates automatic meticulously implements backpropagation using feedback provided by advanced Large Language Models (LLMs), firmly anchored in the gradient metaphor. From my perspective, the concept of TextGrad closely mirrors the principles of self-reflection because both paradigms depend heavily on feedback from LLMs. The parallel lies in how each system utilizes iterative responses from language models to refine and improve their outputs, thereby enhancing the overall quality and accuracy of the generated content.

Alice in Wonderland Problem

Since Nezhurina, et al.2024 demonstrated that a straightforward task can significantly undermine the reasoning capabilities of Large Language Models (LLMs), I repeatedly utilize this particular problem to evaluate the cognitive robustness and reasoning proficiency of LLMs. Therefore, the problem statement is as follows:

✂️

Alice has 3 sisters and she also has 4 brothers. How many sisters does Alice’s brother have?

Let's use gpt-4o to answer this question firstly (certainly, it need OPENAI API KEY).

gpt-4o provides the answer Alice has 3 sisters and 4 brothers. Since Alice is one of the sisters, her brothers also have the same number of sisters. Therefore, each of Alice's brothers has 3 sisters. However, this response is evidently incorrect as it fails to acknowledge Alice herself as one of her brother's sisters.

So, it's time to unveil the enchantment of TextGrad. If you are very familiar with Pytorch, I believe you can easily handle this！

Now, Let's see how the answer changes.

Actually when epoch=1, the model can answer the question correctly. It counts Alice herself as one of the sisters of her brothers! If you still don't believe this result, you can print the answer again to check.

Shirts Dry Time Calculation

This challenge is particularly tricky and originates from a post on Reddit.

✂️

If it takes 1 hour to dry 25 shirts under the sun, how long will it take to dry 30 shirts under the sun? Reason step by step

The TextGrad official Github use gpt-4o to solve this question as a demo but here, I will try to use gpt-3.5-turbo to see whether it can get the right answer.

The steps are very similar to the previous question. First, use gpt-3.5-turbo to answer this question.

Obviously, this is not the correct solution because we can dry the shirts under the sun simultaneously! So, it is the time to show TextGrad. Given the current context, where nuances in performance between different versions.gpt-3.5-turbo is considered inferior to both gpt-4o and gpt-4-turbo, I decided to configure epoch=20 in order to efficiently track and display the answer change log. This would help in observing the evolution of responses.

Now, let's see how the answer changes.

Actually when epoch=6, the model can answer the question correctly. It recognizes shirts should be dried evenly under the sun, so the drying time should not be changed even if the number of shirts are increased. You can now print the answer again to check.

Conclusion

In this blog post, I demonstrate how to use TextGrad effectively to address challenging questions, even with a less advanced model like gpt-3.5-turbo. With a solid understanding of Pytorch, I believe you will find TextGrad quite easy to use.