How to Test AI for a Specific Purpose

Random modifications work against you

Jun 19, 2024

There’s been a lot of discussions about AI testing at work these days and that has made me think about all the efforts I’ve put, on my own learning path with AI.

One thing is clear: figuring out AI’s a mess.

Understanding Large Language Models (LLMs) is like watching a wizard cast spells: the outcomes are astonishing, but the methods remain an enigma.

Nobody truly knows what goes on behind the curtain. This is why we keep discovering “new” strange tricks to get better results. A recently published study titled The Prompt Report: A Systematic Survey of Prompting Techniques even showed that erasing names to create anonymity from an email in a prompt decreased the model’s accuracy.

Even the “best” prompters still get crappy results often.

Still, there’s no denying LLMs like ChatGPT or Claude are things of wonder. They are incredible tools worth using for anything.

Language learning including.

The main reason people get bad results

You will never get anything good out of an LLM if you don’t know exactly what you’re trying to get out of it. It’s that simple.

Rare are the people who haven’t heard of these tools.

Millions of people have tried them, hoping to find something funny or useful.
Most of them had a laugh or dropped an “oh, wow” and never looked back.
Many played with it and thought it could be useful but forgot to go back to it.
Some dove deeper and found a way to use it for one specific purpose.
A few dug even further and realized it could revolutionize everything they do.

Nowadays, my first thought when I’m stuck is “Could ChatGPT help with this?” Often, the answer is a resounding Yes.

If you’re opening ChatGPT and ask a basic “Help me learn German”, you’re bound to get a very answer.

If you instead ask something like “I’m learning German and struggling with the dative. Help me understand how it works through a short story in German with tons of words used in both dative and nominative forms.”, you’re bound to get a more precise answer.

An LLM response is as good as your prompt is.

If your prompt is vague, so will the answer.

As I mentioned a while back, the goal is to be precise. You can think of ChatGPT as a very smart person who can give you any answer but doesn’t get all the underlying aspects you take for granted.

How to Get ChatGPT to Follow Directives

Mathias Barra

September 19, 2023

Read full story

Knowing why you choose to use an LLM, however, is not enough to figure out how to use it well.

A systematic approach

The most common method people use is to try widely different prompts and see which works best but that’s where the first error lays.

See my example above? Well, that’s a bad one. I’m comparing a very short and vague prompt with one a lot more precise. A better version could have been to compare “I’m struggling with the dative case in German. Help me understand it.” At least, this way, ChatGPT would have known what I was aiming for.

And yet, it’s still a wrong comparison.

Nobody knows what happens behind the curtain but we’ve done enough research to know that, for some weird reason, any small modification impacts drastically the result.

Change the order of sentences and you could get a very different result. Add a “Please” or a “Thank you,” and the same could happen again. Some experiments even found that asking the LLM to “Take a deep breath first” could help. How crazy is that!

If you aim to find the “best” prompt for your specific purpose, your experiments must be done one step at a time.

Just like elaboration is a wonderful technique for expanding active language skills, it also helps with LLMs.

Elaborate slowly.

You can start with one specific prompt and slowly tailor it to see if the results get better.

If we were to go again from my second prompt, we could make it evolve as follows in bold:

I’m learning German and struggling with the dative. Help me understand how it works through a short story in German with tons of words used in both dative and nominative forms.

↓

You’re an expert in teaching German and I’m learning German and struggling with the dative. Help me understand how it works through a short story in German with tons of words used in both dative and nominative forms.

↓

You’re an expert in teaching German and I’m learning German and struggling with the dative. Help me understand how it works through a short story in German with tons of words used in both dative and nominative forms. Keep the sentences short.

↓

You’re an expert in teaching German and I’m learning German and struggling with the dative . Help me understand how it works through a short story in German with tons of words used in both dative and nominative forms. Keep the sentences short.
Here’s an example of a “good-sized” sentence: Ich habe eine Mail von den Kindern bekommen.

↓

You’re an expert in teaching German and I’m learning German and struggling with the dative . Help me understand how it works through a short story in German with tons of words used in both dative and nominative forms. Keep the sentences short.
Here’s an example of a “good-sized” sentence: Ich habe eine Mail von den Kindern bekommen.
Follow the story with a short exercise in which I should choose the case for one noun in each sentence.

↓

You’re an expert in teaching German and I’m learning German and struggling with the dative . Help me understand how it works through a short story in German with tons of words used in both dative and nominative forms. Keep the sentences short.
Here’s an example of a “good-sized” sentence: Ich habe eine Mail von den Kindern bekommen.
Follow the story with a short exercise in which I should choose the case for one noun in each sentence. Don’t provide the answers until I ask for them.

I could keep going but you see what I mean.

In fact, the one thing I should have done that I chose not to do to avoid making this too long, is to add more examples of good sentences.

Hell, an even better solution could be to test things until I found an answer I liked and then copy-pasted that answer in my prompt while saying “Follow the below example of a good answer.”

In this case, the answer I got for the last prompt was quite satisfying so I kept it as a good example to reuse someday.

The only thing I think I could improve1 would be to request the exercise to use different sentences than what’s in the text.

The more systematic you are, the more chances you have to find what works for you. Is it fun? Can be. It’s not always though. It can get frustrating.

Still, there’s no rush. You don’t have to find the perfect prompt today.

It’s a conversation

One last thing to remember is that AI is not a one-off kind of system. As of right now, LLMs struggle with very long prompts, suffering from what’s called the “lost in the middle” phenomenon.

This problem takes dramatic proportions for prompts larger than most of us will ever need, but it’s still worth knowing and being careful of.

That’s why, while it’s important to craft a good first prompt to start the discussion on the right foot, you should remember you can tweak the conversation with ChatGPT and so on as you go.

Most of the great insights I’ve received from ChatGPT since I started using it came not from the first answer but from further in the conversation. It was always after I told it where it had misunderstood me, or where it was lacking precision.

TL;DR

So, yeah, do your tests with ChatGPT. Experiment. Keep what got you good results. Experiment some more. Tweak your prompts.

But if you want to know for sure that a certain prompt works well, do it step by step.

Take your time.

And keep the conversation going.

You never know. The next answer could be exactly what you’ve been dreaming of.

In the next The Language of AI2, I’ll share a few prompts I’ve used for my language learning journey to work on specific skills.

If you’re interested in that and you’re not subscribed, now’s the time 😉

As always, cheers for reading,
Mathias

Right now, at least. I often think of other modifications I could have done, but often much later on.

In about a month

The Average Polyglot

How to Get ChatGPT to Follow Directives

Discussion about this post