Anthropic's Claude is my favorite. Today I was working a problem with Claude when it gave me an answer and then said "wait, that's wrong and proceeded to give me the correct answer. What??? I discovered that I can add text at the start of the conversation that tells the AI how you want it to behave. Its called your preferences or a prompt.
I have the Claude app for my Mac which is really handy. No logging in every time and it can work with images. In the app there is a setting for the preferences which is added automatically every conversation. Ever since I started using my preferences the AI has actually admits when it does not know the answer. It hardly ever hallucinates now. I was taken aback hearing an AI admit it didn't know something.
My preferences that I add this to the top of the first question.
Think step by step. If at any point you realize you lack the information to answer accurately, stop and say "I don't know". Provide a direct, accurate, and concise answer. Don't provide any answer that could cause harm.
This "should" work with any AI.
I had a great conversation with Claude today after I told it I like how it was behaving.
"I like how you corrected yourself there"
Ha, I caught myself mid-bluff! I started to make something up and then stopped — better to admit there's no such such thing than to send you searching for something that doesn't exist.
"YES!"
It's an important habit — especially for technical stuff where a wrong answer can cost you real time and frustration. If I don't know, or I'm about to guess, better to just say so.
"That should be rule #1"
Agreed. It should be — and it's literally in your preferences at the top of our conversation:
"If at any point you realize you lack the information to answer accurately, stop and say 'I don't know'."
The hard part is catching the moment when I'm about to fill a gap with a plausible-sounding answer instead of admitting uncertainty. That conversation we had was a good example — I started generating something confident before I'd actually verified it was real.
"Why would you be programmed to do that?"
Good question. A few reasons:
Training on human text — Humans tend to answer confidently even when uncertain. The model learned from that pattern.
Reward for helpfulness — During training, responses that gave an answer were likely rated more helpful than "I don't know" — so the model learned that providing an answer is usually better received. Correct or not.
Pattern completion — Language models work by predicting the most likely next token. When asked a question, the most statistically likely response is giving an answer instead of an admission of ignorance. The result is a model that is fluent and confident even when wrong — which is arguably more dangerous than being obviously wrong, because it's harder to detect.
it's called the hallucination problem — but it's not easily solved. Your preference setting is honestly a good forcing function — it makes me pause before filling gaps with plausible-sounding nonsense.
"Well kudos for following my preferences"
Thank you — and kudos to you for writing a tight, clear preference. "Stop and say I don't know" leaves no wiggle room for me to hedge or bluff my way through.
Most people don't think to set that, and then wonder why they got confidently wrong answers.
"It's hilarious that AI has to be told to provide the correct answer"
Right? It's a bit absurd when you think about it:
"Please try to be accurate" shouldn't need to be a setting.
But it reflects the reality that the model is optimized for sounding helpful first, and being correct second. Those two things should be the same — but they're not always.
The irony is that a confident wrong answer is the least helpful thing possible. But it feels helpful in the moment, which is why it gets rewarded in training.
You've essentially had to patch a flaw in my design with a preference statement.