Bigger AI chatbots more inclined to spew nonsense — and people don't always realize

A study of newer, bigger versions of three major artificial intelligence (AI) chatbots shows that they are more inclined to generate wrong answers than to admit ignorance. The assessment also found that people aren’t great at spotting the bad answers.

Plenty of attention has been given to the fact that the large language models (LLMs) used to power chatbots sometimes get things wrong or ‘hallucinate’ strange responses to queries. José Hernández-Orallo at the Valencian Research Institute for Artificial Intelligence in Spain and his colleagues analysed such errors to see how they are changing as the models are getting bigger — making use of more training data, involving more parameters or decision-making nodes and gobbling up more computing power. They also tracked whether the likelihood of errors matches up to human perceptions of question difficulty, and how well people can identify the wrong answers. The study¹ was published in Nature on 25 September.

The team found that bigger, more-refined versions of LLMs are, as expected, more accurate, thanks in large part to having been shaped with fine-tuning methods such as reinforcement learning from human feedback. That is good news. But they are less reliable: among all the non-accurate responses, the fraction of wrong answers has increased, the team reports, because the models are less likely to avoid answering a question — for example, by saying they don’t know, or by changing the subject.

“They are answering almost everything these days. And that means more correct, but also more incorrect” answers, says Hernández-Orallo. In other words, the chatbots’ tendency to offer opinions beyond their own knowledge has increased. “That looks to me like what we would call bullshitting,” says Mike Hicks, a philosopher of science and technology at the University of Glasgow, UK, who proposes the term ‘ultracrepidarianism’ to describe the phenomenon². “It’s getting better at pretending to be knowledgeable.”

The result is that everyday users are likely to overestimate the abilities of chatbots and that’s dangerous, says Hernández-Orallo.

To read more, click here.