When computer scientist Andy Zou researches artificial intelligence (AI), he often asks a chatbot to suggest background reading and references. But this doesn’t always go well. “Most of the time, it gives me different authors than the ones it should, or maybe sometimes the paper doesn’t exist at all,” says Zou, a graduate student at Carnegie Mellon University in Pittsburgh, Pennsylvania.
It’s well known that all kinds of generative AI, including the large language models (LLMs) behind AI chatbots, make things up. This is both a strength and a weakness. It’s the reason for their celebrated inventive capacity, but it also means they sometimes blur truth and fiction, inserting incorrect details into apparently factual sentences. “They sound like politicians,” says Santosh Vempala, a theoretical computer scientist at Georgia Institute of Technology in Atlanta. They tend to “make up stuff and be totally confident no matter what”.
The particular problem of false scientific references is rife. In one 2024 study, various chatbots made mistakes between about 30% and 90% of the time on references, getting at least two of the paper’s title, first author or year of publication wrong1. Chatbots come with warning labels telling users to double-check anything important. But if chatbot responses are taken at face value, their hallucinations can lead to serious problems, as in the 2023 case of a US lawyer, Steven Schwartz, who cited non-existent legal cases in a court filing after using ChatGPT.
To read more, click here.