An ideas generator powered by artificial intelligence (AI) came up with more original research ideas than did 50 scientists working independently, according to a preprint posted on arXiv this month1.

The human and AI-generated ideas were evaluated by reviewers, who were not told who or what had created each idea. The reviewers scored AI-generated concepts as more exciting than those written by humans, although the AI’s suggestions scored slightly lower on feasibility.

But scientists note the study, which has not been peer-reviewed, has limitations. It focused on one area of research and required human participants to come up with ideas on the fly, which probably hindered their ability to produce their best concepts.

There are burgeoning efforts to explore how LLMs can be used to automate research tasks, including writing papers, generating code and searching literature. But it’s been difficult to assess whether these AI tools can generate fresh research angles at a level similar to that of humans. That’s because evaluating ideas is highly subjective and requires gathering researchers who have the expertise to assess them carefully, says study co-author, Chenglei Si. “The best way for us to contextualise such capabilities is to have a head-to-head comparison,” says Si, a computer scientist at Stanford University in California.

The year-long project is one of the biggest efforts to assess whether large language models (LLMs) — the technology underlying tools such as ChatGPT — can produce innovative research ideas, says Tom Hope, a computer scientist at the Allen Institute for AI in Jerusalem. “More work like this needs to be done,” he says.

To read more, click here.