ChatGPT, Gemini, Copilot and other AI tools whip up impressive sentences and paragraphs from as little as a simple line of text prompt. To generate those words, the underlying large language models were trained on reams of text written by humans and scraped from the internet. But now, as generative AI tools flood the internet with a large amount of synthetic content, that content is being used to train future generations of those AIs. If this continues unchecked, it could be disastrous, researchers say.
Training large language models on their own data could lead to model collapse, University of Oxford computer scientist Ilia Shumailov and colleagues argued recently in Nature.
To read more, click here.