DarkMind: A new backdoor attack that leverages the reasoning capabilities of LLMs

Large language models (LLMs), such as the models supporting the functioning of ChatGPT, are now used by a growing number of people worldwide to source information or edit, analyze and generate texts. As these models become increasingly advanced and widespread, some computer scientists have been exploring their limitations and vulnerabilities in order to inform their future improvement.

Zhen Guo and Reza Tourani, two researchers at Saint Louis University, recently developed and demonstrated a new backdoor attack that could manipulate the text-generation of LLMs while remaining very difficult to detect. This attack, dubbed DarkMind, was outlined in a recent paper posted to the arXiv preprint server, which highlights the vulnerabilities of existing LLMs.

To read more, click here.