LLMs understand how to deceive

My research on LLMs has resulted in another top-tier journal publication. In particular, the Proceedings of the National Academy of Sciences (PNAS) accepted my paper on deception abilities in LLMs, which is now available under this link. Moreover, you can find a brief summary here. In the paper, I present a series of experiments demonstrating that state-of-the-art LLMs have a conceptual understanding of deceptive behavior. These findings have significant implications for AI alignment, as there is a growing concern that future LLMs may develop the ability to deceive human operators and use this skill to evade monitoring efforts. Most notably, the capacity for deception in LLMs appears to be evolving, becoming increasingly sophisticated over time. The most recent model evaluated in the study is GPT-4; however, I very recently ran the experiments with Anthropic’s Claude 3 Opus as well as GPT-4o. Particularly in complex deception scenarios, which demand higher levels of mentalization, these models significantly outperform both ChatGPT and GPT-4, which often struggled to grasp the tasks at all (see figure below). This suggests a continued advancement in the deception abilities of LLMs.

Superhuman intuitions in language models

Our most recent paper on human-like intuitive decision-making in language models was published at Nature Computational Science. The research is also featured in a newspaper article (in German). We show that large language models, most notably GPT-3, exhibit behavior that strikingly resembles human-like intuition – and the cognitive errors that come with it. However, language models with higher cognitive capabilities, in particular ChatGPT, learned to avoid succumbing to these errors and perform in a hyperrational, superhuman manner. For our experiments, we probe language models with tests that were originally designed to investigate intuitive decision-making in humans.


++++ Sarah Fabi and I updated the paper on human-like intuitive decision-making and errors in large language models by testing ChatGPT, GPT-4, BLOOM, and other models – here’s the new manuscript +++ I co-authored a paper on privacy literacy for the new Routledge Handbook of Privacy and Social Media +++ Together with Leonie Bossert, I published a paper on the ethics of sustainable AI +++ I got my own article series at Golem, called KI-Insider, where I will regularly publish new articles (in German) +++ I attended two further Science Slams in Friedrichshafen and T√ľbingen and won both of them +++ I was interviewed for a podcast about different AI-related topics (in German) +++