deception – Dr. Thilo Hagendorff

10. November 202331. January 2024

Recent media appearances

MIT Technology Review reported on OpenAI’s first empirical research on superalignment and included some comments of mine. Sentient Media as well as Green Queen reported on our research regarding speciesist biases in AI systems. I was interviewed about our work on the implementation of cognitive biases in AI systems for an Outlook article in Nature. A Medium contribution discussed my research on deception abilities in LLMs. Also, an article from Insights by Stanford Business covered our research on human-like intuitions in LLMs. This was also covered in a radio show at Deutschlandfunk, to which you can listen here:

10. October 202330. January 2024

Media coverage on my research on deception abilities in language models

It was a pleasure to be invited to the Data Skeptic podcast, where I discussed my latest research on deception abilities in large language models with Kyle Polich. You can listen to the episode using this link or right here:

The research was also featured in an article at FAZ as well as on a radio program (in German), to which you can listen here. Furthermore, I authored an article (also in German) for Golem. Unfortunately, this content is behind a paywall.

1. August 2023

Language models have deception abilities

Aligning large language models (LLMs) with human values is of great importance. However, given the steady increase in reasoning abilities, future LLMs are under suspicion of becoming able to deceive human operators and utilizing this ability to bypass monitoring efforts. As a prerequisite to this, LLMs need to possess a conceptual understanding of deception strategies. My latest research project reveals that such strategies emerged in state-of-the-art LLMs, such as GPT-4. This is one of the most fascinating findings I made since researching LLMs and I’m excited to share a preprint describing the results here. I’ll continue working on this project.