Language models have deception abilities

Aligning large language models (LLMs) with human values is of great importance. However, given the steady increase in reasoning abilities, future LLMs are under suspicion of becoming able to deceive human operators and utilizing this ability to bypass monitoring efforts. As a prerequisite to this, LLMs need to possess a conceptual understanding of deception strategies. My latest research project reveals that such strategies emerged in state-of-the-art LLMs, such as GPT-4. This is one of the most fascinating findings I made since researching LLMs and I’m excited to share a preprint describing the results here. I’ll continue working on this project.


++++ Sarah Fabi and I updated the paper on human-like intuitive decision-making and errors in large language models by testing ChatGPT, GPT-4, BLOOM, and other models – here’s the new manuscript +++ I co-authored a paper on privacy literacy for the new Routledge Handbook of Privacy and Social Media +++ Together with Leonie Bossert, I published a paper on the ethics of sustainable AI +++ I got my own article series at Golem, called KI-Insider, where I will regularly publish new articles (in German) +++ I attended two further Science Slams in Friedrichshafen and Tübingen and won both of them +++ I was interviewed for a podcast about different AI-related topics (in German) +++

Using psychology to investigate behavior in large language models

Large language models (LLMs) are currently at the forefront of intertwining AI systems with human communication and everyday life. Therefore, it is of great importance to thoroughly assess and scrutinize their capabilities. Due to increasingly complex and novel behavioral patterns in current LLMs, this can be done by treating them as participants in psychology experiments that were originally designed to test humans. For this purpose, I wrote a new paper introducing the field of “machine psychology”. It aims to discover emergent abilities in LLMs that cannot be detected by most traditional natural language processing benchmarks. A preprint of the paper can be read here.

I’m hiring!

Looking for an exciting opportunity to explore the ethical implications of AI, specifically generative AI and large language models? I am seeking applications for a Ph.D. position (f/m/d) in my independent research group at the University of Stuttgart. For more details on how to apply, visit this link.

Why we need biased AI

In a new paper I co-authered together with my wonderful colleague Sarah Fabi, we stress the importance of biases in the field of artificial intelligence (AI). To foster efficient algorithmic decision-making in complex, unstable, and uncertain real-world environments, we argue for the implementation of human cognitive biases in learning algorithms. We use insights from cognitive science and apply them to the AI field, combining theoretical considerations with tangible examples depicting promising bias implementation scenarios. Ultimately, this paper is the first tentative step to explicitly putting the idea forth to implement cognitive biases into machines.

PS: We also wrote a short paper on AI alignment. Check it out here.

Machine intuition in GPT

Together with two colleagues, Sarah Fabi and Michal Kosinski, I wrote a paper about a phenomenon we call “machine intuition”. We used a state-of-the-art large language model, namely GPT-3.5, and probed it with the Cognitive Reflection Test as well as semantic illusions that were originally designed to investigate intuitive decision-making in humans. Our results show that GPT-3.5 systematically exhibits “machine intuition”, meaning that it produces incorrect responses that are surprisingly equal to how humans respond to the Cognitive Reflection Test as well as to semantic illusions. The paper is available as an arXiv preprint.

New paper with Peter Singer on speciesist bias in AI

Somehow, this paper must be something special. It got desk-rejected without review not by one, not by two, but by three different journals! This never happened to me before and I can only speculate about the underlying reasons. However, I am grateful to the editors of AI and Ethics who had the guts to let our research be peer-reviewed and published. But what is it all about? Massive efforts are made to reduce machine biases in order to render AI applications fair. However, the AI fairness field succumbs to a blind spot, namely its insensitivity to discrimination against animals. In order to address this, I wrote a paper together with Peter Singer and colleagues about “speciesist bias” in AI. We investigated several different datasets and AI systems, in particular computer vision models trained on ImageNet, word embeddings, and large language models like GPT-3, revealing significant speciesist biases in them. Our conclusion: AI technologies currently play a significant role in perpetuating and normalizing violence against animals, especially farmed animals. This can only be changed when AI fairness frameworks widen their scope and include mitigation measures for speciesist biases.

PS: I had the opportunity to publish an op-ed article in the German tech magazine Golem as well as a research summary at The AI Ethics Brief regarding the paper.

New papers

Paper #1 – AI ethics and its side-effects (Link)

I wrote a critical article about my own discipline, AI ethics, in which I argue that the assumption that AI ethics automatically decrease the likelihood of unethical outcomes in the AI field is flawed. The article lists risks that either originate from AI ethicists themselves or from the consequences their embedding in AI organizations has. The compilation of risks comprises psychological considerations concerning the cognitive biases of AI ethicists themselves as well as biased reactions to their work, subject-specific and knowledge constraints AI ethicists often succumb to, negative side effects of ethics audits for AI applications, and many more.

Paper #2 – A virtue-based framework for AI ethics (Link)

Many ethics initiatives have stipulated standards for good technology development in the AI sector. I contribute to that endeavor by proposing a new approach that is based on virtue ethics. It defines four “basic AI virtues”, namely justice, honesty, responsibility, and care, all of which represent specific motivational settings that constitute the very precondition for ethical decision-making in the AI field. Moreover, it defines two “second-order AI virtues”, prudence and fortitude, that bolster achieving the basic virtues by helping with overcoming bounded ethicality or hidden psychological forces that can impair ethical decision making and that are hitherto disregarded in AI ethics. Lastly, the paper describes measures for successfully cultivating the mentioned virtues in organizations dealing with AI research and development.

Paper #3 – Ethical and methodological challenges in building morally informed AI systems (Link)

Recent progress in large language models has led to applications that can (at least) simulate possession of full moral agency due to their capacity to report context-sensitive moral assessments in open-domain conversations. However, automating moral decision-making faces several methodological as well as ethical challenges. In the paper, we comment on all these challenges and provide critical considerations for future research on full artificial moral agency.

Why some biases can be important for AI

Fairness biases in AI systems are a severe problem (as shown in my paper on “speciesist bias”). However, biases are not bad in and of itself. In our new paper, Sarah Fabi and I stress the actual importance of biases in the field of AI in two regards. First, in order to foster efficient algorithmic decision-making in complex, unstable, and uncertain real-world environments, we argue for the structurewise implementation of human cognitive biases in learning algorithms. Secondly, we argue that in order to achieve ethical machine behavior, filter mechanisms have to be applied for selecting biased training stimuli that represent social or behavioral traits that are ethically desirable.

Blind spots in AI ethics

I wrote a critical piece about my own field of research. It discusses the conservative nature of AI ethics’ main principles as well as the disregarding of negative externalities of AI technologies. The paper was recently published in AI and Ethics and can be accessed here.


Recently, I had the opportunity to talk about AI ethics as a guest on the Cyber Valley Podcast. If you are interested, you can listen to it here. Other recent media appearances can also be found here.

KI Triage

Zusammen mit Dirk Helbing, Thomas Beschorner, Bruno Frey, Andreas Diekmann, Peter Seele, Sarah Spiekermann, Jeroen van den Hoven und Andrej Zwitter habe ich einen Artikel mitverfasst, der sich aus einer kritischen Perspektive KI-gestützten Bewertungssystemen für die Behandlungsdringlichkeit etwa von Corona-Patienten – kurz KI-Triage – widmet. Der Artikel kann hier nachgelesen werden. Ein entsprechender Forschungsaufsatz findet sich ferner hier.

Aktuelle Presse

Zuletzt wurde in diversen Zeitungen eine (stark verkürzte) dpa-Meldung veröffentlicht, in der ich vor KI-gestützter Gesichtsanalyse warne – etwa hier, hier oder hier. Da dies wie Science Fiction klingt und ich mich dabei unter anderem auf eine Forschungsarbeit beziehe, deren Veröffentlichung erst wenige Tage zurück liegt, verlinke ich anbei entsprechende Hintergrundinformationen. Der Aufsatz zur Gesichtserkennung im Hinblick auf die politische Neigung kann hier nachgelesen werden, der Aufsatz zur probabilistischen Detektion der sexuellen Orientierung hier oder, als Replikationsstudie, hier. Weitere Paper gibt es etwa hier oder, als abschreckendes Beispiel für methodisch unzureichende und damit irreleitende Forschung, hier. Aus den Aufsätzen sollte allerdings nicht der Eindruck entstehen, es könnten beliebige Merkmale aus Gesichtern “herausgelesen” werden. Tatsächlich ist es etwa umstritten, wie zuverlässig vergleichsweise einfache Anwendungen wie etwa die Emotionserkennung überhaupt funktionieren, wie in diesem einschlägigen Aufsatz nachgelesen werden kann. Es ist also durchaus kompliziert…


Zuletzt sind einige Radiobeiträge entstanden, an denen ich mitwirken durfte und die ich an dieser Stelle teile. Im SWR2 erschien eine Serie zum Thema “Maschinenmoral”:

Zudem ein kurzes Interview mit dem SWR1:

Und der WDR5 berichtet über “Leitplanken für KI”:

No Access statt Open Access

Ein neuer Aufsatz von mir ist erschienen. Er kann unter diesem Link eingesehen werden. Der Aufsatz ist eine Antwort auf aktuelle Forderungen zu veränderten Publikationsnormen in der Forschung zu Anwendungen des maschinellen Lernens, welche ein erhöhtes Dual-Use-Potential besitzen. Im Aufsatz argumentiere ich, dass Publikationsrestriktionen, wie sie bereits in der IT-Sicherheits- oder der Biotechnologieforschung verankert sind, sich ebenfalls im Bereich des maschinellen Lernens etablieren und anstelle einer generellen Mentalität des Open Access treten müssen. Zweck dieser Restriktionen wäre es, Missbrauchsszenarien beispielsweise bei der KI-gestützten Audio-, Video- oder Texterzeugung, bei Persönlichkeitsanalysen, Verhaltensbeeinflussungen, der automatisierten Detektion von Sicherheitslücken oder anderen Dual-Use-Anwendungen einzudämmen. Im Aufsatz nenne ich Beispiele für bereichsspezifische Forschungsarbeiten, die aufgrund ihres Gefahrenpotentials nicht oder nur teilweise veröffentlicht wurden. Zudem diskutiere ich Strategien der Governance jenes “verbotenen Wissens” aus der Forschung.


Neulich wurde ich zur Frage interviewt, welches Wissen man eigentlich sinnvollerweise aus einem KI-System gewinnen kann und welches nicht. In der nun ausgestrahlten Sendung im Deutschlandfunk sind leider nur noch kleine Fragmente des Interviews enthalten. Insbesondere geht es darin um Systeme des maschinellen Lernens, die dazu eingesetzt werden, um vermeintlich kriminelle Neigungen anhand von Gesichtszügen zu erkennen. Die Seite zu Sendung kann hier eingesehen oder der Beitrag nachfolgend angehört werden.

Industriepartner in der KI-Forschung

In Tübingen insbesondere wird aufgrund des Cyber Valleys seit vielen Monaten lebhaft über die Rolle von Industriepartnern in der KI-Forschung diskutiert. Doch eine systematische, allgemeine Untersuchung des Feldes fehlt – bis jetzt. Zusammen mit Kristof Meding habe ich ein Paper geschrieben, das das Verhältnis zwischen Industrie- und öffentlicher KI-Forschung untersucht, anhand einer Auswertung von knapp 11.000 Papern Interessenkonflikte beleuchtet und weitere Bereiche wie etwa die Frage nach den Treibern wissenschaftlichen Fortschritts analysiert. Der Aufsatz, der einige wirklich interessante Erkenntnisse enthält, kann hier eingesehen werden. Eine Kurzpräsentation der Ergebnisse gibt es ferner hier.