Research

Sigurd Schacht 20/02/2025 Sigurd Schacht 20/02/2025

Mapping Moral Reasoning Circuits: A Mechanistic Analysis of Ethical Decision-Making in Large Language Models

In our ongoing research, we investigate how large language models process moral decisions at a neural level. Through careful analysis of activation patterns, we've identified specific "moral neurons" that consistently respond to ethical content and validated their causal role using ablation studies. Our preliminary findings suggest that models may encode moral principles in distinct neural pathways, opening new possibilities for targeted AI alignment approaches. This work, which we'll present at HCI 2025 in Gothenburg.

Research

Mapping Moral Reasoning Circuits: A Mechanistic Analysis of Ethical Decision-Making in Large Language Models

COAI Research

Location

Contact

Info