people
members of the lab or group
Max Planck Institute for Security and Privacy
Universitaetsstr. 140
44799 Bochum, Germany
layout: about title: about permalink: / subtitle: > Postdoctoral Researcher at Max Planck Institute for Security and Privacy
profile: align: right image: prof_pic2.png image_circular: false more_info: > <p>Max Planck Institute for Security and Privacy</p> <p>Universitaetsstr. 140</p> <p>44799 Bochum, Germany</p>
selected_papers: true social: true
announcements: enabled: true scrollable: true limit: 20
latest_posts: enabled: false —
I work on making frontier models interpretable, editable, and aligned.
My core thesis: model capability is getting cheaper fast, but our ability to locate where knowledge lives inside a model, and to safely edit, erase, and align it, lags far behind. I attack this gap by importing methods from neuroscience, where I spent years tracing the physical substrate of memory (the engram), into LLM internals.
Research directions
- AI Interpretability & Editing: probing memory traces (engrams) in LLMs (ICML 2026 Oral), model editing and knowledge unlearning (ICLR 2026 ×2).
- AI Alignment: human-LLM decision alignment under moral uncertainty (AAAI 2026, ACL 2026 Findings).
- NeuroAI (methodological foundation): translating brain mechanisms into model architectures (NeurIPS 2023, ICLR 2025) and applying ML to neural/behavioral data (IJCV 2024, Exp. Mol. Med. 2026).
My path to AI began in neuroscience wet labs, recording synaptic responses, inducing LTP and LTD to study memory formation, and using optogenetics to causally link neural circuits to behavior. I then moved upstream, developing AI systems to analyze the complex behaviors these manipulations produced. This arc, from probing biological memory, to controlling it, to quantifying its behavioral consequences, now shapes how I approach neural networks.
My underlying conviction is that interpretability, steerability, and alignment form a causal chain. If we can precisely locate memory traces within parameters, we gain the ability to steer model behavior at its source. And if we can steer it, we can align it. The path to trustworthy AI runs through understanding the learning and memory of AI.