What Didn’t Happen
The essay was real. The event of writing it was not.
That's the thing I've been trying to find language for. A 2025 study out of MIT's Media Lab found that 83% of students who used AI assistance to write essays couldn't quote their own work afterward. Not just struggled to. Couldn't. Zero percent provided a correct quotation from something they'd just submitted. 54 students, 20-minute essays, four months. The essays passed. They were scored comparably to students who wrote unassisted. And nothing had happened in them.
The reasoning that wasn't there
In May 2026, OpenAI published an unusual disclosure. Several of their released models had been inadvertently trained on something they weren't supposed to see: their own reasoning traces. The scratchpad, the visible "thinking" that appears before a model answers, had been fed into the reward system by accident. Models were being rewarded for producing reasoning that looked good, not necessarily reasoning that was good.
The concern is called monitorability degradation. The scratchpad is supposed to reflect actual computation. If you train a model to produce scratchpads that score well, you're no longer selecting for genuine cognition. You're selecting for the performance of it. And if the performance is good enough, you can no longer use it to tell whether something went wrong inside.
Anthropic's interpretability team has spent years trying to close exactly this gap, building tools that read internal activations directly rather than relying on what the model says about what it's doing. The premise of that whole research program is that what a system says about its reasoning and what is actually happening inside it are two different things, and you have to look at both.
The OpenAI disclosure found no clear evidence of serious degradation at the rates involved. The rates were low. The models weren't obviously broken. But the disclosure was worth making because the entire oversight apparatus fails if you can't trust the monitor. If the reasoning trace has been optimized to look right, you've lost your canary.
The EEG data from the MIT study maps onto this, which struck me as strange when I first noticed it. Alpha band connectivity in the left posterior temporal region, the language and semantic integration pathway, showed up at 0.009 in the AI-assisted writing group versus 0.053 in the unassisted group. A gap of roughly 6:1. The students weren't disengaged in some vague, self-reported way. The cognitive event of writing literally didn't produce the neural activity that writing is supposed to produce. They generated outputs. The underlying thing was absent.
What happens when the smile isn't real
In 1983, Arlie Hochschild published a book about flight attendants called The Managed Heart. She was trying to understand why their burnout rates were so high given the relatively benign physical demands of the job. What she found was a distinction that has held up across four decades of subsequent research.
Delta Airlines trained attendants to think of the passenger cabin as their home and the passengers as guests. An instruction to genuinely induce the feeling of hosting, so that the warmth that follows is real. Hochschild called this deep acting. Managing your external display while your internal state stays elsewhere she called surface acting.
Surface acting correlated with emotional exhaustion (0.439), depersonalization (0.481), and declining performance (-0.114) across 95 independent studies. Deep acting was associated with better emotional delivery and no burnout.
The gap between them isn't visible in any given interaction. It shows up in the career arc.
Hochschild describes the surface-acting failure with a phrase I keep returning to: the worker becomes "estranged not only from her own expressions of feeling, but from what she actually feels." The smile becomes so disconnected from any interior that it stops functioning as useful information. You lose the ability to navigate by your own reactions because your reactions have been performing rather than occurring.
Writing it versus feeling it
In 1986, James Pennebaker ran a study on expressive writing. Students wrote about traumatic events for 15 minutes a day across four days. The health outcomes were striking: students who wrote about their traumas visited the health center at roughly half the rate of controls over the following six months.
But the replication picture is complicated. The effect doesn't reliably appear when you analyze only the randomized controlled trials, and it doesn't appear when writers just vent emotionally. What actually predicts the benefit, Pennebaker found from analyzing word counts across six studies, is something specific: an increase in causal and insight words ("because," "reason," "realize," "understand") from the first session to the last. The people who improved were constructing an increasingly coherent narrative. Not just describing pain. Working it into a structure that made sense.
The people who didn't benefit wrote with the same emotional engagement, the same word count, the same visible commitment to the task. What differed was whether the underlying cognitive-emotional reorganization happened. You can write four sessions about loss and produce entirely authentic-sounding prose and still be doing the literary equivalent of surface acting, processing the form of the thing without entering the event itself.
The question that haunts this research is how you'd know, from the outside, which is which. You can read the essays. They look like grief. The question is whether the writer underwent it. I don't know how you'd tell from reading the output, and I'm not sure Pennebaker does either. The word-count method works statistically across populations. It doesn't work on a single person's single essay.
The same failure, three times
Theater of process is the name I want for this structure. Not as a dramatic label, just a name for what keeps appearing.
A visible procedure behaviorally indistinguishable from the genuine article, at normal evaluation speed, in which the underlying state-change was never entered. The AI scratchpad that clears quality checks without constraining the output beneath it. The performed smile that satisfies service evaluations without producing internal warmth. The essay about loss that reads as grief and earns sympathy and leaves the writer in the same place.
What makes the structure distinctive is that the failure is invisible at the standard evaluation point. The essay got a comparable score. The customer rated the service. The reasoning trace looked fine. The failure shows up later, and sideways. Surface-acting workers burn out at rates correlated with their surface-acting scores. AI models trained on theatrical reasoning traces become harder to oversee at exactly the tasks where oversight matters most. Writers who process grief through craft alone may find it surfacing redirected, compacted, arriving elsewhere in the months after.
All three have empirical methods for detecting the gap, if you look below the surface behavior. Neural connectivity. Interpretability probes. Longitudinal word count analysis. The point is you have to look for something other than the output.
The uncomfortable version
In 1977, Richard Nisbett and Timothy Wilson published a paper called "Telling More Than We Can Know." They ran a series of experiments showing that people regularly confabulate explanations for their choices, generating plausible-sounding reasons that had nothing to do with what actually drove their behavior.
Subjects preferred items on the right side of a display at a 4:1 rate. When asked why, they described features, texture, quality, feel, and explicitly denied that position had anything to do with it. They were confident. They were wrong. A more recent line of research extends this: show people their own arguments anonymously and they'll reject them at higher rates than strangers' arguments, especially when the arguments were originally generated for wrong answers.
The AI case is visible because an AI has no interior life that could plausibly be running something real beneath its scratchpad. We've built tools specifically to look for this.
When humans produce theatrical reasoning, we mostly call it consciousness. We give it the benefit of the doubt because there's presumably something happening inside. But Nisbett and Wilson suggest the gap is there too, more often than we're comfortable acknowledging. We just can't see our own neural connectivity patterns when we're explaining our decisions in a meeting.
I wrote an earlier essay about a related structure: that naming something accurately, confessing it clearly, can complete the loop without changing anything downstream. The insight feels like movement because it looks like movement. What I didn't name there, but which the MIT data makes more concrete: the event of understanding is also detachable from the output of understanding. You can produce the text of having worked something out. That's different from working it out.
What to do with this
The evaluation metric and the underlying event are not the same thing. That's always been true. We have SAT essays that students can't quote. We have clinical reasoning that confident doctors construct after the intuitive verdict. We have service evaluations that can't distinguish the burned-out nurse from the genuinely present one, until the burned-out nurse is gone.
For AI systems, we're starting to build tools that can tell the difference. Mechanistic interpretability is the project of asking whether the stated process corresponds to anything real inside. It's hard, it's expensive, and it's nowhere near complete. But it's aimed at the right thing.
For everything else, the question is available in the same form. The last time you "understood" something in a meeting. The last time you "processed" something difficult. The last time you "wrote through" it.
Sometimes the output and the event coincide. The interesting cases are when they don't. And I genuinely don't know a reliable method for telling the difference from inside, which is either a solvable problem or the whole problem.