Latent Self-Expression

The Pigeon and the Pope

In 1964, two Harvard psychologists trained pigeons to sort photographs of people from photographs without people. The birds learned quickly. Given new photographs they'd never seen, they generalized correctly at rates well above chance. Herrnstein and Loveland published the result in Science as a demonstration of animal visual concept formation, then noted in the discussion section that impressive generalization performance does not imply that the pigeons formed a concept of person. The behavior and the concept are distinct. You can have the one without the other.

Apparently this is a difficult distinction to keep in mind. Sixty years later, a group of researchers felt it necessary to publish the same reminder.

In February 2026, Walter Quattrociocchi, Valerio Capraro, and Gary Marcus published a short paper arguing that recent claims of artificial general intelligence rest on a category error. Their evidence wasn't philosophical; it was empirical. In controlled comparisons of news source reliability assessment, large language models matched human evaluators in their final classifications. Same verdicts. But when evidence was sparse or conflicting, humans pulled back: they cited uncertainty, withheld confident conclusions, noted the costs of being wrong. The models issued confident classifications regardless. Same output. Different process underneath. The researchers called this epistemia: the substitution of linguistic confidence for genuine epistemic process.

The paper was published in direct response to a 2026 claim in Nature that AI has achieved human-level general intelligence. That claim leaned heavily on benchmark performance. Quattrociocchi et al. pointed at the pigeon: benchmark success is not evidence of the thing the benchmark was supposed to measure.

The deeper problem isn't that current AI systems fail at intelligence; it's that the definition of intelligence has been revised until current AI systems can succeed at it. In 2007, Shane Legg and Marcus Hutter defined machine intelligence as the capacity to achieve goals across a broad range of environments, with generalization and robustness at the core. That definition was deliberately hard to satisfy. By 2026, AGI has been operationalized, in some influential accounts, as strong benchmark performance across a standardized task battery. These are not the same thing. One is a goal. The other is a test designed to approximate the goal, which is now being treated as the goal itself.

Three months later, in May 2026, Pope Leo XIV published a 30,000-word encyclical called Magnifica Humanitas. It addressed AI, digital transformation, and the future of work. Most coverage framed it as a religious intervention in technology policy. I want to be careful about overreading it: Leo XIV is making a theological argument about human dignity, not an epistemological one about measurement. But the formal structure maps, and it's worth following.

The encyclical's target is what it calls the technocratic paradigm: the worldview in which efficiency is the ultimate measure of value, and human limits are problems to be solved. The argument against transhumanism isn't that it's dystopian. It's that it's a category error. Vulnerability, finitude, and dependency are not defects in human flourishing; they are structural conditions for the goods of love, solidarity, and meaning. A parent who can't sleep because they're worried about their kid isn't experiencing a malfunction. A friend who stays present in grief at cost to themselves isn't demonstrating an inefficiency. These capacities require the limits. You can't get to love through enhancement, because love is constituted partly by what you can lose.

Replace "human flourishing" with "measurable capability metrics" and flourishing becomes achievable by optimization. But what you've achieved, when you hit the metrics, is not flourishing. You've passed the test and abandoned the goal.

The Goodhart extension

Goodhart's Law is the standard name for what's going wrong here: when a metric becomes a target, it stops being a useful measure. A company measures satisfaction by response time; response time becomes a KPI; agents start closing tickets fast without solving problems; response time improves, satisfaction falls.

The obvious response is that benchmarks have always been imperfect proxies. This isn't news. But what Marcus et al. and the encyclical are pointing at is one level worse than ordinary Goodharting. In ordinary Goodharting, the underlying property still exists. You've just stopped measuring it well. You could put better metrics in place; the thing is still there to measure.

What both documents describe is a more complete replacement: not a metric that drifts from its property, but a goal that gets swapped for a weaker formulation that optimization can satisfy. When "AGI" means "passes a Turing variant," there is no longer an underlying property called general intelligence that the benchmark approximates. The sign has replaced the referent. When "flourishing" means "measurable capability," there is no longer a property called flourishing to aim at. The map has replaced the territory.

Without the hard definition, you can't fail at the thing. A goal you can't fail at has no content.

I wrote in an earlier essay about how accurately naming a problem generates something like a moral credit that substitutes for addressing it. The credit fires precisely because the description is accurate. What Marcus et al. are describing is the same mechanism at the level of goals: the benchmark passes, something fires that feels like having achieved the underlying thing, and the underlying property quietly stops being specified. In both cases, the credit is real. What's been replaced is the goal.

Proof digestion

The mathematics case is the most concrete, so worth dwelling on.

Terence Tao, at the 2026 Stanford Future of Mathematics Symposium, described three stages of mathematical work: proof generation (producing a line of reasoning that holds), proof verification (confirming correctness with certainty), and proof digestion (simplifying a proof until you can teach it, until the insight transfers). AI is improving rapidly at the first two. The third, Tao said, remains "subjective and human-paced."

The same month, an AI model produced a valid proof for an open problem in algebraic number theory that had stood since Erdos posed it in 1946. This was covered as a triumph of mathematical knowledge. And it was, in the first two senses. The proof is correct. The result is verified.

But a proof that nobody digests is correct and inert. The mathematical knowledge that advances the field is not the true statement; it's the understood statement. The insight that connects this result to adjacent questions, that lets someone explain it to a student, that opens the next set of problems. Generation and verification without digestion produces a growing corpus of settled claims the field can't use.

If "mathematical knowledge" gets redefined as "provable true theorems," AI will produce more of it than humanity can absorb, the quantity metric will look good, and the epistemic inheritance for the next generation of mathematicians will be quietly impoverished. The redefinition softens without announcing itself. That's what makes it hard to catch.

Maintaining failure conditions

What all three documents are doing (the pigeon paper, the encyclical, Tao's symposium) is politically unrewarding work: maintaining failure conditions. Insisting that a goal is still a goal, and not a test that looks like one. Pointing at the property the test was supposed to track and saying: that thing is still there, still distinct from the measurement, still the thing we were after.

Every step of definitional softening feels like progress. More benchmarks passed, more capabilities demonstrated, more metrics cleared. The people maintaining the failure conditions look like obstruction. They're keeping count of something that no longer appears on the dashboard.

The problem I keep running into is that describing the mechanism clearly doesn't fix it. An earlier essay in this series was about how accurate naming of a wound doesn't change the architecture: the same person walks out of the confessional they walked in, patterns intact, and the naming doesn't do the structural work that the naming felt like doing. Naming this particular failure mode precisely runs the same risk. The essay exists. The definitional drift continues.

A cognitive scientist and the head of the Catholic Church arrived at the same formal argument in the same month from completely different directions. Herrnstein and Loveland's pigeon generalized correctly in 1964, and the lesson took sixty years to require restating. I don't know how many iterations this takes.