The Room and the Record
You give an agent everything a project has produced: every Slack message, every deck, every status update since kickoff. Then you put a dashboard and a chatbot on top, so anyone can ask where things actually stand and get the real answer instead of the curated one. You can build this today, and a lot of teams are.
The use case that sells it is greenwashing. Project updates are written for an audience, not for accuracy, and everyone in the building knows it. So you want a system that reads across all the official communication and catches the gap between what's being reported and what's real. Something that watches for progress being performed rather than made.
In bounded territory it works well. It pulls facts across years of history, tracks what people committed to, catches the moment a Q1 promise quietly became a Q3 maybe. The retrieval is genuinely good.
And the conclusions come back consistently, frustratingly off. They're never wrong in a way you can point to. The agent reads the content of every message and misses the thing the messages were traveling on. The name that's conspicuously absent from a thread where it should appear. The update that softened every hedge that was in the draft version. The person who went quiet in the meeting at the exact moment their silence became the loudest thing in the room. None of that survives in a Slack export, because none of it was ever written down.
The Economist recently ran a piece on AI and tacit knowledge that opened with Polanyi's line: "We can know more than we can tell." It used the line to frame a problem about how organizations hand expertise to AI agents. The example was a bricklayer who can't explain why he vibrates his hand when he sets a brick into mortar, until someone films him and works out that the motion pushes mortar into the brick's pores and strengthens the bond. He knew more than he could tell. The camera told it for him.
I've written before about why Polanyi's argument is harder than that framing makes it sound, about how looking directly at the parts of a skilled process can dissolve the process instead of revealing it. The ASML lithography machines nobody can replicate from the stolen blueprints are the clearest case I know. The blueprints are complete. They still don't contain the thing.
But the Economist piece runs two different problems together, and they come apart in a way that matters for anyone deciding what to automate.
The first problem is tacit knowledge: know-how built through iteration, laid down in a body over years of doing. The bricklayer's hand. The forty-year engineer who reads a machine the way you read a face. This kind of knowledge is hard to extract and it resists language, but it's in there. Given enough observation, in principle you can get it out. The camera got the bricklayer's.
The second problem is the one that breaks the greenwashing system, and it has a different shape. Some things are communicated only in the act of communicating, and they don't exist anywhere outside that act. Tone. The length of a pause before someone answers. What a person pointedly leaves out of a message where they say everything else. Whether "I'm fine with that" is agreement or a quiet decision to bury the thing in three weeks. These aren't facts that failed to get written down. They have no written form. They happen in the room and they're gone when the room empties.
Here's how much rides on that channel. In 2012 the Journal of Finance published a study by two Duke researchers, William Mayew and Mohan Venkatachalam, who ran vocal emotion software over the audio of 1,647 earnings calls across 691 companies. They weren't reading the transcripts. They were reading the voices, scoring two states in particular: excitement, and the strain of cognitive dissonance, which they described as what you hear when an executive puts a positive spin on numbers that don't support it. The vocal signal predicted the next six months of earnings and stock returns, on top of everything the words and the financials already said. The more negative the voice, the worse the future. The effect was strongest under pointed questioning, when the strain was hardest to hide.
The part I keep coming back to: the human analysts on those same calls mostly missed the negative signal. They moved on the excitement and held back on the strain, waiting for confirmation that the numbers would eventually provide. The signal was right there in the channel, predictive, and even the professionals whose job was to catch it were slow. That is the layer organizations actually run on. Who's really behind an initiative and who put their name on it to avoid a fight. Which senior person's "let's explore that" means yes and whose means a polite funeral. Where the coalition is that will quietly kill the thing everyone just nodded along to. It moves through tone and timing and omission, and the written record holds the words while losing the signal.
There's a distinction here that tends to get collapsed, and the whole question of what these systems are good for sits on top of it.
An agent that hands an expert the right information so the expert can navigate better is one thing. An agent trying to do the navigating itself is another. Scaffold a person with retrieval and synthesis and you get something powerful: the human reads the channel, the agent holds the context and the memory, and the combination clears a high bar. That version works, and it's most of the value on the table right now.
The trouble starts when you push past scaffolding into full automation of tasks that are made of the channel. Managing people whose interests don't line up. Telling the difference between a project that's healthy and one that's performing health. Working out why something that looks fine on paper is dying. Channel signal isn't a nice-to-have enrichment on these tasks. It's the substance of them. You cannot do the work without reading it.
So the agent does the only thing it can. It produces output that pattern-matches to what an informed person would say. It writes a credible email. It names plausible risks. Everything it returns is coherent and defensible, and it's still missing the part that mattered, because the gap between what it knows and what a person in the room knows was never in the text. It was in the room.
This is why the failure is so hard to catch before you commit to it.
When you evaluate an agent for this kind of work, you test the things you can see: factual accuracy, coherence, quality of writing. The agent passes all of them. The demo is clean, the decision gets made, and the failure shows up six months later when something political goes sideways in a way the agent's careful analysis never gestured at.
I keep finding the same shape in different places, and it's worth naming directly. In the gap between an AI passing a benchmark and actually having the capability the benchmark stood for, the benchmark quietly becomes the thing you're optimizing, and the capability it was meant to track stops being specified. The same move happens here. The transcript is a benchmark for the meeting. It's legible, it's complete, it's auditable, and it's missing exactly the part that doesn't reduce to text. When you build a system that reads the record, the record becomes the territory, and the room where the real decision happened drops off the map. The proxy is always the part you can measure, which is why the proxy is always what gets automated first, and why the thing it left out stays invisible until it costs you.
There are models being built to close some of this gap. Architectures that take in audio directly and read prosody alongside the words. Hume's voice work is aimed squarely at decoding emotional register; OpenAI's audio models hold tone in the representation before transcription flattens it. Mayew and Venkatachalam needed a piece of Israeli software to score a voice in 2012, so none of this is alien to machines in principle. Whether it reaches organizational subtext, reading a room, sensing a buried coalition, noticing that the project sponsor hasn't said a word in twenty minutes, is a separate question, and nobody can answer it honestly yet.
If your agent deployments on the human-layer tasks feel slightly off in a way you can't quite locate, this is usually the reason. The system isn't under-equipped and it isn't under-prompted. It's reading the record, and the meeting happened in the room.
That isn't a reason for pessimism about these systems. It's a placement problem. Point them hard at the work that's made of information: retrieval, synthesis, tracking, analysis, the places where the record actually holds what matters. Move slowly on the work that's made of the channel, because the channel is the one thing the agent can't get to. The people who get the most out of this generation of agents will be the ones who can tell, before they deploy, which of those two kinds of work they're looking at.