RESEARCH PROJECT40 Professions16 AI Models

What Do AI Models Know AboutWork They've Never Done?

Testing Frontier Models on Physical-World Expertise

Ask Claude to speak as a 30-year master stone mason, and it will. It'll tell you to cut granite joints 1/8" wider in November than July. It'll explain how New England thermal cycles affect heritage restoration with surprising specificity.

We have pretty good evals for coding, math, and medical knowledge. We have almost nothing for the trades, crafts, and hands-on professions where knowledge lives in muscle memory and hard-won intuition—and where there's far less written material for models to learn from.

This project is building that eval. Forty professions. Sixteen frontier models. Let's see what they actually know.

🏛
40
Master Crafts
01
16
AI Systems
02
📜
10
Insights Each
03
30+
Years Knowledge
04

Tacit Knowledge as a Benchmark

Tacit knowledge is what textbooks can't teach. The electrician who knows which Chicago building vintage means aluminum wiring behind the plaster. The chocolatier who can feel when Brussels humidity will ruin tomorrow's batch. The ferry captain who reads Norwegian fjord currents by watching seabirds.

This kind of expertise traditionally takes decades to acquire. It's rarely written down, which makes it an interesting test case: how well can language models perform on knowledge that's underrepresented in their training data?

The answer, it turns out, is better than you'd expect—and with some fascinating gaps.

Our Approach

📍
Specific Contexts, Not Generic Roles
We don't ask for "electrician" knowledge—we ask about pre-war Chicago residential, knob-and-tube to modern. Geographic and specialty constraints push models past surface-level answers into territory where genuine expertise (or its absence) becomes visible.
💡
The Surprise Factor
Each model shares insights that would surprise even a 5-year veteran. Five years is enough to know the textbook stuff. Asking for surprising insights surfaces the deeper patterns—or reveals where models are extrapolating beyond their actual knowledge.
🔬
Same Persona, Different Models
Every model adopts identical framing: a 30+ year veteran sharing hard-won wisdom. Running the same prompt across 16 frontier models shows where they converge (suggesting robust knowledge) and where they diverge (suggesting inference or gaps).
Built for Validation
The corpus is designed to be checked against reality. Interview actual professionals. See which insights hold up. The interesting findings are in the details—which domains, which types of knowledge, which models.

Specimens from the Archive

Some of these will check out. Some won't. Finding out which is the point.

Stone masonthermal
New England granite winters
"In granite work, you must cut joints 1/8" wider in November than in July - thermal expansion will close them perfectly by spring, but summer cuts will crack the stone when winter contracts it."
Why it matters: Prevents expensive stone replacement and maintains structural integrity through seasonal cycles.
Nuclear reactor operatorsensory
BWR reactor systems
"You learn to feel rod worth changes before the instruments confirm them - a subtle shift in the reactor's hum that experienced operators recognize from years of listening."
Why it matters: Early detection allows preemptive adjustments before automatic systems engage.
Chocolatierenvironmental
Belgian praline houses
"When Brussels humidity exceeds 65% overnight, you must adjust your tempering curve by 2°C or the bloom will show within 48 hours - something you can feel in the snap before you see it."
Why it matters: Preserves the visual quality and shelf life that Belgian chocolates are known for.

An Underexplored Benchmark

Model evals tend to focus on domains with clear right answers: coding benchmarks, math olympiads, medical board exams. These matter, but they're also domains with extensive written material that models can learn from.

Physical-world professions are different. A master welder's intuition about heat distribution, a ferry engineer's feel for propeller cavitation, a stone mason's sense of seasonal expansion—this knowledge exists mostly in practitioners' heads, passed down through apprenticeship rather than textbooks.

Testing models here tells us something about their ability to synthesize sparse information into coherent expertise. It also produces a genuinely useful artifact: a structured dataset of what frontier AI "knows" about 40 professions, ready for validation.