Comparing how 16 frontier language models capture professional tacit knowledge
Each model receives identical prompts and is asked to adopt the persona of a 30+ year veteran professional. This allows us to compare how different architectures, training approaches, and data sources affect the generation of tacit knowledge.
1.0 (high creativity)
10 per profession
Structured JSON
OpenAIβs flagship multimodal model with real-time text, vision, and audio capabilities. Excels in natural conversation and multilingual tasks.
High-performance model from OpenAI with superior instruction following, coding, and long-context reasoning (1M token window).
OpenAI's latest flagship model with advanced reasoning capabilities.
Advanced reasoning model from OpenAI's o-series. Excels at logic, tool use, and image understanding. Includes variants like o3-mini and o3-pro.
Google's most intelligent model for multimodal understanding and agentic tasks
Google's advanced thinking model for complex reasoning
Google's fast model with excellent price-performance
Anthropic's smartest model for complex agents and coding
Exceptional model for specialized reasoning tasks
Premium model combining maximum intelligence with practical performance
Moonshot's Kimi K2 model with multi-step tool calling and reasoning
DeepSeek's reasoning model with chain-of-thought capabilities
DeepSeek's general chat model
xAI's flagship model
xAI's fast inference model
xAI's high-performance agentic model with 2M token context
Which models generate the most authentic-sounding professional insights? Do larger models perform better, or do specialized training approaches matter more?
How well do models trained primarily on English data handle profession-specific knowledge from different geographic and cultural contexts?
When models generate specific technical details, are they drawing from training data or creating plausible-sounding but potentially false information?