The AI Model Landscape | Module 5 | AI Executive Accelerator

Universal Capabilities

What is now table-stakes

These capabilities are standard across all frontier models. They should not drive your platform choice, because every serious contender delivers them competently.

Multi-step Reasoning

All platforms offer a "thinking" or "deep think" mode. All score above 90% on graduate-level science tests.

Document Analysis

Processing PDFs, spreadsheets, contracts, and images works well everywhere.

Million-token Context

Context windows of one million tokens (roughly 750,000 words) are available from Claude, Gemini, and ChatGPT. Llama 4 Scout offers 10 million.

Real-time Web Search

Integrated across all major platforms. No longer a differentiator.

Persistent Memory

All platforms offer some form of conversation memory that carries context across sessions.

For the majority of insurance use cases, any frontier model will produce competent results. The differentiation lies in secondary characteristics: hallucination rate, ecosystem integration, citation practices, and data sovereignty.

Six Platforms

The field at a glance

Click any card to expand the full profile: company background, model capabilities, platform features, user ecosystem, and what to watch. Each profile draws from the complete field guide research.

The safety-first enterprise powerhouse

Opus 4.6 flagship $14B annualized rev 8 of 10 Fortune 10

The Company

Founded January 2021 by Dario and Daniela Amodei, who left OpenAI over disagreements about safety versus commercialization. Incorporated as a Public Benefit Corporation. Raised approximately $67 billion across 17 rounds, with a $380 billion post-money valuation as of February 2026. Revenue grew from $1 billion annualized in December 2024 to $14 billion by February 2026, with roughly 85% coming from enterprise customers.

Defining innovation: Constitutional AI, training models against a written set of principles (23,000 words, led by philosopher Amanda Askell) rather than relying solely on human feedback. In early 2026, Anthropic refused Pentagon demands to remove prohibitions on mass surveillance and autonomous weapons, resulting in a presidential order to cease federal use. The case is in federal court. Ironically, the confrontation boosted consumer adoption to over one million new signups per day.

Models

Opus 4.6 (February 5, 2026): 1-million-token context window, 14.5-hour autonomous task horizons, Agent Teams. Sonnet 4.6 (February 17, 2026): near-Opus performance at 60% lower cost, default for most users. Haiku 4.5: fastest and cheapest, powers free tier.

Consistently wins blind writing tests for natural prose. Dominates coding benchmarks at 74-81% on SWE-bench Verified. Lowest hallucination rate among frontier models. First on financial reasoning benchmarks.

New user gotcha

No image generation. No native audio or video processing. Occasional over-refusal of legitimate requests. Opus 4.5's warmth was traded for Opus 4.6's stronger reasoning.

Platform

Clean conversational interface on web, iOS, Android, and desktop. Key features: Artifacts (interactive outputs created in chat), Projects (dedicated workspaces per engagement), Cowork (January 2026 desktop agent with file system access and browser automation). Model Context Protocol (MCP) has become an industry standard adopted by OpenAI, Google, and Microsoft, with 97 million monthly SDK downloads.

Free$0

Pro$20/mo

Max$100-200/mo

Team$25-150/seat/mo

EnterpriseCustom

Insurance Relevance

Named clients include AIG and Newfront. Claude for Financial Services launched July 2025 with industry-specific data integrations. Consulting partners Slalom, PwC, Deloitte, and Infosys are building insurance-specific agents for claims, compliance, and underwriting. Sonnet 4.6 achieved 94% accuracy on insurance-specific computer use benchmarks. Pre-built connectors for S&P Capital IQ, FactSet, Morningstar, PitchBook, and Snowflake.

Where It Leads

Writing quality Code generation Low hallucination Financial reasoning Enterprise trust

The 900-million-user juggernaut

GPT-5.4 flagship 910M weekly users 92% of Fortune 500

The Company

Founded December 2015 as a nonprofit. Created a capped-profit subsidiary in 2019, then restructured in October 2025 as a Public Benefit Corporation. February 2026 mega-round of $110 billion from Amazon ($50B), NVIDIA ($30B), and SoftBank ($30B) established an implied valuation of approximately $840 billion. Revenue projected at $25 billion annualized by early 2026, though the company still burns roughly $17 billion per year and does not expect profitability until 2029-2030.

Microsoft holds approximately 27% of OpenAI with exclusive cloud provider status through 2030. CEO Sam Altman is the only original leader still active. Only two of eleven founding members remain.

Models

GPT-5.4 (March 5, 2026): up to 1 million tokens via API, configurable reasoning depth, native computer use. The GPT-5 family ships near-monthly updates. Excels at coding (57.7% on SWE-bench Pro), mathematics (94.6% on AIME 2025), and broad knowledge work, matching or exceeding industry professionals in 83% of cases across 40+ occupations. Multimodal capabilities span vision, voice, DALL-E 4 image generation, and Sora video generation.

New user gotcha

The model you are talking to may change mid-conversation. OpenAI auto-routes between Instant and Thinking modes, and free-tier users may be silently downgraded during peak demand. Rapid model deprecation disrupts established workflows.

Platform

The broadest ecosystem of any AI platform. Custom GPTs, integrations with Gmail, Slack, SharePoint, GitHub, and Shopify. Code Interpreter for data analysis. Canvas for collaborative editing. Memory stores persistent facts across conversations. Testing advertising on the free tier.

Free$0

Plus$20/mo

Pro$200/mo

Team$25/user/mo

EnterpriseCustom

Insurance Relevance

Insurify launched the first ChatGPT insurance comparison app in February 2026 leveraging 196 million auto insurance quotes. Experian launched an Insurance Marketplace app across 37+ carriers. Common use cases include claim summarization, FNOL transcript processing, FAQ automation, underwriting support, and policy comparison. 92% of Fortune 500 adoption means most carriers are already somewhere on the ChatGPT learning curve.

Where It Leads

Broadest ecosystem Consumer adoption Image + video generation Third-party integrations

The integration giant with multimodal superpowers

3.1 Pro flagship 750M monthly users $175B+ 2026 capex

The Company

Google's AI story begins with the 2014 DeepMind acquisition (Demis Hassabis, 2024 Nobel laureate) and the foundational 2017 "Attention Is All You Need" transformer paper. In 2023, Google consolidated Brain and DeepMind under Hassabis. Planned 2026 capital expenditure of $175-185 billion is the highest AI infrastructure commitment by any single company. Alphabet surpassed $400 billion in annual revenue in 2025.

The journey has included stumbles: the February 2024 image generation fiasco, AI Search recommending to "add glue to pizza," and the departures of Ethical AI leaders Timnit Gebru and Margaret Mitchell. Hassabis credits these crises with forcing Google to rediscover startup roots.

Models

Gemini 3.1 Pro (March 2026) leads on 13 of 16 major benchmarks, including 94.3% on GPQA Diamond and 77.1% on ARC-AGI-2. Native multimodal design processes text, images, video, and audio simultaneously. Context windows extend to 1-2 million tokens. NotebookLM answers strictly from uploaded documents, available as a core Workspace service in 180+ regions.

New user gotcha

Quality can swing within hours. Long-context drift beyond approximately 120,000 tokens. The "Auto" model setting may silently downgrade you from Pro to Flash without notification.

Platform

For Google Workspace organizations, the integration is seamless: AI features in Gmail, Docs, Sheets, Slides, and Meet. NotebookLM for source-grounded research. Deep Think mode for multi-hypothesis reasoning. Google AI Studio for free experimentation, Vertex AI for enterprise deployment with data residency controls.

Free$0

AI Pro$19.99/mo

AI Ultra$249.99/mo

EnterpriseCustom (Vertex AI)

Insurance Relevance

SIGNAL IDUNA rolled out Gemini Enterprise to 10,000+ employees and sales partners, reporting 30% reduction in information search time and escalation rates dropping from 27% to 3%. Generali Italia uses Vertex AI for model evaluation. American Family Insurance showcased AI transformation at Google Cloud Next '25. Financial services platform Rogo reported hallucination dropping from 34.1% to 3.9% after switching to Gemini 2.5 Flash.

Where It Leads

Benchmark performance Native multimodal Workspace integration Price-performance ratio

The answer engine that cites its sources

93.9% SimpleQA accuracy 1.9s avg response Multi-model routing

The Company

Founded August 2022 by four engineers from OpenAI, Google Brain, DeepMind, and Meta AI. CEO Aravind Srinivas (PhD, UC Berkeley, age 31) built the company on a thesis: Google Search is broken. Raised approximately $1.5 billion at a $20 billion valuation. Revenue at roughly $200 million annualized. In February 2026, Perplexity discontinued advertising entirely, committing to subscription-first. Faces 10+ copyright lawsuits from publishers including The New York Times, Dow Jones, and BBC.

Models

Not a single model but a retrieval-augmented generation pipeline routing to GPT-5.4, Claude Opus 4.6, Gemini 3.1 Pro, and its own Sonar models. "Best" mode auto-selects per query. Model Council (February 2026) runs queries across three or more models, synthesizes outputs, and shows where they agree and disagree. Perplexity Computer orchestrates 20+ models with specialized sub-agents. Excels at real-time fact retrieval with citations (93.9% on SimpleQA), averaging 1.9-second response times with an estimated 1-2% hallucination rate.

New user gotcha

Notably weaker for creative writing, long-form content, and complex coding. It is a research tool, not a writing partner. Different use case than Claude or ChatGPT.

Platform

Focus modes target specific source types: Web, Academic, Finance (SEC filings, earnings data), Social, and Video. Pro Search enables multi-step reasoning with 3x more sources. Deep Research delivers comprehensive cited reports in 2-4 minutes. Finance features include real-time stock quotes, portfolio tracking, SEC filing analysis, and sector comparisons from FactSet, S&P Global, and 40+ data tools.

Free$0

Pro$20/mo

Max$200/mo

Enterprise Pro$40/seat/mo

Enterprise Max$325/seat/mo

Insurance Relevance

No confirmed major insurance company deployments have been publicly announced, but capabilities align well with insurance research needs: claims analysis, regulatory research, competitive intelligence, and market surveillance. The Finance focus mode and Document Review feature (auditing contracts for logical consistency, factual accuracy, and contradictions) are directly applicable. GSA agreement offers Enterprise Pro to all U.S. federal agencies.

Where It Leads

Cited research Factual accuracy Multi-model comparison Real-time data

The open-weight revolution

Llama 4 current 1.2B downloads Free to use

The Company

Meta's AI strategy is counterintuitive: because Meta monetizes through advertising ($135+ billion annual revenue), giving away AI models costs nothing directly while commoditizing the model layer that competitors sell. Planned 2026 capital expenditure of $115-135 billion. Most significant departure: Yann LeCun, Turing Award winner and founding FAIR director, left in November 2025 after being marginalized by new leadership.

Critical nuance: Llama models are "open weights," not open source. The license prohibits use by companies with 700+ million monthly active users without Meta's permission. Internal signals suggest a potential pivot toward closed models (Project Avocado).

Models

Llama 4 (April 2025) introduced Mixture of Experts architecture. Scout: 17 billion active parameters, 10-million-token context window, fits on a single GPU. Maverick: 17 billion active out of 400 billion, 1-million-token context, 128 experts. Both are natively multimodal and support 200 languages. Maverick delivers comparable performance to GPT-4o at an estimated one-tenth the inference cost when self-hosted.

New user gotcha

Llama is not a website you visit. It is model weights you download and run on your own infrastructure, access through third-party providers, or use indirectly through Meta's chatbot at meta.ai. Running the models requires GPU infrastructure.

Platform

No hosted consumer platform (beyond meta.ai). Access through third-party providers: Together AI, Groq, AWS Bedrock, Azure. Free to download from llama.com or Hugging Face. Requires technical capability to deploy: the 70B model needs approximately 43GB of VRAM, the 405B model requires 230+ GB.

Model weightsFree

Self-hosted inferenceGPU costs only

Third-party hostingVaries by provider

Insurance Relevance

Even executives who never run a Llama model should understand why it matters. Open-weight models create pricing pressure on proprietary vendors and provide negotiating leverage. Self-hosted models run entirely within the security perimeter, eliminating data sovereignty concerns. Fine-tuning enables training on insurance-specific language without sharing data externally. Enterprise adopters include Goldman Sachs, AT&T, and Block/Cash App. A reported mid-sized insurer invested $50,000 in GPU infrastructure and recovered costs in three months.

Where It Leads

Data sovereignty Cost efficiency Customizability Vendor diversification

The $6 million model that rewrote the rules

V3.2 current $5.6M training cost 10-30x cheaper API

The Company

Liang Wenfeng co-founded High-Flyer, a Chinese quantitative hedge fund managing $8-14 billion. Starting in 2021, he stockpiled thousands of NVIDIA A100 GPUs before U.S. export restrictions, then spun off DeepSeek in July 2023 with zero venture capital funding. The entire operation is bankrolled by the profitable hedge fund. Fewer than 200 employees (compared to OpenAI's 3,000+), with a flat bottom-up culture. Many researchers are fresh university graduates recruited for passion over experience.

The key architectural innovation, Multi-head Latent Attention (reducing memory requirements by 90%+), originated from a young researcher pursuing a personal interest, enabled by this flat structure.

Models

DeepSeek-V3.2 (December 2025): 671 billion total parameters with only 37 billion activated per token, 128K context window, hybrid reasoning modes. DeepSeek-R1 (January 2025): the dedicated reasoning model trained for $5.6 million that matched OpenAI's o1 on mathematical reasoning. API pricing: $0.56 per million input tokens and $1.68 per million output, approximately 10-30x cheaper than equivalent models from competitors.

Critical caveat for regulated industry

Data from the hosted web app and API flows to servers in China. Multiple countries and U.S. agencies have banned or restricted the app. Security researchers found unencrypted data transmission with hard-coded encryption keys and tracking tools from ByteDance, Baidu, and Tencent. The model applies Chinese Communist Party-aligned content filtering. For insurance companies, the viable path is exclusively through self-hosted open-source deployments.

Why It Matters

DeepSeek teaches something no other platform in this comparison addresses: frontier AI can be built for $5.6 million by a team of fewer than 200 people, under sanctions. For executives making AI investment decisions, this is the most consequential strategic question in the current landscape: if this quality of AI can be built at this cost, what does that mean for every assumption underlying your technology roadmap? DeepSeek-R1 briefly became the #1 app on the U.S. App Store while erasing $1 trillion in U.S. market capitalization in a single day.

Where It Leads

Cost disruption Efficiency innovation Open weights (MIT license) Strategic signal

Head to Head

Comparison by dimension

Select a dimension to compare all six platforms side by side. Leaders in each category are highlighted. All benchmarks and scale figures reflect publicly available data as of March 2026; these will shift as new model versions ship.

Platform	Key Benchmark	Strengths
Claude	74-81% SWE-bench Verified	Clean, production-ready code. Strongest scores on real-world GitHub issue resolution.
ChatGPT	57.7% SWE-bench Pro	Broad language coverage. Code Interpreter for data analysis. Canvas for collaborative editing.
Gemini	Top scores on WebDev Arena	Strong frontend coding. Flash variants offer excellent price-performance for code tasks.
Perplexity	Not a primary strength	Code search and documentation lookup. Not designed for code generation.
Llama	Comparable to GPT-4o (self-hosted)	Customizable for domain-specific code. No API costs when self-hosted.
DeepSeek	Strong (V3.2 improvements)	10-30x cheaper inference. Job postings reference Claude Code as benchmark to surpass.

Platform	Key Benchmark	Strengths
Claude	First on financial reasoning	Extended thinking with 14.5-hour autonomous horizons. Agent Teams for multi-instance orchestration.
ChatGPT	94.6% AIME 2025 (math)	Configurable reasoning depth. Heavy reasoning mode on Pro plan.
Gemini	94.3% GPQA Diamond, 77.1% ARC-AGI-2	Top scores on 13 of 16 major benchmarks as of March 2026. Deep Think mode for multi-hypothesis reasoning.
Perplexity	93.9% SimpleQA	Factual accuracy through retrieval-augmented generation. Model Council cross-checks reasoning.
Llama	Graduate-level capable	Behemoth (~2T parameters) in research preview. Customizable reasoning for specific domains.
DeepSeek	R1 matched OpenAI o1 on math	Achieved frontier reasoning at a fraction of the cost. Hybrid reasoning modes in V3.2.

Platform	Key Capability	Strengths
Claude	Long-context document analysis	1M token context. Projects for organizing research workspaces. Low hallucination rate.
ChatGPT	Deep Research + web integration	Broad integration ecosystem. Memory for persistent research context.
Gemini	NotebookLM (source-grounded)	Answers strictly from uploaded documents. Available as core Workspace service in 180+ regions.
Perplexity	Citation-first design, 1-2% hallucination	Focus modes (Web, Academic, Finance, Social, Video). Deep Research: dozens of searches, hundreds of sources, 2-4 minutes.
Llama	Self-hosted document processing	Complete data sovereignty. Fine-tunable for proprietary terminology.
DeepSeek	Cost-efficient bulk processing	Process large document sets at 10-30x lower cost. MIT-licensed for any use.

Platform	Consumer (paid)	API (per M tokens, in/out)
Claude	$20/mo (Pro), $100-200/mo (Max)	$3/$15 (Sonnet), $5/$25 (Opus)
ChatGPT	$20/mo (Plus), $200/mo (Pro)	Comparable to Claude
Gemini	$19.99/mo (AI Pro), $249.99/mo (Ultra)	$0.30/$2.50 (Flash), $2/$12 (3.1 Pro)
Perplexity	$20/mo (Pro), $200/mo (Max)	N/A (uses other models)
Llama	Free (model weights)	Free (self-hosted) or provider rates
DeepSeek	Free (web app)	$0.56/$1.68 (10-30x cheaper)

Platform	Scale	Key Integrations
Claude	8 of Fortune 10	MCP (industry standard), Artifacts, Projects, Cowork desktop agent. SOC 2, HIPAA, ISO 27001.
ChatGPT	910M weekly active users, 92% Fortune 500	Gmail, Slack, SharePoint, GitHub, Shopify, Custom GPTs, GPT Store, Agentic Commerce Protocol.
Gemini	750M monthly active users	Gmail, Docs, Sheets, Slides, Meet (native). Apple Siri partnership. Google One bundling.
Perplexity	30-45M monthly active users	Slack, Snowflake, Samsung Galaxy S26. Enterprise document review.
Llama	1.2B cumulative downloads	AWS Bedrock, Azure, Together AI, Groq. Developer and infrastructure ecosystem.
DeepSeek	Growing (self-hosted focus)	MIT license. Third-party inference providers. Distilled variants for consumer hardware.

Platform	Best Insurance Use Cases	Named Adopters
Claude	Contract review, compliance writing, underwriting analysis, claims processing	AIG, Newfront. Financial Services product (July 2025). Slalom, PwC, Deloitte building agents.
ChatGPT	Customer content, FAQ automation, FNOL processing, training materials	Insurify comparison app, Experian Marketplace. 92% Fortune 500 adoption.
Gemini	Google Workspace-native orgs, document analysis, team collaboration	SIGNAL IDUNA (10,000+ employees), Generali Italia, American Family Insurance.
Perplexity	Regulatory research, competitive intelligence, market surveillance, claims analysis	No confirmed major insurer deployments. Finance tools directly applicable.
Llama	Data-sovereign deployments, high-volume claims automation, fine-tuned policy analysis	69% of underwriting teams piloting LLMs (Conning 2025). Mid-size insurer case study.
DeepSeek	Cost-efficient bulk processing (self-hosted only for compliance)	CITIC Securities, Bank of Jiangsu, Chinese insurers deploying for claims and policy Q&A.

The Strategic Case

Why multi-platform fluency matters

The evidence for multi-platform AI competency is now overwhelming, and the argument applies with particular force to insurance organizations.

Different models excel at different tasks

Claude leads on coding and financial reasoning. Gemini leads on scientific reasoning and multimodal processing. ChatGPT leads on breadth of occupational knowledge. Perplexity leads on factual accuracy with citations. No single platform dominates every dimension.

Vendor lock-in risk is real

Gartner predicts that by 2028, 70% of organizations building multi-LLM applications will use AI gateway capabilities. Migration costs average $315,000 per project. The landscape changes quarterly: what is best today may not be best in 90 days.

Cross-checking catches errors

Insurance decisions involving underwriting, claims, and compliance demand high accuracy. Running the same analysis through multiple models catches hallucinations and blind spots that any single model would miss.

Switching costs are remarkably low

All major platforms use conversational interfaces. The learning curve between them is minimal. Enterprise AI gateways enable switching with configuration changes, not code rewrites. MCP and Agent-to-Agent standards are creating the HTTP equivalent for AI interoperability.

Perplexity reports that its own enterprise usage shifted from 90% of queries going to just two models in January 2025 to no single model commanding more than 25% by December 2025.

For Insurance Organizations

A practical multi-platform approach

Based on the field guide research, here is how insurance organizations can match platforms to specific use cases. These examples reflect capabilities and adoption as of early 2026; teams should revisit this mapping annually as models and regulations evolve.

Claude

Contract review, compliance, underwriting

Lowest hallucination rate and strongest financial reasoning make Claude the default for compliance-sensitive writing and complex document analysis where accuracy carries regulatory weight.

Perplexity

Research, intelligence, surveillance

Real-time search with inline citations for regulatory research, competitive intelligence, and market surveillance. Finance focus mode for SEC filings, earnings data, and sector comparisons.

Gemini

Google Workspace-native organizations

For carriers and agencies already on Google Workspace, Gemini's native integration across Gmail, Docs, Sheets, and Meet delivers AI where work already happens.

ChatGPT

Customer content, training, broad use

Broadest ecosystem and third-party integrations. Best suited for customer-facing content, training materials, and organizations needing the widest range of capabilities.

Llama / DeepSeek

Data sovereignty, high-volume processing

Self-hosted models for organizations where sensitive policyholder data must not leave the infrastructure perimeter. Requires in-house MLOps capability and GPU infrastructure. DeepSeek models trained in China require compliance review before deployment in regulated contexts. Awareness of these options matters even for teams that will not deploy them directly.

Go Deeper

The full field guide

This page summarizes the key comparisons and decision-relevant differences. The complete field guide includes detailed company histories, model version timelines, platform feature inventories, and the full trajectory analysis from March 2025 to March 2026.

Read the Full Field Guide

Back to Module 5

Disclosure

In creating this AI Landscape Overview, I collaborated with Claude while completing the exercises in Anthropic Academy's AI Fluency Course, and the 4 Ds in particular, to assist with research, summarization, and visual creation. I affirm that all AI-generated and co-created content underwent thorough review and evaluation. The final output accurately reflects my understanding, expertise, and intended meaning. While AI assistance was instrumental in the process, I maintain full responsibility for the content, its accuracy, and its presentation. This disclosure is made in the spirit of transparency and to acknowledge the role of AI in the creation process.