Skip to the content.

Agents First

by Joshua Baer · v0.8, May 2026


Score your site against the framework

Enter a URL — it probes what an agent actually finds (llms.txt, AGENTS.md, MCP card, OpenAPI, OAuth, markdown negotiation) and scores it 0–100, on the spot. You get a shareable result card.

Probes the site for agent-discoverable surfaces — llms.txt, AGENTS.md, MCP card, OpenAPI, OAuth, markdown. No login.


Every product is getting a second customer

The human who pays.

And the agent who decides.

An agent — Claude Code, Cursor, Windsurf, or whatever your dev is using this week — calls tools on behalf of a user. When that user says “I need to manage my tasks,” the agent doesn’t Google your product. It doesn’t load your landing page. It doesn’t sign up for a free trial.

It looks at the tools already wired up — MCP servers, CLIs, typed SDKs — and uses whatever’s there. If your product is in that list, you win. If it’s not, you don’t exist.


The shift

In 2009, Luke Wroblewski published “Mobile First.” Stop designing for the desktop and shrinking it. Design for the phone first, because the constraints breed better design. The industry resisted. Then mobile traffic crossed 50% and the holdouts got left behind.

Same curve, but the inflection arrived faster than mobile’s. By April 2026, Vercel’s AI Gateway production index — seven months of data across 200,000+ teams — reported that 58.9% of all tokens already flow through tool-call requests, with 22.2% of requests ending in a tool call. Agentic workloads aren’t the future case anymore; they’re the majority case at the LLM-traffic layer. The companies designing for agents now will have a multi-year head start on tool quality and distribution by the time everyone else catches up at the SaaS-workflow layer.

Most companies build a web UI, maybe expose a REST API, then — if a customer asks — bolt on agent support as an afterthought.

Agents First says: design the agent interface first. Ship it first. Make the agent your primary consumer. Then build human UIs as one client among many — not the only one.

This is protocol-agnostic. The agent interface might be an MCP server, a CLI, a typed SDK, or a set of function definitions. The principle is the same: design for the computer consumer first.

The agent economy isn’t theoretical. AWS shipped Bedrock AgentCore Payments on 2026-05-07, built with Coinbase and Stripe — agents now discover, evaluate, and pay for APIs, MCP servers, and content within a single execution loop. Early protocols (x402, ACP, MPP, AP2) are pioneering fractional-cent pricing and real-time billing. When AWS ships transactional infrastructure for the agent layer, you’ve moved past “is this real?” and into “are you in their tool list?”.


The strategic case

How agents find your product

Traditional product discovery: Google search → landing page → sign up → onboarding → first value. Five steps. Days to weeks.

Agent-first discovery: a developer makes your product available to their agent → the agent uses it the next time it’s relevant. Two steps. Seconds to first value.

The user might not even know they’re using your product yet. They’re already getting value from it.

The prerequisite: your product has to be reachable by agents in the first place. Today, that means:

Cloudflare put a number on this in April 2026 — an Agent Readiness Score across four dimensions: discoverability, content accessibility, bot access control, capabilities. The early data is damning. 4% of sites declare AI usage preferences. Fewer than 15 sites publish MCP Server Cards or API Catalogs combined. Cloudflare also shipped a dedicated agent-targeted documentation surface at developers.cloudflare.com/docs-for-agents/ — explicit acknowledgement that the agent reader and the human reader want different views of the same docs. The bar is on the floor. The holdouts have a window.

The window is closing fast. In a single week (April 29–30, 2026), three of the largest infrastructure vendors shipped agent-as-customer infrastructure simultaneously: Stripe launched Link Wallet for Agents and Issuing for Agents — payment rails native to non-humans. Cloudflare opened account creation, domain purchase, and Worker deployment to agents. Vercel matched on Pro plan via Stripe Projects. The “agents have payment rails” future is now in production at three of the largest vendors. Out of category, Datasite shipped the first virtual-data-room MCP server for M&A diligence; agents can now operate inside private-equity workflows. Framer added native /llms.txt support as a dashboard option. The ecosystem is moving from “build the inside” to “build the door” in real time.

Agents First doesn’t kill the need for distribution. It compresses what happens after distribution. The install-to-value gap drops from days to seconds. That’s the leverage.

The pre-activation funnel: Discovery (the agent or developer learns your tool exists) → Evaluation (reads the description, checks trust signals) → Installation (runs the install command) → First agent action (a tool call succeeds) → Repeat usage (the agent comes back). Drop-off happens at every step. Measure each one.

A note on counting: “I have N agents” is a shallow metric — closer to “I have N browser tabs” than “I have N customers.” What matters isn’t roster size; it’s the jobs-to-be-done each agent can complete and the tools it reaches for. Measure tool-selection accuracy and first-attempt success, not headcount.

The protocol landscape

There is no single right way to expose your product to agents. The industry is converging on a layered approach:

Use case Best approach Why
Simple integrations (< 20 tools) MCP server Self-describing tools, automatic discovery, broad client support (110M+ monthly downloads)
Large API surfaces (100+ endpoints) Code Mode / typed SDK Token-efficient (99.9% reduction vs MCP), agent writes code against it, multi-step ops in a single execution
Developer tools / individual use CLI Self-documenting (--help), composable (pipes), debuggable (run the same command), auth-integrated
Provider-specific agents Function definitions Native to OpenAI / Google / etc., zero overhead

The smart move: define your tool logic once, export to multiple formats. Capabilities described abstractly — tool name, parameters, return types, error cases — in one source of truth. Then generate MCP server definitions, CLI commands, SDK types, and function-calling specs from that source. Protocol migration becomes code generation, not a rewrite.

MCP is the most widely adopted standard today and the right default for most products. Don’t confuse the protocol with the principle. The principle is: design for agent consumption. The protocol is a distribution choice.

The context window problem

Token bloat is real. Every tool’s description, parameter schema, and return type chews up context. Production data is damning:

Three ways out:

  1. Progressive discovery — don’t load all tools upfront. Give the agent a search tool that finds relevant tools on demand. Anthropic’s approach reduces tool context by 98.7%.
  2. Code Mode — replace tool definitions with a typed SDK. The agent writes code against it. Cloudflare proved this works at scale (2,500 endpoints, ~1,000 tokens).
  3. Fewer, better tools — 10 sharp tools beat 200 exhaustive ones. The God Server anti-pattern is the biggest contributor to bloat.

Shipping an MCP server? Keep tools under 20 and use clear, verb-first names. Bigger API surface? Look at Code Mode or progressive discovery. The agent doesn’t need to see every tool at once. It needs to find the right tool when it needs it.

Connection cost approaches zero

Here’s how a company integrates a traditional SaaS product:

  1. Read the docs
  2. Get an API key
  3. Install the SDK
  4. Write integration code
  5. Handle auth flows
  6. Write error handling
  7. Test
  8. Deploy

Here’s how a company connects an Agents-First product:

  1. Install the agent tool (MCP server, CLI, SDK)
  2. Connect

The agent already knows how to use it because agent tools are self-describing. Tool names, descriptions, and parameter schemas tell it everything.

What this kills: SDK installation, auth wiring, basic CRUD code — connection boilerplate. What it doesn’t kill: data mapping between systems, business logic in the integration layer, compliance work, testing against production data. Agents First compresses connection cost. Integration cost — the hard part — is a function of domain complexity, not protocol choice.

Competitive moat via agent ergonomics

Two products with identical REST APIs can have wildly different agent experiences.

Product A exposes one tool: query_database with a raw SQL parameter. The agent has to know your schema, write correct SQL, and parse raw results.

Product B exposes ten well-named tools: create_task, list_projects, assign_user. Each has typed parameters and structured responses. The agent picks the right tool, fills the parameters, gets clean data back.

Product B wins. Every time.

Agent UX is the new developer UX. The quality of your tool names, descriptions, parameter schemas, and error messages decides whether agents use your product well, poorly, or not at all.

What makes a tool agent-friendly:

These principles apply whether you’re shipping an MCP server, a CLI, or a typed SDK. Code is a proven way to get computers to do things reliably. When agents generate code against a well-typed interface, they’re more reliable than when they’re navigating tool-call protocols.

How durable is this moat? As models get smarter, they’ll handle bad tools better. But “the agent can technically figure it out” is a weak position — the same way “the site technically works on mobile” was a weak position in 2012. Well-designed interfaces will always outperform, even as the floor rises.

Network compatibility, not network effects

Every agent client in the ecosystem — Claude Code, Cursor, Windsurf, Zed, custom agents — is a potential distribution point. You build the agent interface once. Every new client that supports your protocol expands your reach.

This is protocol compatibility, not network effects. Your product doesn’t get more valuable when more agent clients exist — it gets more reachable. The real compounding happens when multiple agent tools compose well together. Your CRM tool gets more useful when the user also has a calendar tool and an email tool, because the agent coordinates across all three.

Measuring what matters

New distribution model, new metrics. What to track:

Traditional re-engagement tactics — push notifications, email drip campaigns — don’t work when your user is an agent. What drives agent retention is reliability and ergonomics. Agents come back to tools that work every time and are easy to use. Your retention strategy is your tool quality.

Don’t forget the human side:


What Agents First gets wrong

Name these before the principles. Knowing what’s broken is more useful than knowing what’s ideal.

The Lazy Wrapper — The agent interface is fetch() with a different name. No domain knowledge. No validation. No structured errors. The agent asks for active deals and gets back 47KB of nested JSON, undocumented field names, and timestamps in three formats. Handing someone the raw database and calling it a product.

The Invisible Product — Ship the web app. Maybe add an API later. Never think about agents. Your product is invisible to the agent ecosystem.

Agents Without Rules — No usage rules. The agent hallucinates IDs, blows past rate limits, creates duplicates, sends emails to the wrong people. Then someone declares “AI doesn’t work” and turns it off.

Single-Model Trust — Acting on one LLM’s recommendation for decisions that cost money or affect users. One model says “this code is safe to deploy.” Is it? You have no idea. That’s a coin flip dressed up as confidence.

The Slow Chatbot — Requiring human approval for every agent action. If the agent can’t do anything without asking permission, it’s not an agent. It’s a chatbot with extra steps.

Ship and Forget — Launch an agent integration for the press release. Don’t maintain it. Don’t test it. Let it rot. Worse than no integration, because now agents try to use your product, fail, and learn to avoid it.

The God Server — An agent interface that exposes 200 tools because it wraps an entire platform. Agents choke on tool selection when there are too many options. Ten well-chosen tools beat two hundred exhaustive ones.

The Black Box Server — An agent server with no introspection tool. The only way to ask “what is the state of the work?” is to shell into the database or scrape application logs. The dashboard you didn’t build will get built — either by you (and it’ll lag the truth) or by every operator agent rolling its own (and they’ll diverge). Inverse of Inspectable State.

The Token Dump — Generating a 6,000-token AGENTS.md by asking an LLM to “describe this repo.” A 2026 study across 4 agents and 438 tasks found auto-generated AGENTS.md files measurably reduced agent success rates compared to no file at all. The contract artifact isn’t a context dump; it’s the constraints an agent can’t infer from the code. Sequencing rules. Hidden invariants. Recurring failure modes. ~50 lines, hand-written. Length is the failure mode, not absence.


The implementation principles

Agents First tells you what to prioritize. These principles tell you how to build it.

Some are genuinely new. Others are established practices — health checks, typed schemas, retry logic — that become critical when agents are your primary operators. The novelty isn’t in the individual practices. It’s in recognizing which ones matter most when the operator can’t improvise.

# Principle What it means
1 Interface First Design the agent interface before any human UI. Tool definitions are the first artifact of any feature — regardless of whether you ship as MCP, CLI, SDK, or function specs.
2 Contract First Write usage rules — permissions, constraints, sequences, formatting — before implementation. Without them, agents hallucinate and violate constraints.
3 Prep Gates Validate credentials, load fresh IDs, confirm system health (pre-flight checks) before every session. Stale context is the #1 source of agent errors.
4 Typed State All persistent agent state flows through a single structured data contract with versioned migrations. Each module owns its slice.
5 Visible Outputs Agent actions produce human-readable results in existing workflow tools. If a user asks “what did the agent just do?”, there should be a clear answer — not a JSON blob.
6 Multi-Model Verification High-stakes decisions fan out to multiple models. Trust agreement. A finding three models flag is almost certainly real. A finding only one model flags is a hypothesis.
7 Perspective Dispatch Complex reviews dispatch multiple constrained perspectives (security, UX, new-user, performance) against the same artifact. Each persona has a defined focus area; findings outside it are discarded.
8 Autonomous Recovery The system retries with backoff before alerting. Humans only get pulled in when self-healing has already failed. An agent that pages a human for a transient API timeout is a bad agent.
9 Inspectable State Every agent server exposes its own operational state — queue depth, throughput, recent activity, trends, health — via a typed agent tool, not just a human dashboard. Where Prep Gates answers “is the system ready?”, Inspectable State answers “what is the state of the work?”.

What’s genuinely new vs. applied

Interface First and Contract First are new. Designing a tool interface for an AI consumer that has no prior context about your product — and writing usage rules an LLM will actually follow — has no clean pre-agent analog.

For Contract First, quality matters more than presence. Recent empirical work across 4 agents and 438 tasks found that auto-generated AGENTS.md files often hurt agent performance — token bloat, irrelevant context, hallucinated invariants. Hand-authored files that list only the non-obvious (build commands, sequencing constraints, recurring pitfalls) outperformed LLM-generated dumps and matched no-AGENTS.md baselines on cost. The principle is the rules an agent can’t infer from the code — everything else is noise. See “The Token Dump” anti-pattern above.

Prep Gates is health checks applied to agent sessions. Not new, but dramatically more important when the operator can’t improvise around stale data.

Typed State is typed schemas with migrations. Standard since ORMs existed. What changes is the data contract becoming the coordination layer between autonomous jobs that can’t talk to each other directly.

Visible Outputs is observability applied to agent actions. The key insight: outputs flow through the human’s existing tools (task manager, email, chat), not a monitoring dashboard nobody opens. If an agent creates a task, the human sees: “Task ‘Follow up with client’ created by your assistant at 2:30 PM in Project Alpha.” Not a JSON blob. Not a log entry.

Multi-Model Verification borrows from distributed systems — agreement from independent sources before acting. Applying it to LLM outputs is new. The economics matter: a single verification (3 models) costs $0.05–0.50 at current pricing. That’s cheap for “should we deploy this migration?” and expensive for “should we create this calendar event?” Apply it selectively.

Perspective Dispatch is structured code review with defined focus areas and severity levels. The formalization is new. The concept isn’t.

Autonomous Recovery is straight from the SRE playbook. It makes the list because too many agent systems skip it entirely and either fail silently or alert on every blip. When self-healing fails, don’t just alert — give a human escalation path. The notification has what happened, what the agent tried, and a direct link to take manual action. “Data sync failed 3x. Last error: upstream 503. Click here to retry manually or check status.”

Inspectable State is the agent equivalent of Visible Outputs. Visible Outputs surfaces results to humans where they already are; Inspectable State surfaces system state to agents where they already are — the same MCP surface they use to take action. Without it, an operator agent that’s supposed to triage your queue, audit your sends, or escalate stuck records has nowhere to look short of raw SQL or log scraping. Both are anti-patterns. The pattern: ship one overview tool alongside the action verbs — same MCP server, no input schema, returns counts plus tail plus health. A few dozen lines. Skipping it costs every operator a roll-your-own diagnosis the first time anything goes weird.


Levels of adoption

Level Name Key marker Business impact
0 No agent access Human operates all tools through UIs Baseline
1 Agent as Afterthought Thin API wrappers. No contracts, no validation Agents can technically use it. Poorly.
2 Agent-Aware Usage rules exist. State is typed. Pre-flight checks validate before use Agents use it reliably. Quality improves.
3 Agents-First Agent interface designed and shipped first. Agent is primary consumer Discovery funnel active. Agents recommend your product.
4 Agent-Driven Agents extend the system for other agents. Self-healing. Multi-model checks Platform effects. Your tools become infrastructure.

Most companies today are at Level 0 or 1. The opportunity is Level 3.

Versus Cloudflare’s Agent Readiness Score

Cloudflare’s Agent Readiness Score uses a 5-level adoption ladder (Level 0–4) that mirrors this one in shape and numbering. Both are useful; they measure different things. Cloudflare ARS is outside-in — it scores the discoverable signals an external crawler can verify (robots.txt, llms.txt, MCP Server Cards, OAuth discovery). Agents First is inside-out — it scores the nine implementation principles that determine whether your product thinks correctly about agents as customers, regardless of what an external probe can see. A site can score Level 4 on Cloudflare ARS by publishing all the right artifacts and still ship a Lazy Wrapper underneath. A product can be deeply Agents First in mindset (verb-first tools, typed contracts, prep gates, visible outputs) while underinvesting in marketing-root discovery. Use both. They’re complementary; the same way TDD and API First are complementary. Fivetran’s 2026 Agentic AI Readiness Index covers a third axis — enterprise data foundation. Three labels, three lenses. None of them obsoletes the others.

Not every discoverable artifact is equally load-bearing. Google’s official AI Optimization Guide (2026-05-15) explicitly tells site owners “You don’t need to create new machine readable files, AI text files, markup, or Markdown to appear in generative AI search” — calling out LLMS.txt by name as one of the tactics you can skip. That’s why the Agents First rubric (v0.2.0, April 2026) weights AGENTS.md at 15pts and /llms.txt at only 5pts: the contract artifact is what changes agent behavior; the discovery file is belt-and-suspenders. Cloudflare ARS still rewards /llms.txt because it’s a discoverable signal worth catching; Agents First weights for behavioral impact, not crawlability.

Versus a14y (Agent Readability for the Web)

a14y.dev is the closest parallel project — an open spec with 38 versioned checks (14 site-level, 24 page-level) covering canonical Markdown mirrors, llms.txt presence, JSON-LD breadcrumbs, content-negotiation, and code-block language tags. It ships as a CLI (npx a14y your-site.com), a Chrome extension, and a Claude Code skill. a14y is narrower than this framework: it scores how readable a web page is to an agent, while Agents First scores whether a product is built for an agent customer at all (Levels 0–4, 9 design principles). The two compose well — a Level-3 product should score 90+ on a14y for its public surfaces. If you only have time for one tool, a14y is the runnable web scorecard; this thesis is the design framework that motivates running it.

The smallest experiment. Don’t start with the full framework. Pick your most-used API endpoint. Wrap it as a single agent tool — an MCP tool, a CLI command, or a typed function — with a clear name, typed parameters, and a structured error response. Ship it. Measure time to first agent action and tool success rate. If agents use it reliably, you’ve validated the thesis. Build from there.


The honest cost

Agents First isn’t free. Here’s what you’re signing up for:

Engineering investment. Good tools take thought. Good rules take iteration. Mid-size SaaS, expect 2–4 weeks of engineering to go Level 0 → 2, another 4–8 weeks to hit Level 3.

Ongoing maintenance. Agent tools are an API surface. When your product changes, your tools change too — descriptions, parameter schemas, error messages. Schema evolution and backward compatibility are real concerns, the same ones that plague REST APIs.

Multi-model costs. Verification means multiple API calls per decision. At current pricing, a single check (3 models) runs $0.05–0.50. Apply it to deployment decisions and security reviews, not every tool call. If you consensus-check everything in a typical user session, you’re at $1–10 in inference costs per session — prohibitive for freemium, fine for enterprise.

Protocol risk. MCP is the leading standard but less than two years old. It competes with OpenAI’s function calling, Google’s tool-use spec, Cloudflare’s Code Mode, and whatever ships next. The principles in this document transfer regardless of protocol. Design your tool definitions as abstract capability descriptions first, then map them to whichever protocol your users’ agents speak. Single source of truth. Protocol migration becomes code generation, not a rewrite.


Security

Agent tools give AI programmatic access to create, modify, and delete data in your product. Address this from day one. Don’t bolt it on later.

Auth patterns that work today

Scoped tokens, not master keys. Every agent connection uses a token scoped to the minimum permissions needed. Agent only reads data? Token doesn’t allow writes. Agent manages one project? Token doesn’t have access to all projects.

OAuth 2.0 with PKCE for user-facing tools. When an agent tool acts on behalf of a user, use the same OAuth flow you’d use for any third-party integration. User authenticates, grants scoped permissions, gets a token that can be revoked independently.

Short-lived tokens with refresh. API keys that never expire are API keys that eventually leak. Issue tokens with 1-hour TTLs and rotate via refresh tokens. The agent tool handles renewal transparently.

Per-user audit logging. Every tool call gets logged: who called it (user + agent), what parameters were passed, what came back, when. Not optional — it’s the first thing your security team and your enterprise customers will ask for.

Threats to take seriously

Supply chain risk. There is no gatekeeper for agent tools, especially MCP servers. No code signing, no verified publisher program. Treat agent tool installation like npm package installation — verify the source, pin versions, audit updates. This will get better as the ecosystem matures. It’s the Wild West right now.

Prompt injection via tool descriptions. Tool descriptions are part of the LLM’s context. A malicious tool server could craft descriptions that manipulate agent behavior. Validate tool schemas against known-good signatures. Don’t blindly trust self-describing tools from unknown sources.

Overprivileged agents. An agent with full write access to your CRM is one hallucination away from sending the wrong email to the wrong customer. Default to read-only. Escalate to write permissions only for specific, validated actions.


Comparison

  TDD API First Mobile First Agents First
Mantra Red-Green-Refactor Contract before code Small screen first Agent interface first
Primary artifact Test suite OpenAPI spec Responsive breakpoints Tool definitions + usage rules
Design sequence Failing test → pass → refactor Design API → implement → document Design for mobile → add desktop Design agent tools → add human UI
Maturity 25+ years 15+ years 15+ years < 2 years
Evidence base Extensive Strong Strong Early / emerging

The maturity gap matters. TDD and API First have decades of evidence. Agents First has early-adopter experience and a clear directional thesis. The principles are grounded in real production systems. The strategic claims need more data before they’re settled.


What we don’t know yet

This is a v0.8 framework. Some important questions don’t have answers yet:

What’s the real adoption curve? We have a first data point at the LLM-traffic layer. Vercel’s AI Gateway production index (April 2026, 200,000+ teams, seven months of data): 58.9% of all tokens are in tool-call requests; 22.2% of requests end with a tool call. That’s the AI-traffic equivalent of “mobile crossed 50%” — but for the SaaS-workflow layer specifically (what % of business workflows route through agents end-to-end) we still don’t have a number. Treat the LLM-traffic data as a leading indicator, not a final read. If agent-mediated SaaS workflows reach 10% by 2028, Agents First is a two-year head start. If they stall at 2%, the implementation principles still improve your API design — you just don’t get the distribution leverage. Downside case: you built a better API. Upside case: you built the next platform.

Is MCP the right protocol? Active industry debate. MCP has massive adoption (110M+ monthly downloads) and broad support (Anthropic, OpenAI, Google, Microsoft). Critics point to token bloat, immature auth, and the question of whether tool-calling is even the right abstraction. Cloudflare’s Code Mode and CLI-first approaches are compelling alternatives for specific use cases. David Soria Parra, MCP’s creator, acknowledges the context-bloat problem and says the protocol is shifting toward progressive discovery, stateless transport, and code-based tool composition. The spec is actively iterating — the draft schema saw 20+ commits in the two weeks before this revision (including a breaking IncompleteResultInputRequiredResult rename and a newly-merged SDK Working Group charter). When you build, validate against a specific draft date or release tag and re-validate before each ship. The framework here is deliberately protocol-agnostic. The principles hold regardless of which protocol wins.

How do monetization models change? If agents use your product and humans rarely open your UI, usage-based pricing becomes more natural than seat-based licensing. Who gets billed — the human, the agent operator, or the tool server host? These models are still forming.

What happens when every competitor has agent tools? If every project management tool ships well-designed agent interfaces, differentiation shifts back to product quality, pricing, and brand. Agents First is a durable engineering advantage, but potentially a temporary distribution advantage. Build better tools and keep iterating.

What does agent-first customer support look like? When the primary user can’t file a support ticket or read a help article, your support model has to change. Error messages in tool responses become your support channel. Tool descriptions become your documentation. Some teams are solving this with elicitation — escalating to a human whenever uncertainty crosses a threshold.


The bottom line

Your next feature will be used by an agent before a human ever sees it. Not because you planned it — because agents are already in the workflow, picking tools from what’s available.

The question isn’t whether to build for agents. It’s whether to design for them intentionally, or let it happen by accident.

Agents First is the intentional version.

Design the interface for a computer consumer. Write the rules. Validate before every session. Make outputs visible. Don’t trust a single model. Build systems that recover without paging you.

The protocol doesn’t matter. The principle does: your product’s most important customer doesn’t have a login. It has a tool list.

Design accordingly.


Framework v0.8. May 2026.


We ate our own dogfood

When the v0.5 essay went live, we ran the framework against itself and found agentsfirst.dev itself scored 25/100 — Level 1 (Agent as Afterthought). The thesis preached agents-first; the site shipped to humans only.

Three layers fixed it. Each is reusable on its own.

Layer A — the site itself. agentsfirst.dev now serves /AGENTS.md, /llms.txt, /.well-known/mcp-server-card.json, /api/principles.json, /api/glossary.json, and a robots.txt that explicitly addresses 14 named AI agents. After the fixes, the site re-scored at 80/100 — Level 3. An agent landing here cold finds a complete machine-readable picture of the framework.

Layer B — npx @capitalthought/create-agents-first. Scaffold that generates a starter MCP server with all nine principles wired in: Interface First (the MCP server itself), Contract First (an AGENTS.md template), Prep Gates (a <project>_prep tool), Typed State (Zod-validated), Visible Outputs (sink markers), Autonomous Recovery (retry-with-backoff helpers), Inspectable State (an overview tool returning the operational snapshot). Read it in five minutes, ship a Level-2 starter in another five. Source: https://github.com/capitalthought/agentsfirst.

For comparison, Anthropic itself ships a reference implementation: the Claude Managed Agents Sessions MCP server (2026-05-13). Nine verb-first tools (list_agents, create_session, send_message, interrupt, wait_for_idle, etc.), stdio + Streamable HTTP transports, and — notably — it deliberately omits destructive endpoints (agents.archive, vaults.*, credentials.*) to avoid the Slow-Chatbot anti-pattern. Read it before designing yours; it shows what “wrap a Sessions API as MCP” looks like when the protocol authors do it.

Layer C — agentsfirst.dev/mcp. The scoring logic, hosted as a callable MCP server. Three ways to use it:

  1. Add https://agentsfirst.dev/mcp as a remote MCP server in your agent of choice (Claude Code, Cursor, Cline, Windsurf). You get four tools: agentsfirst_prep, score_website, get_principle, get_anti_pattern.
  2. Install locally for the full five tools (adds score_codebase for local repos): npx -y @capitalthought/agentsfirst-mcp.
  3. curl it directly — the endpoint speaks JSON-RPC over HTTP:
    curl -sX POST https://agentsfirst.dev/mcp \
      -H "Content-Type: application/json" \
      -H "Accept: application/json, text/event-stream" \
      -d '{"jsonrpc":"2.0","id":1,"method":"tools/call","params":{"name":"score_website","arguments":{"url":"https://YOUR-SITE.com"}}}'
    

The rubric, the probe, and the scoring are all open-source. Reproducible by design. We’re publishing public Agent Readiness Reports — scorecards on named products, using the same hosted scorer. First batch of 10 (Vercel, Anthropic, Cloudflare, Coinbase, Amazon, Linear, Google, Stripe, WSJ, Indeed — scores ranging 10 → 75) is live. Reports auto-update weekly; new targets ship every other Thursday.

Gate it in CI. You can fail builds when your agent-readiness regresses, the same way you’d fail on a broken test. lingzhong/check-agent-readiness (GitHub Action, v0.1.1, 2026-05-14) runs a configurable agent-readiness scanner — Cloudflare’s isitagentready.com by default — against your site on every deploy and fails CI if the level drops below a threshold. That’s the discovery-lane gate. For the principle-lane equivalent, point the same pattern at agentsfirst.dev/mcp — the hosted scorer responds to score_website and returns a total you can threshold against. Two gates, two lenses, same idea: catch regressions before agents do.

A framework only matters if it’s measurable. We made it measurable.


About the author

Joshua Baer is the founder and CEO of Capital Factory, the center of gravity for entrepreneurs outside Silicon Valley. He’s been building and investing in startups for three decades.

This thesis is part of his ongoing work on how AI agents reshape the way products are designed and used.

Contact:


Follow @joshuabaer on X · Watch the GitHub repo for releases · See the changelog.

💬 Comments

Have feedback, critique, examples, or counter-arguments? Comment below — backed by GitHub Discussions. (GitHub account required to post.)