Abstract

Enterprise agent platforms are accumulating tools faster than they are developing coherent abstractions. The common explanation is that models choose the wrong tool, lose the thread, or fail to recover from intermediate errors. This note makes a narrower claim: many of those failures begin below the model, at the capability surface itself. Steve Yegge's service critique identified a specific problem: clients were forced to manually traverse fragmented service and database boundaries to answer even simple queries. Internal service boundaries leaked into client orchestration logic. Agent systems repeat the same mistake and extend it: the probabilistic planner must reconstruct not only query logic but entire state-changing workflows at runtime. MCP improves connectivity, and Databricks Omnigent improves runtime control, but neither substitutes for good capability design. Durable agent architecture needs both a semantic capability layer that hides service boundaries, and an execution control plane that governs runtime behavior.

Context and motivation

Enterprise agent systems are moving from isolated assistants toward operational runtimes with access to search, tickets, codebases, shells, databases, SaaS tools, internal APIs, and deployment systems. That expansion is happening along two fronts at once.

First, capability catalogs are growing quickly because protocols such as MCP make new integrations easier to publish and reuse. Second, organizations are starting to run several harnesses in parallel, which creates a second orchestration problem above the tools themselves.

This note is necessary because those two forms of growth are often discussed as model-quality problems. In practice, a large share of the instability comes from exposing implementation-shaped operations to a planner that must rediscover workflow semantics on each run.

Core thesis

Agent systems are repeating Yegge's client-side orchestration problem and extending it from queries into state-changing business operations.

The durable fix has two parts:

  1. A semantic capability layer that hides service and storage boundaries and exposes declarative reads and intent-level commands.
  2. An execution control plane (meta-harness) that governs agent selection, credentials, policy, budgets, and trace capture.

MCP helps with connectivity. Meta-harnesses help with runtime governance. Neither automatically repairs a fragmented agent-facing capability surface.

Mechanism: from Yegge's query problem to agent capability design

Yegge's original post identifies a problem that is easy to overgeneralize. His central claim is not merely that too many services exist. It is that clients must know which services to call, in which order, and how to combine the results — and that this orchestration burden belongs in the platform, not in every consumer.

In Yegge's framing, the client faces a read problem: data is scattered across databases and service APIs, and reconstructing a coherent answer requires manually navigating service boundaries. His proposed solution is a declarative query layer that lets the client describe what it needs without owning the traversal logic. This is the architectural insight that later informed systems such as GraphQL.

The agent case is an extension of that same pattern, now generalized from reads to writes.

For reads, the analogy is direct. An agent retrieving information from a fragmented tool surface must discover which tools return relevant data, call them in sequence, filter and join results inside its context window, and handle partial failures. That is Yegge's problem recreated at runtime, except the orchestration logic is now regenerated probabilistically rather than compiled ahead of time.

For writes, the problem becomes harder. A state-changing business operation — opening a customer account, submitting an expense report, creating an incident — requires sequencing, validation, idempotency, authorization, compensation on failure, and outcome verification. When the capability surface exposes only low-level operations, the agent inherits all of these concerns:

create_person
create_contact_record
assign_customer_identifier
create_billing_profile
activate_customer

The above appears composable, but it exports sequencing, validation, retries, compensation, and business rules to the caller. A conventional software client can encode that orchestration deterministically. An agent must infer it in flight.

A better capability surface:

onboard_customer

That single command can validate the whole request, enforce idempotency, and execute inside a transaction or durable workflow. The agent expresses intent. Deterministic software owns the business operation.

Agent systems repeat Yegge's client-side orchestration problem and extend it from queries into state-changing business operations.

Caption: Conventional service clients encode orchestration ahead of time; agent clients reconstruct it during execution.

flowchart LR
    A[Validated input] --> B[Fixed call sequence]
    B --> C[Explicit error handling]
    C --> D[Known transaction semantics]

    E[Natural-language goal] --> F[Tool discovery]
    F --> G[Probabilistic tool selection]
    G --> H[Interpret intermediate output]
    H --> I[Dynamic replanning]
    I --> J[Attempted recovery]

The deeper the tool trajectory, the larger the failure surface:

  • more network round trips;
  • more intermediate outputs in context;
  • more opportunities for invalid arguments;
  • more partial mutations;
  • more authorization decisions;
  • more planning drift; and
  • more difficult debugging and replay.

Tool explosion is also an abstraction problem

Tool explosion is often described as a prompt-budget problem. Large catalogs consume tokens and degrade selection quality. That is true but incomplete.

Large catalogs also create an abstraction problem. Tools overlap semantically, become valid only at certain workflow states, and vary in granularity. Research in 2026 points in the same direction. Repantis et al. found that adaptive tool shortlists can preserve coverage while materially reducing the visible choice set.[3] ToolChoiceConfusion makes a complementary point: semantic relevance alone is not enough, because many tools are related to a task while still being unnecessary or premature at a given step.[4]

An agent should not see every available capability merely because the platform can expose it.

Tool discovery should account for at least four dimensions:

  1. Semantic relevance.
  2. Authorization.
  3. Causal validity at the current workflow state.
  4. Risk.

Caption: Tool retrieval should narrow the visible action space using workflow state and policy, not just semantic similarity.

flowchart TD
    A[User goal] --> B[Candidate capabilities]
    B --> C{Semantic relevance}
    C -->|yes| D{Authorized?}
    C -->|no| X[Hide tool]
    D -->|no| X
    D -->|yes| E{Valid in current state?}
    E -->|no| X
    E -->|yes| F{Risk acceptable?}
    F -->|no| G[Require extra approval or gate]
    F -->|yes| H[Visible tool frontier]

This is more than ordinary tool retrieval. It is planning over a governed capability graph.

More atomic tools are not automatically more composable

Agent-platform teams often respond to unreliable tools by making them smaller. Narrower tools can be easier to describe, test, and authorize, but excessive atomization recreates the teller-call pattern that damaged earlier distributed systems.

An expense workflow is a good example:

create_expense
attach_receipt
assign_cost_center
submit_for_approval

If these four calls collectively represent one business state transition, exposing them separately still leaves the agent responsible for partial failure, duplication, and recovery. A better abstraction may be:

submit_expense_report

When several calls collectively represent one business state transition, composition should usually occur behind the capability boundary.

The agent should express intent. Deterministic software should own the transaction.

Reads and writes need different capability designs

Yegge's original critique focused on the read path: clients needed a declarative surface to describe what data they needed without navigating service boundaries. For reads, that answer remains correct. An agent performing a read should be able to express its information need through a declarative interface that handles projection, filtering, pagination, joins, and authorization behind the capability boundary.

For queries, the capability surface should support:

  • projection;
  • filtering;
  • pagination;
  • cost limits;
  • stable semantic entities;
  • field-level authorization; and
  • bounded result sizes.

For writes, the same degree of flexibility is unsafe. State-changing operations should be exposed as explicit domain commands with typed inputs, validation, authorization, idempotency, and transactional or compensating guarantees.

The write path should expose domain commands with clear contracts:

  • actor;
  • target;
  • parameters;
  • preconditions;
  • expected effects;
  • approval requirements;
  • idempotency semantics;
  • compensation behavior; and
  • verification criteria.

For reads, Yegge's answer points toward a declarative query surface: the client describes the information it needs, while the platform resolves the underlying service and storage boundaries.

For writes, the same degree of flexibility is unsafe. State-changing operations should be exposed as explicit domain commands with typed inputs, validation, authorization, idempotency and transactional or compensating guarantees.

This is why generic tool design for both reads and writes is structurally weak. Queries and commands should be intentionally asymmetric.

MCP solves connectivity, not service design

MCP is an important infrastructure improvement because it standardizes how hosts, clients, and servers exchange tools, resources, and prompts.[5] It reduces the integration-graph problem.

But it does not answer the harder service-design questions:

  • Is this tool too narrow?
  • Should several calls collapse into one domain command?
  • Is the result being joined on the wrong side of the boundary?
  • Is the operation safe at the current workflow state?
  • What compensation follows partial failure?
  • How is the business outcome verified?

A badly designed API exposed through MCP remains badly designed. Protocol success can even accelerate capability sprawl because it becomes easier to publish integrations faster than the organization can govern their semantics.

The emergence of the meta-harness

A second fragmentation problem exists above the tool layer. Organizations increasingly operate multiple coding-agent harnesses, SDK agents, and terminal runtimes at once. They differ in prompts, tools, context construction, recovery behavior, safety defaults, and workspace assumptions.

Databricks Omnigent is notable because it gives concrete shape to a meta-harness layer above those heterogeneous runtimes. Databricks introduced Omnigent on June 13, 2026 and describes it as a shared control surface for composition, policy, collaboration, sandboxing, and session sharing across different agents and harnesses.[6][7]

That matters because a meta-harness can centralize concerns that are hard to implement independently in every harness:

  • session lifecycle;
  • runtime selection;
  • workspace attachment;
  • sandbox provisioning;
  • credential mediation;
  • network controls;
  • collaboration;
  • event normalization;
  • runtime budgets; and
  • execution telemetry.

Caption: Durable agent platforms separate semantic capability design from runtime control.

flowchart TD
    A[User goal] --> B[Agent planner]
    B --> C[Semantic capability layer]
    C --> D[Typed query or domain command]
    D --> E[Policy and approval validation]
    E --> F[Meta-harness control plane]
    F --> G[Selected harness]
    G --> H[Sandboxed runner]
    H --> I[System of record]
    I --> J[Outcome verification]

This separates probabilistic interpretation from deterministic authority.

What the meta-harness solves, and what it does not

A meta-harness addresses operational fragmentation. It can ensure that coding agents run inside isolated environments, that network access passes through policy, that collaborators can inspect the same live session, and that high-risk mutations require approval before execution.

What it cannot automatically repair is the semantic shape of the capability surface itself.

It does not remove:

  • excessively granular tools;
  • ambiguous business operations;
  • hidden client-side joins;
  • missing transactions;
  • absent compensation semantics; or
  • semantically unclear commands.

Omnigent operates at the orchestration and control-plane layer. It can govern how agents are combined, executed and shared. It does not determine whether the underlying tools expose the correct domain boundaries.

A meta-harness can govern complexity operationally without eliminating it semantically.

The durable architecture therefore separates two layers:

Semantic capability layer — hides service and storage boundaries; exposes declarative reads; exposes intent-level commands; owns validation, transaction semantics and compensation.

Execution control plane (meta-harness) — governs agent selection; manages credentials and environments; applies policy and budgets; captures traces and evaluations; coordinates multiple specialized agents.

If twelve internal calls collectively represent one business operation, the durable improvement is to redesign the capability boundary, not only to supervise the twelve calls more carefully.

Policy has to exist at multiple boundaries

Prompt instructions are not an authoritative policy surface. Serious agent governance needs policy at multiple layers:

  1. Semantic policy: whether the requested business operation is allowed.
  2. Runtime policy: which files, commands, or network destinations the session may access.
  3. Transaction policy: retries, idempotency, compensation, and approval lifetime.
  4. Verification policy: how the system proves that the approved outcome actually occurred.

For high-risk actions, approval should be bound to the exact plan and expected effect, not merely granted to a broad tool or session. Credentials are capabilities, not just secrets. Hiding a token from the raw agent process helps, but the delegated authority behind that token still needs explicit validation against actor, operation, scope, parameters, approval, and expiry.

Concrete examples

Example 1: Incident response tool catalogs

Consider an instruction such as: create an incident for a payment failure, assign the correct team, notify the merchant-support channel, and link the relevant deployment.

If the capability surface exposes search_logs, find_payment, get_deployment, create_issue, assign_issue, search_team, post_slack_message, and link_external_resource, the agent must reconstruct workflow semantics from disconnected primitives. It has to infer ordering, ownership, authorization, and partial-failure handling.

A better design would expose a smaller set of intent-level capabilities such as:

  • prepare_payment_failure_incident
  • route_incident_to_owning_team
  • notify_merchant_support_about_incident
  • link_incident_to_deployment

Or, if the sequence truly represents one owned transition:

  • open_payment_failure_incident

The point is not to maximize breadth or narrowness. It is to align capability granularity with domain semantics and ownership.

Example 2: A bounded research implementation

A useful reference workflow is: implement an approved Jira issue in a Git repository and create a pull request.

The semantic layer can expose:

  • issue queries;
  • repository queries;
  • ProposeCodeChange;
  • ExecuteApprovedChange; and
  • CreatePullRequest.

The capability compiler can turn natural-language intent into a typed plan that binds issue, repository, allowed paths, test requirements, prohibited operations, approval policy, and expected outcome. The meta-harness can then select a runtime, provision an isolated worktree, inject only approved context, manage credentials, and capture normalized events. A durable workflow engine can own retries, review pauses, and outcome verification.

This is a better test than asking whether an agent can finish a demo once. The more important question is which guarantees remain stable when the model, harness, or execution environment changes.

Trade-offs and failure modes

This architecture is stronger, but it is not free.

  • Capability design takes domain effort and organizational judgment.
  • Approval and verification add cost and latency.
  • Teams can over-engineer control planes before they have stable workflows.
  • A meta-harness can create false confidence if its policies are operationally strong but the underlying domain abstractions remain weak.
  • Tool and runtime semantics will continue to evolve, especially in a fast-moving 2026 agent ecosystem.

The note should therefore be read as a systems-design frame for consequential, multi-step workflows, not as a prescription for every internal assistant or lightweight automation task.

Practical takeaways

  1. Treat tool design as service design for models, not as thin API wrapping.
  2. For reads, provide a declarative surface that hides service and storage boundaries and supports projection, filtering, and authorization.
  3. For writes, expose explicit, typed, intent-level domain commands that own validation, idempotency, and compensation.
  4. Use tool retrieval to expose a minimal valid frontier, not the full catalog.
  5. Build a meta-harness for runtime governance, but do not confuse it with semantic abstraction.
  6. The agent should express intent. Deterministic software should resolve queries and own transactions.

Positioning note

This is not an academic attempt to prove a general theory of service composition. It is an applied note for engineers building agent-facing capability layers and runtime governance systems. It differs from a blog opinion piece by grounding the claim in service-boundary design, contemporary tool-selection research, MCP architecture, and emerging control-plane patterns such as Omnigent. It also differs from vendor documentation because the concern here is architectural responsibility, not the feature set of any single platform.

Status and scope disclaimer

This is exploratory but evidence-based personal lab work. The argument is strongest for consequential, multi-step workflows where agents interact with real systems, approvals, and partial-failure risk. It is not authoritative guidance, and it should not be read as a universal rule for simple retrieval assistants, narrow internal automations, or teams that do not yet have the operational maturity to sustain capability governance and verification.

References

  1. Steve Yegge. "Services and Complexity." Yegge.ai. https://yegge.ai/listings/services-and-complexity
  2. Steve Yegge. "Stevey's Google Platforms Rant." 2011. Archived copy. https://gist.github.com/chitchcock/1281611
  3. Vyzantinos Repantis, Ameya Gawde, Harshvardhan Singh and Joey Blackwell. "How Many Tools Should an LLM Agent See? A Chance-Corrected Answer." arXiv, 2026. https://arxiv.org/abs/2605.24660
  4. Rahul Suresh Babu and Laxmipriya Ganesh Iyer. "ToolChoiceConfusion: Causal Minimal Tool Filtering for Reliable LLM Agents." arXiv, 2026. https://arxiv.org/abs/2606.06284
  5. Model Context Protocol. "Architecture." MCP Specification. https://modelcontextprotocol.io/specification/2025-06-18/architecture
  6. Matei Zaharia, Kasey Uhlenhuth and Corey Zumar. "Introducing Omnigent: A Meta-Harness to Combine, Control and Share Your Agents." Databricks, June 13, 2026. https://www.databricks.com/blog/introducing-omnigent-meta-harness-combine-control-and-share-your-agents
  7. Omnigent. "A Meta-Harness for All Your AI Agents." GitHub repository. https://github.com/omnigent-ai/omnigent
  8. Temporal. "Temporal Sandbox Orchestration Harness: The Missing Layer for Running Agents." 2026. https://temporal.io/blog/temporal-sandbox-orchestration-harness-the-missing-layer-for-running-agents
  9. Yoonho Lee, Roshen Nair, Qizheng Zhang, Kangwook Lee, Omar Khattab and Chelsea Finn. "Meta-Harness: End-to-End Optimization of Model Harnesses." arXiv, 2026. https://arxiv.org/abs/2603.28052
  10. Apollo GraphQL. "Connect AI Agents to Your GraphQL API Using MCP and Type-Safe Tool Configuration." https://www.apollographql.com/blog/connect-ai-agents-to-your-graphql-api-using-mcp-and-type-safe-tool-configuration
  11. Berkeley Function Calling Leaderboard. Gorilla LLM. https://gorilla.cs.berkeley.edu/blogs/8_berkeley_function_calling_leaderboard.html
  12. Temporal. "From Agent Zoo to Agent Orchestra: The Benefits of Temporal as Your Enterprise Agentic Control Plane." https://temporal.io/blog/from-agent-zoo-to-agent-orchestra-temporal-agentic-control-plane
  13. Steve Yegge. Post linking to his "Services and Complexity" critique. X (Twitter), 2012. https://x.com/Steve_Yegge/status/2065920483879719318