Adding World Shims
Dynamic Discovery & Dashboard¶
The system includes 20 default shims, categorized for full-spectrum enterprise simulation:
Core Infrastructure¶
- Git: VFS-aware repository and version control simulation.
- API: Generic REST/gRPC endpoint mocking with state.
- Database: SQL table simulation with persistent row storage.
- Terminal: SSH and shell execution environment (sandboxed).
- Cloud: Cloud infrastructure simulator (AWS/GCP/Azure resources).
Communication & Collaboration¶
- Slack: Corporate messaging, channels, and bot interaction mock.
- Email: SMTP/IMAP simulator for transactional and user mail.
- Jira: Ticketing, agile boards, and project management simulation.
- Social: Social media posting and feed simulation (X/LinkedIn).
- Support: Desk/Tickets simulator (Zendesk/Intercom).
Business & Fintech¶
- CRM: Customer/Lead management simulation (Salesforce-like).
- ERP: Enterprise resource planning and supply chain mock.
- Stripe: Payment processing, subscriptions, and invoice simulation.
- Calendar: Meeting scheduling and availability management.
AI & Modern Web¶
- Browser: Headless browser automation and DOM interaction simulator.
- Vector: Vector database simulation for RAG (Pinecone/Milvus).
- KB: Knowledge base search and retrieval (Confluence/Notion).
- CI/CD: Build pipeline and deployment status simulator.
- IoT: Smart device and sensor interaction mock.
- Security: Identity & Access Management (Auth0/Okta/IAM).
These are automatically reflected in your Dashboard under "World Shims."
Shims vs. VFS State Parity¶
A common question is: How do shims allow for VFS state parity checks?
- The Sandbox State (VFS): The tool_sandbox.py maintains a "Virtual File System" (the
.statedict). - Execution Hook: When an agent calls
database_query, theToolSandboxroutes it to the DatabaseSimulator. - State Reflection: The shim executes the action and can optionally return
state_changesthat are applied to the VFS. - Parity Check: At the end of a run, the
Evaluatorcompares the current VFS state against the "Ground Truth" defined in the scenario. This is how we ensure "State Parity."
Configuration & Governance¶
The system uses a Two-Key Activation Model for World Shims to ensure security and ease of management:
1. Global System Override¶
In .env (or via environment variables), you can set GLOBAL_ENABLED_SHIMS:
- Default: git,api,database,terminal,cloud,slack,email,jira,social,support,crm,erp,stripe,calendar,browser,vector,kb,cicd,iot,security
- Restricted: git,api (Only these two can ever be used).
- This allows you to disable heavy or risky shims system-wide without updating a single scenario.
- Wildcard: Use * to enable all registered shims (including those from plugins).
2. Per-Scenario Configuration¶
Each scenario can further restrict its environment using the enabled_shims property:
- This only has an effect if the shim is also allowed by the Global Override.
- If missing, it defaults to * (respecting the Global limit).
Adding a New Shim (Zero-Touch Plugin Method)¶
The preferred way to add a shim is via a Plugin. This keeps the core harness immutable.
1. Define the Simulator¶
Create your simulator class anywhere (e.g., in your plugin package). It must implement execute(action, params).
class MyCustomSimulator:
def execute(self, action: str, params: dict) -> dict:
return {"status": "success", "result": "Action executed!"}
2. Register via Plugin Hook¶
In your BaseEvalPlugin subclass, override the on_register_simulators hook:
from eval_runner.plugins import BaseEvalPlugin
class MyEcoPlugin(BaseEvalPlugin):
def on_register_simulators(self, registry: dict):
# Register your shim instance in the provided registry dict
registry["custom"] = MyCustomSimulator()
3. Verify on Dashboard¶
Restart the console or run multiagent-eval console. Your shim will be automatically discovered and reflected in the "World Shims" count.
Using the Shim in Scenarios¶
You can now reference this shim in your scenario's tasks: