Runtime Specification

OSSA runtime specification defining agent execution, lifecycle, control signals, and memory management

OSSA Runtime Specification

The OSSA Runtime Specification defines how agents execute, communicate, and manage state. It provides a standardized execution model that ensures consistent behavior across different runtime implementations.

Overview

The runtime specification consists of several key components:

Component	Description	Documentation
Lifecycle	5-phase agent execution model	Lifecycle
Control Signals	Standard communication primitives	Control Signals
Execution Model	Timeouts, sandboxing, resource limits	This page
Memory Model	Short-term and long-term storage	Memory Model
Execution Profiles	fast/balanced/deep/safe profiles	Execution Profiles

Quick Start

Define runtime configuration in your agent manifest:

apiVersion: ossa/v0.4.9
kind: Agent

metadata:
  name: my-agent
  version: "1.0.0"

spec:
  role: worker

  runtime:
    lifecycle:
      max_iterations: 10
      phases:
        init:
          timeout_seconds: 30
        act:
          timeout_seconds: 300

    execution:
      sandbox:
        enabled: true
        type: container
      resource_limits:
        memory_mb: 512
        cpu_millicores: 1000

Execution Model

Timeout Configuration

Multiple timeout levels protect against runaway executions:

Timeout	Default	Range	Description
`default_seconds`	300	1-3600	Default operation timeout
`llm_call_seconds`	60	5-300	LLM API call timeout
`tool_call_seconds`	60	1-600	Tool invocation timeout
`delegation_seconds`	300	30-3600	Agent delegation timeout

Sandbox Configuration

Agents run in isolated sandboxes to prevent unauthorized access:

apiVersion: ossa/v0.4.9
kind: RuntimeSpec

execution:
  sandbox:
    enabled: true
    type: container  # options: container, vm, wasm, process, none
    filesystem:
      read_paths:
        - /app/data
      write_paths:
        - /app/output
      deny_paths:
        - /etc/passwd
        - /etc/shadow
        - ~/.ssh
        - ~/.aws
    capabilities: []  # Linux capabilities
    syscall_filter: default  # options: strict, default, permissive

Resource Limits

Control agent resource consumption:

resource_limits:
  memory_mb: 512
  cpu_millicores: 1000
  gpu_memory_mb: 0
  max_open_files: 256
  max_processes: 10
  max_network_connections: 50

Network Isolation

Isolation Modes

Mode	Description
`strict`	No network access except explicitly allowed
`namespace`	Isolated network namespace per agent
`mesh`	Service mesh with controlled inter-agent communication
`none`	No network isolation (development only)

Egress Rules

Control outbound traffic:

network_isolation:
  enabled: true
  mode: namespace
  egress:
    allowed_domains:
      - api.openai.com
      - api.anthropic.com
      - "*.googleapis.com"
    blocked_domains: []
    allowed_ports:
      - 443
      - 80
    rate_limit:
      requests_per_second: 100
      bytes_per_second: 10485760  # 10 MB/s

Ingress Rules

Control inbound traffic:

ingress:
  allowed_sources:
    - "10.0.0.0/8"
    - "agent-orchestrator"
  allowed_ports:
    - 8080
    - 8443
  require_mtls: true

Agent Mesh

Configuration for agent-to-agent communication:

agent_mesh:
  enabled: true
  discovery: kubernetes  # options: dns, consul, static
  encryption: mtls       # options: none, tls, wireguard
  allowed_agents:
    - security-scanner
    - code-reviewer
  denied_agents: []

Context Windows: Sliding Window + Long-Term Memory

Modern agents must operate under hard model context limits. OSSA requires agents to declare a context strategy that is deterministic enough to audit, but flexible enough to evolve.

Context Strategy Declaration

Every agent should declare how it manages its context window in agent.ossa.yaml:

spec:
  context:
    strategy: sliding_window        # How to manage the context window
    max_tokens: 32768               # Hard limit for the context window
    reserved_tokens:
      system_prompt: 4096           # Reserved for system instructions
      tools: 2048                   # Reserved for tool definitions
      output: 4096                  # Reserved for generation
    overflow: summarize             # What to do when window is full

    long_term_memory:
      provider: qdrant              # Vector store for persistent recall
      embedding_model: text-embedding-3-small
      dimensions: 1536
      retrieval:
        top_k: 10
        score_threshold: 0.7

Context Strategies

Strategy	Behavior	Best For
`sliding_window`	Remove oldest messages first (FIFO)	General-purpose agents
`summarize`	Compress old messages into summaries	Long conversations
`importance_based`	Keep high-importance messages longer	Complex multi-step tasks
`hybrid`	Sliding window + summarize on overflow	Production workloads

Token Budget Zones

The context window is divided into reserved zones:

┌──────────────────────────────────────────┐
│  System Prompt          (reserved: 4096) │
├──────────────────────────────────────────┤
│  Tool Definitions       (reserved: 2048) │
├──────────────────────────────────────────┤
│  Retrieved Memories     (dynamic)        │
├──────────────────────────────────────────┤
│  Conversation History   (sliding window) │
├──────────────────────────────────────────┤
│  Output Generation      (reserved: 4096) │
└──────────────────────────────────────────┘

The remaining tokens after reservations are split between retrieved long-term memories and the sliding conversation window. This ensures the agent never exceeds its context limit while maintaining access to both recent context and relevant historical knowledge.

Why Declare Context Strategy?

Auditability: Operators can predict and verify token usage patterns
Cost control: Reserved zones prevent runaway token consumption
Determinism: Same inputs produce predictable context window contents
Graceful degradation: Overflow strategies prevent hard failures when context fills up

Memory Model

Short-Term Memory

Volatile memory for immediate context (managed by the context strategy above):

Component	Default Limit	Description
Conversation Context	100 messages / 32K tokens	Recent conversation history
Working Memory	64 MB	Scratchpad for computations
Session State	1 hour TTL	Session-level state

Long-Term Memory

Persistent storage for knowledge and recall across sessions:

Supported Providers

Category	Providers
Relational	postgres, sqlite
Document	mongodb, dynamodb
Vector	qdrant, pinecone, weaviate, chromadb, milvus
Graph	neo4j
Hybrid	postgres with pgvector

Vector Store Configuration

spec:
  context:
    long_term_memory:
      provider: qdrant
      embedding_model: text-embedding-3-small
      dimensions: 1536
      distance_metric: cosine  # options: euclidean, dot_product
      index_type: hnsw         # options: flat, ivf, scann
      retrieval:
        top_k: 10
        score_threshold: 0.7

Memory Provider Interface

All memory providers MUST implement these operations:

Operation	Description	Required
`get`	Retrieve entry by key	Yes
`set`	Store entry	Yes
`delete`	Remove entry	Yes
`exists`	Check if key exists	Yes
`list`	List keys matching pattern	Yes
`clear`	Clear namespace	Yes
`search`	Semantic search (vector)	No
`batch_get`	Retrieve multiple entries	No
`batch_set`	Store multiple entries	No
`expire`	Set expiration	No
`increment`	Atomic increment	No

Memory Lifecycle

Garbage Collection

memory:
  lifecycle:
    gc:
      strategy: hybrid  # options: lru, lfu, ttl, size_based
      interval_seconds: 3600
      trigger_threshold: 0.8  # 80% memory usage

Backup

backup:
  interval_hours: 24
  retention_count: 7
  encryption: true

Complete Example

apiVersion: ossa/v0.4.9
kind: RuntimeSpec

lifecycle:
  phases:
    init:
      timeout_seconds: 30
    plan:
      timeout_seconds: 60
    act:
      timeout_seconds: 300
    reflect:
      timeout_seconds: 30
    terminate:
      timeout_seconds: 15
  max_iterations: 10

control_signals:
  tool_call:
    async: true
    timeout_seconds: 60
  delegation:
    async: true
    timeout_seconds: 300

execution:
  timeout:
    default_seconds: 300
    llm_call_seconds: 60
  sandbox:
    enabled: true
    type: container
  resource_limits:
    memory_mb: 512
    cpu_millicores: 1000

network_isolation:
  enabled: true
  mode: namespace
  egress:
    allowed_domains:
      - api.openai.com
      - api.anthropic.com
    allowed_ports:
      - 443
  agent_mesh:
    enabled: true
    encryption: mtls

Lifecycle Phases - Detailed lifecycle documentation
Control Signals - Signal types and usage
Memory Model - Short-term and long-term memory management
Execution Profiles - fast, balanced, deep, and safe profiles
Schema Reference - Complete schema documentation
Execution Flow - How requests flow through agents

Specification Version: v0.4.5 Last Updated: 2026-02