Constraints

Cost, performance, and resource constraints configuration

Constraints Object

The constraints object in spec.constraints defines operational limits for cost, performance, and resource usage. Constraints prevent runaway costs, ensure SLAs, and manage resource allocation.

Field Reference

Field Name	Type	Required	Description
`cost`	Cost	No	Cost and token usage limits
`performance`	Performance	No	Latency, concurrency, and timeout settings
`resources`	Resources	No	CPU, memory, and GPU allocation (Kubernetes format)

Cost Constraints

Cost constraints prevent excessive LLM API usage and spending.

Field Name	Type	Required	Description
`maxTokensPerDay`	integer	No	Maximum tokens allowed per day (input + output combined). Minimum: 0
`maxTokensPerRequest`	integer	No	Maximum tokens per individual request. Minimum: 0
`maxCostPerDay`	number	No	Maximum cost per day in specified currency. Minimum: 0
`currency`	string	No	ISO 4217 currency code (3 uppercase letters). Default: `USD`

Examples

# Moderate daily limits
constraints:
  cost:
    maxTokensPerDay: 1000000      # 1M tokens/day
    maxTokensPerRequest: 10000    # 10K tokens/request
    maxCostPerDay: 50.0          # $50/day
    currency: USD

# High-volume agent
constraints:
  cost:
    maxTokensPerDay: 50000000     # 50M tokens/day
    maxTokensPerRequest: 100000   # 100K tokens/request
    maxCostPerDay: 500.0
    currency: USD

# Experimental/testing
constraints:
  cost:
    maxTokensPerDay: 100000       # 100K tokens/day
    maxCostPerDay: 5.0
    currency: USD

# European deployment
constraints:
  cost:
    maxTokensPerDay: 5000000
    maxCostPerDay: 200.0
    currency: EUR

Cost Calculation

Cost tracking varies by provider:

OpenAI:

# GPT-4o: $2.50/1M input, $10.00/1M output
constraints:
  cost:
    maxTokensPerDay: 1000000
    maxCostPerDay: 10.0

Anthropic:

# Claude 3.5 Sonnet: $3.00/1M input, $15.00/1M output
constraints:
  cost:
    maxTokensPerDay: 1000000
    maxCostPerDay: 15.0

Note: Platforms typically estimate cost based on provider pricing. For precise cost control, use both token and cost limits.

Performance Constraints

Performance constraints ensure latency SLAs and prevent resource exhaustion.

Field Name	Type	Required	Description
`maxLatencySeconds`	number	No	Maximum acceptable latency per request in seconds. Minimum: 0
`maxConcurrentRequests`	integer	No	Maximum concurrent requests the agent can handle. Minimum: 1
`timeoutSeconds`	number	No	Request timeout in seconds. Minimum: 0

Examples

# Low-latency requirements (chatbot)
constraints:
  performance:
    maxLatencySeconds: 3
    maxConcurrentRequests: 100
    timeoutSeconds: 10

# Batch processing
constraints:
  performance:
    maxLatencySeconds: 60
    maxConcurrentRequests: 10
    timeoutSeconds: 300

# High-throughput
constraints:
  performance:
    maxLatencySeconds: 10
    maxConcurrentRequests: 1000
    timeoutSeconds: 30

# Background tasks
constraints:
  performance:
    maxLatencySeconds: 300
    maxConcurrentRequests: 5
    timeoutSeconds: 3600

Latency vs. Timeout

maxLatencySeconds: Expected/desired latency. Platform may log warnings if exceeded.
timeoutSeconds: Hard cutoff. Requests exceeding timeout are terminated.

constraints:
  performance:
    maxLatencySeconds: 5      # Warn if response takes > 5s
    timeoutSeconds: 30        # Kill request after 30s

Resource Constraints

Resource constraints define CPU, memory, and GPU allocation. Uses Kubernetes resource format.

Field Name	Type	Required	Description
`cpu`	string	No	CPU limit in Kubernetes format. Examples: `100m`, `1`, `2.5`
`memory`	string	No	Memory limit in Kubernetes format. Examples: `128Mi`, `1Gi`, `4Gi`
`gpu`	string	No	GPU requirement (format varies by platform)

CPU Format

Millicores (m):

resources:
  cpu: "100m"    # 0.1 CPU core
  cpu: "500m"    # 0.5 CPU core
  cpu: "1000m"   # 1 CPU core

Cores (integer or decimal):

resources:
  cpu: "1"       # 1 CPU core
  cpu: "2"       # 2 CPU cores
  cpu: "0.5"     # 0.5 CPU core
  cpu: "4.5"     # 4.5 CPU cores

Memory Format

Mebibytes (Mi):

resources:
  memory: "128Mi"   # 128 MiB
  memory: "512Mi"   # 512 MiB
  memory: "1024Mi"  # 1 GiB

Gibibytes (Gi):

resources:
  memory: "1Gi"     # 1 GiB
  memory: "2Gi"     # 2 GiB
  memory: "16Gi"    # 16 GiB

Other units:

resources:
  memory: "1000M"   # 1000 MB (decimal)
  memory: "1G"      # 1 GB (decimal)

GPU Requirement

GPU format is platform-specific:

# NVIDIA GPU count
resources:
  gpu: "1"

# Specific GPU type
resources:
  gpu: "nvidia-tesla-v100"

# GPU memory
resources:
  gpu: "16Gi"

Resource Examples

# Minimal (lightweight agent)
constraints:
  resources:
    cpu: "100m"
    memory: "128Mi"

# Small (chatbot)
constraints:
  resources:
    cpu: "500m"
    memory: "512Mi"

# Medium (code analysis)
constraints:
  resources:
    cpu: "1"
    memory: "2Gi"

# Large (data processing)
constraints:
  resources:
    cpu: "4"
    memory: "8Gi"

# GPU workload
constraints:
  resources:
    cpu: "8"
    memory: "32Gi"
    gpu: "1"

Complete Examples

Production Chatbot

apiVersion: ossa/v0.3.x
kind: Agent
metadata:
  name: customer-chatbot
  version: 1.0.0
spec:
  role: You are a customer support chatbot.

  llm:
    provider: anthropic
    model: claude-3-5-haiku-20241022
    temperature: 0.7
    maxTokens: 1024

  constraints:
    cost:
      maxTokensPerDay: 10000000      # 10M tokens/day
      maxTokensPerRequest: 2048      # 2K tokens/request
      maxCostPerDay: 100.0
      currency: USD
    performance:
      maxLatencySeconds: 3           # Fast responses
      maxConcurrentRequests: 500     # High throughput
      timeoutSeconds: 10
    resources:
      cpu: "2"
      memory: "4Gi"

Background Data Analyzer

apiVersion: ossa/v0.3.x
kind: Agent
metadata:
  name: data-analyzer
  version: 1.5.0
spec:
  role: You are a data analysis specialist.

  llm:
    provider: openai
    model: gpt-4o
    temperature: 0.2
    maxTokens: 8192

  constraints:
    cost:
      maxTokensPerDay: 50000000      # 50M tokens/day
      maxCostPerDay: 500.0
      currency: USD
    performance:
      maxLatencySeconds: 60          # Not time-critical
      maxConcurrentRequests: 10      # Batch processing
      timeoutSeconds: 600            # 10 minute timeout
    resources:
      cpu: "4"
      memory: "16Gi"

Development/Testing Agent

apiVersion: ossa/v0.3.x
kind: Agent
metadata:
  name: dev-assistant
  version: 0.1.0
spec:
  role: Development and testing assistant.

  llm:
    provider: anthropic
    model: claude-3-5-sonnet-20241022

  constraints:
    cost:
      maxTokensPerDay: 100000        # Limited for testing
      maxCostPerDay: 5.0
      currency: USD
    performance:
      maxLatencySeconds: 10
      maxConcurrentRequests: 5
      timeoutSeconds: 30
    resources:
      cpu: "500m"
      memory: "1Gi"

High-Performance Code Generator

apiVersion: ossa/v0.3.x
kind: Agent
metadata:
  name: code-generator
  version: 2.0.0
spec:
  role: You are an expert code generator.

  llm:
    provider: anthropic
    model: claude-3-5-sonnet-20241022
    temperature: 0.2
    maxTokens: 8192

  constraints:
    cost:
      maxTokensPerDay: 100000000     # 100M tokens/day
      maxTokensPerRequest: 16384     # Long responses
      maxCostPerDay: 1000.0
      currency: USD
    performance:
      maxLatencySeconds: 15
      maxConcurrentRequests: 100
      timeoutSeconds: 60
    resources:
      cpu: "8"
      memory: "32Gi"

Cost-Optimized Agent

apiVersion: ossa/v0.3.x
kind: Agent
metadata:
  name: budget-agent
  version: 1.0.0
spec:
  role: Cost-optimized assistant.

  llm:
    provider: anthropic
    model: claude-3-5-haiku-20241022  # Cheapest model
    temperature: 0.5
    maxTokens: 512                    # Short responses

  constraints:
    cost:
      maxTokensPerDay: 500000         # Tight limit
      maxTokensPerRequest: 1024
      maxCostPerDay: 5.0              # Low budget
      currency: USD
    performance:
      maxLatencySeconds: 5
      maxConcurrentRequests: 50
      timeoutSeconds: 15
    resources:
      cpu: "250m"                     # Minimal resources
      memory: "256Mi"

Multi-Currency Support

# United States
constraints:
  cost:
    maxCostPerDay: 100.0
    currency: USD

# European Union
constraints:
  cost:
    maxCostPerDay: 90.0
    currency: EUR

# United Kingdom
constraints:
  cost:
    maxCostPerDay: 80.0
    currency: GBP

# Canada
constraints:
  cost:
    maxCostPerDay: 135.0
    currency: CAD

Currency validation:

Must be 3 uppercase letters
Should be valid ISO 4217 code
Platform converts to provider's billing currency

Constraint Enforcement

How constraints are enforced varies by platform:

Cost Constraints

Soft limits: Log warnings, continue operation
Hard limits: Reject new requests when limit reached
Reset period: Typically 24 hours from first request

Performance Constraints

maxLatencySeconds: Monitoring/alerting threshold
maxConcurrentRequests: Queue or reject excess requests
timeoutSeconds: Hard cutoff, terminate request

Resource Constraints

Kubernetes: Enforced by kubelet
Docker: Enforced by container runtime
Serverless: Enforced by platform

Best Practices

Set realistic limits - Based on expected usage patterns
Monitor regularly - Track actual usage vs. constraints
Start conservative - Tighten constraints after observing usage
Account for spikes - Set limits above normal usage
Use both cost metrics - Tokens AND currency for better control
Match timeout to task - Long-running tasks need higher timeouts
Resource padding - Add 20-30% buffer to resource requests
Test limits - Validate constraints work as expected
Alert on limits - Monitor constraint violations
Document choices - Explain constraint rationale in comments

Common Constraint Patterns

High-Volume Low-Cost

constraints:
  cost:
    maxTokensPerDay: 100000000
    maxCostPerDay: 100.0
  performance:
    maxConcurrentRequests: 1000
  resources:
    cpu: "4"
    memory: "8Gi"

Low-Volume High-Quality

constraints:
  cost:
    maxTokensPerDay: 1000000
    maxCostPerDay: 200.0
  performance:
    maxLatencySeconds: 5
    maxConcurrentRequests: 10
  resources:
    cpu: "2"
    memory: "4Gi"

Batch Processing

constraints:
  cost:
    maxTokensPerDay: 50000000
    maxCostPerDay: 500.0
  performance:
    maxLatencySeconds: 300
    timeoutSeconds: 3600
    maxConcurrentRequests: 5
  resources:
    cpu: "8"
    memory: "32Gi"

Validation

All constraint fields are optional.

Numeric validation:

All numeric values must be >= 0
maxConcurrentRequests minimum: 1
currency pattern: ^[A-Z]{3}$

Resource format:

cpu: Kubernetes CPU format
memory: Kubernetes memory format
gpu: Platform-specific format

Agent Spec - Parent object containing constraints
LLM Configuration - Model configuration affecting cost
Observability - Monitoring constraint violations
Autonomy - Action limits complement constraints

Constraints

Constraints Object

Field Reference

Cost Constraints

Examples

Cost Calculation

Performance Constraints

Examples

Latency vs. Timeout

Resource Constraints

CPU Format

Memory Format

GPU Requirement

Resource Examples

Complete Examples

Production Chatbot

Background Data Analyzer

Development/Testing Agent

High-Performance Code Generator

Cost-Optimized Agent

Multi-Currency Support

Constraint Enforcement

Cost Constraints

Performance Constraints

Resource Constraints

Best Practices

Common Constraint Patterns

High-Volume Low-Cost

Low-Volume High-Quality

Batch Processing

Validation

Related Objects

See Also