Skip to content

Resource Allocation

Relevant Source Files

This document covers how miners in the NI Compute Subnet handle resource allocation requests from validators and external clients. Resource allocation involves provisioning Docker containers with specified compute resources (CPU, RAM, GPU, storage) and providing secure SSH access to allocated environments.

For information about the Resource Allocation API that external clients use to request resources, see Resource Allocation API. For details about container lifecycle management, see Container Management.

The miner’s resource allocation system is built around the Allocate synapse protocol. When a validator or client sends an allocation request, the miner processes it through several stages:

flowchart TD
    A["Allocate Synapse Request"] --> B["blacklist_allocate()"]
    B --> C{"Blacklisted?"}
    C -->|Yes| D["Reject Request"]
    C -->|No| E["priority_allocate()"]
    E --> F["allocate() Method"]
    F --> G{"Checking Mode?"}
    G -->|Yes| H["check_allocation()"]
    G -->|No| I{"Timeline > 0?"}
    I -->|Yes| J["register_allocation()"]
    I -->|No| K["deregister_allocation()"]
    H --> L["Return Status"]
    J --> M["run_container()"]
    K --> N["kill_container()"]
    M --> O["Update WandB State"]
    N --> O
    O --> L

Allocation Request Flow

Sources: neurons/miner.py:419-479 , neurons/miner.py:397-403

The allocate method in the Miner class handles three types of operations:

Operation TypeConditionAction
Check Allocationchecking=True, timeline>0Verify resource availability without allocating
Register Allocationchecking=False, timeline>0Create new resource allocation
Deregister Allocationchecking=False, timeline=0Remove existing allocation

When a miner receives a valid allocation request with timeline > 0, it initiates the resource registration process:

sequenceDiagram
    participant V as "Validator/Client"
    participant M as "Miner.allocate()"
    participant A as "register_allocation()"
    participant C as "Container Management"
    participant D as "Docker Engine"
    participant S as "Schedule Manager"
    
    V->>M: "Allocate(timeline=3600, device_requirement={...})"
    M->>A: "register_allocation(timeline, device_requirement, public_key)"
    A->>C: "kill_container() - cleanup existing"
    C->>D: "Remove existing containers"
    A->>C: "run_container(cpu_usage, ram_usage, gpu_usage, ...)"
    C->>D: "Create container with resource limits"
    D->>C: "Return container info + encrypted SSH details"
    C->>A: "Return allocation status + connection info"
    A->>S: "start(timeline) - schedule auto-deallocation"
    A->>M: "Return allocation result"
    M->>V: "Return Allocate synapse with status + SSH info"

Resource Registration Sequence

Sources: neurons/Miner/allocate.py:29-62 , neurons/miner.py:463-476

The registration process transforms device requirements into Docker container configurations:

graph LR
    A["Device Requirements"] --> B["Resource Parsing"]
    B --> C["Docker Configuration"]
    C --> D["Container Creation"]
    
    subgraph "Resource Parsing"
        B1["CPU Count → CPU Assignment"]
        B2["RAM Capacity → Memory Limit"]
        B3["Disk Capacity → Storage Limit"]
        B4["GPU Capacity → Device Requests"]
    end
    
    subgraph "Container Creation"
        D1["build_sample_container()"]
        D2["Generate SSH Credentials"]
        D3["Create Dockerfile"]
        D4["Run Container with Limits"]
        D5["Return Encrypted Connection Info"]
    end
    
    B --> B1
    B --> B2
    B --> B3
    B --> B4
    
    C --> D1
    C --> D2
    C --> D3
    C --> D4
    C --> D5

Resource Requirement Processing

Sources: neurons/Miner/allocate.py:34-51 , neurons/Miner/container.py:105-207

The container management system translates abstract resource requirements into concrete Docker container limits:

Resource TypeInput FormatDocker ConfigurationImplementation
CPU{"count": 2}cpuset_cpus="0-1" container.py:110-137
RAM{"capacity": 5368709120}mem_limit="5g" container.py:111-133
GPU{"capacity": "all"}device_requests=[DeviceRequest(...)] container.py:167-175
Storage{"capacity": 107374182400}Volume mount limits container.py:112-149

The system supports several container management operations beyond basic allocation:

stateDiagram-v2
    [*] --> Available: "No Container"
    Available --> Creating: "register_allocation()"
    Creating --> Running: "Container Created"
    Running --> Paused: "pause_container()"
    Paused --> Running: "unpause_container()"
    Running --> Running: "restart_container()"
    Running --> Available: "deregister_allocation()"
    Running --> Available: "Timeline Expired"
    
    note right of Running: "SSH Access Available\nResource Limits Applied"
    note right of Paused: "Container Suspended\nResources Released"

Container State Management

Sources: neurons/Miner/container.py:421-520 , neurons/miner.py:437-458

Resource allocation implements multiple security layers:

All allocation operations require RSA public key authentication:

flowchart LR
    A["Client Public Key"] --> B["Allocation Request"]
    B --> C["Container Creation"]
    C --> D["SSH Key Setup"]
    D --> E["Connection Info Encryption"]
    E --> F["Encrypted Response"]
    
    subgraph "Storage"
        G["allocation_key file"]
        H["Base64 Encoded Public Key"]
    end
    
    A --> G
    G --> H
    H --> I["Access Validation"]
    I --> J["Container Operations"]

Public Key Authentication Flow

Sources: neurons/Miner/container.py:188-200 , neurons/Miner/allocate.py:74-77

The system provides secure SSH access to allocated containers:

OperationFunctionSecurity Check
Key Exchangeexchange_key_container()Public key validation
Container Restartrestart_container()Allocation key verification
Container Pausepause_container()Authentication required

Sources: neurons/Miner/container.py:475-520 , neurons/Miner/container.py:384-419

The miner maintains allocation state through multiple mechanisms:

graph TD
    A["Allocation Request"] --> B["allocation_key File"]
    B --> C["Base64 Encoded Public Key"]
    C --> D["Access Validation"]
    
    E["Container Status"] --> F["Docker API Queries"]
    F --> G["check_container()"]
    
    H["WandB Integration"] --> I["update_allocated()"]
    I --> J["Distributed State Sync"]
    
    subgraph "State Validation"
        K["check_if_allocated()"]
        L["File Existence Check"]
        M["Key Comparison"]
        N["Container Running Check"]
    end
    
    D --> K
    G --> K
    K --> L
    K --> M
    K --> N

Allocation State Management

Sources: neurons/Miner/allocate.py:106-137 , neurons/miner.py:405-417

The system includes automatic resource cleanup through timeline-based scheduling:

sequenceDiagram
    participant A as "register_allocation()"
    participant S as "Schedule Manager"
    participant T as "Timer Thread"
    participant C as "Container Manager"
    
    A->>S: "start(timeline=3600)"
    S->>T: "Create timer for 3600 seconds"
    T-->>T: "Wait for timeline expiry"
    T->>C: "kill_container() after timeout"
    C->>C: "Remove container and cleanup"
    
    Note over T: "Timeline-based auto-cleanup\nPrevents resource leaks"

Automatic Deallocation Timeline

Sources: neurons/Miner/allocate.py:57 , neurons/Miner/schedule.py

The allocation system builds upon Docker containers with specific configurations:

The system uses a pre-built base image (ssh-image-base) for faster allocation:

graph LR
    A["build_sample_container()"] --> B["pytorch/pytorch:2.7.0-cuda12.6-cudnn9-runtime"]
    B --> C["Install SSH Server"]
    C --> D["Configure SSH Settings"]
    D --> E["Install Python Dependencies"]
    E --> F["ssh-image-base:latest"]
    
    subgraph "Runtime Container"
        G["Custom Dockerfile"]
        H["User SSH Keys"]
        I["Resource Limits"]
        J["ssh-image:latest"]
    end
    
    F --> G
    G --> H
    H --> I
    I --> J

Container Image Pipeline

Sources: neurons/Miner/container.py:280-368 , neurons/Miner/container.py:136-159

Each allocation creates a customized container with:

  • SSH Access: Root user with password and key-based authentication
  • GPU Support: NVIDIA GPU access through device requests
  • Resource Limits: CPU, memory, and storage constraints
  • Custom Environment: User-specified Docker commands and dependencies

Sources: neurons/Miner/container.py:170-181 , neurons/Miner/container.py:136-146