Resource Allocation
This document covers how miners in the NI Compute Subnet handle resource allocation requests from validators and external clients. Resource allocation involves provisioning Docker containers with specified compute resources (CPU, RAM, GPU, storage) and providing secure SSH access to allocated environments.
For information about the Resource Allocation API that external clients use to request resources, see Resource Allocation API. For details about container lifecycle management, see Container Management.
Allocation Request Processing
Section titled “Allocation Request Processing”The miner’s resource allocation system is built around the Allocate
synapse protocol. When a validator or client sends an allocation request, the miner processes it through several stages:
flowchart TD A["Allocate Synapse Request"] --> B["blacklist_allocate()"] B --> C{"Blacklisted?"} C -->|Yes| D["Reject Request"] C -->|No| E["priority_allocate()"] E --> F["allocate() Method"] F --> G{"Checking Mode?"} G -->|Yes| H["check_allocation()"] G -->|No| I{"Timeline > 0?"} I -->|Yes| J["register_allocation()"] I -->|No| K["deregister_allocation()"] H --> L["Return Status"] J --> M["run_container()"] K --> N["kill_container()"] M --> O["Update WandB State"] N --> O O --> L
Allocation Request Flow
Sources: neurons/miner.py:419-479 , neurons/miner.py:397-403
The allocate
method in the Miner
class handles three types of operations:
Operation Type | Condition | Action |
---|---|---|
Check Allocation | checking=True, timeline>0 | Verify resource availability without allocating |
Register Allocation | checking=False, timeline>0 | Create new resource allocation |
Deregister Allocation | checking=False, timeline=0 | Remove existing allocation |
Resource Registration Process
Section titled “Resource Registration Process”When a miner receives a valid allocation request with timeline > 0
, it initiates the resource registration process:
sequenceDiagram participant V as "Validator/Client" participant M as "Miner.allocate()" participant A as "register_allocation()" participant C as "Container Management" participant D as "Docker Engine" participant S as "Schedule Manager" V->>M: "Allocate(timeline=3600, device_requirement={...})" M->>A: "register_allocation(timeline, device_requirement, public_key)" A->>C: "kill_container() - cleanup existing" C->>D: "Remove existing containers" A->>C: "run_container(cpu_usage, ram_usage, gpu_usage, ...)" C->>D: "Create container with resource limits" D->>C: "Return container info + encrypted SSH details" C->>A: "Return allocation status + connection info" A->>S: "start(timeline) - schedule auto-deallocation" A->>M: "Return allocation result" M->>V: "Return Allocate synapse with status + SSH info"
Resource Registration Sequence
Sources: neurons/Miner/allocate.py:29-62 , neurons/miner.py:463-476
The registration process transforms device requirements into Docker container configurations:
graph LR A["Device Requirements"] --> B["Resource Parsing"] B --> C["Docker Configuration"] C --> D["Container Creation"] subgraph "Resource Parsing" B1["CPU Count → CPU Assignment"] B2["RAM Capacity → Memory Limit"] B3["Disk Capacity → Storage Limit"] B4["GPU Capacity → Device Requests"] end subgraph "Container Creation" D1["build_sample_container()"] D2["Generate SSH Credentials"] D3["Create Dockerfile"] D4["Run Container with Limits"] D5["Return Encrypted Connection Info"] end B --> B1 B --> B2 B --> B3 B --> B4 C --> D1 C --> D2 C --> D3 C --> D4 C --> D5
Resource Requirement Processing
Sources: neurons/Miner/allocate.py:34-51 , neurons/Miner/container.py:105-207
Container Resource Management
Section titled “Container Resource Management”The container management system translates abstract resource requirements into concrete Docker container limits:
Resource Limit Translation
Section titled “Resource Limit Translation”Resource Type | Input Format | Docker Configuration | Implementation |
---|---|---|---|
CPU | {"count": 2} | cpuset_cpus="0-1" | container.py:110-137 |
RAM | {"capacity": 5368709120} | mem_limit="5g" | container.py:111-133 |
GPU | {"capacity": "all"} | device_requests=[DeviceRequest(...)] | container.py:167-175 |
Storage | {"capacity": 107374182400} | Volume mount limits | container.py:112-149 |
Container Lifecycle Operations
Section titled “Container Lifecycle Operations”The system supports several container management operations beyond basic allocation:
stateDiagram-v2 [*] --> Available: "No Container" Available --> Creating: "register_allocation()" Creating --> Running: "Container Created" Running --> Paused: "pause_container()" Paused --> Running: "unpause_container()" Running --> Running: "restart_container()" Running --> Available: "deregister_allocation()" Running --> Available: "Timeline Expired" note right of Running: "SSH Access Available\nResource Limits Applied" note right of Paused: "Container Suspended\nResources Released"
Container State Management
Sources: neurons/Miner/container.py:421-520 , neurons/miner.py:437-458
Security and Access Control
Section titled “Security and Access Control”Resource allocation implements multiple security layers:
Public Key Authentication
Section titled “Public Key Authentication”All allocation operations require RSA public key authentication:
flowchart LR A["Client Public Key"] --> B["Allocation Request"] B --> C["Container Creation"] C --> D["SSH Key Setup"] D --> E["Connection Info Encryption"] E --> F["Encrypted Response"] subgraph "Storage" G["allocation_key file"] H["Base64 Encoded Public Key"] end A --> G G --> H H --> I["Access Validation"] I --> J["Container Operations"]
Public Key Authentication Flow
Sources: neurons/Miner/container.py:188-200 , neurons/Miner/allocate.py:74-77
SSH Access Management
Section titled “SSH Access Management”The system provides secure SSH access to allocated containers:
Operation | Function | Security Check |
---|---|---|
Key Exchange | exchange_key_container() | Public key validation |
Container Restart | restart_container() | Allocation key verification |
Container Pause | pause_container() | Authentication required |
Sources: neurons/Miner/container.py:475-520 , neurons/Miner/container.py:384-419
Allocation State Management
Section titled “Allocation State Management”The miner maintains allocation state through multiple mechanisms:
Local State Storage
Section titled “Local State Storage”graph TD A["Allocation Request"] --> B["allocation_key File"] B --> C["Base64 Encoded Public Key"] C --> D["Access Validation"] E["Container Status"] --> F["Docker API Queries"] F --> G["check_container()"] H["WandB Integration"] --> I["update_allocated()"] I --> J["Distributed State Sync"] subgraph "State Validation" K["check_if_allocated()"] L["File Existence Check"] M["Key Comparison"] N["Container Running Check"] end D --> K G --> K K --> L K --> M K --> N
Allocation State Management
Sources: neurons/Miner/allocate.py:106-137 , neurons/miner.py:405-417
Automatic Deallocation
Section titled “Automatic Deallocation”The system includes automatic resource cleanup through timeline-based scheduling:
sequenceDiagram participant A as "register_allocation()" participant S as "Schedule Manager" participant T as "Timer Thread" participant C as "Container Manager" A->>S: "start(timeline=3600)" S->>T: "Create timer for 3600 seconds" T-->>T: "Wait for timeline expiry" T->>C: "kill_container() after timeout" C->>C: "Remove container and cleanup" Note over T: "Timeline-based auto-cleanup\nPrevents resource leaks"
Automatic Deallocation Timeline
Sources: neurons/Miner/allocate.py:57 , neurons/Miner/schedule.py
Docker Integration
Section titled “Docker Integration”The allocation system builds upon Docker containers with specific configurations:
Base Container Setup
Section titled “Base Container Setup”The system uses a pre-built base image (ssh-image-base
) for faster allocation:
graph LR A["build_sample_container()"] --> B["pytorch/pytorch:2.7.0-cuda12.6-cudnn9-runtime"] B --> C["Install SSH Server"] C --> D["Configure SSH Settings"] D --> E["Install Python Dependencies"] E --> F["ssh-image-base:latest"] subgraph "Runtime Container" G["Custom Dockerfile"] H["User SSH Keys"] I["Resource Limits"] J["ssh-image:latest"] end F --> G G --> H H --> I I --> J
Container Image Pipeline
Sources: neurons/Miner/container.py:280-368 , neurons/Miner/container.py:136-159
Container Configuration
Section titled “Container Configuration”Each allocation creates a customized container with:
- SSH Access: Root user with password and key-based authentication
- GPU Support: NVIDIA GPU access through device requests
- Resource Limits: CPU, memory, and storage constraints
- Custom Environment: User-specified Docker commands and dependencies
Sources: neurons/Miner/container.py:170-181 , neurons/Miner/container.py:136-146