Architecture
Purpose and Scope
Section titled “Purpose and Scope”This document describes the high-level system architecture of the NI Compute Subnet, a decentralized GPU compute marketplace built on the Bittensor network. It covers the core system components, their interactions, communication protocols, and data flow patterns that enable validators to evaluate miner capabilities and allocate GPU resources to clients.
For detailed protocol specifications, see Communication Protocols. For database schema and operations, see Database Operations. For installation and deployment procedures, see Installation and Setup.
System Overview
Section titled “System Overview”The NI Compute Subnet implements a three-tier architecture consisting of validators that assess miner performance, miners that provide GPU resources, and a resource allocation API that manages client requests. The system operates on Bittensor’s peer-to-peer network while providing traditional REST API access for external clients.
graph TB subgraph "External Clients" WEB["Web Applications"] CLI["CLI Tools"] API_CLIENTS["API Clients"] end subgraph "NI Compute Subnet Core" subgraph "Validator Layer" VALIDATOR["Validator<br/>neurons/validator.py"] POG["ProofOfGPU<br/>Benchmarking Engine"] SCORING["Scoring System<br/>calc_score_pog()"] end subgraph "Resource Allocation Layer" REGISTER_API["RegisterAPI<br/>neurons/register_api.py"] ALLOCATION_LOGIC["Resource Management<br/>_allocate_container()"] HEALTH_CHECK["Health Monitoring<br/>_check_allocation()"] end subgraph "Miner Layer" MINER["Miner<br/>neurons/miner.py"] CONTAINER_MGR["Container Management<br/>neurons/Miner/container.py"] ALLOCATION_HANDLER["Allocation Handler<br/>register_allocation()"] end end subgraph "Bittensor Network" SUBTENSOR["ComputeSubnetSubtensor<br/>Blockchain Interface"] METAGRAPH["Metagraph<br/>Network State"] AXON_DENDRITE["Axon/Dendrite<br/>P2P Communication"] end subgraph "Data Layer" COMPUTE_DB[("ComputeDb<br/>SQLite Database")] WANDB_STATE[("WandB<br/>Distributed State")] CONFIG_FILES[("Configuration<br/>config.yaml")] end %% External client interactions WEB --> REGISTER_API CLI --> REGISTER_API API_CLIENTS --> REGISTER_API %% Core system interactions VALIDATOR --> MINER VALIDATOR --> SUBTENSOR VALIDATOR --> POG POG --> SCORING REGISTER_API --> ALLOCATION_LOGIC ALLOCATION_LOGIC --> MINER REGISTER_API --> HEALTH_CHECK MINER --> CONTAINER_MGR MINER --> ALLOCATION_HANDLER MINER --> AXON_DENDRITE %% Bittensor network interactions VALIDATOR --> AXON_DENDRITE MINER --> AXON_DENDRITE AXON_DENDRITE --> SUBTENSOR SUBTENSOR --> METAGRAPH %% Data layer interactions VALIDATOR --> COMPUTE_DB VALIDATOR --> WANDB_STATE REGISTER_API --> COMPUTE_DB MINER --> WANDB_STATE VALIDATOR --> CONFIG_FILES
Sources: neurons/validator.py:70-89 , neurons/miner.py:79-94 , neurons/register_api.py:229-303 , compute/axon.py , compute/protocol.py
Core Components
Section titled “Core Components”Validator System
Section titled “Validator System”The Validator
class implements the core validation logic that maintains network integrity by evaluating miner performance and setting network weights.
graph TB subgraph "Validator Core" VALIDATOR_MAIN["Validator.__init__()<br/>neurons/validator.py:130"] CONFIG_INIT["init_config()<br/>neurons/validator.py:211"] SCORE_SYNC["sync_scores()<br/>neurons/validator.py:312"] end subgraph "Proof of GPU System" POG_MAIN["proof_of_gpu()<br/>neurons/validator.py:663"] TEST_MINER["test_miner_gpu()<br/>neurons/validator.py:799"] GPU_BENCHMARKS["GPU Benchmarking<br/>Merkle Proof Verification"] SCRIPT_EXECUTION["execute_script_on_miner()<br/>neurons/Validator/pog.py"] end subgraph "Scoring Engine" CALC_SCORE["calc_score_pog()<br/>neurons/Validator/calculate_pow_score.py"] RELIABILITY_SCORE["Reliability Scoring<br/>Challenge Success Rate"] WEIGHT_SETTING["Network Weight Updates<br/>Bittensor Integration"] end subgraph "Data Management" COMPUTE_DB_OPS["ComputeDb Operations<br/>SQLite Transactions"] WANDB_INTEGRATION["ComputeWandb<br/>Distributed Metrics"] MINER_STATS["Miner Statistics<br/>retrieve_stats()"] end VALIDATOR_MAIN --> CONFIG_INIT VALIDATOR_MAIN --> POG_MAIN VALIDATOR_MAIN --> SCORE_SYNC POG_MAIN --> TEST_MINER TEST_MINER --> GPU_BENCHMARKS TEST_MINER --> SCRIPT_EXECUTION SCORE_SYNC --> CALC_SCORE CALC_SCORE --> RELIABILITY_SCORE RELIABILITY_SCORE --> WEIGHT_SETTING VALIDATOR_MAIN --> COMPUTE_DB_OPS VALIDATOR_MAIN --> WANDB_INTEGRATION SCORE_SYNC --> MINER_STATS
The validator operates on a continuous cycle, performing hardware verification every 360 blocks and updating scores based on GPU performance benchmarks and challenge response reliability.
Sources: neurons/validator.py:70-200 , neurons/Validator/pog.py , neurons/Validator/calculate_pow_score.py , neurons/Validator/database/
Miner System
Section titled “Miner System”The Miner
class provides GPU compute resources to the network and handles allocation requests from validators and clients.
graph TB subgraph "Miner Core" MINER_MAIN["Miner.__init__()<br/>neurons/miner.py:117"] AXON_INIT["init_axon()<br/>neurons/miner.py:222"] SYNC_STATUS["sync_status()<br/>neurons/miner.py:304"] end subgraph "Request Handlers" ALLOCATE_HANDLER["allocate()<br/>neurons/miner.py:419"] CHALLENGE_HANDLER["challenge()<br/>neurons/miner.py:491"] BLACKLIST_LOGIC["base_blacklist()<br/>neurons/miner.py:330"] end subgraph "Container Management" REGISTER_ALLOC["register_allocation()<br/>neurons/Miner/allocate.py"] CONTAINER_OPS["Container Operations<br/>neurons/Miner/container.py"] DOCKER_LIFECYCLE["Docker Lifecycle<br/>build_sample_container()"] end subgraph "Resource Monitoring" ALLOCATION_STATUS["Allocation Status Tracking<br/>self.allocation_status"] WANDB_UPDATES["WandB State Updates<br/>update_allocated()"] HEALTH_CHECKS["Health Check Responses<br/>check_allocation()"] end MINER_MAIN --> AXON_INIT MINER_MAIN --> SYNC_STATUS AXON_INIT --> ALLOCATE_HANDLER AXON_INIT --> CHALLENGE_HANDLER ALLOCATE_HANDLER --> BLACKLIST_LOGIC ALLOCATE_HANDLER --> REGISTER_ALLOC REGISTER_ALLOC --> CONTAINER_OPS CONTAINER_OPS --> DOCKER_LIFECYCLE ALLOCATE_HANDLER --> ALLOCATION_STATUS ALLOCATION_STATUS --> WANDB_UPDATES ALLOCATE_HANDLER --> HEALTH_CHECKS
The miner continuously monitors for allocation opportunities while maintaining containerized environments for client workloads.
Sources: neurons/miner.py:79-200 , neurons/Miner/allocate.py , neurons/Miner/container.py , compute/wandb/wandb.py
Resource Allocation API
Section titled “Resource Allocation API”The RegisterAPI
class exposes REST endpoints for external clients to allocate and manage GPU resources.
graph TB subgraph "API Layer" REGISTER_API["RegisterAPI.__init__()<br/>neurons/register_api.py:230"] FASTAPI_APP["FastAPI Application<br/>self.app"] ROUTE_SETUP["_setup_routes()<br/>neurons/register_api.py:344"] end subgraph "Allocation Endpoints" ALLOCATE_SPEC["allocate_spec()<br/>/service/allocate_spec"] ALLOCATE_HOTKEY["allocate_hotkey()<br/>/service/allocate_hotkey"] DEALLOCATE["deallocate()<br/>/service/deallocate"] CHECK_STATUS["check_miner_status()<br/>/service/check_miner_status"] end subgraph "Resource Management" ALLOCATE_CONTAINER["_allocate_container()<br/>Resource Discovery"] ALLOCATION_DB["update_allocation_db()<br/>State Persistence"] HEALTH_MONITORING["_check_allocation()<br/>Timeout Management"] end subgraph "External Integrations" DENDRITE_CLIENT["bt.dendrite<br/>Miner Communication"] WANDB_SYNC["_update_allocation_wandb()<br/>Distributed State"] WEBHOOK_NOTIFY["_notify_allocation_status()<br/>External Callbacks"] end REGISTER_API --> FASTAPI_APP REGISTER_API --> ROUTE_SETUP ROUTE_SETUP --> ALLOCATE_SPEC ROUTE_SETUP --> ALLOCATE_HOTKEY ROUTE_SETUP --> DEALLOCATE ROUTE_SETUP --> CHECK_STATUS ALLOCATE_SPEC --> ALLOCATE_CONTAINER ALLOCATE_HOTKEY --> ALLOCATE_CONTAINER ALLOCATE_CONTAINER --> ALLOCATION_DB REGISTER_API --> HEALTH_MONITORING HEALTH_MONITORING --> ALLOCATION_DB ALLOCATE_CONTAINER --> DENDRITE_CLIENT ALLOCATION_DB --> WANDB_SYNC DEALLOCATE --> WEBHOOK_NOTIFY
The API maintains allocation state in both local SQLite database and distributed WandB storage for cross-validator synchronization.
Sources: neurons/register_api.py:229-350 , neurons/register_api.py:407-850 , neurons/Validator/database/allocate.py
Communication Architecture
Section titled “Communication Architecture”The system implements a hybrid communication model combining Bittensor’s peer-to-peer protocols with traditional REST APIs.
graph TB subgraph "Protocol Layer" SPECS_PROTOCOL["Specs Protocol<br/>compute/protocol.py"] ALLOCATE_PROTOCOL["Allocate Protocol<br/>compute/protocol.py"] CHALLENGE_PROTOCOL["Challenge Protocol<br/>compute/protocol.py"] end subgraph "Bittensor Network Layer" COMPUTE_AXON["ComputeSubnetAxon<br/>compute/axon.py"] COMPUTE_SUBTENSOR["ComputeSubnetSubtensor<br/>compute/axon.py"] DENDRITE_CLIENT["bt.dendrite<br/>RPC Client"] end subgraph "REST API Layer" FASTAPI_ROUTES["FastAPI Routes<br/>HTTP/HTTPS"] WEBSOCKET_CONN["WebSocket Connection<br/>/connect"] API_MIDDLEWARE["IPWhitelistMiddleware<br/>Security Layer"] end subgraph "Communication Flows" V_TO_M["Validator → Miner<br/>Specs/Challenge Queries"] API_TO_M["RegisterAPI → Miner<br/>Allocation Requests"] CLIENT_TO_API["External Client → API<br/>Resource Requests"] end SPECS_PROTOCOL --> COMPUTE_AXON ALLOCATE_PROTOCOL --> COMPUTE_AXON CHALLENGE_PROTOCOL --> COMPUTE_AXON COMPUTE_AXON --> COMPUTE_SUBTENSOR COMPUTE_AXON --> DENDRITE_CLIENT FASTAPI_ROUTES --> API_MIDDLEWARE WEBSOCKET_CONN --> FASTAPI_ROUTES V_TO_M --> SPECS_PROTOCOL V_TO_M --> CHALLENGE_PROTOCOL API_TO_M --> ALLOCATE_PROTOCOL API_TO_M --> DENDRITE_CLIENT CLIENT_TO_API --> FASTAPI_ROUTES
Protocol Message Flow:
- Specs Query: Validator requests hardware specifications from miners
- Allocation Request: RegisterAPI or Validator requests resource allocation
- Challenge Response: Validator sends proof-of-work challenges to miners
- Health Check: Periodic status verification of allocated resources
Sources: compute/protocol.py , compute/axon.py , neurons/register_api.py:344-406 , neurons/validator.py:594-662
Data Architecture
Section titled “Data Architecture”The system uses a multi-tier data storage approach combining local SQLite databases with distributed state management.
graph TB subgraph "Local Data Storage" COMPUTE_DB[("ComputeDb<br/>SQLite Database")] MINER_TABLE[("miner table<br/>uid, ss58_address")] POG_STATS[("pog_stats table<br/>GPU performance data")] ALLOCATION_TABLE[("allocation table<br/>active reservations")] CHALLENGE_DETAILS[("challenge_details table<br/>success metrics")] end subgraph "Distributed State" WANDB_RUNS[("WandB Validator Runs<br/>Aggregated metrics")] WANDB_MINERS[("WandB Miner Runs<br/>Hardware specifications")] WANDB_ALLOCATED[("Allocated Hotkeys<br/>Resource status")] WANDB_PENALIZED[("Penalized Hotkeys<br/>Blacklist data")] end subgraph "Configuration Data" CONFIG_YAML[("config.yaml<br/>GPU performance benchmarks")] ENV_CONFIG[("Environment Variables<br/>API keys, endpoints")] PROTOCOL_SCHEMAS[("Protocol Definitions<br/>Message validation")] end subgraph "Data Access Layer" DB_OPERATIONS["Database Operations<br/>compute/utils/db.py"] WANDB_CLIENT["ComputeWandb<br/>compute/wandb/wandb.py"] CONFIG_LOADER["Configuration Loader<br/>load_yaml_config()"] end COMPUTE_DB --> MINER_TABLE COMPUTE_DB --> POG_STATS COMPUTE_DB --> ALLOCATION_TABLE COMPUTE_DB --> CHALLENGE_DETAILS WANDB_RUNS --> WANDB_MINERS WANDB_RUNS --> WANDB_ALLOCATED WANDB_RUNS --> WANDB_PENALIZED CONFIG_YAML --> ENV_CONFIG CONFIG_YAML --> PROTOCOL_SCHEMAS DB_OPERATIONS --> COMPUTE_DB WANDB_CLIENT --> WANDB_RUNS CONFIG_LOADER --> CONFIG_YAML
Data Synchronization Patterns:
- Local database stores operational state and query results
- WandB provides cross-validator state synchronization
- Configuration files define GPU performance baselines and system parameters
Sources: compute/utils/db.py , compute/wandb/wandb.py , neurons/Validator/database/ , config.yaml , neurons/validator.py:178-181
Deployment Architecture
Section titled “Deployment Architecture”The system supports distributed deployment across multiple validator and miner nodes with centralized API services.
graph TB subgraph "Validator Nodes" V1["Validator Instance 1<br/>neurons/validator.py"] V2["Validator Instance 2<br/>neurons/validator.py"] VN["Validator Instance N<br/>neurons/validator.py"] end subgraph "Miner Nodes" M1["Miner Instance 1<br/>neurons/miner.py + Docker"] M2["Miner Instance 2<br/>neurons/miner.py + Docker"] MN["Miner Instance N<br/>neurons/miner.py + Docker"] end subgraph "API Services" API1["RegisterAPI Instance 1<br/>neurons/register_api.py"] API2["RegisterAPI Instance 2<br/>neurons/register_api.py"] LB["Load Balancer<br/>Optional"] end subgraph "Infrastructure Services" BT_NETWORK["Bittensor Network<br/>Subtensor/Metagraph"] WANDB_SERVICE["WandB Service<br/>Distributed State"] MONITORING["System Monitoring<br/>PM2/Prometheus"] end subgraph "Network Configuration" FIREWALL["UFW Firewall<br/>Ports 4444, 8091"] SSH_ACCESS["SSH Access<br/>Container Management"] DOCKER_RUNTIME["Docker + NVIDIA Runtime<br/>GPU Containers"] end V1 --> BT_NETWORK V2 --> BT_NETWORK VN --> BT_NETWORK M1 --> BT_NETWORK M2 --> BT_NETWORK MN --> BT_NETWORK API1 --> BT_NETWORK API2 --> BT_NETWORK LB --> API1 LB --> API2 V1 --> WANDB_SERVICE V2 --> WANDB_SERVICE API1 --> WANDB_SERVICE M1 --> DOCKER_RUNTIME M2 --> DOCKER_RUNTIME MN --> DOCKER_RUNTIME FIREWALL --> SSH_ACCESS SSH_ACCESS --> DOCKER_RUNTIME MONITORING --> V1 MONITORING --> M1 MONITORING --> API1
Deployment Requirements:
- Validators: Require access to Subtensor endpoint and sufficient computational resources for GPU benchmarking
- Miners: Need NVIDIA GPUs, Docker runtime, and open ports (4444 for SSH, 8091 for axon)
- RegisterAPI: Can run on dedicated servers with database persistence and WandB integration
Sources: README.md:110-340 , compute/utils/parser.py:159-165 , neurons/miner.py:154-167 , neurons/register_api.py:86-95