WandB Integration
The WandB Integration system provides distributed state management and experiment tracking across the NI Compute Subnet using Weights & Biases (WandB) as a centralized data store. This system enables validators and miners to share critical network state including hardware specifications, allocation status, performance metrics, and penalty information in a verifiable and tamper-resistant manner.
This document covers the technical implementation of WandB integration for network-wide data synchronization. For Prometheus-based local metrics collection, see Prometheus Metrics.
Architecture Overview
Section titled “Architecture Overview”The WandB integration operates as a distributed state management layer that sits between local database storage and network-wide coordination. Each validator and miner maintains its own WandB run that serves as both a data publication mechanism and a verification layer through cryptographic signatures.
graph TB
subgraph "Local Systems"
V1["Validator Instance"]
V2["Validator Instance"]
M1["Miner Instance"]
M2["Miner Instance"]
DB1["ComputeDb (Local)"]
DB2["ComputeDb (Local)"]
end
subgraph "WandB Cloud Platform"
PROJECT["opencompute Project"]
VRUN1["validator-{hotkey} Run"]
VRUN2["validator-{hotkey} Run"]
MRUN1["miner-{hotkey} Run"]
MRUN2["miner-{hotkey} Run"]
end
subgraph "Shared Network State"
ALLOCATED["allocated_hotkeys"]
PENALIZED["penalized_hotkeys"]
SPECS["miner_specs"]
STATS["validator_stats"]
end
V1 -->|"ComputeWandb.update_allocated_hotkeys()"| VRUN1
V2 -->|"ComputeWandb.update_stats()"| VRUN2
M1 -->|"ComputeWandb.update_specs()"| MRUN1
M2 -->|"ComputeWandb.update_allocated()"| MRUN2
V1 -->|"write_stats()"| DB1
M1 -->|"save_run_id()"| DB2
VRUN1 --> ALLOCATED
VRUN2 --> STATS
MRUN1 --> SPECS
MRUN2 --> ALLOCATED
V1 -.->|"get_allocated_hotkeys()"| ALLOCATED
V1 -.->|"get_miner_specs()"| SPECS
V2 -.->|"get_stats_allocated()"| STATS
Sources: compute/wandb/wandb.py:1-648
Core Components
Section titled “Core Components”ComputeWandb Class
Section titled “ComputeWandb Class”The ComputeWandb class serves as the primary interface for all WandB operations within the compute subnet. It manages authentication, run lifecycle, data synchronization, and cryptographic verification.
classDiagram
class ComputeWandb {
+run: wandb.Run
+config: bt.config
+wallet: bt.wallet
+hotkey: str
+role: str
+db: ComputeDb
+api: wandb.Api
+run_id: str
+__init__(config, wallet, role)
+update_config()
+save_run_id(hotkey, run_id)
+get_run_id(hotkey)
+update_specs()
+log_chain_data(data)
+update_allocated(allocated)
+update_stats(stats)
+update_allocated_hotkeys(hotkey_list)
+update_penalized_hotkeys(hotkey_list)
+get_allocated_hotkeys(valid_validators, flag)
+get_stats_allocated(valid_validators, flag)
+get_miner_specs(queryable_uids)
+sign_run()
+verify_run(run)
+sync_allocated(hotkey)
}
class ComputeDb {
+get_cursor()
+conn: Connection
}
class wandb_Api {
+runs()
+project()
+flush()
}
ComputeWandb --> ComputeDb
ComputeWandb --> wandb_Api
Sources: compute/wandb/wandb.py:19-648
Authentication and Run Management
Section titled “Authentication and Run Management”The system manages WandB authentication through API keys and run persistence through local database storage. Each hotkey maintains a single persistent run across restarts.
| Configuration Parameter | Value | Purpose |
|---|---|---|
PUBLIC_WANDB_NAME | "opencompute" | Project name |
PUBLIC_WANDB_ENTITY | "neuralinternet" | Organization entity |
| Run naming pattern | "{role}-{hotkey}" | Unique run identification |
The authentication flow handles multiple scenarios:
flowchart TD
START["ComputeWandb.__init__()"]
CHECK_KEY{"WANDB_API_KEY exists?"}
CHECK_NETRC{"~/.netrc exists?"}
ERROR["Raise ValueError"]
GET_RUN_ID["get_run_id(hotkey)"]
RUN_EXISTS{"run_id found?"}
QUERY_WANDB["Query WandB for existing runs"]
RUNS_FOUND{"runs.length >= 1?"}
CREATE_RUN["wandb.init() new run"]
SAVE_RUN["save_run_id()"]
RESUME_RUN["wandb.init(id:run_id, resume:'allow')"]
UPDATE_CONFIG["update_config()"]
SIGN["sign_run()"]
START --> CHECK_KEY
CHECK_KEY -->|No| CHECK_NETRC
CHECK_KEY -->|Yes| GET_RUN_ID
CHECK_NETRC -->|No| ERROR
CHECK_NETRC -->|Yes| GET_RUN_ID
GET_RUN_ID --> RUN_EXISTS
RUN_EXISTS -->|No| QUERY_WANDB
RUN_EXISTS -->|Yes| RESUME_RUN
QUERY_WANDB --> RUNS_FOUND
RUNS_FOUND -->|No| CREATE_RUN
RUNS_FOUND -->|Yes| SAVE_RUN
CREATE_RUN --> SAVE_RUN
SAVE_RUN --> RESUME_RUN
RESUME_RUN --> UPDATE_CONFIG
UPDATE_CONFIG --> SIGN
Sources: compute/wandb/wandb.py:22-88 , compute/wandb/wandb.py:109-138
Data Synchronization Patterns
Section titled “Data Synchronization Patterns”Validator Data Flow
Section titled “Validator Data Flow”Validators publish aggregated network statistics and maintain lists of allocated and penalized hotkeys. The synchronization ensures consistency between local database state and distributed WandB state.
sequenceDiagram
participant VDB as "ComputeDb (Validator)"
participant CW as "ComputeWandb"
participant WB as "WandB API"
participant NET as "Network State"
Note over VDB,NET: Stats Update Cycle
VDB->>CW: retrieve_stats(db)
CW->>CW: update_allocated_hotkeys(hotkey_list)
CW->>VDB: write_stats(db, updated_stats)
CW->>WB: run.config.update(allocated_hotkeys, stats)
CW->>WB: run.log(allocated_hotkeys)
CW->>CW: sign_run()
Note over VDB,NET: Network Query Cycle
CW->>WB: api.runs(filters=validator_runs)
WB-->>CW: validator_run_configs
CW->>CW: verify_run(run) for each
CW-->>NET: aggregated_allocated_hotkeys
CW-->>NET: aggregated_stats
Sources: compute/wandb/wandb.py:198-230 , compute/wandb/wandb.py:291-332 , compute/wandb/wandb.py:334-450
Miner Data Flow
Section titled “Miner Data Flow”Miners publish hardware specifications and allocation status, enabling validators to discover available resources and verify capabilities.
sequenceDiagram
participant M as "Miner Process"
participant CW as "ComputeWandb"
participant WB as "WandB API"
participant V as "Validator Query"
Note over M,V: Specs Publication
M->>CW: update_specs()
CW->>CW: get_perf_info(encrypted=False)
CW->>WB: run.config.update(specs)
CW->>CW: sign_run()
Note over M,V: Allocation Status Update
M->>CW: update_allocated(validator_hotkey)
CW->>WB: run.config.update(allocated)
CW->>CW: sign_run()
Note over M,V: Validator Discovery
V->>WB: api.runs(filters=miner_runs)
WB-->>V: miner_run_configs
V->>V: verify_run(run) for each
V-->>V: hotkey_to_specs_mapping
Sources: compute/wandb/wandb.py:140-159 , compute/wandb/wandb.py:168-184 , compute/wandb/wandb.py:540-574
Security and Verification
Section titled “Security and Verification”Cryptographic Signature System
Section titled “Cryptographic Signature System”All WandB runs are signed using the participant’s hotkey to prevent data tampering and ensure authenticity. The signature covers the run ID to prevent replay attacks.
flowchart LR
subgraph "Signing Process"
RUN_ID["run.id"]
HASH["SHA-256 Hash"]
SIGN["wallet.hotkey.sign()"]
STORE["run.config.signature"]
end
subgraph "Verification Process"
RETRIEVE["run.config.signature"]
RECREATE["SHA-256(run.id)"]
VERIFY["bt.Keypair.verify()"]
RESULT["True/False"]
end
RUN_ID --> HASH
HASH --> SIGN
SIGN --> STORE
RETRIEVE --> VERIFY
RECREATE --> VERIFY
VERIFY --> RESULT
The verification process validates both signature authenticity and validator authorization:
| Verification Check | Implementation | Purpose |
|---|---|---|
| Signature validity | bt.Keypair(ss58_address=hotkey).verify() | Prevents data tampering |
| Validator authorization | hotkey in valid_validator_hotkeys | Prevents unauthorized updates |
| Data existence | Config field presence checks | Ensures required data |
Sources: compute/wandb/wandb.py:576-616
Configuration and Setup
Section titled “Configuration and Setup”Environment Requirements
Section titled “Environment Requirements”| Requirement | Configuration Method | Purpose |
|---|---|---|
| WandB API Key | WANDB_API_KEY environment variable | Authentication |
| WandB Login | wandb login command | Alternative authentication |
| Network file | ~/.netrc | Credential storage |
Run Configuration Schema
Section titled “Run Configuration Schema”The system maintains a standardized configuration schema across all runs:
{ "hotkey": "ss58_address", "role": "validator|miner", "config": "bt.config_object", "version": "version_integer", "specs": "hardware_specifications", "allocated": "boolean_or_hotkey", "allocated_hotkeys": ["hotkey_list"], "penalized_hotkeys": ["hotkey_list"], "stats": "uid_to_stats_mapping", "signature": "hex_signature"}Database Integration
Section titled “Database Integration”The system maintains local state persistence through the wandb_runs table in ComputeDb:
| Column | Type | Purpose |
|---|---|---|
hotkey | TEXT | Participant identifier |
run_id | TEXT | WandB run identifier |
Sources: compute/wandb/wandb.py:15-52 , compute/wandb/wandb.py:90-108 , compute/wandb/wandb.py:109-138