Skip to content

GPU Performance Configuration

Relevant Source Files

This document details the GPU performance configuration system used for Proof-of-GPU validation in the NI Compute subnet. It covers the performance benchmarks, tolerance settings, identification logic, and Merkle proof parameters that enable validators to verify miner GPU capabilities. For overall system configuration options, see Command-line Arguments.

The GPU performance configuration consists of several key components:

  • Performance benchmark data (TFLOPS, AVRAM) for GPU identification
  • Tolerance pairs for handling equivalent GPU models
  • Merkle proof parameters for cryptographic verification
  • Benchmarking timeouts and retry limits

Sources: config.yaml:1-104 , compute/__init__.py:37-48

The system maintains comprehensive performance data for GPU models in config.yaml under the gpu_performance section. This data enables accurate GPU identification and performance verification through three key metrics:

FP16 TFLOPS Configuration

GPU_TFLOPS_FP16:
NVIDIA B200: 1205
NVIDIA H200: 610
NVIDIA H100 80GB HBM3: 570
NVIDIA A100-SXM4-80GB: 238.8

FP32 TFLOPS Configuration

GPU_TFLOPS_FP32:
NVIDIA B200: 67.2
NVIDIA H200: 49.6
NVIDIA H100 80GB HBM3: 49.0
NVIDIA A100-SXM4-80GB: 18.2

VRAM Configuration

GPU_AVRAM:
NVIDIA B200: 68.72
NVIDIA H200: 68.72
NVIDIA H100 80GB HBM3: 34.36
NVIDIA A100-SXM4-80GB: 34.36

The gpu_scores section assigns relative performance values used by the scoring system:

GPU ModelPerformance Score
NVIDIA B2005.00
NVIDIA H2004.0
NVIDIA H100 80GB HBM33.30
NVIDIA H1002.80
NVIDIA A100-SXM4-80GB1.90
NVIDIA L40s0.90
NVIDIA RTX 6000 Ada Generation0.83
NVIDIA RTX 40900.68

Sources: config.yaml:1-94

The system handles functionally equivalent GPU models through tolerance pairs that prevent false negatives during GPU identification. This mechanism accounts for naming variations and similar performance characteristics.

graph LR
    subgraph "gpu_tolerance_pairs Configuration"
        L40["NVIDIA L40"] <--> RTX6000["NVIDIA RTX 6000 Ada Generation"]
        A100PCIe["NVIDIA A100 80GB PCIe"] <--> A100SXM["NVIDIA A100-SXM4-80GB"] 
        H100_80GB["NVIDIA H100 80GB HBM3"] <--> H100["NVIDIA H100"]
        A40["NVIDIA A40"] <--> RTXA6000["NVIDIA RTX A6000"]
        RTXA5000["NVIDIA RTX A5000"] <--> RTX4000["NVIDIA RTX 4000 Ada Generation"]
    end

The identify_gpu function in neurons/Validator/pog.py applies tolerance logic during GPU identification:

# Check if identified GPU matches the tolerance pair
if identified_gpu in tolerance_pairs and reported_name == tolerance_pairs.get(identified_gpu):
identified_gpu = reported_name
# Check reverse mapping
elif reported_name in tolerance_pairs and identified_gpu == tolerance_pairs.get(reported_name):
identified_gpu = reported_name

This allows miners with equivalent hardware to receive consistent identification regardless of minor naming differences.

Sources: config.yaml:63-73 , neurons/Validator/pog.py:27-73

The Proof-of-GPU system uses Merkle tree verification to cryptographically validate GPU computations. The merkle proof configuration parameters control this verification process.

merkle_proof:
miner_script_path: "neurons/Validator/miner_script_m_merkletree.py"
time_tolerance: 5
submatrix_size: 512
hash_algorithm: 'sha256'
pog_retry_limit: 22
pog_retry_interval: 60 # seconds
max_workers: 64
max_random_delay: 900 # 900 seconds
flowchart TD
    subgraph "Merkle Proof Verification Process"
        validator["Validator"] -->|"send_script_and_request_hash"| ssh["SSH Connection"]
        ssh -->|"execute_script_on_miner"| script["miner_script_m_merkletree.py"]
        script -->|"generate_matrix_torch"| matrices["Matrix Generation"]
        matrices -->|"build_merkle_tree_rows"| tree["Merkle Tree"]
        tree -->|"get_merkle_proof_row"| proof["Merkle Proofs"]
        proof -->|"verify_merkle_proof_row"| validator
        validator -->|"verify_responses"| result["Verification Result"]
    end

The Merkle proof system involves several key functions:

  • send_script_and_request_hash(): Transfers and verifies the benchmark script
  • execute_script_on_miner(): Runs computation modes (benchmark/compute/proof)
  • build_merkle_tree_rows(): Constructs Merkle trees from computation results
  • verify_merkle_proof_row(): Validates individual proof elements
  • verify_responses(): Performs overall verification with failure tolerance

Sources: config.yaml:95-104 , neurons/Validator/pog.py:75-340

The system uses several timeout and retry parameters to ensure reliable GPU performance validation while handling network and hardware variations.

From compute/__init__.py:

# Proof of GPU settings
pog_retry_limit = 30
pog_retry_interval = 80 # seconds
specs_timeout = 60 # Time before specs requests timeout
flowchart TD
    subgraph "GPU Benchmarking Process"
        start["Validator initiates PoG"] --> send["send_script_and_request_hash()"]
        send --> verify["Verify script hash"]
        verify --> benchmark["execute_script_on_miner(mode:'benchmark')"]
        benchmark --> parse["parse_benchmark_output()"]
        parse --> compute["execute_script_on_miner(mode:'compute')"]
        compute --> merkle["parse_merkle_output()"]
        merkle --> proof["execute_script_on_miner(mode:'proof')"]
        proof --> validate["verify_responses()"]
        validate --> result["GPU identification & scoring"]
    end

The parse_benchmark_output function processes miner responses:

num_gpus, vram, size_fp16, time_fp16, size_fp32, time_fp32 = parse_benchmark_output(output)

This extracts:

  • GPU count
  • Available VRAM
  • FP16 matrix size and execution time
  • FP32 matrix size and execution time

These values are then used by identify_gpu() to match against the performance database.

Sources: compute/__init__.py:37-48 , neurons/Validator/pog.py:101-146

The core GPU identification process combines performance benchmarking with tolerance-aware matching to accurately identify miner hardware capabilities.

flowchart TD
    subgraph "identify_gpu Function Flow"
        input["Input: fp16_tflops, fp32_tflops, estimated_avram, reported_name"] 
        input --> calculate["Calculate combined_scores for all GPU models"]
        calculate --> deviation["fp16_deviation + fp32_deviation + avram_deviation / 3"]
        deviation --> sort["Sort by lowest deviation score"]
        sort --> identify["identified_gpu : best_match"]
        identify --> tolerance{"Check tolerance_pairs"}
        tolerance -->|"Match found"| adjust["Apply tolerance adjustment"]
        tolerance -->|"No match"| return["Return identified_gpu"]
        adjust --> return
    end

The identify_gpu function calculates deviation scores for each GPU model:

fp16_deviation = abs(fp16_tflops - fp16_theoretical) / fp16_theoretical
fp32_deviation = abs(fp32_tflops - fp32_theoretical) / fp32_theoretical
avram_deviation = abs(estimated_avram - avram_theoretical) / avram_theoretical
combined_score = (fp16_deviation + fp32_deviation + avram_deviation) / 3

The GPU with the lowest combined deviation score is selected as the identified model.

The miner_script_m_merkletree.py script provides multiple execution modes:

  • benchmark: Matrix multiplication performance testing
  • compute: Merkle tree computation with PRNG matrices
  • proof: Generate cryptographic proofs for verification
  • gpu_info: Basic GPU detection and enumeration

Sources: neurons/Validator/pog.py:27-73 , neurons/Validator/miner_script_m_merkletree.py:21-388

The GPU performance configuration is loaded and validated through the YAML configuration system with error handling for missing or malformed data.

flowchart TD
    subgraph "load_yaml_config Function"
        start["load_yaml_config(file_path)"] --> open["Open config.yaml"]
        open --> parse["yaml.safe_load(data)"]
        parse --> validate["Validate gpu_performance section"]
        validate --> return["Return configuration dict"]
        
        parse -->|"FileNotFoundError"| error1["Raise FileNotFoundError"]
        parse -->|"YAMLError"| error2["Raise ValueError"]
    end

The loaded configuration provides access to all GPU performance data:

gpu_data = load_yaml_config("config.yaml")
GPU_TFLOPS_FP16 = gpu_data["gpu_performance"]["GPU_TFLOPS_FP16"]
GPU_TFLOPS_FP32 = gpu_data["gpu_performance"]["GPU_TFLOPS_FP32"]
GPU_AVRAM = gpu_data["gpu_performance"]["GPU_AVRAM"]
tolerance_pairs = gpu_data["gpu_performance"]["gpu_tolerance_pairs"]

GPU configuration data is persisted using database functions:

  • update_pog_stats(): Stores GPU name and count for miners
  • get_pog_specs(): Retrieves most recent GPU specifications
  • write_stats(): Stores comprehensive performance data with JSON serialization

Sources: neurons/Validator/pog.py:14-26 , neurons/Validator/database/pog.py:24-98

  1. GPU Performance Configuration: Defined in config.yaml
  2. Score Calculation Logic: Implemented in neurons/Validator/calculate_pow_score.py
  3. GPU Data Storage: Managed by functions in neurons/Validator/database/pog.py
  4. Mathematical Utilities: Provided in compute/utils/math.py

GPU specifications are stored in the database using JSON serialization:

# Convert dict to JSON string for storage
if isinstance(raw_specs, dict):
gpu_specs = json.dumps(raw_specs)
else:
gpu_specs = raw_specs
# When retrieving
raw_gpu_specs = row[2]
if raw_gpu_specs:
try:
gpu_specs = json.loads(raw_gpu_specs) # Convert from JSON -> dict
except Exception as e:
gpu_specs = None

This allows flexible storage of different GPU configurations while maintaining a structured database schema.

Sources: neurons/Validator/database/pog.py:100-186