Skip to content

.spectral[z] / .spectralz4 wire format reference

Canonical user-facing spec for the binary container that holds chess-spectral encoder output. Covers all four shipped versions (v2 / v3 / v4 / v5) and how readers dispatch between them.

Looking for the design rationale? That lives in docs/adr/wire_format/ADR-001-v5-unified-encoding-modes.md. This file is the user-facing format spec; the ADR is the why.

Versions at a glance

Version Magic Shipped in Encoder Frame body File ext Status
v2 LARTPSEC v0.x 2D 640-dim dense .spectral / .spectralz reader-only
v3 LARTPSEC v1.0 4D 40 960-dim dense .spectralz4 reader-only (legacy)
v4 LARTPSEC v1.1.1 4D 45 056-dim dense .spectralz4 reader-only (legacy)
v5 LARTPSEC v1.6 2D or 4D, three encoding modes dense / per-channel / xor-stream .spectral[z] or .spectralz4 default for new writes

The magic bytes (LARTPSEC, ASCII, little-endian = 0x434553505452414C) are identical across all versions. Readers detect the version by reading the first 12 bytes (magic + u32 version), then dispatch to the right parser.

Compression

Any version may be transparently gzipped (RFC 1952) — the z suffix in .spectralz / .spectralz4 indicates gzip compression. Readers detect gzip by peeking the first two bytes (0x1F 0x8B) and decompress before parsing the LARTPSEC header. The internal layout is identical between gzipped and uncompressed forms; gzip is a transport wrapper.

The C writer always emits gzip with mtime=0 for deterministic output (byte-for-byte reproducibility across runs). The Python writer matches.

v5 — current format (v1.6+)

Single header serves both 2D and 4D, parameterised by n_dimensions. Three encoding modes for the frame body, selected by encoding_mode.

Header (256 bytes, little-endian)

typedef struct {
    char     magic[8];       // "LARTPSEC"
    uint32_t version;        // 5
    uint32_t encoding_dim;   // 640 (2D) or 45056 (4D)
    uint32_t frame_bytes;    // dense-equivalent frame size
    uint32_t n_plies;        // number of frames that follow
    uint32_t board_dim_side; // 8 (always)
    uint32_t n_dimensions;   // 2 or 4 — the explicit dim flag
    uint8_t  encoding_mode;  // 0=dense, 1=per-channel, 2=xor-stream
    uint8_t  reserved[223];  // zero-filled
} spectralz_v5_header_t;

Total: 8 + 6×4 + 1 + 223 = 256 bytes — matches the v2/v4 header geometry exactly, so v5 files have the same on-disk header size as their predecessors.

Encoding modes

The encoding_mode byte selects one of three frame-body layouts. Each layout is independently optimal for a different workload; ADR-001 has the empirical compression numbers (4D XOR-stream measured 7.23× compression vs dense gzipped on a 50-ply knight-tour fixture).

Mode 0 — dense

Frame body = float32 encoding[encoding_dim] followed by move metadata. Identical body to v2 (2D) / v3 / v4 (4D) — only the header differs. This is the --encoding=full CLI flag's effect.

Component 2D bytes 4D bytes
encoding[encoding_dim] 640 × 4 = 2560 45 056 × 4 = 180 224
ply (u32) 4 4
move-from coordinates 1 (u8) 4 (u8 × 4)
move-to coordinates 1 (u8) 4 (u8 × 4)
promo (u8) 1 1
flags (u8) 1 1
total 2568 180 238

Mode 1 — per-channel replacement

Frame body has variable size. Layout per ply:

u32 body_size_bytes        // length of body (excluding own size field)
u8  flags                  // bit 0 = PC_FLAG_FULL (independent frame)
u8  n_channels_present     // 0..N_channels
[u8 channel_idx, u8 reserved, float32 buffer[channel_dim]] × n_present
<move-metadata tail>       // 8 B (2D) or 14 B (4D), same as mode 0

The encoder compares each frame against the previous reconstructed one and emits only the channels whose float32 bit pattern changed. Channel layout: 2D = 10 channels × 64 modes; 4D = 11 channels × 4096 modes. The first frame is always emitted with the FULL flag set (independent baseline).

Mode 2 — XOR-stream

Frame body fixed-size = identical layout to mode 0. The DIFFERENCE: each frame's encoding[] payload is the bit-XOR of the real encoding with the previous reconstructed frame's encoding (treated as uint32 arrays). Frame 0 is XOR'd with zero = verbatim.

real_frame[N] = stored[N] XOR real_frame[N-1]    // cumulative reconstruction

Bit-exact, lossless. The wins come from gzip: chess hypervectors are mostly stable per ply, so XOR yields long zero-byte runs that gzip compresses essentially for free. This is the leanest encoding format — same fixed frame body size as mode 0, no per-frame overhead.

Reader dispatch

A correct reader handles all four versions. The Python reference reader (chess_spectral.frame_v5.peek_version()) reads only the first 12 bytes:

from chess_spectral.frame_v5 import peek_version

v = peek_version("game.spectralz")  # transparent over gzip
if v == 2:
    # 2D legacy → chess_spectral.frame.read_all()
    ...
elif v in (3, 4):
    # 4D legacy → chess_spectral.frame_4d.read_all()
    ...
elif v == 5:
    # unified → chess_spectral.frame_v5.read_v5_header()
    #          + dispatch on encoding_mode + n_dimensions
    ...

Implementation: python/chess_spectral/frame_v5.py.

Backward compatibility guarantees

  • v2/v3/v4 readers stay forever. Files already on disk keep working.
  • Magic is unchanged across versions. Any reader that checks the magic before the version field will still recognise the file.
  • v5 dense-mode frame body bytes are byte-identical to v2/v4. A v5 file in mode 0 is a v2/v4 file with a different header. Tools that only process the encoding payload (e.g., the chess-maths-viewer's frame iterator) will work unchanged on v5 dense files.
  • Default for new writes (v1.6+). --encoding=xor is the default CLI mode; users opt out with --encoding=full (mode 0) for byte-for- byte compatibility with prior tools that haven't learned the v5 modes.

See also