Chess channel rationale + hyper-wrap parallel-ops — findings (2026-05-11)¶
Spike: User question 2026-05-11: "have them look at chess2d and chess4d to find out why flat board with x,y coords is 10 channel dims and chess4d with w,x,y,z is 11 channel dims. what if we somehow had a reason to wrap it inside a hyper object such that we could perform parallel operations?" Two sub-questions: (A) what does the EXTRA 11th channel in 4D encode? (B) could a hyper-wrap of the channel dim enable parallel ops?
Method: Concertmaster role, MPM-discipline. Read encoder source code (qm_2d.py, qm_4d.py, encoder.py, encoder_4d.py, qm_2d_dynamics.py, qm_4d_dynamics.py). Build synthetic chess-scale Laplacian benchmarks comparing three propagator paths.
Dispatch: Conductor → concertmaster, 2026-05-11. Single dispatch.
Bottom line: (A) Chess-2D's 10 channels are NOT one-per-piece — they are 5 D₄ irrep projections + 3 symmetric-fiber + FA-pawn + FD-diagonal. Chess-4D's 11 channels are 1 A₁ orbit + 4 STD4-axis residuals (X/Y/Z/W) + 3 symmetric-fiber + 2 FA-pawn-axis (W/Y per Oana-Chiru) + 1 FD-diagonal. The "extra" 11th channel arises from two independent splits: STD4 expands the 2D's three non-trivial 1D irreps (A₂, B₁, B₂) into four axis-residual channels (one per spatial axis of the 4D base), AND FA splits into two axis-specific pawn channels per Oana-Chiru Definition 11 (pawns oriented on Y or W axis, never Z). Net: 2D has +5 base-irreps - 4 STD4 = 1 fewer channel from the irrep side, +1 from the pawn-axis split, +0 from fiber-sym (both 3) and FD (both 1). Math reconciles. (B) The chess H₀ free-particle propagator is strictly block-diagonal across channels with bit-identical blocks (H_full = I_{N_CHANNELS} ⊗ H₀, off-block nnz = 0, max block-diff = 0.0 at machine precision in both 2D and 4D). The "hyper-wrap" for parallel ops is real: a single batched-einsum eigenbasis propagator gives ~196× speedup at chess-2D scale (10.4 ms → 0.053 ms per call). At chess-4D scale the speedup VANISHES — the 4096-dim dense eigendecomposition costs 124 seconds one-shot and per-call is 722 ms (12× SLOWER than the current sparse path); amortization breakeven is infinite. Honest finding: math-doesn't-lie. Per-channel SIMD via einsum is a 2D-scale win, not a 4D-scale win.
Reproducible: python docs/srmech/notes/chess-channels-and-hyper-parallel-script.py. Runtime ~3 min on commodity workstation (the 4D eigendecomposition dominates). Deterministic seed 20260511.
1 — Channel rationale: 10-vs-11 decoded¶
1.1 — chess-2D — 10 channels × 64 dims = 640 total¶
Source: encoder.py:91-95, qm_2d.py:38-41.
| Channel | Slice | Semantic role | Group-theoretic anchor |
|---|---|---|---|
| A1 | 0:64 | D₄ trivial irrep (totally symmetric signal) | 1-dim D₄ irrep |
| A2 | 64:128 | D₄ sign-of-rotation irrep | 1-dim D₄ irrep |
| B1 | 128:192 | D₄ diagonal-flip irrep | 1-dim D₄ irrep |
| B2 | 192:256 | D₄ anti-diagonal-flip irrep | 1-dim D₄ irrep |
| E | 256:320 | D₄ 2-dim standard irrep | 2-dim D₄ irrep |
| F1, F2, F3 | 320:512 | Symmetric-fiber: per-piece local Laplacians projected onto 3D fiber basis (knight, bishop=queen, king contribute 3 independent directions) | Cross-piece SVD rank-3 basis |
| FA | 512:576 | Antisymmetric pawn fiber: (A_white_pawn - A_white_pawn.T)/2 in grid eigenbasis |
Antisymmetric / pawn-direction |
| FD | 576:640 | Diagonal deviation: diag(EVECS.T @ L_piece @ EVECS) - EVALS_GRID, signed-piece-weighted |
Rook-shadow / per-piece eigenvalue split |
5 D₄ irreps + 3 fiber-sym + 1 FA + 1 FD = 10. The channels are NOT one-per-piece — they are spectral / group-theoretic projections of the signed-piece-value scalar field over the 8×8 base. Piece identity enters only through the fiber-sym (3 directions of cross-piece SVD), the FA (pawns only), and the FD (per-piece-type DIAG_DEV row).
1.2 — chess-4D — 11 channels × 4096 dims = 45 056 total¶
Source: encoder_4d.py:3-14, 115-127, qm_4d.py:38-41.
| Channel | Slice | Semantic role | Group-theoretic anchor |
|---|---|---|---|
| A1 | 0:4096 | B₄ trivial irrep (P_A1 @ sig) |
1-dim B₄ irrep |
| STD4_X | 4096:8192 | Centered x-coord residual × signal | std-4D rep, axis 0 |
| STD4_Y | 8192:12288 | Centered y-coord residual × signal | std-4D rep, axis 1 |
| STD4_Z | 12288:16384 | Centered z-coord residual × signal | std-4D rep, axis 2 |
| STD4_W | 16384:20480 | Centered w-coord residual × signal | std-4D rep, axis 3 |
| FIB_SYM_½/3 | 20480:32768 | Symmetric fiber (cross-piece SVD basis, knight/bishop=queen/king, 4D rook drops out) | Cross-piece SVD rank-3 |
| FA_PAWN_W | 32768:36864 | Antisymmetric pawn fiber on W-axis (I⊗I⊗I⊗W_ANTI_DCT) |
Oana-Chiru Def. 11 |
| FA_PAWN_Y | 36864:40960 | Antisymmetric pawn fiber on Y-axis (I⊗Y_ANTI_DCT⊗I⊗I) |
Oana-Chiru Def. 11 |
| FD_DIAG | 40960:45056 | Diagonal deviation, rook-shadow, all pieces | Per-piece DIAG_DEV row |
1.3 — Reconciliation: why 11 not 10¶
Two independent splits convert 2D's 10 into 4D's 11:
- D₄ irreps (2D, 5) → STD4 axis residuals (4D, 4 + 1 = 5). The 2D code uses 5 character-table-projected channels (A1, A2, B1, B2, E). The 4D code keeps just one B₄-orbit projector (A1, the trivial irrep) and replaces the other four with four axis residual channels STD4_{X,Y,Z,W}. Net: 5 ↔ 5 (no change in count, but a deep semantic reframing — irrep projection → per-axis residual; consistent with the std-4D rep being 4-dimensional vs D₄'s {A2, B1, B2, E} totaling 4 non-trivial-irrep dimensions).
- FA pawn (2D, 1) → FA_PAWN axis-split (4D, 2). The 2D code has one antisymmetric pawn channel. The 4D code splits it into FA_PAWN_W and FA_PAWN_Y per Oana-Chiru Definition 11 (4D pawns are anchored on the Y or W axis, never Z; vendored at
hoodoos/rinaldi-unciuleanu-chiru-2026.xml). Net: 1 → 2 (+1 channel). - Fiber-sym (2D, 3) → Fiber-sym (4D, 3). No change.
- FD (2D, 1) → FD_DIAG (4D, 1). No change.
Tally: 5 + 1 + 3 + 1 = 10 (2D); 5 + 2 + 3 + 1 = 11 (4D). The +1 channel in 4D is from the pawn-axis split mandated by the chess4D-OC ruleset, NOT from the dimensional bump per se. (One might have naively expected +6 or +8 channels from B₄'s richer irrep structure; the encoder design instead uses the 4D coordinate frame as the natural splitter and keeps only one B₄ orbit projector.)
This is a load-bearing design choice — the 4D encoder bypasses B₄'s 20-irrep table in favor of the natural-rep coordinate axes for channels 1-4. A future encoder revision could expand the B₄ irrep coverage (analogue of D₄'s {A2, B1, B2, E}) at the cost of more channels.
2 — Hyper-wrap for parallel operations: structural check¶
2.1 — H₀ block-diagonal structure¶
Both qm_2d_dynamics.py:159-172, 261-271 and qm_4d_dynamics.py:2598-2604 document: H_full = I_{N_CHANNELS} ⊗ H₀. The current code path is already a per-channel loop:
Direct measurement (synthetic chess-scale Laplacian):
| Scale | Off-block coupling nnz | Strictly block-diagonal? | Max block-vs-block diff | Blocks identical? |
|---|---|---|---|---|
chess-2D (I_10 ⊗ H_0) |
0 | YES | 0.0 (machine-exact) | YES |
chess-4D (I_11 ⊗ H_0) |
0 | YES | 0.0 (machine-exact) | YES |
The structure is exactly what the code claims. The "hyper-wrap" representation already exists implicitly; the question is whether materializing it (as a single big propagator) or batching the per-channel calls more aggressively gives a speedup.
2.2 — Three propagator paths benchmarked¶
Three implementations, same numerical result (validated):
- Per-channel sparse loop — current code:
N_CHANNELSseparateexpm_multiplycalls, each on a single(H₀, ψ_block)pair. - Eigenbasis + einsum batched — pre-compute
eigh(H₀_dense)once, then per call applyV_T @ diag(exp(-iλt)) @ V^H @ ψto allN_CHANNELSblocks at once via(N_CHANNELS, n) @ (n, n)einsum. This is the literal "hyper-wrap for parallel ops" interpretation. - Full-dim sparse — materialize
I_N ⊗ H₀and callexpm_multiplyonce on theN·n-dim state. The "naive hyper" form.
| Scale | Per-channel sparse | Eig + einsum batched | Full-dim sparse | Eig speedup vs per-channel | Cross-check max_abs |
|---|---|---|---|---|---|
| chess-2D (10×64=640) | 10.4 ms | 0.053 ms | 1.22 ms | ~196× | 1.8e-16 (machine-exact) |
| chess-4D (11×4096=45056) | 61.3 ms | 722 ms (post-decomp; 124s decomp) | 69.0 ms | 0.085× (SLOWER) | 1.3e-16 (machine-exact) |
2.3 — Math-doesn't-lie verdict¶
The hyper-wrap-for-parallel-ops idea is a real 2D-scale win and a real 4D-scale loss. Honest accounting:
- chess-2D: 196× speedup is genuine. Eigenbasis diagonalization of the 64×64 H₀ costs ~0.5 ms one-shot; subsequent per-call cost is 0.053 ms vs 10.4 ms for the current sparse loop. Recommendation: ship as ADR-002 §6.1 optimization for the 2D side. Eigendecomp is amortized across move boundaries during M14.x animation; per-frame cost drops by ~196×.
- chess-4D: 12× slowdown plus prohibitive setup cost. Eigendecomp of the 4096×4096 dense H₀ costs 124 seconds one-shot, and per-call eigenbasis-batched is 722 ms (vs 61 ms sparse). Amortization breakeven is infinite — the per-call cost is already higher, so no number of subsequent calls recoups the upfront cost. Recommendation: do NOT pursue eigenbasis optimization for 4D side. The 4096×4096 dense matmul (V^H @ ψ across 11 channels) is fundamentally slower than
scipy.sparse.linalg.expm_multiply's Krylov subspace iteration on the sparse H₀ (32 768 nnz). - The full-dim sparse path (interpretation 3) is fastest at 2D scale (1.22 ms vs 10.4 ms per-channel loop) and roughly tied at 4D scale (69 ms vs 61 ms). This is because
scipy.sparse.linalg.expm_multiplyhas per-call overhead that dominates at smalln; one big call beats 10 small calls. Anomaly-chase: the current per-channel loop inqm_2d_dynamics.evolve_under_h0is suboptimal at 2D scale by ~8× even without going to eigenbasis. MaterializingI_10 ⊗ H₀(once, cached) and callingexpm_multiplyonce on the full 640-dim state is strictly better than the per-channel loop on every metric tested.
2.4 — Connection to "hyper" terminology¶
The user's question proposed a fourth potential sense of "hyper" — channel-parallelization. Recommendation: do NOT add a fourth sense to the canon. What we measured is just structured numpy broadcasting (einsum('cij,cj->ci', ...) or psi.reshape(N, n) @ V), well within the existing algebraic-hyperdimensional (§3.5.1) sense — HDC-style batched-vector operations on a fiber-bundle structure. The fiber-bundle operator algebra (§3.5.4) already documents total = base × fiber (tensor product) and says explicitly "the graph-Laplacian + eigenphase-torus lift applies to the base factor; HDC bind/bundle operations apply to the fiber factor". What this spike confirms is that operators that respect the fiber structure (factor as I_fiber ⊗ O_base) admit per-fiber batched evaluation — this is structural, not new terminology.
2.5 — Anomaly chase: the chess-4D 94 ms from project-workflow-a2a-quickbench¶
The project-workflow-A2A quick-bench measured chess qm_4d_dynamics at 94 ms per evolve_under_h0 call. Our synthetic measurement is 61 ms — within 1.5× and consistent with the quick-bench's "within constant factors of the real subsystem" caveat. (The quick-bench used N_TRIALS=10 at the full scale, and the synthetic Laplacian has ~32 768 nnz vs the real 4D H₀ which has the same structure.) This is convergence — our measurement validates the prior quick-bench. The bigger anomaly is that the project-internal 4D dynamics call is already near-optimal: the per-channel loop and full-dim sparse paths are tied at 4D scale, and eigenbasis would slow it down. There is no further speedup to chase at the 4D scale via channel-parallelization.
3 — Three concrete recommendations to the conductor¶
- chess-2D dynamics: enable ADR-002 §6.1 eigenbasis-diagonal optimization. A 196× speedup at zero numerical cost (max_abs deviation 1.8e-16) is a real M14.x animation win at the 2D scale. The 64×64 H₀ eigendecomp is trivial (<1 ms). Implementation is ~10 lines in
qm_2d_dynamics.evolve_under_h0. - chess-4D dynamics: do NOT pursue eigenbasis optimization. Honest negative result. Document the breakeven calculation in ADR-002 §6.1 so the deferral has a measured justification (not just "deferred").
- chess-2D dynamics: alternative — drop the per-channel sparse loop in favor of pre-cached
I_10 ⊗ H₀. 8× speedup with no math change, no eigendecomposition. If the eigenbasis path is judged too invasive, this is the lower-risk floor improvement.
Channel rationale: the chess-spectral notebook should gain a §16.1 (or similar) channel rationale subsection explaining the 10 vs 11 split. The current qm_2d.py and encoder_4d.py docstrings document the layouts independently; no shared narrative explains the +1 (it's the pawn-axis split per Oana-Chiru, not the dimensional bump per se).
§3.5.4 fiber-bundle row update (srmech notebook): add a sub-property after the existing operator-orthogonality paragraph: "Operators of the form I_fiber ⊗ O_base (free-particle Hamiltonian H₀, channel-independent observables, group-action lifts U_g) factor naturally as per-fiber batched evaluations. The chess-spectral H₀ is the load-bearing project instance: synthetic measurement at chess-2D scale shows a 196× speedup from per-channel sparse expm_multiply to batched eigenbasis-diagonal einsum; at chess-4D scale the dense-eigendecomposition cost is prohibitive (124 s) and the speedup inverts to 12× slowdown."
Files¶
chess-channels-and-hyper-parallel-2026-05-11.md— this findings markdownchess-channels-and-hyper-parallel-per-finding-2026-05-11.ndjson— structural-check + bench records, one per linechess-channels-and-hyper-parallel-script.py— reproducible benchmark, deterministic seed20260511