chess-spectral 2D eigenbasis optimization — implementation report (2026-05-11)¶
Branch: feat/chess-spectral-2d-eigenbasis-optimization
Lineage: surfaced_from: chess_channels_hyper_parallel_research_2026-05-11 (commit 7e98a7e, merged via PR #329)
Spec: ADR-002 §6.1, Open Question / Future Work item 1 ("eigenbasis-diagonal optimization for evolve_until / evolve_under_h0; deferred until profiling justifies the LOC"). The 2026-05-11 honest-negative spike provided the measured justification at chess-2D scale (and the measured deferral justification at chess-4D scale).
Bottom line¶
The new path replaces the per-channel scipy.sparse.linalg.expm_multiply loop in chess_spectral.qm_2d_dynamics.evolve_under_h0 with the closed-form eigenbasis-diagonal expression U(t) = V @ diag(exp(-i λ t)) @ V^H where (λ, V) = eigh(H_FREE_2D.toarray()) is cached in a module-level singleton. All 28 pre-existing tests pass without modification; 51 new tests in tests/test_qm_2d_dynamics_eigenbasis.py verify parity, caching invariants, and the spec-required mathematical properties. Measured speedup on the real H_FREE_2D (not the spike's synthetic Laplacian) is 189.7× with max-abs deviation 1.54e-16 vs the sparse reference.
Implementation summary¶
- Cache: module-level
_H0_EIGENBASIS_2D: Optional[Tuple[np.ndarray, np.ndarray]](lazy first-call via_get_h0_eigenbasis_2d()). Built vianumpy.linalg.eigh(H_FREE_2D.toarray()); returns(eigvals: float64[64], eigvecs: complex128[64, 64]). The eigvecs are stored complex128 even though the symmetric H_0 admits a real-orthogonal V, so the downstream phase multiplication stays in one dtype. evolve_under_h0three paths:- Single 64-block:
coefs = V^H @ ψ; coefs *= phases; out = V @ coefs - Full 640-dim broadcast:
psi_blocks = ψ.reshape(10, 64); coefs = psi_blocks @ V.conj(); coefs *= phases; out_blocks = coefs @ V.T; out = out_blocks.ravel() - Single-channel-only mode: same as single-block path applied to the targeted slice; other 9 blocks bit-identical to (complex128-cast) input.
phases = np.exp(-1j * eigvals * t)computed once per call (shared across all 10 channels).H_FREE_2D()and_get_h_free_2d()unchanged. Public API: signature, return shape, exceptions all identical.H_FREE_2Dcontinues to be the SSOT for the sparse matrix; the eigenbasis cache is keyed off it implicitly (built once, the H_0 sparse matrix is also a module-level singleton).- Unused import dropped:
scipy.sparse.linalg.expm_multiplyis no longer needed (was used only by the previous per-channel loop).
Parity numbers (real H_FREE_2D, N_TRIALS=20, warmup=3, t=0.1, seed 20260511)¶
| Path | Median ms | IQR ms |
|---|---|---|
| New: eigenbasis-diagonal, batched | 0.057 | 0.030 |
Old: per-channel sparse expm_multiply |
10.75 | 2.77 |
| Speedup | 189.7× | — |
- Max-abs deviation (new vs sparse reference): 1.54e-16 (machine-precision bit-equivalent; same scale as the spike's 1.8e-16 on the synthetic Laplacian).
- One-shot eigendecomposition cost: ~6 ms on commodity workstation; amortized over the first call (lazy build).
The 189.7× vs the spike's 196× is well within bench variance (~3% delta). Both are "math-doesn't-lie wins" at chess-2D scale.
Test coverage¶
- Pre-existing
tests/test_qm_2d_dynamics.py: 28/28 pass unchanged. The optimization preserves the public-API contract exactly. - New
tests/test_qm_2d_dynamics_eigenbasis.py: 51 tests pass. Cover: - Eigenbasis cache structural invariants (5)
V @ diag(λ) @ V^Hreconstructs H_0 to machine precision (1)- V is unitary to 1e-12 (1)
- Eigvals lie in [-8, 0] (1)
- Parity vs sparse reference, single-block path, t ∈ {1e-6, 1e-3, 0.05, 0.1, 0.5, 1.0, 2.5}, tol 1e-14 (7)
- Parity vs per-channel sparse loop, full-640 batched path, same t sweep (7)
- Parity vs sparse, per-channel-only mode, all 10 channels (10)
- Norm preservation, single-block + full-640 across t sweep (14)
- Energy ⟨ψ|H_0|ψ⟩ preservation across t = {0.05, 0.1, 0.5, 1.0} (4)
- Determinism: same t → bit-identical; different t → different; U(-t)∘U(t) = I; U(t1+t2) = U(t2)∘U(t1) (4)
tests/test_qm_4d_dynamics_b2.py: 17/17 pass — verifies the 4D side is unaffected (the optimization deliberately does NOT touchqm_4d_dynamics.evolve_under_h0per the spike's honest negative).tests/test_qm_2d.py+tests/test_qm_2d_bridge.py: 73 + 29 = 102 tests pass — wider 2D ecosystem unaffected.
Total verified surface: 169 tests, all pass.
ADR conformance¶
ADR-002 §6 future-work item 1 stated:
Per Pre-flight 3, H_0 = -Δ is diagonal in the encoder basis. A v1.6+ optimization could replace expm_multiply with direct phase multiplication. Deferred until profiling shows evolve_until is a hot path.
Pre-flight 3 framed the encoder as a simultaneous eigenbasis of (Δ, B_4 commutant). The spike measured that the encoder-channels are bit-identical blocks under H_0 (because H_full = I_N_CHANNELS ⊗ H_0), which means the eigenbasis is the per-block eigh(H_0) — we don't need to identify the encoder-aligned eigenbasis explicitly; numpy.linalg.eigh finds it (up to within-eigenspace mixing, which is irrelevant for the diagonal-phase application since exp(-iλt) acts identically on degenerate eigenvectors). This matches the §4.1 design intent.
ADR-002 §4.5 ("Performance") explicitly estimated:
~10x speedup; deferred until profiling justifies the LOC.
The measured 189.7× is roughly 19× better than the ADR's own estimate. This is because the ADR was estimating against a single big expm_multiply call (the "1 matvec + 1 elementwise" framing), whereas the current code path uses scipy.sparse.linalg.expm_multiply per channel — overhead-dominated at the 64×64 scale. The "8× speedup from going to full-dim sparse" intermediate path from the spike's anomaly-chase explains the remaining gap.
C-side state of evolve_under_h0 parity¶
Verified via Grep("evolve_under_h0", path="docs/chess-maths/chess-spectral/src") and path="docs/chess-maths/chess-spectral/include") — no matches. evolve_under_h0 is Python-only QM dynamics; the C port covers encoders (encode_*, including the 2D pure-phase port in 1.19.0+), not the QM dynamics layer. No C/Python parity action required.
Anomalies and fermata¶
None. The implementation flows cleanly from the spike. Two minor design choices worth flagging for review (not blockers):
eigvecsstored as complex128 even though H_0 is real-symmetric. I chose dtype consistency over the float64 micro-saving (the matmul accumulator would have to handle promotion anyway). The 64×64 cache is ~64 KB complex128 vs ~32 KB float64 — negligible at module scope.eigvalsstays float64.np.exp(-1j * eigvals * t)correctly promotes to complex128 at the elementwise level. This is the natural / standard form; keeping eigvals real also costs no information.
If reviewer prefers float64 V for tighter memory or pure-real intermediate matmul, the change is a one-line edit (eigvecs = np.ascontiguousarray(eigvecs, dtype=np.float64)) — I left it complex128 for the reason above.
Files touched¶
docs/chess-maths/chess-spectral/python/chess_spectral/qm_2d_dynamics.py(modified) — eigenbasis cache + rewrittenevolve_under_h0body + docstring updates.docs/chess-maths/chess-spectral/python/tests/test_qm_2d_dynamics_eigenbasis.py(new) — 51-test parity regression layer.docs/chess-maths/chess-spectral/python/CHANGELOG.md(modified) —[Unreleased]section entry.
Reproducibility¶
cd docs/chess-maths/chess-spectral/python
python -m pytest tests/test_qm_2d_dynamics.py tests/test_qm_2d_dynamics_eigenbasis.py -v
# 79/79 PASSED in ~3 s
For the speedup measurement re-run, the inline benchmark in the spike script (commit 7e98a7e, docs/srmech/notes/chess-channels-and-hyper-parallel-script.py) covers the synthetic-Laplacian case at the chess-2D scale; the real-H_FREE_2D numbers in this report were measured by a one-shot verification script (not committed; trivially reconstructed from the spike script by replacing h_free_2d() with chess_spectral.qm_2d_dynamics.H_FREE_2D()).