Phase 1e findings — notebook §2e draft¶
Status: All five sub-phases complete. Date: 2026-04-23
Follow-up to §1d.c (twin-channel A₁(s²) / E(s²) headline) and §2c.2 (Shannon×A₁⁻ in-book ρ = +0.213). Five sub-phases:
- 1e.1 — edax d=20 on 2587 tasklist, for Reading A vs B separation.
- 1e.2 — multivariate OLS + rotated sweep on A₁(s²) + E(s²).
- 1e.3 — Shannon info × full D₄ occupation battery (generalises §2c.2).
- 1e.4 — accuracy-100 archive re-aggregation.
- 1e.5 — L7b-style predictive sheaf validation on 30-seed trajectories.
1e.2 — Multivariate A₁(s²) + E(s²) joint predictor — CONFIRMED¶
Runner: research/phase1e_multivariate.py.
Input: results/phase1d_spectral_vs_perfectplay.csv.
Output: results/phase1e_multivariate.json.
Univariate reproduces §2d.c exactly at N = 2587:
| channel | raw ρ | partial ρ |
|---|---|---|
| D₄-A₁(s²) | −0.498 | −0.331 |
| D₄-E(s²) | +0.484 | +0.304 |
| D₄-A₂(s²) | +0.151 | +0.120 |
| D₄-B₁(s²) | +0.148 | +0.059 |
| D₄-B₂(s²) | +0.034 | −0.013 |
Joint (A₁+E) OLS: - Raw R² = 0.155 - Partial-on-|disc_diff| R² = 0.100
Full D₄ 5-channel (A₁+A₂+B₁+B₂+E): - Raw R² = 0.160 (only ~0.005 gain over A₁+E) - Partial R² = 0.112 (~0.012 gain)
Confirms A₁+E soak up essentially all of the D₄-occupation signal. A₂ contributes a tiny residual (partial +0.133 univariate), B₁/B₂ effectively nothing.
Rotated single-dim sweep (standardised coordinates, θ on [0°, 180°]):
- Best raw: ρ = +0.515 at θ = 151.75° (≈ 0.88·E − 0.47·A₁)
- Best partial: ρ = +0.340 at θ = 156.25°
Canonical directions:
| direction | raw ρ | partial ρ |
|---|---|---|
| a1 only (θ=0°) | −0.498 | −0.331 |
| a1 + e (θ=45°) | −0.049 | −0.038 |
| e only (θ=90°) | +0.437 | +0.299 |
| a1 − e (θ=135°) | +0.510 | +0.337 |
The (A₁ + E) direction is near-null (ρ ≈ 0), confirming the §2d.c Plancherel/mirror reading: A₁ and E trade off against each other. The (A₁ − E) combination is slightly stronger than either channel individually (partial +0.337 vs A₁ alone −0.331). The gain is modest (≈ 2 %), so the rotated observable is a clean univariate summary without being a dramatic improvement.
Saturated reference: 10 spectral channels + |disc_diff| → R² = 0.462,
meaning the spectral battery plus disc count explains almost half of
archive_mean_lb variance. |disc_diff| alone (from residualising Y
on DD) explains R² = 0.288 — so the spectral 10-channel component
adds ~0.17 incremental R² on top of disc count. Most of that
incremental content is in the D₄-A₁/E pair.
1e.3 — Shannon-info × full spectral battery — STRONGER THAN §2c.2¶
Runner: research/phase1e_shannon_observables.py.
Output: results/phase1e_shannon_observables.json.
N = 2099 played plies over 35 Barcelona EGP 2026 games.
Headline: the §2c.2 in-book correlation (ρ = +0.213 for A₁⁻ magnetisation) more than doubles when the observable is swapped for the Z₂-invariant occupation projection.
Per-observable Spearman(I_move, observable) by phase:
| observable | all plies | in-book | out-of-book |
|---|---|---|---|
| a1_minus (§2c.2 baseline) | −0.065 | +0.213 | −0.465 |
| e_minus | −0.011 | +0.426 | −0.508 |
| d4_a1_occ | −0.115 | +0.465 | −0.755 |
| d4_a2_occ | +0.366 | +0.124 | +0.335 |
| d4_b1_occ | +0.403 | +0.345 | +0.327 |
| d4_b2_occ | +0.244 | +0.237 | +0.183 |
| d4_e_occ | +0.508 | +0.422 | +0.438 |
Control ρ(I_move, n_legal_moves) = +0.814 (matches §2c.2).
Readings:
-
D₄-A₁(s²) in-book ρ = +0.465 (vs §2c.2's +0.213 for A₁⁻). More than 2× the effect size. The §2c.2 A₁⁻ correlation captured the D₄ part of the A₁⁻ = D₄ · Z₂⁻ projection but was weakened by the Z₂-odd component (which the single-disc-type play doesn't modulate strongly in-book). Moving to the Z₂-invariant D₄-A₁(s²) projection lets the full D₄-symmetric occupation structure track Shannon-info.
-
D₄-A₁(s²) sign-flips between in-book (+0.465) and out-of-book (−0.755). A very strong effect with opposite signs. In-book: positions where the chosen move "diverges" more from WTHOR empirical also have higher D₄-symmetric occupation ("more evenly spread" stones). Out-of-book: the spectral-info relationship inverts sharply — deeper / less-surveyed positions with high D₄-symmetric occupation have LOWER I_move bits. Likely reading: Shannon info in the out-of-book tail is dominated by the
log₂|M|term (ρ(I_move, n_legal) = +0.81), and n_legal is itself negatively correlated with D₄-symmetric fill in late-game (when few legal moves exist, the board is dense and occupation is more symmetric). -
D₄-E(s²) all-plies ρ = +0.508. Strongest single-observable correlation across the full corpus. Matches the §2d.c twin- channel story — the "oriented anisotropy" component of occupation tracks Shannon-info in the same direction across book and out-of-book phases. Unlike A₁, E doesn't sign-flip.
-
D₄-B₁(s²) all-plies ρ = +0.403. Unexpectedly strong, beats the rank among "moderate" channels in §2d.c. In Takizawa archive_mean_lb correlation B₁ was a "ghost signal" (partial 0.074). Against Shannon info it becomes a substantive correlation. Probably reflects the edge/corner vs centre occupation asymmetry that Takizawa proved-bounds didn't care about but tournament move-choice does.
-
D₄-A₂(s²) all-plies ρ = +0.366. A₂ is the pure rotation channel (transforms as Rz); it captures "spin" of the occupation pattern. Also substantive against Shannon info, another novel result vs §2c.2.
1e.5 — Predictive sheaf spectrum (L7b analog) — POSITIVE, contra logo L7b¶
Runner: research/phase2_predictive_sheaf.py.
30 random-play trajectories (seeds 100–129), mean length 60.4 moves,
~1750 pairs per Δ.
R² summary (multivariate OLS, all 8 sheaf features → target):
| Δ | target | R²(pred t→t+Δ) | R²(snap t+Δ) | R²(pers target t→target t+Δ) | gain_vs_pers |
|---|---|---|---|---|---|
| 1 | n_legal | 0.562 | 0.577 | 0.325 | +0.237 |
| 3 | n_legal | 0.551 | 0.559 | 0.250 | +0.301 |
| 5 | n_legal | 0.544 | 0.551 | 0.167 | +0.377 |
| 10 | n_legal | 0.561 | 0.562 | 0.070 | +0.490 |
| 10 | ρ | 0.972 | 0.954 | 1.000 | −0.028 |
| 10 | empties | 0.972 | 0.954 | 1.000 | −0.028 |
Key observations:
-
Sheaf at time t predicts n_legal_moves at t + Δ almost as well as the sheaf at t + Δ itself. R²(pred) ≈ R²(snap) across all Δ; the gap is < 0.02 at Δ=10. This is the cleanest positive predictive result in the notebook so far.
-
Gain vs persistence baseline grows with Δ: +0.237 at Δ=1 rising to +0.490 at Δ=10. At Δ=10 the sheaf features explain R² = 0.56 of n_legal_moves(t+10) variance while knowing just n_legal_moves(t) explains R² = 0.07. The sheaf carries substantial forward-predictive content for legal-move count beyond simple temporal persistence.
-
Rho and empty_count have negligible predictive gain. They are near-monotone with move number (persistence R² ≈ 1.0), so the test is uninformative for those targets. A more interesting target would be ΔA₁⁻(t → t+Δ) or an emerging-flanking indicator, left for sequel work.
-
Direct contrast with logo L7b.5 (gain = −0.855 vs snapshot): Othello sheaf extrapolation gain near 0 vs snapshot, but dramatic positive gain vs persistence. Why the divergence?
- Logo L7b used a COMPLEX program's fiber built from its full trace at step N, asked to predict geometry at step N+Δ. The fiber was built from "this specific program's trace so far"; the target was that SAME program's geometry. A fiber summarising a specific program's partial trace has no predictive lever over just using the current snapshot.
- Othello sheaf is built from CURRENT BOARD STATE only (no trajectory memory). The sheaf implicitly encodes geometric connectivity of the current flank structure, which constrains the set of reachable positions Δ moves ahead. Predictive content comes from current-state structure that bounds the near-future, not from trajectory memory.
- The two systems differ structurally: logo L7b tested trajectory-based fiber prediction; 1e.5 tested state-based sheaf constraint propagation. Both gave different verdicts because they're different experiments in spirit.
L7b caveat upgrade: the "does snapshot extrapolate forward?" question has a different answer in Othello sheaves than in logo fibers. §3 should note that the sheaf λ₂ at move t usefully predicts legal-move count at move t + 10. Open question: does this generalise to tournament-play trajectories (where the move choice correlates with future trajectory) or is it a random-play artifact?
Caveats. - The sheaf kernel_dim is constant (128) across every position (§3.4), so correlations involving kernel_dim as a feature are ignored by OLS (effectively NaN warnings in the log). - Targets ρ and empty_count are trivially predictable, included only as controls. - 30 trajectories may be too few to detect seed-variance; re-run at M=100 would tighten the effect size estimate.
1e.1 — edax d=20 Reading A vs B verdict — CONFIRMED, Reading B¶
Runner: research/phase1e_edax_d20_tasklist.py.
Analyzer: research/phase1e_edax_d20_correlations.py.
Walltime: 986 s ≈ 16.4 min (vs 4 h worst-case estimate — edax at
level 20 on 50-empty positions is faster than expected because
many positions resolve as full WLD-level endgame proofs).
Status: complete, N = 2587 (2587/2587 parse_ok).
Final headline Spearmans:
| channel | raw vs preproof | raw vs edax_d20 | raw vs archive_mean_lb |
|---|---|---|---|
| D₄-A₁(s²) | −0.349 | −0.342 | −0.498 |
| D₄-E(s²) | +0.346 | +0.341 | +0.484 |
| D₄-A₂(s²) | +0.119 | +0.118 | +0.151 |
| channel | partial vs preproof | partial vs edax_d20 | partial vs archive_mean_lb |
|---|---|---|---|
| D₄-A₁(s²) | −0.284 | −0.277 | −0.331 |
| D₄-E(s²) | +0.254 | +0.247 | +0.304 |
| D₄-A₂(s²) | +0.129 | +0.129 | +0.120 |
The d=20 correlation is indistinguishable from the pre-proof correlation — they differ by 0.007 raw and 0.007 partial on the headline channel, well within sampling noise at N = 2587. Going from pre-proof to d=20 does not bridge ANY of the 43 % gap to archive_mean_lb.
a_position metric (0 = aligned with pre-proof, 1 = aligned with archive bounds):
| scope | pre-proof | d=20 | archive_mean_lb | a_position |
|---|---|---|---|---|
| raw | −0.349 | −0.342 | −0.498 | −0.043 |
| partial | −0.284 | −0.277 | −0.331 | −0.142 |
a_position ≈ 0 means d=20 sits right at pre-proof; negative means d=20 is even SLIGHTLY further from archive_mean_lb than the pre-proof value is (0.001-level random drift).
Verdict: Reading B is load-bearing. The spectral D₄-A₁(s²) channel carries ground-truth-aligned information that even a strong deep-search engine at d=20 does not capture. Reading A (noise floor on y-variable) explains essentially 0 % of the gain.
Interpretation caveat. A Reading-C is worth stating: the archive bounds aggregate over 300 k–600 k 36-empty sub-problems per 50-empty parent. Each sub-problem is 14 ply deeper than the 50- empty parent, so the archive effectively integrates a much deeper search volume than any single-position d=20 call could reach (edax at d=20 evaluates leaves heuristically — a 50-empty position would require ~50 ply to truly solve). So "alignment" here may mean "the spectral channel captures structural content that only emerges at full-solve depth, not at d=20 heuristic-leaf depth." That is still Reading B in spirit (spectral > engine heuristic) but frames the mechanism as "structural truth emerges at game-theoretic resolution" rather than "spectral magically aligns with ground truth."
1e.4 — Accuracy-100 archive re-aggregation — NULL (as predicted)¶
Runner: research/takizawa_archive_loader.py
with the new --min-accuracy 100 flag.
Outputs:
- results/phase1d_archive_summary_exact100.csv
- results/phase1e_correlations_exact100.json
- results/phase1e_spectral_vs_perfectplay_exact100.csv
Archive walltime: 4959 s ≈ 83 min (matches §2d.b's ~80 min estimate almost exactly). Rate 0.52 files/s.
Side-by-side with the §2d.b unfiltered baseline (N = 2587):
| correlation | unfiltered (§2d.b) | exact100 (1e.4) | Δ |
|---|---|---|---|
| D₄-A₁(s²) raw vs archive_mean_lb | −0.4984 | −0.5017 | −0.003 |
| D₄-A₁(s²) partial | −0.3185 | −0.3196 | −0.001 |
| D₄-E(s²) raw | +0.4839 | +0.4864 | +0.003 |
| D₄-E(s²) partial | +0.3101 | +0.3109 | +0.001 |
| D₄-A₂(s²) raw | +0.1506 | +0.1513 | +0.001 |
| D₄-B₁(s²) raw | +0.1482 | +0.1494 | +0.001 |
All deltas are well under 1 % relative and all tighten in the
predicted direction. The §2d.b signal is robust to the
accuracy=99 residual. The archive was already ~99.3 % exact by
child-row per §2d.b (exact_fraction median 0.994), and filtering
out the remaining ~0.7 % shifts correlations by less than the
third decimal place.
Re-analysis protocol (reproducible):
mkdir -p /tmp/h9_exact100
python research/h9_strict_runner.py \
--archive-summary ../results/phase1d_archive_summary_exact100.csv \
--results-dir /tmp/h9_exact100
cp /tmp/h9_exact100/phase1d_spectral_vs_perfectplay.csv \
../results/phase1e_spectral_vs_perfectplay_exact100.csv
cp /tmp/h9_exact100/phase1d_correlations.json \
../results/phase1e_correlations_exact100.json
(The --results-dir /tmp/... redirect is important — h9_strict_runner.py
hardcodes its output filenames as phase1d_*, which would clobber
§2d.b's originals if pointed at ../results/.)
Headline: §2d.b's D₄-A₁(s²) and D₄-E(s²) correlations are NOT an accuracy-99 artefact. They survive the accuracy=100 restriction with negligible drift.
Phase 1e summary¶
Five sub-phases, all landing clean numerics:
- 1e.1 — Reading B confirmed at N = 2587. edax at d=20 gives ρ = −0.342 (raw) / −0.277 (partial) for D₄-A₁(s²), essentially identical to pre-proof edax_score (−0.349 / −0.284, difference 0.007). Archive_mean_lb gives −0.498 / −0.319. a_position metric = −0.04 raw, −0.14 partial: d=20 sits at pre-proof, NOT between pre-proof and archive. The 43 % gain in §2d.b is not explained by noise averaging. The spectral channel carries ground-truth-aligned content beyond deep-search heuristic eval.
- 1e.2 — A₁ and E are a Plancherel-locked pair. (A₁ + E) direction is near-null (partial ρ = −0.038), (A₁ − E) direction slightly stronger than either alone (partial ρ = +0.337 vs A₁ alone −0.331). Joint (A₁ + E) R² = 0.100 partial. Full 5-channel D₄ battery adds only +0.012 over A₁+E, confirming A₁/E soak up essentially all D₄-occupation signal.
- 1e.3 — Shannon-info correlations strengthen dramatically. §2c.2 in-book A₁⁻ ρ = +0.213 becomes D₄-A₁(s²) ρ = +0.465, and D₄-E(s²) all-plies ρ = +0.508 is the strongest Shannon-info correlation in the notebook. D₄-A₁(s²) sign-flips between in-book (+0.465) and out-of-book (−0.755) — novel, open for interpretation.
- 1e.4 — accuracy=99 residual is not the story. Restricting to accuracy=100 children shifts all D₄-occupation correlations by < 0.003. §2d.b stands.
- 1e.5 — Othello sheaf extrapolation is positive, contra logo L7b.5. Sheaf at t predicts n_legal_moves at t+10 with R² = 0.56, essentially matching sheaf at t+10 (R² = 0.56). Gain vs persistence baseline +0.24 (Δ=1) → +0.49 (Δ=10). Opposite sign from logo L7b.5; interpreted as "state-based sheaf constraint propagation" vs logo's "trajectory-fiber extrapolation".
Single-line headline for the notebook §2e: the D₄-A₁(s²) and D₄-E(s²) channels are ground-truth-aligned in a Reading-B sense (edax at d=20 does not capture them), their "A₁ − E" rotated combination is the marginally strongest univariate Othello summary, the accuracy=99 archive residual is a non-issue, the Shannon-info coupling to these channels is 2× stronger than §2c.2 reported, and the Othello sheaf spectrum carries non-trivial forward-predictive content for legal-move count.
1e.5b — Predictive sheaf on tournament trajectories with spectral targets — POSITIVE with crossover¶
Runner: research/phase1e_predictive_sheaf_pgn.py.
Output: results/phase1e_predictive_sheaf_pgn_liveothello_2025_all.json.
Corpus: liveothello_2025_all.pgn (2178 tournament games, mean
trajectory length 61.7 plies). Walltime 50.5 min. Pair counts
112 k–132 k per Δ.
Extends 1e.5 to (a) real tournament play instead of random-play and (b) spectral targets (A₁⁻, D₄-A₁(s²), D₄-E(s²)) in addition to the near-monotone state targets (n_legal, ρ).
n_legal_moves — sheaf retains predictive power under tournament play:
| Δ | R²(pred) | R²(pers) | gain_vs_pers |
|---|---|---|---|
| 1 | 0.559 | 0.243 | +0.316 |
| 3 | 0.565 | 0.187 | +0.378 |
| 5 | 0.580 | 0.126 | +0.455 |
| 10 | 0.624 | 0.214 | +0.410 |
Matches 1e.5's random-play result in magnitude (+0.24 → +0.49 there; +0.32 → +0.41 here). Tournament vs random doesn't change the qualitative finding — the sheaf's constraint on future move-count is a state-geometry property, not a play-style artifact.
A₁⁻ magnetisation energy — crossover between persistence and sheaf prediction:
| Δ | R²(pred) | R²(pers) | gain_vs_pers | gain_vs_snap |
|---|---|---|---|---|
| 1 | 0.443 | 0.789 | −0.345 | +0.012 |
| 3 | 0.442 | 0.597 | −0.155 | +0.030 |
| 5 | 0.438 | 0.467 | −0.029 | +0.044 |
| 10 | 0.418 | 0.321 | +0.098 | +0.066 |
Short-Δ regime: persistence dominates. At Δ=1 knowing A₁⁻(t) alone explains 79 % of A₁⁻(t+1) variance. The sheaf features at t explain only 44 % — worse than persistence.
Long-Δ regime: sheaf beats persistence. By Δ=10 persistence R² has decayed to 0.32 while sheaf R² holds at 0.42. Crossover is between Δ=3 and Δ=5. At Δ=10 the sheaf features at t carry forward-predictive A₁⁻ information that A₁⁻(t) itself does not.
Additional oddity: gain_vs_snap is consistently POSITIVE for A₁⁻
(+0.01 to +0.07). The sheaf at t predicts A₁⁻(t+Δ) slightly BETTER
than the sheaf at t+Δ does. Most natural reading: the sheaf's
relationship with A₁⁻ is non-stationary across the game, and the
"past" sheaf encodes the trajectory from which A₁⁻(t+Δ) arose more
faithfully than a momentary snapshot of the future sheaf. Open
interpretation.
D₄-A₁(s²) occupation — near-monotone, no sheaf benefit: Persistence R² = 0.986–0.999 across Δ; sheaf slightly negative gain. The occupation channel grows almost deterministically with disc count, so persistence is near-perfect and nothing can improve it. As expected; treated as a control.
D₄-E(s²) occupation — persistence decays, sheaf still flat: Persistence R² drops from 0.95 (Δ=1) to 0.18 (Δ=10), confirming E(s²) is genuinely trajectory-dependent. But R²(pred) is only ~0.11–0.17 across Δ. Sheaf features do not predict E(s²) forward — match persistence at Δ=10 and lose at shorter Δ. Notable that gain_vs_snap rises sharply (+0.07 at Δ=10): the sheaf at t is a better predictor of E(s²)(t+10) than the sheaf at t+10 itself, even though both are weak. Another non-stationary trajectory signature.
Summary — tournament-play predictive sheaf findings:
- n_legal_moves is the sheaf's best predictive target (grows with Δ, matches random-play).
- A₁⁻ energy shows a persistence-vs-sheaf crossover: persistence wins short-term, sheaf wins long-term (Δ ≥ ~5–7). Novel.
- D₄-A₁(s²) is a negative control (persistence saturates).
- D₄-E(s²) persistence decays fast but sheaf doesn't pick up the slack; an open question whether a richer fiber model would.
- The
gain_vs_snap> 0 oddity for A₁⁻ and E(s²) — sheaf at t beats sheaf at t+Δ — is a non-stationary sheaf signature worth investigating separately.
The original 1e.5 retraction of the logo L7b MISS stands and strengthens: state-based sheaf structure does carry forward- predictive content, now demonstrated on both random play (state targets) and tournament play (spectral targets).
1e.6 — Sign-flip decomposition — Simpson's paradox identified¶
Runner: research/phase1e_signflip_decomposition.py.
Output: results/phase1e_signflip_decomposition.json.
Uses per-move CSV from 1e.3.
The §1e.3 in-book (+0.465) vs out-of-book (−0.755) split is dominated by Simpson's paradox, not by book coverage. Sign flip is a game-phase effect, not a book-cliff effect.
Within-phase Spearman(I_move, D₄-A₁(s²)) by empties range:
| range | N | all | in-book | out-of-book |
|---|---|---|---|---|
| opening (60–53 empties) | 280 | +0.245 | +0.245 | (all in-book) |
| early midgame (52–45) | 280 | +0.117 | +0.104 | +0.480 |
| late midgame (44–35) | 350 | +0.283 | +0.126 | +0.321 |
| endgame entry (34–25) | 350 | +0.065 | +0.100 | +0.033 |
| deep endgame (24–10) | 525 | −0.425 | (no in-book) | −0.440 |
| terminal (9–0) | 314 | −0.616 | (no in-book) | −0.616 |
Within-phase in-book correlations are small (+0.10–0.25), far from the +0.465 aggregate. The aggregate amplification comes from Simpson-style between-phase structure — D₄-A₁(s²) and I_move both grow with filling density, so the between-phase trend boosts rank correlation on the pooled in-book subset.
Out-of-book covers mostly post-book deep endgame (30–0 empties). Within deep endgame and terminal, within-phase ρ is strongly negative (−0.44, −0.62). The aggregate out-of-book −0.755 is compatible with this floor plus between-phase amplification in the opposite direction.
Sign flip occurs at ~24 empties, not ~20. The WTHOR book coverage cliff is at empties ≈ 20 (see §2c.2), but the spectral sign flip happens 4 plies earlier. The two are nearby but not coincident — confirms flip is phase-driven, not coverage-driven.
D₄-E(s²) has a double sign flip: +0.128 (opening) → +0.263 (early midgame) → −0.264 (endgame entry) → +0.525 (terminal). A more complex phase trajectory than A₁, motivates a longitudinal analysis rather than a single aggregate ρ.
A₁⁻ magnetisation is consistently NEGATIVE within every phase (opening in-book −0.274, late midgame in-book −0.419). The §2c.2 reported aggregate of +0.213 is a pure Simpson's paradox artifact of between-phase structure; within any single phase A₁⁻ correlates NEGATIVELY with Shannon info. This is a material retraction of §2c.2's framing: A₁⁻ does NOT track strategic divergence from tournament-empirical policy; it tracks game phase.
Takeaway. For trajectory-based claims (Shannon info, move-by-move analyses) the notebook's previous in-book / out-of-book split is misleading as a between-group comparison. Within-phase decomposition is the cleaner analysis. The static-50-empty results (§2d / §1e.1) are unaffected because they are single-phase.
1e.7 — Open item follow-ups (Z holonomy, T4 T_eff, T5 chains, faithful-sheaf gain, chess A1/E, engine dispatch)¶
Seven open items closed in a single session (commits 0f7a74f …
345a50e …0ace888), plus the encoder engine-dispatch wiring.
1e.7.1 — §2.H5 holonomy characterisation at corpus scale¶
Runner: research/phase1e_holonomy_plaquettes.py.
Output: results/phase1e_holonomy_plaquettes.{csv,json}.
Enumerated 1192 loops across 7 shape families (unit plaquettes, 2×2 and 3×3 squares, m×1 / 1×m bars, corner triangles, long-jump rectangles). Clean structural rule:
Z₂ holonomy exists iff the loop is a long-jump W×1 rectangle with W ≥ 3 (105/105 such loops non-trivial).
All other 1087 loops are trivial (cos = +1). The §2.H5 hand-picked loop (rectangle (0,0)-(0,3)-(1,3)-(1,0)-(0,0), i.e. one of the 35 lj_rect_3x1 instances) is one example of this class. The effect is path-orientation-dependent, not homotopy-invariant: lj_rect_1x3 (transpose) is trivial. Connection form's curvature concentrates on horizontal long-jumps of width ≥ 3.
1e.7.2 — T4 (T_eff, D_eff) thermodynamic trajectory¶
Runner: research/phase1e_t4_thermodynamic_trajectory.py.
Robust variant: research/phase1e_t4_robust.py.
Outputs: results/phase1e_t4_thermodynamic_*.json, results/phase1e_t4_robust_*.json.
Per-ply T_eff = dE_tot / dS_eff where E_tot = ||enc||² and S_eff = Shannon entropy of the 12-channel energy distribution:
Barcelona (N=2184 plies, finite-diff): T_eff median = −11 k WC 2005 (N=1418 plies, finite-diff): T_eff median = −12 k Barcelona (windowed OLS, window=8, r²≥0.3): T_eff median = −22 k, IQR [−35 k, −14 k]
Negative T_eff is robust across both 20-year-apart corpora and both estimation methods. Reading: spectral-energy ANTI-correlates with Shannon entropy over the channel distribution — as total energy grows (disc count rises), channel distribution concentrates (fewer effective channels carry the norm). Structurally consistent with the A₁/E Plancherel-budget story (§2d.c).
Mean/outliers are heavy-tailed (|dS|→0 outliers dominate); median is the honest summary.
1e.7.3 — T5 FK-BC flank-cluster size distribution¶
Runner: research/phase1e_t5_cluster_distribution.py.
Output: results/phase1e_t5_cluster_distribution.json.
Extracts maximal same-colour chain lengths (≥ 2) along every ray from every ply; fits exponential vs power-law.
corpus N chains mean max tau lambda preferred Barcelona_EGP_2026 141 584 2.94 8 2.67 0.72 exponential World_Championships_2005 93 950 2.86 8 2.73 0.77 exponential liveothello-2026-APR 1 646 532 2.95 8 2.67 0.72 exponential liveothello_2025_all 8 754 052 2.94 8 2.68 0.73 exponential
All 4 corpora prefer exponential by ΔAIC of 3 k–221 k. §10.10 T5's critical FK-BC power-law is rejected. Remarkable cross-corpus stability (mean 2.86–2.95, tau 2.67–2.73) across 20 years. Echoes T1's per-move flip-count result (also exponential).
1e.7.4 — Faithful sheaf restriction maps + predictive validation¶
Runners: research/faithful_sheaf.py,
research/phase1e_predictive_sheaf_faithful.py,
research/phase1e_faithful_gain_investigation.py.
Replaces the endpoint-only crude restriction maps with bracket- aware ones. Edges only exist across active flanks (R2 stable or R3 pending); restriction maps have α = 1.0 (pending) or α = 0.5 (stable). Breaks §3.4's constant-128 kernel dim artifact: faithful kernel varies 141–188 with state on the seed=123 trajectory. λ₂ differs from crude by mean +0.29.
Predictive-sheaf re-run on Barcelona (see §1e.5) now with faithful
restrictions. gain_vs_snap > 0 is ubiquitous:
target Δ=10 gain_vs_snap d4_e_occupation_energy +0.086 rho / empty_count +0.060 d4_a1_occupation_energy +0.050 a1_minus_energy +0.028
Investigation of the gain signature (Probe 1–3 in phase1e_faithful_gain_investigation.py):
-
Δ sweep: gain_vs_snap GROWS monotonically with Δ on persistence-saturated targets (rho 0.045 @ Δ=7 → 0.104 @ Δ=20).
-
Train/test split (17/18 games): out-of-sample gain is LARGER than in-sample for every target:
target OOS gain_vs_snap a1_minus_energy +0.244 (in-sample +0.028) d4_e_occupation +0.105 (in-sample +0.086) rho +0.105 (in-sample +0.060)
Rules out overfit / survivorship (H2).
-
Per-feature ablation: EVERY feature has delta_gain < 0 (removing any one feature makes predictive lose less than snapshot). No single feature drives the gain; it's a joint feature decorrelation effect.
Mechanistic reading (H1, confirmed): at time t, the 8 faithful- sheaf features encode diverse trajectory-relevant info — λ₂ (connectivity), entropy (spectral spread), kernel_dim (bracket- graph sparsity), λ_max (largest mode). These features are more INDEPENDENT at t, so the multivariate regression picks up unique predictive content per feature. At time t + Δ, the same features collapse onto "current state summary" — they become redundant with each other, and with features of the target. Hence predictive regression at t outperforms snapshot regression at t + Δ.
This is a non-trivial signature of temporal structure in the faithful sheaf. The crude sheaf's constant kernel dim (§3.4) hid it; the faithful sheaf's position-dependent kernel dim exposes it.
1e.7.5 — Chess A1/E pair check + Simpson's paradox¶
Runners: research/phase1e_chess_a1_e_pair.py,
research/phase1e_chess_a1_e_bootstrap.py,
research/phase1e_othello_a1_e_per_game.py.
Outputs: results/phase1e_chess_a1_e_{pair,bootstrap}.json,
results/phase1e_othello_a1_e_per_game.json.
Chess pooled Pearson(E_A1, E_E) = +0.0414 (p = 2e-5) on N=10 729 plies across 3 corpora. Looks like the Othello -0.834 mirror is Othello-specific. But per-game analysis reveals Simpson's paradox:
chess corpus pooled per-game median ashchess_N50 +0.011 -0.178 drnykterstein_N10 -0.201 -0.582 hf_N50 +0.116 -0.171
Chess DOES have within-game anticorrelation (A1/E mirror). It's hidden by between-game variation when pooled. Bootstrap: drnykterstein sits at 3.3rd percentile of ashchess 10-game resamples — unusual but within the 95 % null.
Othello per-game (N = 2177 games, 2025 corpus):
pair per-game median pooled magn_A1⁻ vs magn_E⁻ +0.481 +0.330 occ_A1⁺ vs occ_E⁺ −0.046 −0.072 magn_A1⁻ vs magn_B2⁻ +0.310 occ_A1⁺ vs occ_B2⁺ +0.093 fiber_ortho_s vs fiber_diag_s +0.799
Chess and Othello have OPPOSITE-signed per-game A1/E structure. Chess (piece-value signal) is −0.18 to −0.58 median (anti-correlation). Othello magnetisation (Z₂-odd) is +0.48 median (co-correlation). Othello occupation (Z₂-even) is weakly negative (−0.05).
The Othello 50-empty static −0.834 is a phase-specific structural property (14-disc configuration), not a general trajectory signature. Trajectory-level A1/E structure is game-specific and differently-signed between chess and Othello.
1e.7.4b — Faithful gain investigation at 2025 corpus scale¶
Runner: research/phase1e_faithful_gain_investigation.py
on the 2178-game 2025 corpus. ~1 h walltime (faithful sheaf
computation dominates). Refines §1e.7.4's Barcelona result at
60× more statistical power.
Headline table — in-sample gain_vs_snap by Δ and target:
Δ target R²_pred R²_snap gain 10 n_legal_moves +0.529 +0.517 +0.012 10 a1_minus_energy +0.287 +0.328 −0.041 10 d4_a1_occupation_energy +0.886 +0.816 +0.070 10 d4_e_occupation_energy +0.299 +0.140 +0.159 10 rho +0.899 +0.820 +0.079 20 a1_minus_energy +0.257 +0.297 −0.041 20 d4_a1_occupation_energy +0.898 +0.748 +0.150 20 rho +0.917 +0.763 +0.154
Train/test split (1089 train, 1089 test) at Δ=10:
target R²_pred_oos R²_snap_oos gain n_legal_moves +0.515 +0.507 +0.008 a1_minus_energy +0.269 +0.327 −0.058 d4_a1_occupation_energy +0.884 +0.812 +0.072 d4_e_occupation_energy +0.288 +0.131 +0.157 rho +0.895 +0.814 +0.081
OOS gains track in-sample closely — not an overfit artefact.
Refined H1 interpretation: trajectory memory is target-specific.
-
Strong predictive gain (H1 confirmed): Z₂-invariant occupation channels. d4_e_occ especially (+0.159 @ Δ=10, holds at +0.089 @ Δ=20 after peaking mid-range). Rho / empty_count gains grow monotonically with Δ (0.079 → 0.154). d4_a1_occ likewise (0.070 → 0.150).
-
No predictive gain (H1 null): n_legal_moves (+0.012 @ Δ=10). Sheaf and snapshot are equivalent here; what predicts future mobility is captured equally well by either.
-
Snapshot wins (H1 negated): a1_minus_energy. Gain is consistently NEGATIVE across Δ ∈ {5, 10, 15, 20} (−0.03 to −0.04, in-sample and OOS). Consistent with §1e.7.5b's A1-drift decomposition: A1⁻ is phase-sensitive (Z₂-odd magnetisation channel that sign-flips between phases), so the sheaf at time t does NOT reliably encode A1⁻(t + Δ) better than the sheaf at t + Δ itself.
Per-feature ablation at Δ=10 on d4_e_occ: all 8 features have POSITIVE dGain values (removing any feature SHRINKS the gain). Features cooperate in the predictive direction.
Per-feature ablation on a1_minus_energy: all features have NEGATIVE dGain (removing any feature LOWERS the snapshot more than the predictive — so removing hurts snapshot more, which would raise gain; since baseline gain is already negative, we're at local maxima). Consistent with "features encode the same thing for A1⁻ in both directions, so snapshot is strictly better equipped."
Mechanistic reading: the faithful sheaf's features carry forward-predictive information about CURRENT-STATE-TYPE signals (occupation density, channel geometry) but NOT about phase-odd magnetisation. The Z₂-even content evolves smoothly and the sheaf captures its trajectory; the Z₂-odd content is too phase- coupled to benefit from trajectory memory.
This reconciles §1e.7.4's Barcelona result (where a1_minus gain was weakly positive +0.028) with §1e.7.5b's deep-endgame localisation on the crude sheaf. Barcelona N=35 was too small to resolve a1_minus's true ~−0.04 gain; §1e.7.5b already hinted at phase-localisation; 2025 now separates Z₂-even trajectory memory (H1 confirmed) from Z₂-odd phase-sensitivity (H1 null).
1e.7.5b — A1⁻ drift by phase at 2025 corpus scale¶
Runner: research/phase1e_a1_drift_by_phase.py.
Output: results/phase1e_a1_drift_by_phase_2025.json.
Corpus: 2178 games, 132 k pairs at Δ=1. Walltime ~55 min.
Barcelona (35 games) showed a messy picture — small N per bucket
couldn't cleanly localise the gain_vs_snap > 0 signature. 2025
resolves it:
gain_vs_snap (crude sheaf → A1⁻ energy) at Δ=10, by phase:
phase N_pairs R²_pred R²_snap R²_pers gain_snap opening 17 403 +0.024 +0.084 +0.006 −0.060 early_midgame 17 374 +0.024 +0.086 +0.010 −0.062 late_midgame 21 661 +0.034 +0.050 +0.045 −0.016 endgame_entry 21 609 +0.039 +0.062 +0.090 −0.023 deep_endgame 32 308 +0.120 +0.074 +0.109 +0.047 terminal 2 173 +0.146 +0.390 +0.028 −0.244
The non-stationary sheaf signature is concentrated in deep
endgame (24–10 empties). Opening/midgame have gain_vs_snap < 0
(snapshot beats predictive — standard expectation). Deep endgame
reverses it.
Interpretation: deep endgame is where the board is densely populated with latent bracket structures. The sheaf at time t captures the full bracket network; by t + 10 many brackets have resolved (flipped into stable flanks or disappeared). The "which brackets were pending at t" information is gone at t + Δ, leaving the snapshot sheaf's features only describing current state.
Consistent with §1e.7.4's faithful-sheaf finding: H1 (trajectory memory) holds on the crude sheaf too, but only in the phase where bracket-structure dynamics dominate the A1⁻ signal. At opening, A1⁻ is dominated by the initial 4-disc configuration's D₄×Z₂ structure, which has no trajectory memory to encode.
Terminal (N=2173) is a small-sample outlier; games that reach Δ+10 past terminal are a small self-selected subset.
This finding tightens §1e.5b's (+0.066) pooled 2025 result: the effect is phase-localised, not diffuse. Future predictive- sheaf work should segment analyses by n_empties.
1e.7.7 — Simpson's paradox mechanism (chess vs Othello)¶
Runner: research/phase1e_simpson_mechanism.py.
Output: results/phase1e_simpson_mechanism.json.
Law-of-total-covariance decomposition of pooled Pearson into within-game and between-game components:
corpus pooled within between chess/ashchess_N50 +0.011 −0.117 +0.667 chess/drnykterstein_N10 −0.201 −0.315 +0.605 chess/hf_N50 +0.116 +0.022 +0.663 othello/Barcelona magn_A1⁻ × E⁻ +0.353 +0.386 −0.431 othello/2025 magn_A1⁻ × E⁻ +0.330 +0.385 −0.479 othello/2025 occ_A1⁺ × E⁺ −0.072 −0.051 −0.592
Clean mechanism: the between-game component (correlation of per-game means of A1 and E) is tightly +0.60 to +0.67 for all three chess corpora. Chess games with high mean A1 also have high mean E — game-to-game baseline variation co-varies strongly. Within any single chess game, A1 and E mildly anti-correlate (mean −0.15). Pooled Pearson reflects both: near-zero, because the positive between and negative within partially cancel.
Othello magnetisation reverses the sign structure: between- game is strongly NEGATIVE (−0.48 on 2025), within-game is strongly POSITIVE (+0.39). Games whose mean A1⁻ is high tend to have low mean E⁻ — opposite direction from chess. Pooled is positive because the within-game positive is the larger contribution here.
Othello occupation: both within and between are negative (weak −0.05 and strong −0.59). Pooled weakly negative because within-game variance dominates the occupation signal.
The chess "mirror" that §1e.7.5 identified at per-game median (−0.18 to −0.58) reflects the within-game component with medium-range noise. The between-game +0.67 is the larger statistical effect when pooling.
1e.7.8 — T4 log-transform robust variant¶
Runner: research/phase1e_t4_logtransform.py.
Output: results/phase1e_t4_logtransform_*.json.
Three variants compared on Barcelona:
variant slope_all median slope_hi_r2 median IQR (hi_r2) raw −14 726 −22 188 [−34 803, −14 008] log_E −4.33 −6.04 [−8.48, −4.28] log_both −3.40 −4.85 [−6.70, −3.37]
log_E (regress log(E_tot) on S_eff) compresses the raw slope's
heavy tail while preserving the negative sign. Interpretation:
a unit increase in Shannon entropy corresponds to a factor of
exp(−6.04) ≈ 0.002 change in total spectral energy (high-r² wins).
log_both (log-log / elasticity): a 1 % change in S_eff
corresponds to a −4.85 % change in E_tot. Same sign, similar
magnitude. Either log variant is a dramatically more stable
estimator than the raw slope, without changing the qualitative
conclusion (negative T_eff).
1e.7.9 — C encoder ctypes path¶
Scripts: research/othello_spectral/runtime.py
gains find_c_dll(), _load_dll(), encode_768_c_ctypes(),
encode_768_c_ctypes_batch().
The subprocess-based C path was 7.6 s on APR 2026 (25 k states) vs Python's 2.7 s — the stdio round-trip dominated single-threaded encoding.
Building othello_spectral.dll and calling via ctypes bypasses
the subprocess entirely. Benchmarks:
path encode time (25k states) wall total python 3.0 s 18.4 s c (ctypes DLL) 3.1 s 17.8 s c (subprocess exe) 7.6 s 21.7 s
ctypes is 2.5× faster than the subprocess path and matches
Python on this corpus. Bit-identical .spectralz body SHA
matches between py and ctypes (dd8f68cc20c2f65a on APR 2026).
Parity verified at float64 precision on 8 random states via
tests/test_c_py_parity.py::test_ctypes_dll_parity_if_present.
CLI engine dispatch prefers ctypes when the DLL is present and
falls back to the subprocess binary when only the exe is built.
Both paths respect the VERSION string match + smoke test in
auto mode.
Build: clang -std=c17 -O2 -shared -I c_encoder/include
c_encoder/src/othello_spectral.c -o c_encoder/othello_spectral.dll.
Windows exports are gated by __declspec(dllexport) in
othello_spectral.h (via the OTHELLO_API macro).
1e.7.6 — C encoder body + engine dispatch¶
Scripts: research/othello_spectral/c_encoder/src/othello_spectral.c,
research/othello_spectral/runtime.py.
- ANSI C17 encoder body filled in, bit-identical to Python at float32 precision on 9 test states (starting + 8 random seeds).
- Codegen brace-count bug fixed: emit_c_tables was generating one extra brace level; clang silently truncated each row to its first value (all-zero channels). After fix, clang -Wall builds with zero warnings.
- Engine dispatch:
--engine {py, c, auto}on theencode-pgnCLI; env varsOTHELLO_SPECTRAL_BIN(path),OTHELLO_SPECTRAL_ENGINE(default mode).automode verifies the C binary (version string + byte-identical starting-position encoding) before selection; silent fallback to py on verification failure. - End-to-end parity: Barcelona 35-game corpus encoded via py and c gives byte-identical .spectralz bodies (SHA256 match on all 2184 frames).
Perf note: Python 2.7 s vs C 7.6 s on 25 k state encodes (APR 2026). C subprocess stdio round-trip dominates single-threaded encoding; C would win with direct memory sharing or multi-worker orchestration. Filed as perf improvement for a later revision.
Revised open items / sequel work¶
- 1e.5 multivariate predictive gain on richer targets. See 1e.7.4 — this is substantially closed.
- Chess-side A₁/E pair check. See 1e.7.5 — CLOSED.
- Harden h9_strict_runner against filename collision. Fixed —
--out-prefixflag landed in commitfd6246e. - A1-drift on 2025 corpus (queued, running as of this write). Barcelona result (§1e.5b's A1⁻ drift signature decomposed by phase bucket, 2099 plies, N=35 games) was statistically thin. 2025 rerun at N=2178 games promises meaningful per-bucket effect sizes.
- T4 outlier robustification — windowed variant (phase1e_t4_robust.py) now lands median −22 k with IQR [−35 k, −14 k]. Still heavy-tailed if windows are too short; worth a log-transform follow-up.
- Faithful sheaf gain investigation on 2025 corpus — §1e.7.4's Barcelona result is qualitatively consistent with §1e.5b's 2025 result; a 2025 rerun of the OOS probe would confirm the joint-feature-decorrelation interpretation at statistical power 1–2 orders of magnitude higher.
- C encoder batch perf. Current subprocess pipe is single-threaded and stdio-bound. Shared-memory or direct-PGN ingest in the C layer would make the C path actually faster than Python for large corpora.
- Chess Simpson's paradox mechanism. Why are chess games' A1/E per-game correlations tightly negative (−0.18 to −0.58 median) while pooling flattens them? Likely between-game baseline variation (game-to-game shifts in mean A1 and mean E) dominates within-game anticorrelation. Worth factoring with a mixed-effects model.
Files¶
Scripts added (this phase): - phase1e_multivariate.py - phase1e_shannon_observables.py - phase1e_edax_d20_tasklist.py - phase1e_edax_d20_correlations.py - phase2_predictive_sheaf.py - phase1e_signflip_decomposition.py - phase1e_predictive_sheaf_pgn.py - phase1e_holonomy_plaquettes.py (Open-4) - phase1e_a1_drift_by_phase.py (Open-1) - phase1e_t5_cluster_distribution.py (Open-3) - phase1e_predictive_sheaf_faithful.py (Open-5) - phase1e_faithful_gain_investigation.py (iii MAIN EVENT) - faithful_sheaf.py - phase1e_t4_thermodynamic_trajectory.py (Open-2) - phase1e_t4_robust.py (ii) - phase1e_chess_a1_e_pair.py (Open-7) - phase1e_chess_a1_e_bootstrap.py (iv) - phase1e_othello_a1_e_per_game.py (iv+) - phase1e_replay_from_features.py - phase1e_flipcount_distribution.py (T1) - phase1e_wthor_scale_retest.py (1c.3 retest)
Encoder package: - othello_spectral/ — v0.2.0 with py/c/auto engine dispatch, bit-identical C17 reference encoder (Open-6 + engine wiring)
Result files (1e.5b, 1e.6): - phase1e_signflip_decomposition.json - phase1e_predictive_sheaf_pgn_liveothello_Barcelona_EGP_2026.json - phase1e_predictive_sheaf_pgn_liveothello_2025_all.json
Scripts modified:
- takizawa_archive_loader.py —
--min-accuracy flag + per-row flush.
Result files: - phase1e_multivariate.json - phase1e_shannon_observables.json - phase1e_shannon_per_move_observables.csv - phase1e_predictive_sheaf.json - phase1e_predictive_sheaf_pairs.csv - phase1e_edax_d20.csv - phase1e_edax_d20_correlations.json - phase1d_archive_summary_exact100.csv - phase1e_correlations_exact100.json - phase1e_spectral_vs_perfectplay_exact100.csv
Per-seed cache for sheaf trajectories (30 files, ~60 KB each) at results/phase1e_sheaf_cache/.