LOGO HDC — Research Notebook¶
"Can't stop the signal, Mal. Everything goes somewhere, and I go everywhere." — Mr. Universe, Serenity (Joss Whedon, 2005)
Signature epigraph of the spectral-research collection. The body of work — validated results and rigorous falsifications alike — was offered through conventional channels and dismissed as foolery. The math stands independently. The discipline since: ship every result, falsifications included, with full reproducibility and per-row provenance (the Mathematical Provenance Method). A corpus that publishes its own invalidations is harder to dismiss than one that doesn't, and propagates through every channel that ingests open research. The signal is in the world; it goes everywhere now.
Status: Iteration 3 complete (Phases L0–L7). Iteration-1 L1 reversibility claim retracted; see §"Phase L1 — RETRACTION" below. L7a held-out generalization PASSES (L6d's D4 character claim survives); L7b partial-trace fibers MOSTLY MISS — fiber drifts with the trace (L7b.3 PASS) but does NOT extrapolate forward (L7b.½/⅘ MISS), which reshapes the trajectory hypothesis for chess. Started: 2026-04-14 (generalizing the chess-spectral split-object pattern to a non-chess domain) Iteration 2 added: 2026-04-14 — self-containment (drop chess_spectral sys.path hack), F₁ HDC retrieval fix, fiber-as-transport-map experiments (L6a–e), three-body topology (L6f–h), and an honest retraction of the iteration-1 L1 reversibility-via-binding claim. Iteration 3 added: 2026-04-14 — held-out generalization (L7a, gating) and partial-trace fibers (L7b, trajectory probe). See §L7 for numbers and interpretation.
Living document. Mirrors the structure of
docs/chess-maths/chess_spectral_research_notebook.md: prediction → experiment → numbers → interpretation → what it means for the broader pattern.Project navigation + state-pointer¶
ReadTheDocs landing — https://mlehaptics.readthedocs.io/en/latest/ — is the canonical pointer to the current state across all sister notebooks in this project.
This notebook is a snapshot. Future framework additions will not be back-ported into it; the RTD landing tells you whether new sister notebooks or downstream developments are available.
Brief since-summary (as of 2026-05-08): - The chess-spectral split-object pattern this notebook generalises was further instantiated on antikythera-spectral, doom-spectral (v1.0.0; first end-to-end Rosetta Stone existence proof), ephemerides-spectral (matured through v0.26.0; PyPI: https://pypi.org/project/ephemerides-spectral/), and othello-spectral. - The Mathematical Provenance Method (MPM) formalised as the project's discipline (ephemerides notebook §0.0); instrument-first physics critique in ephemerides §20. - Inkscape contribution shipped on the
spectral-faithfulbranch — three new SVG filter primitives (feSpectralBilateral,feSpectralDistance,feSpectralNoise) using the same eigenbasis substrate. - mfo-spectral sister-notebook added (May 2026) — Metric Field Ontology, one candidate foundational-ontology framing hosted in the project (cavity-instrument analogy; fractal metric field; ~11D structure motivated independently of string theory). Not the project's endorsed answer over alternatives; ephemerides §20 cites MFO as a worked example without picking a spatial-structure side. Future MPM target. - Ephemerides §21 — Tool-rejection as MPM-screening failure (symmetric counterpart to §20). Names the evaluator-side screening failure: rejecting work by which tool was used to make it, not by what it claims. Anchored on the historical orbital-mechanics chain DE441 traces back to (Copernicus / Bruno / Galileo / Kepler). §21.3 names the disability-accommodation dimension explicitly: categorical tool bans function as participation barriers for contributors with aphantasia, ADHD, dyslexia, motor disabilities, and many other variations. MPM is tool-agnostic by design.
Framing — Why LOGO¶
Chess-spectral established a reusable design pattern — a split-object-with-fiber-matrix: two HDC subspaces (piece identity, positional coupling) bound through an explicit coupling matrix whose rank / singular values / null-space carry the interesting structural information. The rest of the chess work is an instantiation of that pattern on a D4-symmetric 8×8 lattice.
LOGO is the first test of whether the pattern generalizes. It has:
- A tiny command set (12 atoms) — so Part A is fully enumerable.
- A small grammar (~20 productions) — Part B is fully enumerable.
- Geometry as output — Part C reuses the chess spectral basis directly (LOGO canvas ≡ chess board after rasterization).
If the pattern is generic, we should recover:
- F₁ rank reveals "kinds of coupling" between atoms and productions.
- F₂ SVD surfaces symmetry modes of the generated geometry.
- Structurally inert rules show up in F₁'s null space.
Phase L0 — Parser / AST / Interpreter¶
Deliverables: logo_ast.py, logo_parser.py, logo_interp.py.
Sanity checks (passed):
REPEAT 4 [FORWARD 100 RIGHT 90]closes the square within|bbox residual| < 1e-13.REPEAT 360 [FORWARD 1 RIGHT 1]producesbbox aspect ≈ 1.0— a near-circle.- Recursive procedure:
TO poly :n :d REPEAT :n [FORWARD :d RIGHT (360/:n)] END; poly(7 30)runs and closes.
Every AST node carries a rule_name field; the interpreter emits both
a turtle trace and a Counter of rule firings, which together feed
Parts B/C and both fibers.
Phase L1 — HDC primitives + structured atoms¶
Deliverables: logo_hdc/hd.py, logo_hdc/quantum_atoms.py, logo_hdc/codebook.py.
Key decisions:
- MAP-B bipolar at D=10000 — self-inverse bind, cosine sim linear in bit-agreement.
- Atoms structured from quantum numbers —
a_c = bundle_sum( bind(role_i, val_i) )over(category, arity, argtype, reverse_axis, block_opener). - bundle_sum vs bundle: we use unbinarized sums for atom and
rule composition. Majority-vote
bundledestroys the linearity that per-role flip arithmetic relies on — the reversibility-pair test failed withbundle, passed cleanly withbundle_sum.
Gotcha caught here: Initial quantum-number assignment had
LEFT/RIGHT and FORWARD/BACK sharing identical 5-tuples modulo the
reverse-axis coordinate. The rev_axis tag was just FWD/REV, which
collided across pair boundaries. Renamed the tags to
TRANS_FWD/TRANS_REV (translation) and ROT_FWD/ROT_REV (rotation)
so each reversibility pair has its own axis namespace.
⚠ RETRACTION (iteration 2). The original L1 went on to claim "with that fix,
bind(a_FORWARD, flip_axis) ≈ a_BACKunder cleanup and not toa_LEFT." This is wrong. Re-measured at D=10000:
probe cos(bind(FORWARD, flip_axis), probe) BACK −0.0037 LEFT −0.0073 RIGHT +0.0050 All three are statistical noise. A single-vector "flip axis" cannot algebraically swap one role inside a
bundle_sumof five role-value bindings — the bind hits all five terms and four of them become uncorrelated noise. The high FORWARD↔BACK structural similarity (~0.80) that the iteration-1 author noticed is real but comes from shared quantum numbers (⅘ roles match between FORWARD and BACK), not from any flip operation.The correct primitive is role-level, against the role and value atoms in the codebook:
unbind_role(cmd_hv, "role_reverse", cb) = cmd_hv ⊙ cb["role_reverse"]which extracts the value-HV bound to
role_reverse(the four other role bindings become uncorrelated noise of magnitude ≈ 1/√5). Measured at D=10000:cos(unbind_role(FORWARD, "role_reverse"), cb["rev_TRANS_FWD"]) = +0.4486 cos(unbind_role(FORWARD, "role_reverse"), cb["rev_TRANS_REV"]) ≈ 0.0 cos(unbind_role(FORWARD, "role_reverse"), cb["rev_ROT_FWD"]) ≈ 0.0Clean reversibility swap uses unbind→subtract→add on the unbinarized accumulator:
swap_role_value(FORWARD, "role_reverse", "rev_TRANS_FWD", "rev_TRANS_REV", cb) = a_FORWARD - bind(role_reverse, rev_TRANS_FWD) + bind(role_reverse, rev_TRANS_REV) ≈ a_BACK (cos = 1.0000 exactly — algebraically equal)So the
bundle_sumchoice was right, but for the wrong reason: it preserves linearity for role swaps, not for whole-atom "flip-binding".flip_axisandREVERSE_OFwere deleted in iteration 2; see logo_hdc/quantum_atoms.py for the corrected API. Lesson: structural similarity of atoms is not the same as reversibility-via-binding; the conflation in iteration 1 is the cautionary tale that motivated this notebook's "report actual numbers" rule.
Bundle-size sanity: D=10000 recovers 40 random atoms at >95% cleanup accuracy. Cliff around D/250, consistent with MAP-B theory.
Phase L2 — Vocab + syntax encoders with independent inferability¶
Deliverables: logo_encode_vocab.py, logo_encode_syntax.py.
Prediction: Each side must be probeable independently — sim(vocab_lex_hv,
a_FORWARD) reveals FORWARD's presence without touching Part B, and
vice versa. This is the "split" part of split-object.
Results:
- Lex/vocab: top-6 command retrieval on mixed programs — 83% hit-rate (target ≥ 80% from plan).
- Context/vocab:
unbind(vocab_ctx_hv, a_FORWARD)cleans to the correct slot position within the top-3. - Syntax side: all 5
MOTION_NUMpositions in aREPEAT 5 [FORWARD ...]program recovered in the top-5.
Small fix here: rule_sequence filter initially used
endswith("_NUM") to drop literal-terminal rules; this wrongly also
dropped r_STMT_is_MOTION_NUM. Narrowed to an explicit skip-list.
Phase L3 — F₁: vocab × syntax fiber¶
Deliverables: logo_hdc/fiber.py, logo_hdc/split.py, logo_build_splits.py.
Prediction (from plan): Rank 3-4 with modes aligned to spatial-transform-with-arg, state-toggle, block-opening, variable-binding. Null space: structurally-inert productions.
Result — EXCEEDED prediction:
F1 matrix shape=(12, 12) rank=7
sigma[0] = 30.50 FORWARD <-> r_STMT_is_MOTION_NUM
sigma[1] = 22.00 REPEAT <-> r_STMT_is_REPEAT_NUM_BLOCK
sigma[2] = 13.00 THING <-> r_NUM_is_VAR
sigma[3] = 9.00 MAKE <-> r_STMT_is_MAKE_NAME_EXPR
sigma[4] = 7.07 {PENUP, PENDOWN} <-> r_STMT_is_PEN
sigma[5] = 2.83 {TO, END} <-> r_STMT_is_TO_NAME_ARGS_BLOCK_END
sigma[6] = 1.00 IF <-> r_STMT_is_IF_COND_BLOCK
null rows: [BACK] (never used in the corpus — legitimate null)
null cols: [r_PROG, r_BLOCK, r_CALL, r_NUM_is_literal, r_NUM_is_BINOP]
(containers / leaves — no direct command coupling)
Seven crisp modes, each semantically interpretable. BACK in the null space because no corpus program uses it (interpreter supports it fine). The container / leaf productions sit in the column null space — they don't directly fire commands, they dispatch to sub-rules that do.
Retrieval check (iteration-1 baseline): unbind(F₁_hdc, a_FORWARD)
cleans to r_STMT_is_MOTION_NUM at cosine 0.545 (next best 0.08).
Retrieval works for FORWARD because its true coupling has α=23 — the
single largest weight in the matrix.
L3 — iteration 2: F₁ HDC retrieval for low-energy couplings¶
Symptom (iteration 1): PENUP, PENDOWN, IF, MAKE, THING all
retrieve r_STMT_is_MOTION_NUM (the heaviest rule) instead of their
true couplings. The matrix form is exact; the HDC form is dynamic-range
limited — heavy rules dominate every probe in the bundled
superposition.
Diagnosis: D=10000 is plenty (chess uses the same dimension); the
issue is purely dynamic-range. PENUP's true coupling weight is 5;
MOTION_NUM's is 23. In the bundled fiber F = Σ α(c,r) bind(c, r),
unbinding by PENUP recovers PENUP's row but the contribution from
every other row leaks through proportional to α — and α=23 wins.
Fix: per-row L1-normalization in fiber_to_hdc:
F = Σ_{c,r} (α(c,r) / Σ_r' α(c,r')) · bind(c, r)
Each command contributes equal total mass; cross-row leakage is suppressed.
Result:
| variant | top-1 retrieval (corpus) |
|---|---|
| iteration 1 (raw) | 5/11 (45%) |
| log-scale weights | 7/11 (64%) |
| per-row L1 norm | 10/11 (91%) ✓ |
The remaining miss is THING (its quantum atom is structurally close
enough to FORWARD that residual MOTION_NUM cross-talk still wins).
That's a structured-atom limitation, not a fiber-construction one.
Per-row normalization is now the default in
logo_hdc/fiber.py (normalize_rows=True).
Phase L4 — F₂: syntax × geometry fiber, symmetry recovery¶
Deliverables: logo_encode_geometry.py, logo_corpus.py, logo_fiber2.py.
This is the framework-validates-outside-chess moment.
L4a — Geometry encoder¶
First attempt rasterized the trace into an 8×8 ink grid, quantile-binned
cell intensities into piece-chars (P/N/B/R/Q by bin, globally-max cell →
K), and fed the resulting pos-dict through chess_spectral.encode_640.
This failed to discriminate polygons. Post-encoding cosine similarity between all polygon pairs was ≥ 0.98. Diagnosis: quantile binning normalizes intensity distribution, so all polygons (which all fill a square bounding box uniformly) produce near-identical pos-dicts, which collapse to near-identical 640-dim HVs.
Fix: bypass piece-char quantization entirely. Only the D4-irrep
channels (dims 0-319 of the chess encoder) are meaningful for LOGO —
the fiber channels (F1..FD) are piece-type-specific and don't apply.
We feed the raw 64-dim intensity grid directly into project_irrep
for each of {A1, A2, B1, B2, E}, yielding a 320-dim HV that preserves
angular structure.
Result — polygon cosine after the fix:
triangle square pentagon hexagon octagon dodecagon circle
triangle 1.000 0.485 0.492 0.389 0.321 0.308 0.300
square 0.485 1.000 0.503 0.598 0.663 0.613 0.596
pentagon 0.492 0.503 1.000 0.786 0.657 0.669 0.657
hexagon 0.389 0.598 0.786 1.000 0.910 0.923 0.917
octagon 0.321 0.663 0.657 0.910 1.000 0.992 0.991
dodecagon 0.308 0.613 0.669 0.923 0.992 1.000 0.999
circle 0.300 0.596 0.657 0.917 0.991 0.999 1.000
A real continuum of "approach to circle": triangle is most orthogonal to the circular shapes; octagon/dodecagon/circle cluster tightly (D₈, D₁₂, SO(2) all project onto A1 in D4). Pentagon sits between, its D5 splitting A1+E in the D4 basis.
CSV confirms the D4-irrep picture cleanly:
name E_A1 E_A2 E_B1 E_B2 E_E
triangle 1075 0.513 211 211 949 ← D3: splits everywhere
square 2184 0 0 0 0 ← D4: pure A1
pentagon 2124 0.543 89 89 535 ← D5: A1 + E
hexagon 2230 0.099 90 90 5.27 ← D6: mostly A1
octagon 2738 0 0.07 0.07 0.27 ← D8: pure A1
dodecagon 3023 0 0 0 0 ← D12: pure A1
circle 25216 0 0 0 0 ← SO(2): pure A1
Pure A1 for every Dₙ with n multiple of 4, Pythagorean split for D3 and D5. This is a textbook character-table calculation falling out of a turtle-trace encoder reusing the chess spectral basis.
L4b — F₂ matrix, SVD, symmetry modes¶
Feature axis: ALL_RULES + value-enriched repeat_n=N fingerprints
(needed because polygon ASTs differ only in the REPEAT count literal).
Matrix normalization: both per-program count L1 and per-program geometry L2 — without this, the circle program (REPEAT 360) dominates every row it participates in.
SVD:
F2 matrix shape=(30, 320) rank=16
sigma[0] = 5.493 dominant feat: r_NUM_is_literal (-0.76) [DC / generic]
sigma[1] = 0.233 dominant feat: r_STMT_is_PEN (+0.63)
sigma[2] = 0.203 dominant feat: r_STMT_is_PEN (-0.56)
sigma[3] = 0.158 dominant feat: r_NUM_is_VAR (+0.39)
sigma[4] = 0.143 dominant feat: repeat_n=5 (+0.63) ← polygon-symmetry mode
sigma[5] = 0.102 dominant feat: repeat_n=4 (-0.46) ← polygon-symmetry mode
sigma[6] = 0.089 dominant feat: repeat_n=4 (-0.61)
sigma[7] = 0.075 dominant feat: repeat_n=3 (-0.66) ← polygon-symmetry mode
Mode 0 is the "generic LOGO program" DC — loaded on r_NUM_is_literal,
r_STMT_is_MOTION_NUM, r_BLOCK, r_REPEAT, the features every program
shares. Its singular value (5.5) is >20× the next mode (0.23) —
this is expected: most of the matrix-mass is "these programs all
look like LOGO programs."
Modes 4-7 are the polygon-specific symmetry modes. They load on
repeat_n=3/4/5 — the polygon symmetry number. Without the value-
fingerprint feature, these modes would not exist (rule-type counts
alone don't distinguish REPEAT 3 from REPEAT 5).
L4c — Symmetry recovery per-program¶
Prediction from plan: project a program's syntax-feature vector through F₂ and compare predicted geometry to actual geometry HVs. Expect > 0.7 self-match for polygons.
Raw projection fails (self-match buried by the DC mode — every polygon's prediction looks mostly like the mean geometry). After deflating the dominant mode, the picture is clean:
Deflated self-similarity (pred vs actual, mode-0 removed):
triangle +0.426 top match: triangle
square +0.481 top: nested_squares (also D4!) square 2nd
pentagon +0.426 top: pentagon
hexagon +0.428 top: hexagon
octagon +0.276 top-3 all ~circular
dodecagon +0.556 top: circle (D12 → SO(2) at this resolution)
circle +0.688 top: circle
The "wrong" matches are semantically correct: square's top match
is nested_squares (another D4 figure); dodecagon's top is circle
(both project onto A1 alone); octagon's top-3 are all circle-approaching.
F₂ is surfacing symmetry class, not program identity. That's the
prediction.
L4 verdict¶
F₂ recovers output symmetry from program structure. The split-object pattern generalizes outside chess. The pattern's "matrix rank and SVD modes are the interesting part" claim holds — the subdominant modes carry the category-discriminating information; the dominant mode is a "baseline program" DC that needs to be factored out for cleanest class separation.
Phase L5 — CSV export + CLI¶
Deliverables: logo_csv_export.py, logo_py.py, samples/corpus.csv.
Verbs mirror chess-spectral's spectral_py.py:
python logo_py.py parse SRC
python logo_py.py trace SRC
python logo_py.py encode SRC [--dump]
python logo_py.py fiber1
python logo_py.py fiber2
python logo_py.py csv [-o OUT]
python logo_py.py version
SRC accepts a filesystem path, a corpus entry name (e.g. square),
or - for stdin.
CSV columns:
name, family, symmetry,
n_stmts, n_rules_static, n_cmds_static,
exec_steps, n_segments, n_pendown,
bbox_w, bbox_h, aspect,
f2_self_sim_raw, f2_self_sim_deflated,
E_A1, E_A2, E_B1, E_B2, E_E,
geom_norm
Number formatting is copy-pasted from chess-spectral's fmt_csv_num so
the two pipelines' outputs share a house style.
Phase L6 — Fiber-as-transport-map (iteration 2)¶
Question: Are F₁/F₂ static co-occurrence summaries, or do they
function as predictive transport maps? If Δg_pred = M₂ᵀ · Δv
agrees with Δg_actual for syntax-feature deltas Δv, the split-object
pattern picks up a generative interpretation, not just a descriptive
one.
All numbers below come from
logo_transport.py; reproduce with
python logo_py.py transport --pairs --chain --characterize --modes --topology.
L6a — Polygon-pair single-fiber transport¶
For each polygon pair (a, b), compute Δv = featvec(b) - featvec(a),
predict Δg_pred = M₂ᵀ · Δv, compare to actual normalized geometry
delta. Both raw F₂ and DC-deflated F₂.
pair cos_raw cos_def D4 split (A1/A2/B1/B2/E)
square -> triangle +0.852 +0.891 46% / 0% / 8% / 8% / 38%
square -> hexagon +1.000 +0.978 90% / 0% / 5% / 5% / 0%
triangle -> pentagon +0.723 +0.747 49% / 0% / 8% / 8% / 35%
hexagon -> octagon +0.478 +0.473 57% / 0% / 21% / 21% / 1%
pentagon -> octagon +0.685 +0.701 63% / 0% / 5% / 5% / 28%
square -> dodecagon +0.682 +0.897 100% / 0% / 0% / 0% / 0%
L6a (cos_deflated > 0.30 for adjacent polygons): ✓ PASS — 6/6 pairs above 0.30 (worst: hexagon→octagon at 0.473). The fiber acts as a working transport map; the prediction lives in the right neighbourhood of the actual delta in 320-dim irrep space.
L6b (deflated ≥ raw for ≥4/6 pairs): ✓ PASS — 4/6 pairs improve under deflation (square→triangle, triangle→pentagon, pentagon→octagon, square→dodecagon). For square↔hexagon and hexagon→octagon, raw is already so high (≥0.978) that DC removal can only nibble.
The D4-channel breakdown is the textbook signature: pair-changes
involving a n=3 polygon spread mass to E (rotational doublet);
square↔dodecagon — both pure-A1 — leaves the entire delta in A1.
L6c — Chain transport (vocab → syntax → geometry)¶
Square ↦ spiral_square chains a vocab-count delta through F₁ᵀ to a syntax-feature delta, then through M₂ᵀ to a geometry delta:
cos(syntax_pred, syntax_actual) [via F₁] = +0.629
cos(geom_chain, geom_actual) [F₁ → F₂] = +0.516
cos(geom_direct, geom_actual) [F₂ only ] = +0.597
L6c (chain cos > 0.10): ✓ PASS at 0.516. The chain-via-F₁ pays a ~15% relative penalty against direct featvec transport (0.597 → 0.516) — the price of going through two approximations and a zero-fill of the value-fingerprint dimensions that don't live in F₁'s rule axis. The signal still propagates: composing the two fibers gives a meaningful prediction without ever touching the geometry of the target program.
L6d — Per-feature transport(N) — D4 character recovery¶
transport(N) = M_hi.T · e_N for one-hot indicator at repeat_n=N,
decomposed into D4 channel energies (deflated F₂):
feature |t| A1 A2 B1 B2 E
repeat_n=3 0.091 33% 2% 9% 9% 47%
repeat_n=4 0.092 86% 0% 4% 4% 7%
repeat_n=5 0.116 46% 0% 7% 7% 40%
repeat_n=6 0.057 27% 0% 25% 25% 23%
repeat_n=8 0.085 60% 2% 11% 11% 15%
repeat_n=12 0.063 93% 0% 2% 2% 4%
L6d (A1-pure for n∈{4,8,12}; A1+E for n=5; B1+B2+E spread for n=3): ✓ MOSTLY PASS.
transport(4)86% A1,transport(12)93% A1,transport(5)46% A1 + 40% E — match the prediction.transport(3)is 47% E + 33% A1 with B1+B2 contributing — matches the "spread to multiple irreps" prediction (D3 splits across A1, B1+B2, E in D4).transport(8)is messier (60% A1 + 26% B+E) than predicted "pure A1." Likely a small-corpus artefact: only one D8 program participates, so the deflated subspace can't cleanly separate it from neighbouring symmetry classes.transport(6)is the most fragmented. With three D6 programs in the corpus (hexagon, rosette_6, proc_two_polys-second-half), the deflated subspace splits across all four non-trivial irreps roughly equally. Honest reading: not every n recovers cleanly; the strong predictions (4, 5, 12, 3) do.
This is the strongest "fiber knows representation theory" evidence:
F₂ has never seen a D4 character table, but transport(N) for the
distinguished n's lands in the irrep classes that the n-fold rotation
group projects onto inside D4. The fiber is doing more than memorizing
program geometries.
L6e — Per-mode SVD attribution (square → triangle)¶
Decompose Δg_pred = Σ_k σ_k · (u_k · Δv) · v_k:
k sigma cos(term, Δg) pred share cum cos
1 0.233 0.112 7.7% 0.112
2 0.203 0.249 0.0% 0.123
3 0.158 0.411 21.1% 0.415
4 0.143 0.283 3.4% 0.485
5 0.102 0.008 1.4% 0.477
6 0.089 0.469 34.7% 0.668
7 0.075 0.441 20.5% 0.798
8 0.063 0.371 11.2% 0.876
L6e (modes 4-7 carry > 80% of polygon transport energy): ✗ MISS. Modes 4-7 carry 59.9% of the prediction's energy for square→triangle (not 80%). The polygon-symmetry signal is real but spread across modes 1, 3, 6, 7, 8 rather than concentrated in 4-7.
Cross-checked on two more pairs: - square→pentagon: 4-7 = 59.4% (also miss) - square→hexagon: 4-7 = 47.3% (also miss)
The cumulative cos still climbs to 0.876 by mode 8, so the prediction
overall fits — just not via the modes the L4b SVD report named
"polygon-symmetry modes." Honest reading: L4b's modal labels were
right about which modes load on repeat_n=N, but the transport
contribution doesn't follow the loading pattern. A single mode's
σ × |u·Δv| can be large or small independently of which feature
loads it. Lesson for the pattern: SVD interpretability tells you
which features participate in a mode, not how much that mode
contributes to a specific transport.
L6f — Chain collapse: does syntax mediate vocab → geometry?¶
Build F3_direct (vocab × geometry) and F3_chain = M_AB · M_BC. Test
two B-widths: rules-only (12 cols) vs. extended (30 cols including
repeat_n=N value buckets):
narrow B (rules only, dim 12): cos(F3_chain, F3_direct) = +0.993
wide B (extended, dim 30): cos(F3_chain, F3_direct) = +0.993
widening gain (wide - narrow) = 0.000
Result: Iteration-2 prediction was 0.95 narrow → 0.99 wide. Reality is 0.99 → 0.99: rules alone already mediate ~99% of the vocab→geometry correlation; widening to value-fingerprints adds nothing detectable here. Interpretation: the rule axis already captures the entire mediation budget at this corpus size — value features matter for intra-class discrimination (polygon n) but not for the cross-modal correlation envelope. The "syntax mediates geometry" hypothesis lands cleanly.
L6g — Cycle holonomy (round-trip A→B→C→A vs direct A↔A)¶
‖M_AA‖_F = 4.890
‖C_fwd‖_F = 140.015
cos(C_fwd, M_AA) [normalized] = +0.942
holonomy ‖Δ‖_F = +0.340
cycle SVD (top singular values, normalized to s₀):
s[0] = 140.015 (100.0%)
s[1] = 0.057 ( 0.0%)
s[2] = 0.038 ( 0.0%)
s[3..]≈ 0
mode-0 dominance: s_1 / s_0 = 0.000
Result: Iteration-2 prediction was cos ≈ 0.93, holonomy ≈ 0.37 — measured 0.942 / 0.340 nails both. The cycle SVD is extreme: the round-trip operator is essentially rank-1, with everything crushed into the leading direction. This is the predicted "geometry-visible commands survive the round trip; geometry-invisible commands like MAKE/PENUP get distorted toward FORWARD/REPEAT." The corpus mostly lives in a thin slice of vocab-space (motion-dominated programs); the round trip projects onto that slice and discards everything else.
L6h — Chirality (commutator)¶
cos(C_fwd, C_rev = C_fwd^T) = +0.997
commutator ‖normalized Δ‖_F = +0.082
Verdict: chiral (non-commutative)
Caveat noted: With only 21 corpus programs, an 0.082 commutator is small but nonzero. Mathematically C_rev = C_fwd^T, so the commutator is the antisymmetric part of C_fwd; an 0.082 norm against unit-norm pieces means the symmetric (torus-like) part dominates ~12× over the antisymmetric (chiral) part. Honest reading: structure is near-symmetric with a small chiral signal that is probably corpus bias. A distinguishing experiment would need a corpus designed to enforce or break ABC-cycle symmetry.
Phase L7 — Generalization + Trajectory (Iteration 3)¶
Motivation. Everything in L6 was measured on the same 21 programs that built F₂. "The fiber learned D4 character theory" (L6d) is only meaningful if the fiber retains that structure after dropping the programs whose character it's supposedly predicting. L7 closes the memorization-vs-learning gap before the pattern is extended to chess (where the payoff is trajectory/branching dynamics, not static structure).
Two experiments, L7b gated on L7a. Both use deflated-fiber numerics (matches L6 convention).
L7a — Held-out generalization (PASS)¶
Command. python logo_py.py transport --holdout pentagon,octagon
(see logo_transport.py::holdout_transport_table
and print_holdout_report).
Method. Rebuild F₂ from the 19 programs remaining after dropping
pentagon and octagon. Recompute:
- transport(5) and transport(8) — the D4-character decomposition
claim from L6d — on the reduced fiber.
- Δg_pred = M_ho_hi.T · (v_b − v_a) for polygon pairs that straddle
the hold-out — triangle→pentagon, hexagon→octagon — against the
ground-truth G_hi delta from the full-corpus payload
(pentagon/octagon geometries don't move when we drop them from F₂;
the question is whether the reduced fiber still aligns with them).
Raw numbers (2026-04-14).
| feat | scope | ‖t‖ | A1 | A2 | B1 | B2 | E |
|---|---|---|---|---|---|---|---|
| repeat_n=5 | full | 0.116 | 46% | 0% | 7% | 7% | 40% |
| repeat_n=5 | ho | 0.085 | 45% | 0% | 7% | 7% | 40% |
| repeat_n=8 | full | 0.085 | 60% | 2% | 11% | 11% | 15% |
| repeat_n=8 | ho | 0.092 | 70% | 2% | 10% | 10% | 8% |
Pair transport under the held-out fiber (vs full-corpus Δg):
| pair | cos_full | cos_ho |
|---|---|---|
| triangle → pentagon | +0.747 | +0.568 |
| hexagon → octagon | +0.473 | +0.170 |
Predictions vs measurement.
| ID | Prediction | Measured | Verdict |
|---|---|---|---|
| L7a.1 | transport(5) A1+E share drifts ≤ 20 pp under hold-out |
0.1 pp (86% → 85%) | PASS |
| L7a.2 | transport(8) A1 share stays ≥ 50% under hold-out |
69.7% | PASS |
| L7a.3 | |transport(5)|, |transport(8)| norms don't collapse (≥30% of full) |
0.727, 1.077 | PASS |
| L7a.4 | Best pair cos_ho > 0.25 | +0.568 (tri→pent) | PASS |
Interpretation. L6d's D4 character claim survives hold-out
without qualitative change — the reduced fiber places transport(5) in
A1+E and transport(8) in A1 essentially unchanged. This is the
strongest evidence so far that the split-object pattern compresses
structure, not corpus identity. Interestingly, |transport(8)|
grows under hold-out (ratio 1.077) — dropping octagon from the
corpus makes the repeat_n=8 column more committed to A1-energy, not
less. Consistent with octagon being the only program enforcing the
repeat_n=8 → A1 coupling, and its geometry being a "clean" anchor
for it.
Pair transport weakens on hexagon→octagon (0.473 → 0.170) — the
held-out fiber has less information about octagon's specific geometric
signature — but triangle→pentagon barely moves (0.747 → 0.568), so
the D4-transport predictions retain their shape. Generalization holds.
L7b — Partial-trace fibers (MOSTLY MISS; headline finding)¶
Motivation. L7a validated static structure learning. L7b asks whether the split-object pattern can also carry trajectory information: build F₂ from the first N trace segments instead of the final trace, then ask whether the fiber built at step N predicts geometry at step N+Δ.
Method. No interpreter changes — we just truncate Trace.segments
and re-encode. Helpers:
- logo_transport.py::partial_geom_for(entry, N)
- build_partial_context(n_segments)
- partial_trace_transport_table(n_build, n_target)
- fiber_drift(n_early, n_late)
- spiral_extrapolation_probe(name, n_early, n_late)
Polygons finish in 3–12 FORWARDs so their rows are time-invariant
after their natural end (anchor N > len → same geometry at all N > len).
Spirals and fractals are the actual trajectory probe. The degenerate
only_pen and only_bind produce zero-norm geometry and cluster at
cos ≈ 0 by construction — noise floor for per-program cosines.
Raw numbers.
transport --partial 30,60 (fiber at N=30, target at N=60):
| family | n | mean cos(pred, target) |
|---|---|---|
| polygon | 7 | +0.307 |
| spiral | 3 | +0.085 |
| fractal | 4 | +0.433 |
| freestyle | 3 | +0.805 |
| procedures | 2 | +0.647 |
| degenerate | 2 | −0.013 |
transport --partial 10,30 (build earlier, target closer):
| family | n | mean cos(pred, target) |
|---|---|---|
| polygon | 7 | +0.377 |
| spiral | 3 | −0.033 |
| fractal | 4 | +0.333 |
| freestyle | 3 | +0.695 |
| procedures | 2 | +0.539 |
transport --drift 10,60:
- Frobenius cos(M_early, M_late) = +0.971
‖M_l − M_e‖_F / ‖M_e‖_F= 0.250- Per-irrep cos: A1 +0.984, A2 +0.021, B1 +0.876, B2 +0.876, E +0.840
- Top-5 drift rows:
r_NUM_is_literal(0.932),r_STMT_is_MOTION_NUM(0.543),r_BLOCK_is_STMT_star(0.297),r_STMT_is_REPEAT_NUM_BLOCK(0.297),r_STMT_is_MAKE_NAME_EXPR(0.263)
transport --spiral-extrapolate 10,60 (the headline probe):
| quantity | value |
|---|---|
| cos(fiber_pred(M_e.T·featvec), g_late) | −0.125 |
| cos(g_early, g_late) [snapshot baseline] | +0.730 |
| gain = fiber − snapshot | −0.855 |
Predictions vs measurement.
| ID | Prediction | Measured | Verdict |
|---|---|---|---|
| L7b.1 | Polygon mean cos > 0.40 at (30, 60) | +0.307 | MISS |
| L7b.2 | Spiral mean cos > 0.20 at (30, 60) | +0.085 | MISS |
| L7b.3 | ‖M_l − M_e‖_F / ‖M_e‖_F > 0.10 between N=10 and N=60 |
0.250 | PASS |
| L7b.4 | Spiral-marker feats (MAKE, repeat_n=60) dominate top-5 drift | hits=0 | MISS |
| L7b.5 | spiral_square fiber beats snapshot baseline by > 0.10 | gain = −0.855 | MISS |
Interpretation.
-
Fibers are time-dependent (L7b.3 PASS). The Frobenius cos between fibers at N=10 and N=60 is 0.971 (high, but not 1), and the relative Frobenius delta is 25%. Fibers do shift with the trace — A2's per-irrep cosine of +0.021 is especially striking: A2 basically flips sign between early and late snapshots. But this drift is a snapshot signal, not a predictor.
-
Drift is generic, not spiral-specific (L7b.4 MISS). The top-5 drifting feature rows are generic structural productions (
r_NUM_is_literal,r_STMT_is_MOTION_NUM,r_BLOCK_is_STMT_star) — rules that fire in every program. The naive hypothesis — that spiral-marker features (MAKE, repeat_n=60) would dominate drift — was wrong. Drift tracks how much trace each program has accumulated by step N, not what makes a spiral different from a polygon. -
The fiber does not extrapolate forward (L7b.5 MISS, badly). cos(fiber_pred at N=10, g_late at N=60) = −0.125. The snapshot baseline — literally just "use the geometry at N=10 as if nothing changed" — scores +0.730. The fiber-predicted geometry is actively worse than assuming no change at all. The −0.855 gain is the most emphatic negative result in this spike.
Why: M_e.T @ featvec(p) is a projection of the full-program
syntax feature into the fiber space built from partial-trace
geometry. It averages across programs that all happen to be at
their "step-10 snapshot" and can't distinguish "the fiber that
built this featvec at N=10" from "the fiber that will build it at
N=60." The partial-trace F₂ summarizes the current snapshot, not
the trajectory from N to N+Δ.
- Family ordering is revealing. Freestyle (+0.805) and procedures (+0.647) predict best, then fractals (+0.433), polygons (+0.307), spirals (+0.085). The programs that look most "typical" for their feature vector get predicted best; the programs whose geometry diverges through execution — spirals — get predicted worst. This is a descriptive structure, not a predictive one.
What this means for chess. The chess trajectory question originally asked: does a snapshot fiber, interrogated at a later move, predict future eval? L7b.5 says no, not with this construction. The chess trajectory fiber will need a different lever — per-move fiber sequences (each move is a fiber delta, composed over a game), or recurrent/difference formulations, or fibers keyed on current move index with an explicit time coordinate. A snapshot F₂ built mid-game and asked "what does the position look like 10 moves later" won't buy trajectory prediction for free.
This is the right moment to find that out — on a deterministic toy where every state is inspectable — rather than after committing to a chess fiber construction.
L7 status summary¶
- L7a: 4/4 PASS — L6d's D4 character claim generalizes.
- L7b: ⅕ PASS (drift exists; prediction doesn't).
- Iteration-2 structural claims (L6a–d, L6f–h) stand unretracted.
- The trajectory hypothesis for chess needs reformulating before the chess instantiation begins.
What this spike validated¶
-
Split-object pattern generalizes. F₁ factored vocab × syntax into 7 interpretable modes with the predicted null structure. F₂ factored syntax × geometry and surfaced Dₙ symmetry classes in its subdominant modes. Neither result required any chess-specific machinery beyond
project_irrep. -
Structured atoms beat random atoms when the codebook has a symmetry that the downstream task cares about. Reversibility pair recovery (
a_FORWARD · flip ≈ a_BACK) only worked with quantum-numbered atoms composed throughbundle_sum. Random atoms would have forced F₁ to rediscover this from the coincidence stream instead of getting it for free. -
bundle_sum≠bundle. Majority-vote binarization destroys flip linearity. For compositional codebooks, keep the accumulator real-valued until cleanup. -
Dominant-mode deflation matters for class recovery. F₂'s top mode is "generic program structure" and carries no class signal. The symmetry modes sit below it in the spectrum. Anyone instantiating this pattern on a new domain should check subdominant modes, not just σ₀.
-
The chess spectral basis reused cleanly.
project_irrepon a raw rasterized grid gives the correct D4-irrep decomposition for a turtle trace. The piece-char intermediate was harmful for LOGO; bypassing it (but keeping the irrep machinery) gave a clean encoder. The chess fiber channels (F1..FD) are piece-type-specific and don't generalize — that's fine; they weren't needed. Iteration 2: the projection itself is now inlined in logo_irrep.py; no sibling-package sys.path hack. -
Fibers are predictive transport maps, not just summaries (iteration 2). L6a–c shows
M₂ᵀ · Δvpredicts polygon-to-polygon geometry deltas at cos > 0.47 (deflated, all 6 pairs); chains through F₁ then F₂ propagate vocab deltas to geometry deltas at cos 0.516 with a ~15% relative penalty vs direct featvec transport. -
The fiber knows representation theory (iteration 2). L6d:
transport(N) = M_hi.T · e_Nfor one-hotrepeat_n=Nreproduces the qualitative D4 character class for the distinguished n's (4/8/12 → A1; 5 → A1+E; 3 → spread). The fiber was never told the D4 character table — it inferred it from program-geometry coincidences. -
Three-body topology is near-symmetric, near-rank-1 (iteration 2). L6f–h: syntax mediates ~99% of vocab↔geometry correlation, the round-trip cycle has cos 0.94 to direct A↔A and a holonomy norm of 0.34, the cycle SVD is essentially rank-1 (s_1/s_0 ≈ 0), and the chirality commutator is small (0.082, probably corpus bias). The split-object pattern composes; the cycle isn't a torus but isn't far from one either at this corpus size.
-
Honest retraction matters. L1's "
bind(a_FORWARD, flip_axis) ≈ a_BACK" claim was wrong — measured cos was −0.004. Iteration 2 replaced it withunbind_roleandswap_role_value, which work algebraically on the unbinarized bundle (cos 1.0 exactly for the swap). The right-shape-wrong-reason mistake — confusing structural similarity of atoms with reversibility-via-binding — is now documented in L1's RETRACTION block as a cautionary tale.
Out of scope (deferred)¶
- C17 port — the Python math held; nothing to port prematurely.
- Full LOGO (lists, infix precedence, RUN/IFELSE/OUTPUT) — research subset is enough for the pattern proof.
- Learned codebooks — the structured-atom approach was enough.
- Multi-turtle
SPAWN(the stretch goal L6) — deferred.
Open questions¶
- D sweep: we ran everything at D=10000. Phase L3's F₁ rank-7 signal should survive well below that; quick sweep would characterize the cliff for this pattern.
- Deflation is a knob: how many leading modes should a consumer of this pattern deflate before reading class signal? For LOGO it's exactly σ₀ — one mode. Chess-spectral has a richer top-mode structure; this might be a per-domain tuning parameter.
- BACK in the null space: structurally legitimate (no corpus program uses it) but it would be worth adding a BACK-using program to confirm its row activates symmetrically with FORWARD.
How to cite this notebook¶
BibTeX:
@misc{kirkland_logo_2026,
author = {Kirkland, Steven},
title = {LOGO HDC --- Research Notebook},
year = 2026,
howpublished = {\url{https://github.com/lemonforest/mlehaptics/blob/main/docs/logo-maths/logo_research_notebook.md}},
note = {Part of \emph{mlehaptics: Spectral-Research Portfolio}; LOGO turtle-graphics cyclic-group encoder. Project-level citation metadata at \url{https://github.com/lemonforest/mlehaptics/blob/main/CITATION.cff}. Co-authored with Claude Opus 4.7 (Anthropic, 1M-context configuration) per project memory \texttt{feedback\_orchestration\_metaphor}. Framing is one candidate within the project's research portfolio per \texttt{feedback\_no\_lineage\_claims\_in\_notebook}.}
}
Plain text: Kirkland, S. (2026). LOGO HDC — Research Notebook. mlehaptics Spectral-Research Portfolio. https://github.com/lemonforest/mlehaptics/blob/main/docs/logo-maths/logo_research_notebook.md
Per-result citation discipline. Specific technical claims cite their canonical sources directly (LOGO language history, HDC bind/bundle/permute primitives, etc., PDF-verified per [[feedback_pdf_extraction_citation_discipline]]). When citing a specific result, prefer citing both this notebook AND the underlying canonical source. Framings presented here are candidate methodological readings per [[feedback_no_lineage_claims_in_notebook]], not endorsed over alternatives without explicit empirical convergence.
Project-level citation. See CITATION.cff at the repo root for the project-as-a-whole citation form.