Fiedler vs HRP vs GICS — spike test findings (2026-05-11)¶

Spike: Does the project's graph-Laplacian Fiedler partition (gateway-graph eigendecomposition primitive, ephemerides-spectral §13) outperform, match, or underperform López-de-Prado 2016 Hierarchical Risk Parity (HRP) and GICS-style ground-truth sector classification on a benchmark equity-correlation clustering task?

Method: Concertmaster role; MPM-discipline (closed-form numpy/scipy/sklearn; no SGD; deterministic seed 20260511). Synthetic block-correlation benchmark (López-de-Prado AFML ch.16 style; 50 assets × 10 sectors × 5 stocks/sector). Three sample sizes (252 / 1260 / 5040 trading days). Anomaly-chase sweeps over cluster cardinality + intra-sector-correlation. Realistic-SNR multi-trial sweep (n=20 per scenario) at S&P-500-style parameters. T^N quantum-walk lift side experiment.

Dispatch: financial-scoping-2026-05-11.md Fermata 3 spike-test candidate (a).

Bottom line: Fiedler decisively outperforms HRP at every realistic-SNR scenario tested; MFO Phase B 18-block-style structural-prediction-from-S_k×S_m-rep-theory validated to numerical-precision floor on the noiseless base matrix; T^N quantum-walk lift surfaces phase information but does not improve clustering at this benchmark (deferred — see §5).

1 — Setup + data recipe¶

Benchmark: synthetic block-correlation matrix with known cluster ground truth — the López-de-Prado canonical benchmark for clustering-method comparison (AFML ch.16). Block structure has known eigenstructure under S_k × S_m permutation symmetry: one large "market mode" eigenvalue, (k-1) "sector contrast" modes, and k(m-1) "idiosyncratic" modes.

Parameter	Value	Source
`n_sectors` (k)	10	GICS-like (real GICS is 11, kept round for symmetry)
`n_per_sector` (m)	5	small enough to be testable, large enough for sample correlation
`n_assets` (N=km)	50	benchmark-canonical
Sample-size sweep	252 / 1260 / 5040	1 / 5 / 20 trading years daily
Intra-sector correlation	0.10 → 0.80 sweep	discrimination zone
Inter-sector correlation	0.05 → 0.30 sweep	weak-blocks → strong-market-mode
`n_trials` (SNR sweep)	20	bootstrap statistical power
Bootstrap CI samples	200	95% CI on purity/ARI/NMI
Seed	`20260511`	deterministic

Sample-correlation noise model: Wishart-style — generate joint-normal returns with population correlation C_base, return empirical sample correlation. Approximately matches real S&P 500 daily-returns noise at the chosen sample size.

2 — Three methods implemented¶

Method	Library calls	Cluster decision rule
(i) Fiedler partition	`scipy.linalg.eigh(L_sym)` on normalized Laplacian `L_sym = I − D^(−1/2) A D^(−1/2)` with adjacency `A = (C+1)/2`; `sklearn.cluster.KMeans` on k-1 non-trivial eigenvectors	For k=2: sign(f_2). For k>2: embed in (f_2, ..., f_k) and k-means.
(ii) HRP (López-de-Prado 2016)	`scipy.cluster.hierarchy.linkage(D, method='single')` on Mantegna distance `d = sqrt(2*(1−C))`; `fcluster(Z, t=k, criterion='maxclust')`	Single-linkage dendrogram cut at k clusters.
(iii) GICS	Ground-truth block labels by construction	Reference; trivially `(purity, NMI, ARI) = (1, 1, 1)` against itself.

All three implementations are closed-form numerical linear algebra — no SGD, no learned parameters, no test-set tuning. Deterministic seeds.

3 — Metric table¶

3.1 Easy-benchmark (intra=0.5, inter=0.05, n_obs=252)¶

Both methods saturate at perfect performance. This is the dispatch-target benchmark but is too clean to discriminate the methods.

Method	Purity	ARI	NMI	Eigenvalue gap at k=10
Fiedler	1.000	1.000	1.000	0.499 (ratio 13.21)
HRP	1.000	1.000	1.000	n/a (single-linkage cut: dendrogram gap not directly comparable)
GICS	1.000	1.000	1.000	n/a

3.2 Realistic-SNR sweep (n_trials=20, n_obs=252)¶

This is the load-bearing comparison. Parameters chosen to span S&P-500-style scenarios.

Scenario	intra	inter	Fiedler ARI (mean)	HRP ARI (mean)	Δ(F−H)	F wins / trials
Strong market mode	0.50	0.30	1.000	0.894	+0.106	11/20 (rest are ties)
Moderate market mode	0.40	0.25	0.996	0.400	+0.595	20/20
Weak market mode	0.30	0.20	0.621	0.050	+0.571	20/20
Block (weak intra)	0.25	0.05	0.995	0.687	+0.308	19/20
Block (very weak intra)	0.20	0.05	0.868	0.253	+0.615	20/20

Verdict: Fiedler wins decisively in every realistic-SNR scenario. HRP's single-linkage clustering exhibits chaining failure under noise — the dendrogram merges across putative cluster boundaries when noise creates spurious short single-link paths, degrading ARI rapidly with decreasing block-signal SNR. Fiedler's global-eigenstructure approach is robust to the same noise because the Laplacian's spectral gap is an integrated property over the whole graph, not a single-edge property.

3.3 Cluster-cardinality sensitivity (k_clusters varied; intra=0.5, inter=0.05; ground-truth k=10)¶

Requested k	Fiedler ARI	HRP ARI	Fiedler − HRP
2	0.156	0.039	+0.117
5	0.470	0.335	+0.134
10 (truth)	1.000	1.000	0.000
15 (over)	0.825	0.906	−0.081
20 (heavy over)	0.656	0.825	−0.170

Anomaly: HRP wins when over-partitioning. Single-linkage's tendency to attach singletons one at a time means that at k > k_true, HRP creates additional small clusters that don't damage the existing correct ones; Fiedler's k-means in the eigenvector embedding re-splits the existing correct clusters, damaging them. This is a known tradeoff in spectral clustering literature; flag for honest reporting.

3.4 Sample-size sensitivity (intra=0.5, inter=0.05, k=10)¶

All three sample sizes (252, 1260, 5040 obs) saturate at perfect on the easy benchmark. The sensitivity is hidden by the easy benchmark; section 3.2 captures it.

4 — MFO Phase B 18-block-style structural-prediction validation¶

Closed-form prediction from S_k × S_m permutation symmetry on the noiseless base correlation C:

Eigenvalue group	Predicted formula	Predicted value	Empirical (noiseless)	Match?
Market mode (1 eigenvalue)	`1 + (m−1)·ρ_in + (k−1)·m·ρ_out`	5.250	5.250	✅ exact
Sector contrast modes (k−1=9 eigenvalues)	`1 + (m−1)·ρ_in − m·ρ_out`	2.750	2.750 (mean over 9 modes)	✅ exact
Idiosyncratic modes (k(m−1)=40 eigenvalues)	`1 − ρ_in`	0.500	0.500 (mean over 40 modes)	✅ exact

Numerical match to 15-digit float precision. This is the finance-domain analog of MFO Phase B's 18-block geometric count from D_3 irrep multiplicities. Group-theoretic structural prediction → empirical match: same MPM-discipline pattern.

Implication: the finance literature's empirical-PCA approach (Litterman-Scheinkman 1991 "level/slope/curvature"; Laloux et al 1999 RMT cleaning) treats the block-eigenvalue separation as an observation; it can equivalently be predicted from sector permutation symmetry. The S_k × S_m formula above is the closed-form predictive form. Not previously articulated this way in the finance literature surveyed.

5 — T^N quantum-walk lift side experiment¶

Side experiment scope: does U(t) = exp(−i L_corr t) on the correlation Laplacian surface clustering information that classical Fiedler discards? Per srmech §3.5.1 layer (b) / financial-scoping-2026-05-11.md Fermata 2.

Method: initialize uniform state ψ_0 = 1/√N; evolve via spectral exponentiation U(t) = V·diag(exp(−i·λ·t))·V^T; measure (a) magnitude-based clustering after evolution (control); (b) phase-based clustering via circular k-means on (cos(phase), sin(phase)) 2D embedding (the load-bearing test).

Result on the easy benchmark (intra=0.5, inter=0.05; n_obs=252):

t	Mean circular phase variance	Phase clustering purity	Phase clustering ARI
0.0	0 (trivial)	—	—
0.1	9.0e−8	0.28	−0.055
0.5	2.1e−6	0.28	−0.053
1.0	6.3e−6	0.28	−0.052
2.0	6.8e−6	0.30	−0.044
5.0	7.2e−6	0.28	−0.045

Result on the realistic-SNR scenarios (averaged over 20 trials at t=1.0):

Scenario	TN phase clustering purity
Strong market mode	0.20 (≈ chance for k=10)
Moderate market mode	0.20
Weak market mode	0.18
Block (weak intra)	0.20
Block (very weak intra)	0.19

Interpretation. At this benchmark, the T^N phase clustering does NOT improve over magnitude-based Fiedler:

The uniform initial state ψ_0 = 1/√N is an eigenvector of the trivial eigenvalue λ=0 (constant function); evolution barely perturbs it because the small eigenvalues dominate exp(−iλt) ≈ 1 in the relevant time range.
Phase variance is 10^−6 — essentially numerical noise. To get meaningful phase coherence on this benchmark would require either localized initial states (single-node kicks) or longer evolution times (t >> π/λ_max ≈ 1.5).
The dispatch flagged the lift as load-bearing for asynchronous multi-asset lead-lag analysis (Hayashi-Yoshida 2005 style on real high-frequency cross-spectrum), not for static-correlation clustering. This benchmark is the wrong proving ground for the lift.

Honest verdict on the lift: at this benchmark, no improvement over Fiedler magnitude-only. The load-bearing test for the T^N lift remains the asynchronous-HF cross-spectrum setting (§13.9-style hybrid embedding + phase coherence). Deferred to a follow-up spike with proper lead-lag-bearing benchmark (e.g., simulated 2-asset HF tick data with phase-shifted intensities; Hayashi-Yoshida estimator vs exp(−i L_corr t) magnitude-and-phase decomposition).

6 — Honest verdict per metric¶

Load-bearing-question answer: Fiedler outperforms HRP on the benchmark equity-correlation clustering task at every realistic SNR scenario (4 of 5 with Fiedler winning 20/20 trials; 1 of 5 with Fiedler winning 11/20 + 9 ties, never losing). HRP shows chaining failure under moderate-to-weak block signal; Fiedler is robust. GICS is ground truth by construction — Fiedler approaches GICS exactly at moderate noise.

Caveats — what was tested:

Synthetic block-correlation matrix with known structure (not real S&P 500 data — see §7 anomaly log).
10 equal-size sectors (real GICS has 11 unequal-size sectors; real S&P 500 has heavy-tailed industry sizes).
Daily-return noise simulated by Wishart-style sampling (not by real-world heavy-tail / regime-switching dynamics).
k=10 clusters requested (matches ground truth — see §3.3 for what happens off-truth).
Mantegna distance metric chosen (López-de-Prado uses sqrt(0.5·(1−ρ)), equivalent up to scale).

Caveats — what was NOT tested:

Real S&P 500 daily-returns correlation (no network access during this spike). Would test against actual GICS labels with their imbalance + heavy-tail noise.
Tail-event / crisis regimes (per financial-scoping anomaly 1: Gaussian-spectral methods break in crisis).
Time-varying / non-stationary regimes (per anomaly 4).
1259 stocks × 11 GICS sectors realistic-scale (would require larger eigendecomposition + observed N>>D regime).
Eigenvalue-clipping / Ledoit-Wolf shrinkage pre-processing — both methods used raw sample correlation; finance practice typically cleans first.

Caveats — methodological:

HRP single-linkage is the canonical choice (López-de-Prado 2016); average-linkage or Ward's-linkage HRP variants might perform differently. Not tested.
Fiedler with k>k_true under-performs HRP (§3.3). For users who select k via dendrogram/eigenvalue-gap diagnostics, this matters; we held k=k_true throughout the realistic-SNR sweep.
HRP's primary purpose is portfolio-weight allocation, not just clustering. The portfolio-weight performance metric (out-of-sample Sharpe ratio) was NOT evaluated — that is the original López-de-Prado claim and is downstream of clustering quality.

7 — Anomaly log¶

Anomaly 1: easy-benchmark saturation. At intra=0.5, inter=0.05 (dispatch parameters), both methods achieve perfect (1.000) on all metrics. The benchmark is too clean to discriminate. Investigation: extended to realistic-SNR sweep (§3.2), which IS discriminating.

Anomaly 2: HRP wins when over-partitioning. Counter-intuitive but reproducible. At k_requested > k_true with easy data, HRP's single-linkage handles the over-partitioning by attaching singletons; Fiedler's k-means re-splits correct clusters. Implication: for practitioners who don't know k, HRP may be more forgiving; for those who do, Fiedler dominates.

Anomaly 3: T^N quantum-walk lift no-op on static benchmark. Lift produces phase variance at machine-precision floor (10^−6); does not improve clustering. Investigation: the load-bearing benchmark for the lift is asynchronous-HF lead-lag, not static-correlation clustering. Deferred.

Anomaly 4: GICS-style symmetric-block model exactly predicted by S_k×S_m permutation rep theory. Eigenvalues match closed-form prediction to 15-digit float precision. This is the finance-domain analog of MFO Phase B 18-block structural-prediction result; strong cross-domain validation of the "structural prediction from group symmetry, not SGD fit" MPM-discipline pattern. Not previously articulated this way in the finance literature surveyed in financial-scoping-2026-05-11.md. Stands as a fermata-worthy finding.

Anomaly 5: synthetic-benchmark caveat dominates. Real S&P 500 has imbalanced sectors, heavy tails, regime shifts. Synthetic benchmark validation is a necessary-not-sufficient result. Recommendation: if elevating the Fiedler-beats-HRP finding to a srmech first-class deliverable, follow up with a real-S&P-500 spike (requires Yahoo Finance or similar API access).

8 — Fermata records¶

Fermata 1: synthetic-vs-real benchmark choice. The spike used a synthetic López-de-Prado-canonical benchmark (option A in dispatch) because (a) deterministic, (b) network access unconfirmed at dispatch time, © faster iteration. Real S&P 500 data is the natural follow-up. Conductor decision: is the synthetic-benchmark result sufficient to claim "Fiedler-beats-HRP cross-domain primitive validated," or does the project need a real-data follow-up before elevating? Recommendation: report synthetic result honestly with the caveat; flag real-data follow-up as a queued next-spike if the finding warrants elevation to first-class srmech offering.

Fermata 2: T^N quantum-walk lift load-bearing-benchmark gap. This spike does not test the lift at its load-bearing setting (asynchronous-HF lead-lag). The lift's potential value (per financial-scoping Fermata 2) remains untested. Conductor decision: queue a separate dedicated T^N lift spike with a simulated 2-asset HF tick-data benchmark (Hayashi-Yoshida vs exp(−i L_corr t) comparison), or defer the lift question entirely until a real-data opportunity arises. Recommendation: queue the dedicated T^N lift spike — this is the financial-scoping round's most novel claim and the only "project-→-external-domain new-information offering" identified across six rounds; should be tested.

Fermata 3: cardinality sensitivity informs ship-mode design. The "HRP wins when over-partitioning" anomaly (§3.3) is real and ship-relevant. If the project ever ships a bridge.predict_sector_clustering surface, the API should expose k-selection diagnostics (eigenvalue-gap detection, dendrogram-inconsistency) rather than require user-supplied k. Conductor decision: is this finding load-bearing enough to warrant a separate srmech sub-section on "spectral-clustering k-selection methodology," or is it a footnote?

Fermata 4: 18-block-style structural-prediction validation finance instance. §4's S_k × S_m closed-form match to 15-digit precision is the load-bearing cross-domain analog of MFO Phase B. This stands independently of the Fiedler-vs-HRP-vs-GICS question. It's a worthy result on its own: finance has the same "structural prediction from group symmetry" pattern as the MFO 18-block finding. Conductor decision: elevate to srmech §3.5.3(C) sub-section as a fourth instantiation of the structural-prediction-from-group-theory motif (MFO 18-block, financial-scoping Sub-investigation 4, this spike §4)? Or keep at spike-level until validated on real S&P 500 data? Recommendation: elevate now — the noiseless-block-model match is to machine precision and is closed-form predictable; real-data deviation from prediction is the noise/non-stationarity story, which is information not refutation.

9 — Recommended next actions¶

Land §3.5 finance row in srmech notebook noting the Fiedler-beats-HRP on synthetic benchmark finding. This is the fifth quantitative cross-domain datapoint after graphics / audio / protein / power (ephemerides §13 Matthews φ + Spearman ρ being the protein-adjacent fourth) — but with the synthetic-vs-real caveat called out explicitly.
Land §3.5.3(C) 18-block-style structural prediction as a fourth instantiation: MFO Phase B 18-block + financial-scoping Sub-investigation 4 (theoretical) + this spike §4 (numerical to 15-digit precision). Closed-form predictive form: λ_market = 1 + (m−1)·ρ_in + (k−1)·m·ρ_out, etc.
Queue dedicated T^N quantum-walk lift spike at proper load-bearing benchmark (asynchronous-HF lead-lag; Hayashi-Yoshida vs exp(−i L_corr t) comparison). The 2026-05-11 financial-scoping round's most novel claim deserves a proper test.
Queue real-S&P-500 follow-up spike (requires Yahoo Finance or similar API; preferably WRDS CRSP if accessible) to validate the synthetic-benchmark Fiedler-beats-HRP result against real-world imbalance + heavy-tail noise + regime structure.
Honest reporting in srmech §3.5.3(A)/(C) sub-sections: cite this spike's chain-tier-style result if d_S/2 measurement happens later (financial-scoping Fermata 3 sub-spike (b)); otherwise note that the d_S/2 question is queued.

10 — Reproducibility¶

Script: fiedler-vs-hrp-vs-gics-spike-script.py

Per-metric NDJSON output: fiedler-vs-hrp-vs-gics-spike-per-metric-2026-05-11.ndjson (80 records: 71 main + 5 SNR-sweep + anomaly chase + side experiments)

Reproduction: python docs/srmech/notes/fiedler-vs-hrp-vs-gics-spike-script.py

Runtime: ~30 seconds on commodity workstation. Deterministic across runs (seed 20260511).

Library versions tested: numpy 2.4.4, scipy 1.17.1, scikit-learn 1.8.0, Python 3.x. No SGD, no learned parameters, no test-set tuning.