Skip to content

Fiedler vs HRP vs GICS — spike test findings (2026-05-11)

Spike: Does the project's graph-Laplacian Fiedler partition (gateway-graph eigendecomposition primitive, ephemerides-spectral §13) outperform, match, or underperform López-de-Prado 2016 Hierarchical Risk Parity (HRP) and GICS-style ground-truth sector classification on a benchmark equity-correlation clustering task?

Method: Concertmaster role; MPM-discipline (closed-form numpy/scipy/sklearn; no SGD; deterministic seed 20260511). Synthetic block-correlation benchmark (López-de-Prado AFML ch.16 style; 50 assets × 10 sectors × 5 stocks/sector). Three sample sizes (252 / 1260 / 5040 trading days). Anomaly-chase sweeps over cluster cardinality + intra-sector-correlation. Realistic-SNR multi-trial sweep (n=20 per scenario) at S&P-500-style parameters. T^N quantum-walk lift side experiment.

Dispatch: financial-scoping-2026-05-11.md Fermata 3 spike-test candidate (a).

Bottom line: Fiedler decisively outperforms HRP at every realistic-SNR scenario tested; MFO Phase B 18-block-style structural-prediction-from-S_k×S_m-rep-theory validated to numerical-precision floor on the noiseless base matrix; T^N quantum-walk lift surfaces phase information but does not improve clustering at this benchmark (deferred — see §5).


1 — Setup + data recipe

Benchmark: synthetic block-correlation matrix with known cluster ground truth — the López-de-Prado canonical benchmark for clustering-method comparison (AFML ch.16). Block structure has known eigenstructure under S_k × S_m permutation symmetry: one large "market mode" eigenvalue, (k-1) "sector contrast" modes, and k(m-1) "idiosyncratic" modes.

Parameter Value Source
n_sectors (k) 10 GICS-like (real GICS is 11, kept round for symmetry)
n_per_sector (m) 5 small enough to be testable, large enough for sample correlation
n_assets (N=km) 50 benchmark-canonical
Sample-size sweep 252 / 1260 / 5040 1 / 5 / 20 trading years daily
Intra-sector correlation 0.10 → 0.80 sweep discrimination zone
Inter-sector correlation 0.05 → 0.30 sweep weak-blocks → strong-market-mode
n_trials (SNR sweep) 20 bootstrap statistical power
Bootstrap CI samples 200 95% CI on purity/ARI/NMI
Seed 20260511 deterministic

Sample-correlation noise model: Wishart-style — generate joint-normal returns with population correlation C_base, return empirical sample correlation. Approximately matches real S&P 500 daily-returns noise at the chosen sample size.


2 — Three methods implemented

Method Library calls Cluster decision rule
(i) Fiedler partition scipy.linalg.eigh(L_sym) on normalized Laplacian L_sym = I − D^(−1/2) A D^(−1/2) with adjacency A = (C+1)/2; sklearn.cluster.KMeans on k-1 non-trivial eigenvectors For k=2: sign(f_2). For k>2: embed in (f_2, ..., f_k) and k-means.
(ii) HRP (López-de-Prado 2016) scipy.cluster.hierarchy.linkage(D, method='single') on Mantegna distance d = sqrt(2*(1−C)); fcluster(Z, t=k, criterion='maxclust') Single-linkage dendrogram cut at k clusters.
(iii) GICS Ground-truth block labels by construction Reference; trivially (purity, NMI, ARI) = (1, 1, 1) against itself.

All three implementations are closed-form numerical linear algebra — no SGD, no learned parameters, no test-set tuning. Deterministic seeds.


3 — Metric table

3.1 Easy-benchmark (intra=0.5, inter=0.05, n_obs=252)

Both methods saturate at perfect performance. This is the dispatch-target benchmark but is too clean to discriminate the methods.

Method Purity ARI NMI Eigenvalue gap at k=10
Fiedler 1.000 1.000 1.000 0.499 (ratio 13.21)
HRP 1.000 1.000 1.000 n/a (single-linkage cut: dendrogram gap not directly comparable)
GICS 1.000 1.000 1.000 n/a

3.2 Realistic-SNR sweep (n_trials=20, n_obs=252)

This is the load-bearing comparison. Parameters chosen to span S&P-500-style scenarios.

Scenario intra inter Fiedler ARI (mean) HRP ARI (mean) Δ(F−H) F wins / trials
Strong market mode 0.50 0.30 1.000 0.894 +0.106 11/20 (rest are ties)
Moderate market mode 0.40 0.25 0.996 0.400 +0.595 20/20
Weak market mode 0.30 0.20 0.621 0.050 +0.571 20/20
Block (weak intra) 0.25 0.05 0.995 0.687 +0.308 19/20
Block (very weak intra) 0.20 0.05 0.868 0.253 +0.615 20/20

Verdict: Fiedler wins decisively in every realistic-SNR scenario. HRP's single-linkage clustering exhibits chaining failure under noise — the dendrogram merges across putative cluster boundaries when noise creates spurious short single-link paths, degrading ARI rapidly with decreasing block-signal SNR. Fiedler's global-eigenstructure approach is robust to the same noise because the Laplacian's spectral gap is an integrated property over the whole graph, not a single-edge property.

3.3 Cluster-cardinality sensitivity (k_clusters varied; intra=0.5, inter=0.05; ground-truth k=10)

Requested k Fiedler ARI HRP ARI Fiedler − HRP
2 0.156 0.039 +0.117
5 0.470 0.335 +0.134
10 (truth) 1.000 1.000 0.000
15 (over) 0.825 0.906 −0.081
20 (heavy over) 0.656 0.825 −0.170

Anomaly: HRP wins when over-partitioning. Single-linkage's tendency to attach singletons one at a time means that at k > k_true, HRP creates additional small clusters that don't damage the existing correct ones; Fiedler's k-means in the eigenvector embedding re-splits the existing correct clusters, damaging them. This is a known tradeoff in spectral clustering literature; flag for honest reporting.

3.4 Sample-size sensitivity (intra=0.5, inter=0.05, k=10)

All three sample sizes (252, 1260, 5040 obs) saturate at perfect on the easy benchmark. The sensitivity is hidden by the easy benchmark; section 3.2 captures it.


4 — MFO Phase B 18-block-style structural-prediction validation

Closed-form prediction from S_k × S_m permutation symmetry on the noiseless base correlation C:

Eigenvalue group Predicted formula Predicted value Empirical (noiseless) Match?
Market mode (1 eigenvalue) 1 + (m−1)·ρ_in + (k−1)·m·ρ_out 5.250 5.250 ✅ exact
Sector contrast modes (k−1=9 eigenvalues) 1 + (m−1)·ρ_in − m·ρ_out 2.750 2.750 (mean over 9 modes) ✅ exact
Idiosyncratic modes (k(m−1)=40 eigenvalues) 1 − ρ_in 0.500 0.500 (mean over 40 modes) ✅ exact

Numerical match to 15-digit float precision. This is the finance-domain analog of MFO Phase B's 18-block geometric count from D_3 irrep multiplicities. Group-theoretic structural prediction → empirical match: same MPM-discipline pattern.

Implication: the finance literature's empirical-PCA approach (Litterman-Scheinkman 1991 "level/slope/curvature"; Laloux et al 1999 RMT cleaning) treats the block-eigenvalue separation as an observation; it can equivalently be predicted from sector permutation symmetry. The S_k × S_m formula above is the closed-form predictive form. Not previously articulated this way in the finance literature surveyed.


5 — T^N quantum-walk lift side experiment

Side experiment scope: does U(t) = exp(−i L_corr t) on the correlation Laplacian surface clustering information that classical Fiedler discards? Per srmech §3.5.1 layer (b) / financial-scoping-2026-05-11.md Fermata 2.

Method: initialize uniform state ψ_0 = 1/√N; evolve via spectral exponentiation U(t) = V·diag(exp(−i·λ·t))·V^T; measure (a) magnitude-based clustering after evolution (control); (b) phase-based clustering via circular k-means on (cos(phase), sin(phase)) 2D embedding (the load-bearing test).

Result on the easy benchmark (intra=0.5, inter=0.05; n_obs=252):

t Mean circular phase variance Phase clustering purity Phase clustering ARI
0.0 0 (trivial)
0.1 9.0e−8 0.28 −0.055
0.5 2.1e−6 0.28 −0.053
1.0 6.3e−6 0.28 −0.052
2.0 6.8e−6 0.30 −0.044
5.0 7.2e−6 0.28 −0.045

Result on the realistic-SNR scenarios (averaged over 20 trials at t=1.0):

Scenario TN phase clustering purity
Strong market mode 0.20 (≈ chance for k=10)
Moderate market mode 0.20
Weak market mode 0.18
Block (weak intra) 0.20
Block (very weak intra) 0.19

Interpretation. At this benchmark, the T^N phase clustering does NOT improve over magnitude-based Fiedler:

  1. The uniform initial state ψ_0 = 1/√N is an eigenvector of the trivial eigenvalue λ=0 (constant function); evolution barely perturbs it because the small eigenvalues dominate exp(−iλt) ≈ 1 in the relevant time range.
  2. Phase variance is 10^−6 — essentially numerical noise. To get meaningful phase coherence on this benchmark would require either localized initial states (single-node kicks) or longer evolution times (t >> π/λ_max ≈ 1.5).
  3. The dispatch flagged the lift as load-bearing for asynchronous multi-asset lead-lag analysis (Hayashi-Yoshida 2005 style on real high-frequency cross-spectrum), not for static-correlation clustering. This benchmark is the wrong proving ground for the lift.

Honest verdict on the lift: at this benchmark, no improvement over Fiedler magnitude-only. The load-bearing test for the T^N lift remains the asynchronous-HF cross-spectrum setting (§13.9-style hybrid embedding + phase coherence). Deferred to a follow-up spike with proper lead-lag-bearing benchmark (e.g., simulated 2-asset HF tick data with phase-shifted intensities; Hayashi-Yoshida estimator vs exp(−i L_corr t) magnitude-and-phase decomposition).


6 — Honest verdict per metric

Load-bearing-question answer: Fiedler outperforms HRP on the benchmark equity-correlation clustering task at every realistic SNR scenario (4 of 5 with Fiedler winning 20/20 trials; 1 of 5 with Fiedler winning 11/20 + 9 ties, never losing). HRP shows chaining failure under moderate-to-weak block signal; Fiedler is robust. GICS is ground truth by construction — Fiedler approaches GICS exactly at moderate noise.

Caveats — what was tested:

  • Synthetic block-correlation matrix with known structure (not real S&P 500 data — see §7 anomaly log).
  • 10 equal-size sectors (real GICS has 11 unequal-size sectors; real S&P 500 has heavy-tailed industry sizes).
  • Daily-return noise simulated by Wishart-style sampling (not by real-world heavy-tail / regime-switching dynamics).
  • k=10 clusters requested (matches ground truth — see §3.3 for what happens off-truth).
  • Mantegna distance metric chosen (López-de-Prado uses sqrt(0.5·(1−ρ)), equivalent up to scale).

Caveats — what was NOT tested:

  • Real S&P 500 daily-returns correlation (no network access during this spike). Would test against actual GICS labels with their imbalance + heavy-tail noise.
  • Tail-event / crisis regimes (per financial-scoping anomaly 1: Gaussian-spectral methods break in crisis).
  • Time-varying / non-stationary regimes (per anomaly 4).
  • 1259 stocks × 11 GICS sectors realistic-scale (would require larger eigendecomposition + observed N>>D regime).
  • Eigenvalue-clipping / Ledoit-Wolf shrinkage pre-processing — both methods used raw sample correlation; finance practice typically cleans first.

Caveats — methodological:

  • HRP single-linkage is the canonical choice (López-de-Prado 2016); average-linkage or Ward's-linkage HRP variants might perform differently. Not tested.
  • Fiedler with k>k_true under-performs HRP (§3.3). For users who select k via dendrogram/eigenvalue-gap diagnostics, this matters; we held k=k_true throughout the realistic-SNR sweep.
  • HRP's primary purpose is portfolio-weight allocation, not just clustering. The portfolio-weight performance metric (out-of-sample Sharpe ratio) was NOT evaluated — that is the original López-de-Prado claim and is downstream of clustering quality.

7 — Anomaly log

Anomaly 1: easy-benchmark saturation. At intra=0.5, inter=0.05 (dispatch parameters), both methods achieve perfect (1.000) on all metrics. The benchmark is too clean to discriminate. Investigation: extended to realistic-SNR sweep (§3.2), which IS discriminating.

Anomaly 2: HRP wins when over-partitioning. Counter-intuitive but reproducible. At k_requested > k_true with easy data, HRP's single-linkage handles the over-partitioning by attaching singletons; Fiedler's k-means re-splits correct clusters. Implication: for practitioners who don't know k, HRP may be more forgiving; for those who do, Fiedler dominates.

Anomaly 3: T^N quantum-walk lift no-op on static benchmark. Lift produces phase variance at machine-precision floor (10^−6); does not improve clustering. Investigation: the load-bearing benchmark for the lift is asynchronous-HF lead-lag, not static-correlation clustering. Deferred.

Anomaly 4: GICS-style symmetric-block model exactly predicted by S_k×S_m permutation rep theory. Eigenvalues match closed-form prediction to 15-digit float precision. This is the finance-domain analog of MFO Phase B 18-block structural-prediction result; strong cross-domain validation of the "structural prediction from group symmetry, not SGD fit" MPM-discipline pattern. Not previously articulated this way in the finance literature surveyed in financial-scoping-2026-05-11.md. Stands as a fermata-worthy finding.

Anomaly 5: synthetic-benchmark caveat dominates. Real S&P 500 has imbalanced sectors, heavy tails, regime shifts. Synthetic benchmark validation is a necessary-not-sufficient result. Recommendation: if elevating the Fiedler-beats-HRP finding to a srmech first-class deliverable, follow up with a real-S&P-500 spike (requires Yahoo Finance or similar API access).


8 — Fermata records

Fermata 1: synthetic-vs-real benchmark choice. The spike used a synthetic López-de-Prado-canonical benchmark (option A in dispatch) because (a) deterministic, (b) network access unconfirmed at dispatch time, © faster iteration. Real S&P 500 data is the natural follow-up. Conductor decision: is the synthetic-benchmark result sufficient to claim "Fiedler-beats-HRP cross-domain primitive validated," or does the project need a real-data follow-up before elevating? Recommendation: report synthetic result honestly with the caveat; flag real-data follow-up as a queued next-spike if the finding warrants elevation to first-class srmech offering.

Fermata 2: T^N quantum-walk lift load-bearing-benchmark gap. This spike does not test the lift at its load-bearing setting (asynchronous-HF lead-lag). The lift's potential value (per financial-scoping Fermata 2) remains untested. Conductor decision: queue a separate dedicated T^N lift spike with a simulated 2-asset HF tick-data benchmark (Hayashi-Yoshida vs exp(−i L_corr t) comparison), or defer the lift question entirely until a real-data opportunity arises. Recommendation: queue the dedicated T^N lift spike — this is the financial-scoping round's most novel claim and the only "project-→-external-domain new-information offering" identified across six rounds; should be tested.

Fermata 3: cardinality sensitivity informs ship-mode design. The "HRP wins when over-partitioning" anomaly (§3.3) is real and ship-relevant. If the project ever ships a bridge.predict_sector_clustering surface, the API should expose k-selection diagnostics (eigenvalue-gap detection, dendrogram-inconsistency) rather than require user-supplied k. Conductor decision: is this finding load-bearing enough to warrant a separate srmech sub-section on "spectral-clustering k-selection methodology," or is it a footnote?

Fermata 4: 18-block-style structural-prediction validation finance instance. §4's S_k × S_m closed-form match to 15-digit precision is the load-bearing cross-domain analog of MFO Phase B. This stands independently of the Fiedler-vs-HRP-vs-GICS question. It's a worthy result on its own: finance has the same "structural prediction from group symmetry" pattern as the MFO 18-block finding. Conductor decision: elevate to srmech §3.5.3(C) sub-section as a fourth instantiation of the structural-prediction-from-group-theory motif (MFO 18-block, financial-scoping Sub-investigation 4, this spike §4)? Or keep at spike-level until validated on real S&P 500 data? Recommendation: elevate now — the noiseless-block-model match is to machine precision and is closed-form predictable; real-data deviation from prediction is the noise/non-stationarity story, which is information not refutation.


  1. Land §3.5 finance row in srmech notebook noting the Fiedler-beats-HRP on synthetic benchmark finding. This is the fifth quantitative cross-domain datapoint after graphics / audio / protein / power (ephemerides §13 Matthews φ + Spearman ρ being the protein-adjacent fourth) — but with the synthetic-vs-real caveat called out explicitly.

  2. Land §3.5.3(C) 18-block-style structural prediction as a fourth instantiation: MFO Phase B 18-block + financial-scoping Sub-investigation 4 (theoretical) + this spike §4 (numerical to 15-digit precision). Closed-form predictive form: λ_market = 1 + (m−1)·ρ_in + (k−1)·m·ρ_out, etc.

  3. Queue dedicated T^N quantum-walk lift spike at proper load-bearing benchmark (asynchronous-HF lead-lag; Hayashi-Yoshida vs exp(−i L_corr t) comparison). The 2026-05-11 financial-scoping round's most novel claim deserves a proper test.

  4. Queue real-S&P-500 follow-up spike (requires Yahoo Finance or similar API; preferably WRDS CRSP if accessible) to validate the synthetic-benchmark Fiedler-beats-HRP result against real-world imbalance + heavy-tail noise + regime structure.

  5. Honest reporting in srmech §3.5.3(A)/(C) sub-sections: cite this spike's chain-tier-style result if d_S/2 measurement happens later (financial-scoping Fermata 3 sub-spike (b)); otherwise note that the d_S/2 question is queued.


10 — Reproducibility

Script: fiedler-vs-hrp-vs-gics-spike-script.py

Per-metric NDJSON output: fiedler-vs-hrp-vs-gics-spike-per-metric-2026-05-11.ndjson (80 records: 71 main + 5 SNR-sweep + anomaly chase + side experiments)

Reproduction: python docs/srmech/notes/fiedler-vs-hrp-vs-gics-spike-script.py

Runtime: ~30 seconds on commodity workstation. Deterministic across runs (seed 20260511).

Library versions tested: numpy 2.4.4, scipy 1.17.1, scikit-learn 1.8.0, Python 3.x. No SGD, no learned parameters, no test-set tuning.