Audio scoping for srmech — 2026-05-09 cross-domain absorption round¶
Round: Audio (DSP / music / speech / spatial / EMDR-bilateral)
Date: 2026-05-09
Method: Dual-agent research pattern (feedback_dual_agent_research_pattern.md)
Headline findings¶
- Audio instantiates every row of §3.5 cross-manifold table. Euclidean grid → spectrogram (STFT); sphere S² → HRTF / ambisonics; flat torus T² → periodic loops; triangle mesh → 3D acoustic-cavity meshes; general graph → microphone-array beamforming + Tonnetz key-relationship graph. Strongest evidence to date that the cross-manifold framing is load-bearing.
- Spectrogram is a 2D image → all graphics-domain primitives (heat-kernel blur, Perona-Malik bilateral, DoG, Varadhan SDF, anisotropic, Helmholtz wave, power-spectrum noise, reaction-diffusion) port directly with only an
eigenvalues:field swap. Heat-kernel blur on spectrogram = noise reduction; Perona-Malik on spectrogram = harmonic-percussive separation; DoG on spectrogram = onset detection. - Music theory IS cyclic-group theory. Z₁₂ chromatic scale = direct sibling of chess Z_640 / ephemerides Z_{2³²}. D₁₂ key-and-transposition = direct sibling of chess D₄ board symmetry. Tonnetz key-relationship graph = direct sibling of ephemerides 52-body resonance graph.
AudioPhase12BIPis the audio-domainSkPhase9BIPcousin — same architecture, different alphabet. - Bilateral audio is directly load-bearing for the EMDR project's mission. Audio as a peer modality alongside motor + LED, under the same UTLP-coordinated catalogue + bilateral-coordination machinery. Operators required (alternating tones, binaural beats, isochronic tones, music-driven panning, cardiac-coherence pacing at 0.1 Hz) are all closed-form
g(λ)catalogue entries. Hardware cost: ~$2–5 BOM (PCM5102 / MAX98357 / UDA1334 I²S DAC). Potentially the shortest-path proof-of-concept for srmech on the project's actual mission. - AMSC
literature_curatedalready covers DSP knowledge. RBJ EQ Cookbook → literature_curated. ISO 226 equal-loudness, ERB / Bark / A-weighting curves, Moore-Glasberg masking, MPEG psychoacoustic model, codec specifications (MP3 / AAC / Opus tables) — all standard literature_curated fits. HRTF / RIR archives, speech corpora (TIMIT / LibriSpeech / VCTK / Common Voice), music corpora (FMA / MAESTRO / MUSDB) → binary_archive. No new ingestion machinery needed. - Config-vs-substrate ratio: ~80/20. Most audio DSP is closed-form
g(λ)— EQ, denoising (Wiener, spectral subtraction, MMSE-LSA), reverb, pitch-shift, ambisonic encoding. Substrate primitives are state-dependent or coupled (compressors / limiters, auto-tune, neural vocoders, source separation, AEC, ANC).
Operator counts¶
- Audio manifolds: 13 (main agent) / 24 (sub-agent) — sub-agent broader enumeration; both cover the load-bearing core
- Transforms: ~25 (main) / 27 (sub) — DFT/FFT, MDCT, STFT, CQT, NSGT, wavelets, gammatone, Mel/Bark, modulation spectrogram, cepstrum, MFCC, chroma, spherical harmonics, KLT, etc.
- Closed-form
g(λ)operators: 50+ (main) / 75+ (sub) across 8–11 thematic groups: EQ family, denoising family, reverb / convolution family, pitch / time family (phase vocoder), Hilbert / analytic-signal family, spatial / spherical-harmonic family, modulation / synthesis family, coding / quantisation family, fingerprinting / chroma family - Substrate primitives: 25 (main) / 26 (sub) — compressor / limiter / expander, auto-tune, adaptive filtering (LMS / NLMS / RLS / Kalman), AEC, ANC, RNNoise / DeepNoise, onset / beat detection, pitch detection, source separation (NMF / ICA / RPCA / Demucs / Spleeter), Karplus-Strong, waveguide synthesis, modal synthesis with feedback, WaveNet / SampleRNN / diffusion-on-spectrogram, Griffin-Lim, MSM / DTW, etc.
- HDC cyclic groups: 9 (main) / 12 (sub) — Z₁₂ chromatic, Z₂₄ quartertones, Z₇ diatonic, Z₃ circle-of-fifths (Camelot wheel), Z_b beat cycles, Z₁₂ × Z_b chroma×beat, Tonnetz Z₁₂×Z₁₂ torus, Z₁₃₂ keyboard span, vocal-tract phoneme cycle, Shazam-style Z_n × Z_n peak-pair fingerprint, room-mode index (l,m,n)
Cross-pollination with already-absorbed srmech primitives¶
The §3.5 cross-manifold table extends — audio instantiation column:
| Manifold | Audio instantiation |
|---|---|
| Euclidean grid + Neumann BC | Spectrogram (2D STFT) |
| Sphere S² | HRTF / ambisonics — Y_l^m basis with l(l+1) eigenvalues |
| Flat torus T² | Periodic-loop sample; periodic-rhythm pattern |
| Triangle mesh | 3D acoustic-wave simulation on irregular geometry |
| General graph | Microphone array; Tonnetz; key-relationship graph |
Direct primitive ports (graphics → spectrogram):
- Heat-kernel blur on spectrogram → noise reduction / envelope estimation (anisotropic σ_t and σ_f independent)
- Perona-Malik on spectrogram → harmonic-percussive separation (Fitzgerald 2010)
- DoG on spectrogram → onset detection
- Sharpen / Laplacian on spectrogram → transient enhancement
- Anisotropic-tensor diffusion on spectrogram → smoothing along harmonic ridges (vocal-formant tracking)
- Helmholtz wave on spectrogram → procedural texture generator
- Power-spectrum noise (audio variant) → 1/f pink, 1/f² brown, blue, violet, log-normal, log-banded — direct sibling of graphics §3.1 power-spectrum noise; same
P ∈ {1, 1/√λ, 1/λ, √λ}menu
EMDR-project-specific opportunities (strong direct connection)¶
Bilateral audio as a peer modality alongside motor + LED:
- Alternating tones (250 Hz–1 kHz, hard-pan L/R, 0.5–2 Hz alternation matching motor frequency range) — closed-form
(Transform = trivial, λ = bilateral_phase, g = pan(t)) - Alternating clicks / pulses — sharper attention than tones for some clients
- Pink-noise burst alternation — spectrally rich, less fatiguing over 20-min sessions
- Music-driven bilateral panning — energy-preserving rotation
g_L(t) = cos²(π f_alt t),g_R(t) = sin²(π f_alt t) - Binaural beats (carrier f_L = 200 Hz, f_R = 204 Hz → 4 Hz beat via frequency-following response)
- Isochronic tones —
(1 + cos(2π f_pulse t)) · cos(2π f_tone t); mono-speaker compatible - Cardiac-coherence pacing tones — 0.1 Hz amplitude-modulated soft pink noise (
EMDR_Slow_BLS_Research_Frontier.mdsub-0.5 Hz frontier) - Coherent multi-modal stimuli — single phase signal drives motors + audio + LED; polyrhythm via Z₂ × Z₃ × Z₆ = Z₁₂ LCM cycle
- HRV biofeedback tones — adaptive (substrate primitive); pace adjusts to user's RMSSD
Disability-accommodation dimension (per feedback_disability_accommodation_dimension.md, sub-agent caught this):
- Aphantasia: bilateral audio gives non-visual cue (user can't visualise moving spot but hears alternation) — direct accommodation
- Visual impairment / blindness: audio becomes primary modality; motor + audio is inclusive design
- Hearing impairment: motor primary, audio adds redundancy; bone-conduction transducers a future-direction accommodation
- Photosensitivity / migraine: audio + motor with no visual flicker
- Cognitive load (ADHD / executive-function differences): audio coordination removes visual-attention demand of tracking moving spot
Trauma-informed audio environment design (per feedback_trauma_informed_defensive_scope.md):
- Soft-onset / soft-offset envelopes (avoid startle response)
- Bandwidth-limited (no aggressive high-frequency content)
- Sub-A-weighted-loudness limits (perceptually safe per IEC 61672)
- No abrupt panning (smooth bilateral alternation, not L/R discrete)
- Post-session quiet ramp
First-principles cautions (framework-edge)¶
- Phase matters in audio.
g(λ)must often be complex-valued (delays, all-pass, comb filters). The (Transform, λ_k, g) decomposition extends naturally — g maps to ℂ rather than ℝ — but substrate must support this. - Time-varying
g(ω, t)for chorus / flanger / phaser. Framework extends but is not yet articulated in §3.0 sketch. - Real-time vs offline. Offline FFT-of-whole-signal fits config; real-time block-by-block needs substrate-primitive scheduling (overlap-add / overlap-save / sliding-DFT).
- Sample-rate dependence. Physical-frequency
g(λ)parameters re-bind per sample-rate. - Perceptual vs physical eigenbasis. Audio's "right" basis is often perceptual (Mel / Bark / ERB / cochlear) rather than linear. Cross-manifold framing handles this; choosing per task is design.
- EMDR sub-100ms latency budget. Most operators work in 10ms blocks; some don't (CQT reassignment).
- Z₁₂ / D₁₂ caveat for non-12-EDO music. Microtonal (24-EDO, 31-EDO, just intonation) needs different modular arithmetic. Non-pitched percussion lives in rhythm-graph world.
Comparison: main-agent vs sub-agent¶
| Dimension | Main-agent (with conversation context) | Sub-agent (independent fresh-read) |
|---|---|---|
| Manifold count | 13 | 24 (broader; added cepstral / wavelet plane / pitch helix / spherical shell / loudspeaker line / non-rectangular plate / auditory-nerve graph / score graph / Q-factor / beam-pattern manifolds) |
| Filter family | Bundled (low-pass / high-pass / band-pass / shelving / peaking) | Explicit (Butterworth Nth, Chebyshev I/II, elliptic / Cauer, Bessel, Linkwitz-Riley) |
| Closed-form ops | 50+ | 75+ |
| Substrate ops | 25 | 26 (added Griffin-Lim explicit, neural vocoder HiFi-GAN / Vocos, Weighted Prediction Error WPE, self-organizing maps, voice conversion) |
| HDC groups | 9 | 12 (added Z₂₄ quartertones, Z₃ Camelot, Z_n×Z_n fingerprint, Z₁₃₂ keyboard, vocal-tract phoneme, room-mode index) |
| Citation discipline | Loose | Strong (RBJ Audio EQ Cookbook 2003 by name) |
| Hardware specifics | "needs care for latency" | I²S DACs (PCM5102 / MAX98357 / UDA1334), ~$2–5 BOM, 512-sample real-time DCT on ESP32-C6 |
| Disability-accommodation memory | Missed | Applied |
| Trauma-informed memory | Missed | Applied |
| Phase-mattering caveat | Caught | Implicit only |
Time-varying g(ω, t) framework extension |
Caught | Listed but not articulated as gap |
| Real-time vs offline scheduling | Caught | Light treatment |
| Sample-rate re-binding | Caught | Not surfaced |
| Z₁₂ caveat for non-12-EDO | Caught | Mentioned but not flagged |
AudioPhase12BIP naming |
Implied | Named explicitly |
| AMSC mode taxonomy | Flat list | Structured 7 mode classes |
Convergent core (both reached independently): all 6 headline findings above. The differences are at the margin of enumeration breadth and citation specificity vs framework-edge cautions.
Takeaways landed in master srmech notebook¶
- §3.5 cross-manifold table: audio instantiation column added
- §4.2 calibration: audio profile is ~80/20 (similar to graphics)
- §5.2 absorption-round subsection: headline findings + link to this file
- §1.5 future-notebook candidates: audio row added (project-mission relevance — direct EMDR fit)