Skip to content

0028: Command-and-Control with Synchronized Fallback Architecture

Date: 2025-11-11 Phase: 1b Status: Accepted (Motor Control) | Superseded by AD041 (Bilateral Alternation) Type: Architecture


Summary (Y-Statement)

In the context of dual-device bilateral stimulation with BLE coordination, facing time-sync drift causing safety violations vs. immediate fallback poor UX, we decided for command-and-control with synchronized fallback (2-minute grace period), and neglected time-synchronized independent operation or immediate fallback, to achieve guaranteed non-overlapping stimulation with seamless brief disconnection handling, accepting Phase 1 (0-2 min) synchronized fallback then Phase 2 (2+ min) fixed role fallback.


Problem Statement

Initial dual-device architecture (AD026) specified immediate fallback to single-device mode on BLE disconnection. User requested analysis of time-synchronized independent operation as an alternative. Mathematical analysis revealed time-sync would cause safety violations (overlapping stimulation) after 15-20 minutes due to crystal drift and FreeRTOS jitter.

Crystal Drift Analysis: - ESP32-C6 crystal: ±10 PPM typical tolerance - FreeRTOS jitter: ±100ms typical per task cycle - Over 20 minutes: ±1944ms accumulated drift - Result: Overlapping stimulation violates FR002 safety requirement


Context

Safety Requirements: - FR002: Non-overlapping bilateral stimulation (devices must not stimulate simultaneously) - Therapeutic efficacy: Bilateral alternation required for EMDR - Session duration: 20-90 minutes (long enough for drift to matter)

User Experience Requirements: - Brief BLE disconnections (< 2 min) shouldn't interrupt therapy - Permanent failures (battery death, 2.4GHz interference) need graceful degradation - No manual intervention during session

Technical Constraints: - BLE latency: 50-100ms (therapeutically insignificant) - Crystal drift: ±10 PPM (±1944ms over 20 minutes) - FreeRTOS jitter: ±100ms per task cycle - Human perception threshold: 100-200ms timing differences imperceptible


Decision

We will adopt Command-and-Control with Synchronized Fallback architecture for dual-device bilateral stimulation.

Architecture:

Normal Operation (BLE Connected):
Server Device                Client Device
Check messages →             Receive BLE command
Send BLE "FORWARD" →         Process command
Forward active (125ms)       Wait for next command
Send BLE "COAST" →          Process command
Coast (375ms)               Coast (375ms)
Send BLE "REVERSE" →        Process command
Coast continues             Reverse active (125ms)
                           Coast (375ms)

Synchronized Fallback Phase 1 (0-2 minutes after disconnect):
Server Device                Client Device
Detect BLE loss →           Detect BLE loss
Continue rhythm (SERVER)    Continue rhythm (CLIENT)
Forward → Coast →           Coast → Reverse →
Use last timing ref         Use last timing ref
Fallback Phase 2 (2+ minutes, remainder of session):
Server Device                Client Device
Forward only (125ms on)     Reverse only (125ms on)
Coast (375ms)               Coast (375ms)
Repeat assigned role        Repeat assigned role
No alternation              No alternation
Reconnect attempt/5min      Reconnect attempt/5min
Session Complete (60-90 minutes):
Both devices → Deep Sleep

Key Features:

  1. Command-and-Control During Normal Operation:
  2. Server controls all timing decisions
  3. Client executes commands immediately upon receipt
  4. Guarantees non-overlapping stimulation (FR002 safety requirement)
  5. 50-100ms BLE latency is therapeutically insignificant

  6. Synchronized Fallback Phase 1 (0-2 minutes):

  7. Continue established bilateral rhythm using last timing reference
  8. Maximum drift over 2 minutes: ±1.2ms (negligible)
  9. Provides seamless therapy during brief disconnections
  10. Both devices maintain alternating pattern

  11. Fallback Phase 2 (2+ minutes to session end):

  12. Server continues forward-only stimulation (assigned role)
  13. Client continues reverse-only stimulation (assigned role)
  14. No alternation within each device - just repeat assigned role
  15. Handles both battery death and 2.4GHz interference scenarios
  16. Non-blocking reconnection attempt every 5 minutes
  17. If reconnection succeeds, resume command-and-control seamlessly

  18. Session Completion:

  19. Both devices enter deep sleep after 60-90 minute session
  20. Ensures predictable battery management
  21. Clear session boundaries for therapeutic practice

Implementation:

// Fallback state management
typedef struct {
    uint32_t disconnect_time;        // When BLE disconnected
    uint32_t last_command_time;      // Timestamp of last server command
    uint32_t last_reconnect_attempt; // Last reconnection attempt
    uint16_t established_cycle_ms;   // Current cycle period (e.g., 500ms)
    uint16_t established_duty_ms;    // Current duty cycle (e.g., 125ms)
    motor_role_t fallback_role;      // MOTOR_ROLE_SERVER or MOTOR_ROLE_CLIENT
    bool phase1_sync;                 // True during 2-minute sync phase
} fallback_state_t;

// Fallback phase management
uint32_t now = xTaskGetTickCount();
uint32_t disconnect_duration = now - fallback_state.disconnect_time;

if (disconnect_duration < pdMS_TO_TICKS(120000)) {
    // Phase 1: Maintain synchronized bilateral pattern
    continue_bilateral_rhythm();
} else {
    // Phase 2: Continue in assigned role only
    fallback_state.phase1_sync = false;
    if (fallback_state.fallback_role == MOTOR_ROLE_SERVER) {
        motor_forward_only();  // No reverse
    } else {
        motor_reverse_only();  // No forward
    }

    // Periodic reconnection attempts (non-blocking)
    if ((now - fallback_state.last_reconnect_attempt) > pdMS_TO_TICKS(300000)) {
        ble_attempt_reconnect_nonblocking();
        fallback_state.last_reconnect_attempt = now;
    }
}

Consequences

Benefits

  • No overlap risk: Command-and-control guarantees sequential operation
  • Minimal drift during fallback: ±1.2ms over 2 minutes is imperceptible
  • Automatic recovery: Falls back to safe single-device mode after 2 minutes
  • Seamless brief disconnections: 0-2 minute window handles transient issues
  • User notification: LED/haptic feedback indicates mode changes
  • Therapeutic continuity: Session continues despite connection issues
  • Safety guaranteed: FR002 non-overlapping requirement preserved

Drawbacks

  • Complex state machine: Three operational modes (normal, phase 1, phase 2)
  • 2-minute grace period: Accumulates small drift during Phase 1
  • Reduced bilateral alternation: Phase 2 loses alternation (acceptable for emergency fallback)
  • Periodic reconnection overhead: 5-minute attempts consume power

Options Considered

Option A: Time-Synchronized Independent Operation

Pros: - No command-and-control complexity - Devices operate independently

Cons: - Crystal drift (±10 PPM) + FreeRTOS jitter = ±1944ms over 20 minutes - Would cause overlapping stimulation (safety violation) - Complex NTP-style time sync adds unnecessary complexity - Drift correction requires continuous BLE communication

Selected: NO Rationale: Mathematical analysis proves unsafe after 15-20 minutes (violates FR002)

Option B: Immediate Fallback (AD026)

Pros: - Simple state machine - No drift accumulation

Cons: - Interrupts therapy on any BLE glitch - Poor user experience during brief disconnections - No grace period for transient issues

Selected: NO Rationale: User experience testing revealed brief disconnections common, immediate fallback too aggressive

Option C: Command-and-Control with Synchronized Fallback (CHOSEN)

Pros: - Guaranteed non-overlapping (command-driven) - 2-minute grace period handles transient disconnections - Minimal drift during Phase 1 (±1.2ms) - Automatic recovery to safe fallback mode - Periodic reconnection attempts

Cons: - More complex state machine - Phase 2 loses bilateral alternation (acceptable emergency fallback)

Selected: YES Rationale: Best balance of safety, user experience, and therapeutic continuity


Supersedes

  • AD026: BLE Automatic Role Recovery - Immediate fallback behavior replaced with synchronized fallback phases

Superseded By

  • AD045: Synchronized Independent Operation - Motor control uses epoch-based calculation without corrections. Both devices calculate transitions independently from synchronized motor_epoch (like Bluetooth audio). Command-and-control retained for mode changes (two-phase commit protocol) and emergency features only.
  • AD029: Relaxed Timing Specification - ±100ms tolerance enables command-and-control architecture
  • AD030: BLE Bilateral Control Service - Bilateral Command characteristic implements command-and-control
  • AD035: Battery-Based Role Assignment - Determines SERVER vs CLIENT role for fallback behavior

Implementation Notes

Code References

  • Motor Task: src/motor_task.c (fallback state machine)
  • BLE Task: src/ble_task.c (command transmission/reception)
  • BLE Manager: src/ble_manager.c (Bilateral Command characteristic)

Build Environment

  • Environment Name: xiao_esp32c6
  • Configuration File: sdkconfig.xiao_esp32c6
  • Phase: Phase 1b foundation, Phase 2 full implementation

Testing & Verification

Phase 1b Implementation Status (November 14, 2025):

Peer Discovery (Implemented): - Both devices advertise Bilateral Control Service UUID - Both devices scan for peer advertising same service - First device to discover peer initiates connection - Race condition handled per AD010 (ACL error 523 gracefully handled) - Connection type identification (ble_get_connection_type_str() returns "Peer" vs "App")

Battery Exchange (Implemented): - Bilateral Battery characteristic updating every 60 seconds - ble_update_bilateral_battery_level() called by motor_task - Motor task battery logs show connection status: Battery: 4.18V [98%] | BLE: Peer

Role Assignment (Phase 1c - Pending): - Battery-based role assignment logic (see AD035) - Higher battery device becomes SERVER (controller) - Lower battery device becomes CLIENT (follower) - Tie-breaker: Connection initiator becomes SERVER if batteries equal

Command-and-Control (Phase 2 - Pending): - Bilateral Command characteristic for SERVER→CLIENT commands - Device Role characteristic to store assigned role - Command types: START/STOP/SYNC/MODE_CHANGE/EMERGENCY/PATTERN - Normal operation with BLE commands as described above

Synchronized Fallback (Phase 2 - Pending): - Fallback Phase 1 (0-2 minutes): Continue bilateral rhythm - Fallback Phase 2 (2+ minutes): Fixed role assignment (no alternation) - Periodic reconnection attempts every 5 minutes - Seamless resume of command-and-control on reconnection

Testing Evidence (November 14, 2025):

Peer discovery working reliably with ~1-2 second connection time:

11:09:01.749 > Peer discovered: b4:3a:45:89:5c:76
11:09:01.949 > BLE connection established
11:09:01.963 > Peer identified by address match
11:09:27.452 > Battery: 4.18V [98%] | BLE: Peer  ← Correct identification

Devices successfully reconnect after disconnect.


JPL Coding Standards Compliance

  • ✅ Rule #1: No dynamic memory allocation - Static fallback state structure
  • ✅ Rule #2: Fixed loop bounds - Phase transitions have deterministic timing
  • ✅ Rule #3: No recursion - Linear state machine transitions
  • ✅ Rule #5: Return value checking - BLE command transmission checked
  • ✅ Rule #6: No unbounded waits - vTaskDelay() for all timing
  • ✅ Rule #7: Watchdog compliance - Feed during all phases
  • ✅ Rule #8: Defensive logging - ESP_LOGI for phase transitions

Migration Notes

Migrated from docs/architecture_decisions.md on 2025-11-21 Original location: AD028 Git commit: (phase 1b implementation)

Safety Analysis:

  • No Overlap Risk: Command-and-control guarantees sequential operation
  • Minimal Drift During Fallback: ±1.2ms over 2 minutes is imperceptible
  • Automatic Recovery: Falls back to safe single-device mode after 2 minutes
  • User Notification: LED/haptic feedback indicates mode changes

Integration with AD035 (Battery-Based Role Assignment):

Phase 1b provides connection establishment and peer identification. Phase 1c will add role assignment based on battery comparison. Phase 2 will implement the full command-and-control architecture with synchronized fallback as described above.



Phase 6k Update (November 28, 2025)

Status Change: Motor control architecture superseded by AD041 (Predictive Bilateral Synchronization).

What Changed: - Motor Control: Now uses AD041 (predictive sync with drift-rate compensation) instead of command-and-control - Emergency Features: Still uses command-and-control (shutdown, mode sync) - Rationale: AD041 validates Option A by solving the drift problem that caused rejection

Coexistence:

Motor Control:         AD041 (Predictive Sync) ← Changed from AD028
Emergency Shutdown:    AD028 (Command-and-Control) ← Retained
Mode Sync:             AD028 (Command-and-Control) ← Retained
Session Management:    AD041 (Predictive Sync) ← Changed from AD028

Result: AD028 Option A (rejected due to drift) is now validated and implemented via AD041's drift-rate prediction. Command-and-control architecture retained for critical safety features only.


Template Version: MADR 4.0.0 (Customized for EMDR Pulser Project) Last Updated: 2025-11-28