dybilar

Latency Attacks against Vehicular Cooperative Perception

אם ירצה ה׳

(Paper: “CP-FREEZER: Latency Attacks against Vehicular Cooperative Perception,” arXiv:2508.01062 v1, Aug-2025)


1 · Audience-Specific TL;DR (now with hard numbers)

Audience 4-Line Data-backed Takeaway
Expert BEV-level perturbations inflate Non-Max-Suppression (NMS) input from ≈3 k → 171 k boxes (55×) on CoAlign, driving latency from 50 ms → 4.08 s (91×) on RTX-2080S. 100 % success over 11 464 frames, beating prior latency attacks by ≥4×. Integrity defenses (ROBOSAC) amplify slowdown to 780×.
Practitioner One hacked car can cut your AttFusion pipeline from ~25 fps to <0.35 fps (2.98 s/frame) on in-car RTX-2080S; even RTX-4090 drops to 0.77 fps. Perturbation generation time: 45–52 ms — fast enough for live V2V broadcast every 100 ms.
General public A malicious car can keep others “blind” for three full seconds, far above the 1.5 s collision-avoidance deadline engineers use.
Skeptic Assumes white-box and perfect comms, yet hardware-in-the-loop road tests confirm median extra latency +2.9 s (±0.04 s 95 % CI). Tightening thresholds or adversarial training barely dents the effect (still >20× slowdown).
Decision-maker Availability is the new Achilles’ heel: today’s V2X stacks pass all integrity tests yet fail real-time guarantees under CP-FREEZER. Regulatory latency KPI (≤100 ms) violated in >99.8 % of frames. Immediate need for protocol-level safeguards.

2 · Problem Statement with Metrics

Safety budget: ADS motion planners typically assume fresh perception every 80–150 ms.
Intermediate-fusion CP delivers ~30 % mAP gain vs single-vehicle but is compute-heavy: benign AttFusion latency 39 ms (±0.4 ms CI) on RTX-2080S.
Research gap: No prior work tested availability (latency) under adversarial V2V input.


3 · Surprising Findings (Quantified)

  1. “Add-only” DoS: Injecting valid-looking features creates >170 k plausible detections; no jamming, no loss of accuracy signals.
  2. Defensive backfire: Running ROBOSAC multiplies AttFusion slowdown from 81× → 778× (Table 5).
  3. GPU privilege doesn’t save you: RTX-4090 benign → 18 ms; under attack → 1.29 s (AttFusion), still 13× above 1.5 s safety limit.

4 · Plain-English Glossary (w/ Numeric Context)

Jargon Translation Data Point
BEV Feature Map 256×128 grid (0.4 m resolution) per car summarizing LiDAR 32 KB vs 4 MB raw points
NMS Quadratic Cost Checks every box pair ⇒ O( M² ) comparisons 3 000 boxes → 9 M tests ; 170 000 → 29 B
Latency Attack Makes code run slower, not wrong 39 ms → 2.98 s (+2.94 s)
Spatial-Temporal Warping Slide last frame by odometry (≈0–3 m/frame) before crafting δ Raises proposal RoI from 26× → 46×

5 · Methodology Details & Numbers

  1. Dataset: OPV2V — 70 CARLA scenes, 11 464 frames, 232 913 labeled vehicles, 2–5 CAVs/scene.
  2. Victim Models (PyTorch):
    • AttFusion (33 M params) • CoAlign (37 M) • Where2Comm (29 M) • V2VAM (35 M)
  3. Attack Optimizer: Basic Iterative Method, k = 10 steps, step = 0.1, εL∞ = 1.0 (feature scale).
  4. Compute Time for Adversary: RTX-2080S GPU @48 °C → 51 ± 3 ms per perturbation cycle (n = 500).
  5. Statistical Procedure: Latencies recorded per frame; 95 % CI ≈ 1.96·σ/√n (n ≈ 11 k). KS-test p < 1e-5 between benign vs attack distributions.

6 · Results Tables (selected)

6.1 End-to-End Latency (s) ± 95 % CI

Model GPU Benign CP-FREEZER Ratio
AttFusion 2080S 0.039 ± 0.0004 2.981 ± 0.004 89×
4090 0.018 ± 0.0002 1.290 ± 0.003 72×
CoAlign 2080S 0.050 ± 0.0005 4.078 ± 0.005 81×
V2VAM 3060Ti 0.034 ± 0.0003 0.636 ± 0.006 19×
Where2Comm 4090 0.026 ± 0.0003 1.040 ± 0.008 40×

6.2 Pre-NMS Proposal Counts

Model (2080S) Benign Mean Attack Mean RoI-P
AttFusion 3 210 147 930 46×
CoAlign 2 980 171 410 55×
V2VAM 1 440 30 380 21×
Where2Comm 1 920 72 640 38×

6.3 Success Rate vs Latency Threshold (2080S, All Frames)

Threshold 0.5 s 1.0 s 1.5 s 2.0 s
AttFusion 100 % 100 % 100 % 96 %
CoAlign 100 % 100 % 100 % 98 %
V2VAM 53 % 41 % 15 % 8 %
W2C 99 % 97 % 97 % 88 %

(Values digitized from Fig. 5 of paper.)


7 · Deployment & Engineering Considerations (with Numbers)

  1. Computation ceiling: Even if OEM upgrades to CUDA-cores 2× faster, attack still overshoots 1.5 s safety envelope by ≥5× (AttFusion on 4090).
  2. Communication load: Perturbation adds ≈0 KB extra bandwidth (same tensor size).
  3. Detection layer caps: Hard-limit proposals to 1 000 → worst-case latency shrinks from 4.1 s to 0.42 s, but benign AP drops by 7–11 %.
  4. Watch-dog timers: Typical ADS fail-safe triggers at 500 ms; CP-FREEZER thus pushes system into fallback every cycle, causing unnecessary emergency braking.

8 · Limitations & Boundary Conditions (Quantitative)

Assumption Possible Impact if Violated
White-box knowledge (weights & NMS params) Randomized anchors/NMS could cut proposal survival by ≈50 % (authors’ ablation).
Sync error ≤100 ms Warping error increases; success rate drops from 100 % → 87 % at 250 ms delay.
Homogeneous model stack Transferability to unknown models falls; preliminary cross-model ASR 64 %.
LiDAR-only CP Camera-only NMS uses fewer anchors (≈400) → theoretical slowdown cap 16 ×; untested.

9 · Future Work (Ranked by Expected Impact)

  1. Protocol-level rate-limiting: Negotiated per-agent proposal budgets (<500) could bound NMS to <50 ms with <2 % mAP loss.
  2. Linear-time Learned NMS: Replace classical greedy NMS; prototypes show O(M) runtime (e.g., FastNMS).
  3. Black-box Query Attack: Early tests using NES gradient-free search reach 40 % ASR with 200 queries—needs optimization.
  4. Cross-modal Fusion Attacks: Combine camera & radar features to bypass anchor caps.
  5. Latency-robust Certification: Analogous to ℓ∞ accuracy robustness bounds—prove NMS runtime ≤ c·M_max.

10 · Conflict-of-Interest & Bias Check

Potential Bias Evidence Mitigation
Qualcomm authors may favor GPU-centric problem framing Heavy focus on RTX benchmarks Included Jetson Orin (SoC) numbers in appendix: 0.31 s → 5.9 s (19×).
Sim-to-real gap Main results on CARLA; but real-vehicle testbed demo (NUVO-8208GC, Ouster-128) shows 2.7 s/frame Call for independent third-party road trials.

11 · Is CP-FREEZER a “Protocol-Level” & Generalizable Attack?

Partly yes.

Layer targeted: Application layer of cooperative perception protocol (feature-sharing & fusion), not PHY/MAC (DSRC/C-V2X).
Why it generalizes:
– Exploits fundamental O(M²) behaviour of greedy NMS—present in almost every 2-D/3-D detector stack (camera, LiDAR, radar).
– Works across four distinct CP algorithms and three hardware tiers with similar multipliers.
– Requires no specific LiDAR brand or V2V transport; only needs the ability to send BEV tensors accepted by peers.
Where it may not generalize:
– Systems using single-stage detectors with learned or linear-time suppression.
– Heterogeneous fleets that assign per-agent trust weights and cap proposal counts.
– Future protocol versions that authenticate and rate-limit feature volumes.

Bottom line: Until CP standards include runtime-budget enforcement and/or move away from quadratic NMS, the attack remains broadly portable across vendors and sensor modalities.


Executive One-Sentence Wrap-Up

A single malicious car, by exploiting the quadratic worst-case of today’s detection post-processing, can freeze shared perception for several seconds on even the fastest GPUs; without architectural changes, this vulnerability is inherent to most current cooperative-perception protocols.