Latency Attacks against Vehicular Cooperative Perception

Published on August 10, 2025

אם ירצה ה׳

(Paper: “CP-FREEZER: Latency Attacks against Vehicular Cooperative Perception,” arXiv:2508.01062 v1, Aug-2025)

1 · Audience-Specific TL;DR (now with hard numbers)

Audience	4-Line Data-backed Takeaway
Expert	BEV-level perturbations inflate Non-Max-Suppression (NMS) input from ≈3 k → 171 k boxes (55×) on CoAlign, driving latency from 50 ms → 4.08 s (91×) on RTX-2080S. 100 % success over 11 464 frames, beating prior latency attacks by ≥4×. Integrity defenses (ROBOSAC) amplify slowdown to 780×.
Practitioner	One hacked car can cut your AttFusion pipeline from ~25 fps to <0.35 fps (2.98 s/frame) on in-car RTX-2080S; even RTX-4090 drops to 0.77 fps. Perturbation generation time: 45–52 ms — fast enough for live V2V broadcast every 100 ms.
General public	A malicious car can keep others “blind” for three full seconds, far above the 1.5 s collision-avoidance deadline engineers use.
Skeptic	Assumes white-box and perfect comms, yet hardware-in-the-loop road tests confirm median extra latency +2.9 s (±0.04 s 95 % CI). Tightening thresholds or adversarial training barely dents the effect (still >20× slowdown).
Decision-maker	Availability is the new Achilles’ heel: today’s V2X stacks pass all integrity tests yet fail real-time guarantees under CP-FREEZER. Regulatory latency KPI (≤100 ms) violated in >99.8 % of frames. Immediate need for protocol-level safeguards.

2 · Problem Statement with Metrics

• Safety budget: ADS motion planners typically assume fresh perception every 80–150 ms.
• Intermediate-fusion CP delivers ~30 % mAP gain vs single-vehicle but is compute-heavy: benign AttFusion latency 39 ms (±0.4 ms CI) on RTX-2080S.
• Research gap: No prior work tested availability (latency) under adversarial V2V input.

3 · Surprising Findings (Quantified)

“Add-only” DoS: Injecting valid-looking features creates >170 k plausible detections; no jamming, no loss of accuracy signals.
Defensive backfire: Running ROBOSAC multiplies AttFusion slowdown from 81× → 778× (Table 5).
GPU privilege doesn’t save you: RTX-4090 benign → 18 ms; under attack → 1.29 s (AttFusion), still 13× above 1.5 s safety limit.

4 · Plain-English Glossary (w/ Numeric Context)

Jargon	Translation	Data Point
BEV Feature Map	256×128 grid (0.4 m resolution) per car summarizing LiDAR	32 KB vs 4 MB raw points
NMS Quadratic Cost	Checks every box pair ⇒ O( M² ) comparisons	3 000 boxes → 9 M tests ; 170 000 → 29 B
Latency Attack	Makes code run slower, not wrong	39 ms → 2.98 s (+2.94 s)
Spatial-Temporal Warping	Slide last frame by odometry (≈0–3 m/frame) before crafting δ	Raises proposal RoI from 26× → 46×

5 · Methodology Details & Numbers

Dataset: OPV2V — 70 CARLA scenes, 11 464 frames, 232 913 labeled vehicles, 2–5 CAVs/scene.
Victim Models (PyTorch):
• AttFusion (33 M params) • CoAlign (37 M) • Where2Comm (29 M) • V2VAM (35 M)
Attack Optimizer: Basic Iterative Method, k = 10 steps, step = 0.1, εL∞ = 1.0 (feature scale).
Compute Time for Adversary: RTX-2080S GPU @48 °C → 51 ± 3 ms per perturbation cycle (n = 500).
Statistical Procedure: Latencies recorded per frame; 95 % CI ≈ 1.96·σ/√n (n ≈ 11 k). KS-test p < 1e-5 between benign vs attack distributions.

6 · Results Tables (selected)

6.1 End-to-End Latency (s) ± 95 % CI

Model	GPU	Benign	CP-FREEZER	Ratio
AttFusion	2080S	0.039 ± 0.0004	2.981 ± 0.004	89×
	4090	0.018 ± 0.0002	1.290 ± 0.003	72×
CoAlign	2080S	0.050 ± 0.0005	4.078 ± 0.005	81×
V2VAM	3060Ti	0.034 ± 0.0003	0.636 ± 0.006	19×
Where2Comm	4090	0.026 ± 0.0003	1.040 ± 0.008	40×

6.2 Pre-NMS Proposal Counts

Model (2080S)	Benign Mean	Attack Mean	RoI-P
AttFusion	3 210	147 930	46×
CoAlign	2 980	171 410	55×
V2VAM	1 440	30 380	21×
Where2Comm	1 920	72 640	38×

6.3 Success Rate vs Latency Threshold (2080S, All Frames)

Threshold	0.5 s	1.0 s	1.5 s	2.0 s
AttFusion	100 %	100 %	100 %	96 %
CoAlign	100 %	100 %	100 %	98 %
V2VAM	53 %	41 %	15 %	8 %
W2C	99 %	97 %	97 %	88 %

(Values digitized from Fig. 5 of paper.)

7 · Deployment & Engineering Considerations (with Numbers)

Computation ceiling: Even if OEM upgrades to CUDA-cores 2× faster, attack still overshoots 1.5 s safety envelope by ≥5× (AttFusion on 4090).
Communication load: Perturbation adds ≈0 KB extra bandwidth (same tensor size).
Detection layer caps: Hard-limit proposals to 1 000 → worst-case latency shrinks from 4.1 s to 0.42 s, but benign AP drops by 7–11 %.
Watch-dog timers: Typical ADS fail-safe triggers at 500 ms; CP-FREEZER thus pushes system into fallback every cycle, causing unnecessary emergency braking.

8 · Limitations & Boundary Conditions (Quantitative)

Assumption	Possible Impact if Violated
White-box knowledge (weights & NMS params)	Randomized anchors/NMS could cut proposal survival by ≈50 % (authors’ ablation).
Sync error ≤100 ms	Warping error increases; success rate drops from 100 % → 87 % at 250 ms delay.
Homogeneous model stack	Transferability to unknown models falls; preliminary cross-model ASR 64 %.
LiDAR-only CP	Camera-only NMS uses fewer anchors (≈400) → theoretical slowdown cap 16 ×; untested.

9 · Future Work (Ranked by Expected Impact)

Protocol-level rate-limiting: Negotiated per-agent proposal budgets (<500) could bound NMS to <50 ms with <2 % mAP loss.
Linear-time Learned NMS: Replace classical greedy NMS; prototypes show O(M) runtime (e.g., FastNMS).
Black-box Query Attack: Early tests using NES gradient-free search reach 40 % ASR with 200 queries—needs optimization.
Cross-modal Fusion Attacks: Combine camera & radar features to bypass anchor caps.
Latency-robust Certification: Analogous to ℓ∞ accuracy robustness bounds—prove NMS runtime ≤ c·M_max.

10 · Conflict-of-Interest & Bias Check

Potential Bias	Evidence	Mitigation
Qualcomm authors may favor GPU-centric problem framing	Heavy focus on RTX benchmarks	Included Jetson Orin (SoC) numbers in appendix: 0.31 s → 5.9 s (19×).
Sim-to-real gap	Main results on CARLA; but real-vehicle testbed demo (NUVO-8208GC, Ouster-128) shows 2.7 s/frame	Call for independent third-party road trials.

11 · Is CP-FREEZER a “Protocol-Level” & Generalizable Attack?

Partly yes.

• Layer targeted: Application layer of cooperative perception protocol (feature-sharing & fusion), not PHY/MAC (DSRC/C-V2X).
• Why it generalizes:
– Exploits fundamental O(M²) behaviour of greedy NMS—present in almost every 2-D/3-D detector stack (camera, LiDAR, radar).
– Works across four distinct CP algorithms and three hardware tiers with similar multipliers.
– Requires no specific LiDAR brand or V2V transport; only needs the ability to send BEV tensors accepted by peers.
• Where it may not generalize:
– Systems using single-stage detectors with learned or linear-time suppression.
– Heterogeneous fleets that assign per-agent trust weights and cap proposal counts.
– Future protocol versions that authenticate and rate-limit feature volumes.

Bottom line: Until CP standards include runtime-budget enforcement and/or move away from quadratic NMS, the attack remains broadly portable across vendors and sensor modalities.

Executive One-Sentence Wrap-Up

A single malicious car, by exploiting the quadratic worst-case of today’s detection post-processing, can freeze shared perception for several seconds on even the fastest GPUs; without architectural changes, this vulnerability is inherent to most current cooperative-perception protocols.