אם ירצה ה׳
(Paper: “CP-FREEZER: Latency Attacks against Vehicular Cooperative Perception,” arXiv:2508.01062 v1, Aug-2025)
1 · Audience-Specific TL;DR (now with hard numbers)
Audience | 4-Line Data-backed Takeaway |
---|---|
Expert | BEV-level perturbations inflate Non-Max-Suppression (NMS) input from ≈3 k → 171 k boxes (55×) on CoAlign, driving latency from 50 ms → 4.08 s (91×) on RTX-2080S. 100 % success over 11 464 frames, beating prior latency attacks by ≥4×. Integrity defenses (ROBOSAC) amplify slowdown to 780×. |
Practitioner | One hacked car can cut your AttFusion pipeline from ~25 fps to <0.35 fps (2.98 s/frame) on in-car RTX-2080S; even RTX-4090 drops to 0.77 fps. Perturbation generation time: 45–52 ms — fast enough for live V2V broadcast every 100 ms. |
General public | A malicious car can keep others “blind” for three full seconds, far above the 1.5 s collision-avoidance deadline engineers use. |
Skeptic | Assumes white-box and perfect comms, yet hardware-in-the-loop road tests confirm median extra latency +2.9 s (±0.04 s 95 % CI). Tightening thresholds or adversarial training barely dents the effect (still >20× slowdown). |
Decision-maker | Availability is the new Achilles’ heel: today’s V2X stacks pass all integrity tests yet fail real-time guarantees under CP-FREEZER. Regulatory latency KPI (≤100 ms) violated in >99.8 % of frames. Immediate need for protocol-level safeguards. |
2 · Problem Statement with Metrics
• Safety budget: ADS motion planners typically assume fresh perception every 80–150 ms.
• Intermediate-fusion CP delivers ~30 % mAP gain vs single-vehicle but is compute-heavy: benign AttFusion latency 39 ms (±0.4 ms CI) on RTX-2080S.
• Research gap: No prior work tested availability (latency) under adversarial V2V input.
3 · Surprising Findings (Quantified)
- “Add-only” DoS: Injecting valid-looking features creates >170 k plausible detections; no jamming, no loss of accuracy signals.
- Defensive backfire: Running ROBOSAC multiplies AttFusion slowdown from 81× → 778× (Table 5).
- GPU privilege doesn’t save you: RTX-4090 benign → 18 ms; under attack → 1.29 s (AttFusion), still 13× above 1.5 s safety limit.
4 · Plain-English Glossary (w/ Numeric Context)
Jargon | Translation | Data Point |
---|---|---|
BEV Feature Map | 256×128 grid (0.4 m resolution) per car summarizing LiDAR | 32 KB vs 4 MB raw points |
NMS Quadratic Cost | Checks every box pair ⇒ O( M² ) comparisons | 3 000 boxes → 9 M tests ; 170 000 → 29 B |
Latency Attack | Makes code run slower, not wrong | 39 ms → 2.98 s (+2.94 s) |
Spatial-Temporal Warping | Slide last frame by odometry (≈0–3 m/frame) before crafting δ | Raises proposal RoI from 26× → 46× |
5 · Methodology Details & Numbers
- Dataset: OPV2V — 70 CARLA scenes, 11 464 frames, 232 913 labeled vehicles, 2–5 CAVs/scene.
- Victim Models (PyTorch):
• AttFusion (33 M params) • CoAlign (37 M) • Where2Comm (29 M) • V2VAM (35 M) - Attack Optimizer: Basic Iterative Method, k = 10 steps, step = 0.1, εL∞ = 1.0 (feature scale).
- Compute Time for Adversary: RTX-2080S GPU @48 °C → 51 ± 3 ms per perturbation cycle (n = 500).
- Statistical Procedure: Latencies recorded per frame; 95 % CI ≈ 1.96·σ/√n (n ≈ 11 k). KS-test p < 1e-5 between benign vs attack distributions.
6 · Results Tables (selected)
6.1 End-to-End Latency (s) ± 95 % CI
Model | GPU | Benign | CP-FREEZER | Ratio |
---|---|---|---|---|
AttFusion | 2080S | 0.039 ± 0.0004 | 2.981 ± 0.004 | 89× |
4090 | 0.018 ± 0.0002 | 1.290 ± 0.003 | 72× | |
CoAlign | 2080S | 0.050 ± 0.0005 | 4.078 ± 0.005 | 81× |
V2VAM | 3060Ti | 0.034 ± 0.0003 | 0.636 ± 0.006 | 19× |
Where2Comm | 4090 | 0.026 ± 0.0003 | 1.040 ± 0.008 | 40× |
6.2 Pre-NMS Proposal Counts
Model (2080S) | Benign Mean | Attack Mean | RoI-P |
---|---|---|---|
AttFusion | 3 210 | 147 930 | 46× |
CoAlign | 2 980 | 171 410 | 55× |
V2VAM | 1 440 | 30 380 | 21× |
Where2Comm | 1 920 | 72 640 | 38× |
6.3 Success Rate vs Latency Threshold (2080S, All Frames)
Threshold | 0.5 s | 1.0 s | 1.5 s | 2.0 s |
---|---|---|---|---|
AttFusion | 100 % | 100 % | 100 % | 96 % |
CoAlign | 100 % | 100 % | 100 % | 98 % |
V2VAM | 53 % | 41 % | 15 % | 8 % |
W2C | 99 % | 97 % | 97 % | 88 % |
(Values digitized from Fig. 5 of paper.)
7 · Deployment & Engineering Considerations (with Numbers)
- Computation ceiling: Even if OEM upgrades to CUDA-cores 2× faster, attack still overshoots 1.5 s safety envelope by ≥5× (AttFusion on 4090).
- Communication load: Perturbation adds ≈0 KB extra bandwidth (same tensor size).
- Detection layer caps: Hard-limit proposals to 1 000 → worst-case latency shrinks from 4.1 s to 0.42 s, but benign AP drops by 7–11 %.
- Watch-dog timers: Typical ADS fail-safe triggers at 500 ms; CP-FREEZER thus pushes system into fallback every cycle, causing unnecessary emergency braking.
8 · Limitations & Boundary Conditions (Quantitative)
Assumption | Possible Impact if Violated |
---|---|
White-box knowledge (weights & NMS params) | Randomized anchors/NMS could cut proposal survival by ≈50 % (authors’ ablation). |
Sync error ≤100 ms | Warping error increases; success rate drops from 100 % → 87 % at 250 ms delay. |
Homogeneous model stack | Transferability to unknown models falls; preliminary cross-model ASR 64 %. |
LiDAR-only CP | Camera-only NMS uses fewer anchors (≈400) → theoretical slowdown cap 16 ×; untested. |
9 · Future Work (Ranked by Expected Impact)
- Protocol-level rate-limiting: Negotiated per-agent proposal budgets (<500) could bound NMS to <50 ms with <2 % mAP loss.
- Linear-time Learned NMS: Replace classical greedy NMS; prototypes show O(M) runtime (e.g., FastNMS).
- Black-box Query Attack: Early tests using NES gradient-free search reach 40 % ASR with 200 queries—needs optimization.
- Cross-modal Fusion Attacks: Combine camera & radar features to bypass anchor caps.
- Latency-robust Certification: Analogous to ℓ∞ accuracy robustness bounds—prove NMS runtime ≤ c·M_max.
10 · Conflict-of-Interest & Bias Check
Potential Bias | Evidence | Mitigation |
---|---|---|
Qualcomm authors may favor GPU-centric problem framing | Heavy focus on RTX benchmarks | Included Jetson Orin (SoC) numbers in appendix: 0.31 s → 5.9 s (19×). |
Sim-to-real gap | Main results on CARLA; but real-vehicle testbed demo (NUVO-8208GC, Ouster-128) shows 2.7 s/frame | Call for independent third-party road trials. |
11 · Is CP-FREEZER a “Protocol-Level” & Generalizable Attack?
Partly yes.
• Layer targeted: Application layer of cooperative perception protocol (feature-sharing & fusion), not PHY/MAC (DSRC/C-V2X).
• Why it generalizes:
– Exploits fundamental O(M²) behaviour of greedy NMS—present in almost every 2-D/3-D detector stack (camera, LiDAR, radar).
– Works across four distinct CP algorithms and three hardware tiers with similar multipliers.
– Requires no specific LiDAR brand or V2V transport; only needs the ability to send BEV tensors accepted by peers.
• Where it may not generalize:
– Systems using single-stage detectors with learned or linear-time suppression.
– Heterogeneous fleets that assign per-agent trust weights and cap proposal counts.
– Future protocol versions that authenticate and rate-limit feature volumes.
Bottom line: Until CP standards include runtime-budget enforcement and/or move away from quadratic NMS, the attack remains broadly portable across vendors and sensor modalities.
Executive One-Sentence Wrap-Up
A single malicious car, by exploiting the quadratic worst-case of today’s detection post-processing, can freeze shared perception for several seconds on even the fastest GPUs; without architectural changes, this vulnerability is inherent to most current cooperative-perception protocols.