论文
arXiv
Trajectory
Mobility
中文标题
驱动城市感知的视觉杠杆有多少?基于多重局部化编辑的干预性反事实分析
English Title
How Many Visual Levers Drive Urban Perception? Interventional Counterfactuals via Multiple Localised Edits
Jason Tang, Stephen Law
发布时间
2026/4/24 06:20:59
来源类型
preprint
语言
en
摘要
中文对照

街景感知模型可大规模预测安全等主观属性,但其本质仍为相关性建模:无法识别针对特定场景、可能改变人类判断的局部视觉变化。我们提出一种基于杠杆的干预性反事实框架,将场景级可解释性重构为在结构化反事实编辑空间内的有界搜索。每个杠杆定义一个语义概念、空间支持范围、干预方向及受约束的编辑模板。候选编辑通过提示词引导的图像编辑生成,并仅在满足同地点保持性、局部性、真实性和合理性等有效性检验时予以保留。在来自五座城市的50个场景的初步实验中,该框架揭示了基于代理的方向性模式初探结果,以及纯提示编辑下的实用失效分类体系;其中,交通基础设施(Mobility Infrastructure)与物理维护(Physical Maintenance)两类杠杆引发的安全性辅助变化最为显著。人类成对判断仍为未来验证的基准真值终点。

English Original

Street-view perception models predict subjective attributes such as safety at scale, but remain correlational: they do not identify which localized visual changes would plausibly shift human judgement for a specific scene. We propose a lever-based interventional counterfactual framework that recasts scene-level explainability as a bounded search over structured counterfactual edits. Each lever specifies a semantic concept, spatial support, intervention direction, and constrained edit template. Candidate edits are generated through prompt-conditioned image editing and retained only if they satisfy validity checks for same-place preservation, locality, realism, and plausibility. In a pilot across 50 scenes from five cities, the framework reveals preliminary proxy-based directional patterns and a practical failure taxonomy under prompt-only editing, with Mobility Infrastructure and Physical Maintenance showing the largest auxiliary safety shifts. Human pairwise judgements remain the ground-truth endpoint for future validation.

元数据
arXiv2604.22103v1
来源arXiv
类型论文
抽取状态raw
关键词
Trajectory
Mobility
cs.CY
cs.CV