UrbanComp Lab | 学习资料库

返回论文库

论文

arXiv

Trajectory

Mobility

中文标题

驱动城市感知的视觉杠杆有多少？基于多重局部化编辑的干预性反事实分析

English Title

How Many Visual Levers Drive Urban Perception? Interventional Counterfactuals via Multiple Localised Edits

Jason Tang, Stephen Law

发布时间

2026/4/24 06:20:59

来源类型

preprint

语言

摘要

中文对照

街景感知模型可大规模预测安全等主观属性，但其本质仍为相关性建模：无法识别针对特定场景、可能改变人类判断的局部视觉变化。我们提出一种基于杠杆的干预性反事实框架，将场景级可解释性重构为在结构化反事实编辑空间内的有界搜索。每个杠杆定义一个语义概念、空间支持范围、干预方向及受约束的编辑模板。候选编辑通过提示词引导的图像编辑生成，并仅在满足同地点保持性、局部性、真实性和合理性等有效性检验时予以保留。在来自五座城市的50个场景的初步实验中，该框架揭示了基于代理的方向性模式初探结果，以及纯提示编辑下的实用失效分类体系；其中，交通基础设施（Mobility Infrastructure）与物理维护（Physical Maintenance）两类杠杆引发的安全性辅助变化最为显著。人类成对判断仍为未来验证的基准真值终点。

English Original

Street-view perception models predict subjective attributes such as safety at scale, but remain correlational: they do not identify which localized visual changes would plausibly shift human judgement for a specific scene. We propose a lever-based interventional counterfactual framework that recasts scene-level explainability as a bounded search over structured counterfactual edits. Each lever specifies a semantic concept, spatial support, intervention direction, and constrained edit template. Candidate edits are generated through prompt-conditioned image editing and retained only if they satisfy validity checks for same-place preservation, locality, realism, and plausibility. In a pilot across 50 scenes from five cities, the framework reveals preliminary proxy-based directional patterns and a practical failure taxonomy under prompt-only editing, with Mobility Infrastructure and Physical Maintenance showing the largest auxiliary safety shifts. Human pairwise judgements remain the ground-truth endpoint for future validation.

资源链接

论文 PDFarxiv.org/pdf/2604.22103v1 论文 PDFarxiv.org/pdf/2604.22103v1 原始来源页面arxiv.org/abs/2604.22103v1

元数据

arXiv2604.22103v1

来源arXiv

类型论文

抽取状态raw

关键词

Trajectory

Mobility

cs.CY

cs.CV