大型语言模型(LLMs)正日益被用于描述与评估城市,但其城市判断背后的文化结构仍缺乏深入研究。本文提出一种测量框架,用以检验基于LLM的城市感知是否具有文化中立性,该框架依托一个全球分层的街景图像数据集。三个前沿多模态模型生成的开放式描述与结构化评分均表明,所谓中立基线更接近于欧洲与北美地区相关的地方性认知框架,而非其他文化框架。AI与人类城市感知的对比进一步显示,提示工程(prompting)虽可使AI响应趋近特定区域的人类描述,却无法复现人类响应的丰富性与多样性,反而会弱化可观测的人口统计学模式,并引入基于情感的自我偏好偏差。这些结果表明,在将AI视为城市任务中的中立工具时存在系统性风险,尤其当模型输出被用于跨文化语境下的城市比较、评估或表征时。
Large language models (LLMs) are increasingly used to describe and evaluate cities, yet the cultural structure of their urban judgments remains understudied. Here we introduce a measurement framework for testing whether LLM-based urban perception is culturally neutral, using a globally stratified street-view image dataset. Open-ended descriptions and structured scores generated by three frontier multimodal models all show that the neutral baseline lies closer to regional framings associated with Europe and North America than to other cultural framings. Comparisons between AI and human urban perception further show that prompting can move AI responses closer to specific regional human descriptions, but fails to recover the variety and diversity of human responses, flattening observed demographic patterns and introducing sentiment-based self-favouring bias. These results indicate a systematic risk in treating AI as a neutral tool for urban tasks, especially when model outputs are used to compare, evaluate or represent cities across cultural contexts.