论文
arXiv
GeoAI
GIS
RemoteSensing
EarthObservation
LLM
Multimodal
GeoMultimodal
Agent
中文标题
MONETA:基于地理信息与多智能体系统的多模态行业分类方法
English Title
MONETA: Multimodal Industry Classification through Geographic Information with Multi Agent Systems
Arda Yüksel, Gabriel Thiem, Susanne Walter, Patrick Felka, Gabriela Alves Werb, Ivan Habernal
发布时间
2026/4/9 16:21:39
来源类型
preprint
语言
en
摘要
中文对照

行业分类体系是公共及企业数据库的重要组成部分,用于依据经济活动对企业进行归类。由于企业注册名录规模庞大,人工标注成本高昂;而每次行业分类体系更新后重新微调模型又需大量数据采集。我们通过利用现有或易于获取的多模态资源,模拟人工专家验证过程以实现行业分类。本文提出 MONETA——首个融合文本(网站、维基百科、Wikidata)与地理空间数据(OpenStreetMap 及卫星影像)的多模态行业分类基准。该数据集涵盖欧洲 1,000 家企业,依据欧盟《统计用经济活动分类》(NACE)标准标注 20 类经济活动标签。我们在无需训练的基线方法上,分别使用开源与闭源的多模态大语言模型(MLLM)取得 62.10% 和 74.10% 的准确率;进一步结合多轮交互设计、上下文增强与分类解释机制后,性能提升最高达 22.80%。我们将公开发布该数据集及优化后的标注指南。

English Original

Industry classification schemes are integral parts of public and corporate databases as they classify businesses based on economic activity. Due to the size of the company registers, manual annotation is costly, and fine-tuning models with every update in industry classification schemes requires significant data collection. We replicate the manual expert verification by using existing or easily retrievable multimodal resources for industry classification. We present MONETA, the first multimodal industry classification benchmark with text (Website, Wikipedia, Wikidata) and geospatial sources (OpenStreetMap and satellite imagery). Our dataset enlists 1,000 businesses in Europe with 20 economic activity labels according to EU guidelines (NACE). Our training-free baseline reaches 62.10% and 74.10% with open and closed-source Multimodal Large Language Models (MLLM). We observe an increase of up to 22.80% with the combination of multi-turn design, context enrichment, and classification explanations. We will release our dataset and the enhanced guidelines.

元数据
arXiv2604.07956v1
来源arXiv
类型论文
抽取状态raw
关键词
GeoAI
GIS
RemoteSensing
EarthObservation
LLM
Multimodal
GeoMultimodal
Agent
cs.AI