结构化论文与动态追踪

论文与情报流

聚合权威论文、预印本与机构内容源,支持按主题、来源与关键词筛选,面向实验室持续开展科研跟踪与资料沉淀。

筛选器
当前来源
当前展示 24 条,共 798
最新采集内容
首页/论文
论文
Scientific Data
TopJournal
Chromosome-scale Genome Assembly of the Critically Endangered Blue-crowned Laughingthrush (<i>Pterorhinus courtoisi</i>, Leiothrichidae)

Scientific Data, Published online: 20 March 2026; doi:10.1038/s41597-026-06951-8 Chromosome-scale Genome Assembly of the Critically Endangered Blue-crowned Laughingthrush ( Pterorhinus courtoisi , Leiothrichidae)

Yuxuan Ouyang
2026/3/20
论文
Scientific Data
TopJournal
A dataset of the smart governance index for Chinese cities

Scientific Data, Published online: 20 March 2026; doi:10.1038/s41597-025-06510-7 A dataset of the smart governance index for Chinese cities

Lu Song
2026/3/20
论文
Scientific Data
TopJournal
Dataset of solubility values for organic compounds in binary mixtures of solvents at various temperatures

Scientific Data, Published online: 20 March 2026; doi:10.1038/s41597-026-07047-z Dataset of solubility values for organic compounds in binary mixtures of solvents at various temperatures

Dmitry Malikov
2026/3/20
论文
Scientific Data
TopJournal
A high-precision catalogue of landslide events in China based on news text mining with large language model

Scientific Data, Published online: 20 March 2026; doi:10.1038/s41597-026-07066-w A high-precision catalogue of landslide events in China based on news text mining with large language model

Binru Zhao
2026/3/20
论文
Scientific Data
TopJournal
High-Resolution Downscaled CMIP6 Projections dataset of Key Climate Variables for Senegal

Scientific Data, Published online: 20 March 2026; doi:10.1038/s41597-026-07059-9 High-Resolution Downscaled CMIP6 Projections dataset of Key Climate Variables for Senegal

Asse Mbengue
2026/3/20
论文
Scientific Data
TopJournal
VitalDB Arrhythmia Database: An Anesthesiologist-Validated Large-scale Intraoperative Arrhythmia Dataset with Beat and Rhythm Labels

Scientific Data, Published online: 20 March 2026; doi:10.1038/s41597-026-07076-8 VitalDB Arrhythmia Database: An Anesthesiologist-Validated Large-scale Intraoperative Arrhythmia Dataset with Beat and Rhythm Labels

Da-In Eun
2026/3/20
论文
Scientific Data
TopJournal
Multimodal
A Three-Year Multimodal Holistic Dataset For Horticultural Tomato Cultivation

Scientific Data, Published online: 20 March 2026; doi:10.1038/s41597-026-07074-w A Three-Year Multimodal Holistic Dataset For Horticultural Tomato Cultivation

Yu Gong
2026/3/20
论文
Scientific Data
TopJournal
COVID Diaries, State Response to COVID Vaccination Program, December 2020 to September 2021

Scientific Data, Published online: 20 March 2026; doi:10.1038/s41597-026-06975-0 COVID Diaries, State Response to COVID Vaccination Program, December 2020 to September 2021

Avalon S. Moore
2026/3/20
论文
arXiv
SpatialIntelligence
Trajectory
Not All Features Are Created Equal: A Mechanistic Study of Vision-Language-Action Models

Vision-Language-Action (VLA) models combine perception, language, and motor control in a single architecture, yet how they translate multimodal inputs into actions remains poorly understood. We apply activation injection, sparse autoencoders (SAEs), and linear probes to six models spanning 80M--7B parameters across 394,000+ rollout episodes on four benchmarks. The visual pathway dominates action generation across all architectures: injecting baseline activations into null-prompt episodes recovers near-identical behavior, while cross-task injection steers robots toward source-task positions (99.8\% of X-VLA episodes align with the source trajectory), exposing spatially bound motor programs tied to scene coordinates rather than abstract task representations. Language sensitivity depends on task structure, not model design: when visual context uniquely specifies the task, language is ignored; when multiple goals share a scene, language becomes essential (X-VLA \texttt{libero\_goal}: 94\%$\to$10\% under wrong prompts vs.\ \texttt{libero\_object}: 60--100\% regardless). In all three multi-pathway architectures (\pizhalf{}, SmolVLA, GR00T), expert pathways encode motor programs while VLM pathways encode goal semantics ($2\times$ greater behavioral displacement from expert injection), and subspace injection confirms these occupy separable activation subspaces. Per-token SAE processing is essential for action fidelity on most architectures, though mean-pooling improves fidelity on X-VLA. Contrastive identification recovers 82+ manipulation concepts, and causal ablation reveals sensitivity spanning 28--92\% zero-effect rates independent of representation width. We release \textbf{Action Atlas} (https://action-atlas.com) for interactive exploration of VLA representations across all six models.

Bryce Grant, Xijia Zhao, Peng Wang
2026/3/20
论文
arXiv
Trajectory
Mobility
MonoArt: Progressive Structural Reasoning for Monocular Articulated 3D Reconstruction

Reconstructing articulated 3D objects from a single image requires jointly inferring object geometry, part structure, and motion parameters from limited visual evidence. A key difficulty lies in the entanglement between motion cues and object structure, which makes direct articulation regression unstable. Existing methods address this challenge through multi-view supervision, retrieval-based assembly, or auxiliary video generation, often sacrificing scalability or efficiency. We present MonoArt, a unified framework grounded in progressive structural reasoning. Rather than predicting articulation directly from image features, MonoArt progressively transforms visual observations into canonical geometry, structured part representations, and motion-aware embeddings within a single architecture. This structured reasoning process enables stable and interpretable articulation inference without external motion templates or multi-stage pipelines. Extensive experiments on PartNet-Mobility demonstrate that OM achieves state-of-the-art performance in both reconstruction accuracy and inference speed. The framework further generalizes to robotic manipulation and articulated scene reconstruction.

Haitian Li, Haozhe Xie, Junxiang Xu
2026/3/20
论文
arXiv
Multimodal
Agent
NavTrust: Benchmarking Trustworthiness for Embodied Navigation

There are two major categories of embodied navigation: Vision-Language Navigation (VLN), where agents navigate by following natural language instructions; and Object-Goal Navigation (OGN), where agents navigate to a specified target object. However, existing work primarily evaluates model performance under nominal conditions, overlooking the potential corruptions that arise in real-world settings. To address this gap, we present NavTrust, a unified benchmark that systematically corrupts input modalities, including RGB, depth, and instructions, in realistic scenarios and evaluates their impact on navigation performance. To our best knowledge, NavTrust is the first benchmark that exposes embodied navigation agents to diverse RGB-Depth corruptions and instruction variations in a unified framework. Our extensive evaluation of seven state-of-the-art approaches reveals substantial performance degradation under realistic corruptions, which highlights critical robustness gaps and provides a roadmap toward more trustworthy embodied navigation systems. Furthermore, we systematically evaluate four distinct mitigation strategies to enhance robustness against RGB-Depth and instructions corruptions. Our base models include Uni-NaVid and ETPNav. We deployed them on a real mobile robot and observed improved robustness to corruptions. The project website is: https://navtrust.github.io.

Huaide Jiang, Yash Chaudhary, Yuping Wang
2026/3/20
论文
arXiv
SpatialIntelligence
Trajectory
Bridging Semantic and Kinematic Conditions with Diffusion-based Discrete Motion Tokenizer

Prior motion generation largely follows two paradigms: continuous diffusion models that excel at kinematic control, and discrete token-based generators that are effective for semantic conditioning. To combine their strengths, we propose a three-stage framework comprising condition feature extraction (Perception), discrete token generation (Planning), and diffusion-based motion synthesis (Control). Central to this framework is MoTok, a diffusion-based discrete motion tokenizer that decouples semantic abstraction from fine-grained reconstruction by delegating motion recovery to a diffusion decoder, enabling compact single-layer tokens while preserving motion fidelity. For kinematic conditions, coarse constraints guide token generation during planning, while fine-grained constraints are enforced during control through diffusion-based optimization. This design prevents kinematic details from disrupting semantic token planning. On HumanML3D, our method significantly improves controllability and fidelity over MaskControl while using only one-sixth of the tokens, reducing trajectory error from 0.72 cm to 0.08 cm and FID from 0.083 to 0.029. Unlike prior methods that degrade under stronger kinematic constraints, ours improves fidelity, reducing FID from 0.033 to 0.014.

Chenyang Gu, Mingyuan Zhang, Haozhe Xie
2026/3/20
论文
arXiv
SpatialIntelligence
Trajectory
Rethinking Vector Field Learning for Generative Segmentation

Taming diffusion models for generative segmentation has attracted increasing attention. While existing approaches primarily focus on architectural tweaks or training heuristics, there remains a limited understanding of the intrinsic mismatch between continuous flow matching objectives and discrete perception tasks. In this work, we revisit diffusion segmentation from the perspective of vector field learning. We identify two key limitations of the commonly used flow matching objective: gradient vanishing and trajectory traversing, which result in slow convergence and poor class separation. To tackle these issues, we propose a principled vector field reshaping strategy that augments the learned velocity field with a detached distance-aware correction term. This correction introduces both attractive and repulsive interactions, enhancing gradient magnitudes near centroids while preserving the original diffusion training framework. Furthermore, we design a computationally efficient, quasi-random category encoding scheme inspired by Kronecker sequences, which integrates seamlessly with an end-to-end pixel neural field framework for pixel-level semantic alignment. Extensive experiments consistently demonstrate significant improvements over vanilla flow matching approaches, substantially narrowing the performance gap between generative segmentation and strong discriminative specialists.

Chaoyang Wang, Yaobo Liang, Boci Peng
2026/3/20
论文
arXiv
SpatialIntelligence
Trajectory
Synthetic Control Misconceptions: Recommendations for Practice

To estimate the causal effect of an intervention, researchers need to identify a control group that represents what might have happened to the treatment group in the absence of that intervention. This is challenging without a randomized experiment and further complicated when few units (possibly only one) are treated. Nevertheless, when data are available on units over time, synthetic control (SC) methods provide an opportunity to construct a valid comparison by differentially weighting control units that did not receive the treatment so that their resulting pre-treatment trajectory is similar to that of the treated unit. The hope is that this weighted ``pseudo-counterfactual" can serve as a valid counterfactual in the post-treatment time period. Since its origin twenty years ago, SC has been used over 5,000 times in the literature (Web of Science, December 2025), leading to a proliferation of descriptions of the method and guidance on proper usage that is not always accurate and does not always align with what the original developers appear to have intended. As such, a number of accepted pieces of wisdom have arisen: (1) SC is robust to various implementations; (2) covariates are unnecessary, and (3) pre-treatment prediction error should guide model selection. We describe each in detail and conduct simulations that suggest, both for standard and alternative implementations of SC, that these purported truths are not supported by empirical evidence and thus actually represent misconceptions about best practice. Instead of relying on these misconceptions, we offer practical advice for more cautious implementation and interpretation of results.

Robert Pickett, Jennifer Hill, Sarah Cowan
2026/3/20
论文
arXiv
AI
OmniVTA: Visuo-Tactile World Modeling for Contact-Rich Robotic Manipulation

Contact-rich manipulation tasks, such as wiping and assembly, require accurate perception of contact forces, friction changes, and state transitions that cannot be reliably inferred from vision alone. Despite growing interest in visuo-tactile manipulation, progress is constrained by two persistent limitations: existing datasets are small in scale and narrow in task coverage, and current methods treat tactile signals as passive observations rather than using them to model contact dynamics or enable closed-loop control explicitly. In this paper, we present \textbf{OmniViTac}, a large-scale visuo-tactile-action dataset comprising $21{,}000+$ trajectories across $86$ tasks and $100+$ objects, organized into six physics-grounded interaction patterns. Building on this dataset, we propose \textbf{OmniVTA}, a world-model-based visuo-tactile manipulation framework that integrates four tightly coupled modules: a self-supervised tactile encoder, a two-stream visuo-tactile world model for predicting short-horizon contact evolution, a contact-aware fusion policy for action generation, and a 60Hz reflexive controller that corrects deviations between predicted and observed tactile signals in a closed loop. Real-robot experiments across all six interaction categories show that OmniVTA outperforms existing methods and generalizes well to unseen objects and geometric configurations, confirming the value of combining predictive contact modeling with high-frequency tactile feedback for contact-rich manipulation. All data, models, and code will be made publicly available on the project website at https://mrsecant.github.io/OmniVTA.

Yuhang Zheng, Songen Gu, Weize Li
2026/3/20
论文
arXiv
SpatialIntelligence
Trajectory
FASTER: Rethinking Real-Time Flow VLAs

Real-time execution is crucial for deploying Vision-Language-Action (VLA) models in the physical world. Existing asynchronous inference methods primarily optimize trajectory smoothness, but neglect the critical latency in reacting to environmental changes. By rethinking the notion of reaction in action chunking policies, this paper presents a systematic analysis of the factors governing reaction time. We show that reaction time follows a uniform distribution determined jointly by the Time to First Action (TTFA) and the execution horizon. Moreover, we reveal that the standard practice of applying a constant schedule in flow-based VLAs can be inefficient and forces the system to complete all sampling steps before any movement can start, forming the bottleneck in reaction latency. To overcome this issue, we propose Fast Action Sampling for ImmediaTE Reaction (FASTER). By introducing a Horizon-Aware Schedule, FASTER adaptively prioritizes near-term actions during flow sampling, compressing the denoising of the immediate reaction by tenfold (e.g., in $π_{0.5}$ and X-VLA) into a single step, while preserving the quality of long-horizon trajectory. Coupled with a streaming client-server pipeline, FASTER substantially reduces the effective reaction latency on real robots, especially when deployed on consumer-grade GPUs. Real-world experiments, including a highly dynamic table tennis task, prove that FASTER unlocks unprecedented real-time responsiveness for generalist policies, enabling rapid generation of accurate and smooth trajectories.

Yuxiang Lu, Zhe Liu, Xianzhe Fan
2026/3/20
论文
arXiv
SpatialIntelligence
Trajectory
OS-Themis: A Scalable Critic Framework for Generalist GUI Rewards

Reinforcement Learning (RL) has the potential to improve the robustness of GUI agents in stochastic environments, yet training is highly sensitive to the quality of the reward function. Existing reward approaches struggle to achieve both scalability and performance. To address this, we propose OS-Themis, a scalable and accurate multi-agent critic framework. Unlike a single judge, OS-Themis decomposes trajectories into verifiable milestones to isolate critical evidence for decision making and employs a review mechanism to strictly audit the evidence chain before making the final verdict. To facilitate evaluation, we further introduce OmniGUIRewardBench (OGRBench), a holistic cross-platform benchmark for GUI outcome rewards, where all evaluated models achieve their best performance under OS-Themis. Extensive experiments on AndroidWorld show that OS-Themis yields a 10.3% improvement when used to support online RL training, and a 6.9% gain when used for trajectory validation and filtering in the self-training loop, highlighting its potential to drive agent evolution.

Zehao Li, Zhenyu Wu, Yibo Zhao
2026/3/20
论文
arXiv
SpatialIntelligence
Trajectory
ADMM-Based Distributed MPC with Control Barrier Functions for Safe Multi-Robot Quadrupedal Locomotion

This paper proposes a fully decentralized model predictive control (MPC) framework with control barrier function (CBF) constraints for safety-critical trajectory planning in multi-robot legged systems. The incorporation of CBF constraints introduces explicit inter-agent coupling, which prevents direct decomposition of the resulting optimal control problems. To address this challenge, we reformulate the centralized safety-critical MPC problem using a structured distributed optimization framework based on the alternating direction method of multipliers (ADMM). By introducing a novel node-edge splitting formulation with consensus constraints, the proposed approach decomposes the global problem into independent node-local and edge-local quadratic programs that can be solved in parallel using only neighbor-to-neighbor communication. This enables fully decentralized trajectory optimization with symmetric computational load across agents while preserving safety and dynamic feasibility. The proposed framework is integrated into a hierarchical locomotion control architecture for quadrupedal robots, combining high-level distributed trajectory planning, mid-level nonlinear MPC enforcing single rigid body dynamics, and low-level whole-body control enforcing full-order robot dynamics. The effectiveness of the proposed approach is demonstrated through hardware experiments on two Unitree Go2 quadrupedal robots and numerical simulations involving up to four robots navigating uncertain environments with rough terrain and external disturbances. The results show that the proposed distributed formulation achieves performance comparable to centralized MPC while reducing the average per-cycle planning time by up to 51% in the four-agent case, enabling efficient real-time decentralized implementation.

Yicheng Zeng, Ruturaj S. Sambhus, Basit Muhammad Imran
2026/3/20
论文
arXiv
RemoteSensing
EarthObservation
Mobile Radio Networks and Weather Radars Dualism: Rainfall Measurement Revolution in Densely Populated Areas

This study demonstrates, for the first time, how a network of cellular base stations (BSs) - the infrastructure of mobile radio networks - can be used as a distributed opportunistic radar for rainfall remote sensing. By adapting signal-processing techniques traditionally employed in Doppler weather radar systems, we demonstrate that BS signals can be used to retrieve typical weather radar products, including reflectivity factor, mean Doppler velocity, and spectral width. Due to the high spatial density of BS infrastructure in urban environments, combined with intrinsic technical features such as electronically steerable antenna arrays and wide receiver bandwidths, the proposed approach achieves unprecedented spatial and temporal resolutions, on the order of a few meters and several tens of seconds, respectively. Despite limitations related to low transmitted power, limited antenna gain, and other system constraints, a major challenge arises from ground clutter contamination, which is exacerbated by the nearly horizontal orientation of BS antenna beams. This work provides a thorough assessment of clutter impact and demonstrates that, through appropriate processing, the resulting clutter-filtered radar moments reach a satisfactory level of quality when compared with raw observations and with measurements from independent BSs with overlapped field-of-views. The findings highlight a transformative opportunity for urban hydrometeorology: leveraging existing telecommunications infrastructure to obtain rainfall information with a level of spatial granularity and temporal immediacy like never before.

Davide Tornielli Bellini, Mario Montopoli, Dario Tagliaferri
2026/3/20
论文
arXiv
SpatialIntelligence
Trajectory
From Inference Efficiency to Embodied Efficiency: Revisiting Efficiency Metrics for Vision-Language-Action Models

Vision-Language-Action (VLA) models have recently enabled embodied agents to perform increasingly complex tasks by jointly reasoning over visual, linguistic, and motor modalities. However, we find that the prevailing notion of ``efficiency'' in current VLA research, characterized by parameters, FLOPs, or token decoding throughput, does not reflect actual performance on robotic platforms. In real-world execution, efficiency is determined by system-level embodied behaviors such as task completion time, trajectory smoothness, cumulative joint rotation, and motion energy. Through controlled studies across model compression, token sparsification, and action sequence compression, we make several observations that challenge common assumptions. (1) Methods that reduce computation under conventional metrics often increase end-to-end execution cost or degrade motion quality, despite maintaining task success rates. (2) System-level embodied efficiency metrics reveal performance differences in the learned action policies that remain hidden under conventional evaluations. (3) Common adaptation methods such as in-context prompting or supervised fine-tuning show only mild and metric-specific improvements in embodied efficiency. While these methods can reduce targeted embodied-efficiency metrics such as jerk or action rate, the resulting gains may come with trade-offs in other metrics, such as longer completion time. Taken together, our results suggest that conventional inference efficiency metrics can overlook important aspects of embodied execution. Incorporating embodied efficiency provides a more complete view of policy behavior and practical performance, enabling fairer and more comprehensive comparisons of VLA models.

Zhuofan Li, Hongkun Yang, Zhenyang Chen
2026/3/20
论文
arXiv
SpatialIntelligence
Trajectory
Exact-Time Safety Recovery using Time-Varying Control Barrier Functions with Optimal Barrier Tracking

This paper is motivated by controllers developed for autonomous vehicles which occasionally result into conditions where safety is no longer guaranteed. We develop an exact-time safety recovery framework for any control-affine nonlinear system when its state is outside a safe region using time-varying Control Barrier Functions (CBFs) with optimal barrier tracking. Unlike conventional formulations that provide only conservative upper bounds on recovery time convergence, the proposed approach guarantees recovery to the safe set at a prescribed time. The key mechanism is an active barrier tracking condition that forces the barrier function to follow exactly a designer-specified recovery trajectory. This transforms safety recovery into a trajectory design problem. The recovery trajectory is parameterized and optimized to achieve optimal performance while preserving feasibility under input constraints, avoiding the aggressive corrective actions typically induced by conventional finite-time formulations. The safety recovery framework is applied to the roundabout traffic coordination problem for Connected and Automated Vehicles (CAVs), where any initially violated safe merging constraint is replaced by an exact-time recovery barrier constraint to ensure safety guarantee restoration before CAV conflict points are reached. Simulation results demonstrate improved feasibility and performance.

Yingqing Chen, Christos G. Cassandras, Wei Xiao
2026/3/20
论文
arXiv
AI
Isentropic hybrid stars in the Nambu-Jona-Lasinio model: effects of neutrino trapping

Binary neutron star mergers and proto-neutron stars provide unique environments where dense matter is hot, lepton rich, and potentially undergoes a transition from hadronic to deconfined quark matter. We investigate the thermodynamics and stellar properties of hybrid matter under such conditions. The hadronic phase is described within a covariant density functional framework, while the quark phase is modeled using a Nambu-Jona-Lasinio (NJL) model that includes repulsive vector interactions, the axial $U_A(1)$-breaking 't Hooft determinant interaction, and two-flavor color-superconducting (2SC) pairing. The phase transition between hadronic and quark matter is constructed using a mixed-phase prescription that enforces baryon and lepton number conservation, allowing us to follow thermodynamic trajectories at fixed entropy per baryon and fixed lepton fraction. We analyze the phase structure of dense matter at finite temperature and study the composition of the hadronic, mixed, and quark phases in both neutrino-trapped and neutrino-free regimes. Our results show that neutrino trapping significantly modifies the particle composition and shifts the onset of deconfinement to higher densities. Using the resulting equations of state, we compute static stellar configurations and examine the influence of temperature and lepton content on the mass-radius relation of hybrid stars. Hot, neutrino-rich configurations are found to have larger radii and slightly higher maximum masses than their cold counterparts.

Andrea Sabatucci, Armen Sedrakian
2026/3/20
论文
arXiv
GeoAI
GIS
Multi-Modal Building Change Detection for Large-Scale Small Changes: Benchmark and Baseline

Change detection in optical remote sensing imagery is susceptible to illumination fluctuations, seasonal changes, and variations in surface land-cover materials. Relying solely on RGB imagery often produces pseudo-changes and leads to semantic ambiguity in features. Incorporating near-infrared (NIR) information provides heterogeneous physical cues that are complementary to visible light, thereby enhancing the discriminability of building materials and tiny structures while improving detection accuracy. However, existing multi-modal datasets generally lack high-resolution and accurately registered bi-temporal imagery, and current methods often fail to fully exploit the inherent heterogeneity between these modalities. To address these issues, we introduce the Large-scale Small-change Multi-modal Dataset (LSMD), a bi-temporal RGB-NIR building change detection benchmark dataset targeting small changes in realistic scenarios, providing a rigorous testing platform for evaluating multi-modal change detection methods in complex environments. Based on LSMD, we further propose the Multi-modal Spectral Complementarity Network (MSCNet) to achieve effective cross-modal feature fusion. MSCNet comprises three key components: the Neighborhood Context Enhancement Module (NCEM) to strengthen local spatial details, the Cross-modal Alignment and Interaction Module (CAIM) to enable deep interaction between RGB and NIR features, and the Saliency-aware Multisource Refinement Module (SMRM) to progressively refine fused features. Extensive experiments demonstrate that MSCNet effectively leverages multi-modal information and consistently outperforms existing methods under multiple input configurations, validating its efficacy for fine-grained building change detection. The source code will be made publicly available at: https://github.com/AeroVILab-AHU/LSMD

Ye Wang, Wei Lu, Zhihui You
2026/3/20
论文
arXiv
Agent
CAMO: A Conditional Neural Solver for the Multi-objective Multiple Traveling Salesman Problem

Robotic systems often require a team of robots to collectively visit multiple targets while optimizing competing objectives, such as total travel cost and makespan. This setting can be formulated as the Multi-Objective Multiple Traveling Salesman Problem (MOMTSP). Although learning-based methods have shown strong performance on the single-agent TSP and multi-objective TSP variants, they rarely address the combined challenges of multi-agent coordination and multi-objective trade-offs, which introduce dual sources of complexity. To bridge this gap, we propose CAMO, a conditional neural solver for MOMTSP that generalizes across varying numbers of targets, agents, and preference vectors, and yields high-quality approximations to the Pareto front (PF). Specifically, CAMO consists of a conditional encoder to fuse preferences into instance representations, enabling explicit control over multi-objective trade-offs, and a collaborative decoder that coordinates all agents by alternating agent selection and node selection to construct multi-agent tours autoregressively. To further improve generalization, we train CAMO with a REINFORCE-based objective over a mixed distribution of problem sizes. Extensive experiments show that CAMO outperforms both neural and conventional heuristics, achieving a closer approximation of PFs. In addition, ablation results validate the contributions of CAMO's key components, and real-world tests on a mobile robot platform demonstrate its practical applicability.

Fengxiaoxiao Li, Xiao Mao, Mingfeng Fan
2026/3/19
1 / 34