轨迹数据增强是一种有望缓解机器学习应用中数据稀缺问题的方法,但其效用受限于维持时空一致性的复杂性。尽管先前工作已验证几何扰动的可行性,却依赖于朴素的随机选择策略,从而在“应选择哪些轨迹进行增强以实现最大收益”这一关键问题上留下重要空白。本论文通过构建一个系统化且可扩展的框架填补该空白,评估五种系统化选择策略:离群性(Outlierness)、多样性(Diversity)、代表性(Representativeness)、不确定性(Uncertainty)及随机选择(Random selection)。这些策略在涵盖动物行为(Foxes 和 Starkey)、海上交通(AIS)及城市交通(Car)的四个数据集上,结合一系列线性与非线性机器学习模型进行了严格测试。作为评估的一部分,本研究集成了基于 Optuna 的超参数优化循环,以在所探索的搜索空间内为每个数据集经验性地确定最优增强参数。结果表明,尽管系统化选择并非普适解,但相较随机基线仍具显著优势:尤其是离群性与不确定性策略展现出更高稳定性,且在稠密数据集中不易出现随机采样所导致的性能下降。然而,研究亦发现增强的价值具有严格条件性:通过 UMAP 进行的可视化分析表明,系统化增强虽可在稀疏数据集中有效修复拓扑碎片化,但在高质量、稠密数据集中反而可能成为干扰噪声;此外,研究还识别出高流速领域存在的物理限制,在该类场景下标准扰动技术失效。
Trajectory data augmentation is a promising approach to mitigate data scarcity in machine learning applications, but its utility has been limited by the complexity of preserving spatio-temporal coherence. Although prior work demonstrated the viability of geometric perturbation, it relied on naive random selection, leaving a critical gap in understanding which trajectories should be augmented for maximal benefit. This thesis addresses this gap by developing a systematic and scalable framework to evaluate five systematic selection strategies: Outlierness, Diversity, Representativeness, Uncertainty, and Random selection. These strategies were rigorously tested across four datasets covering animal behavior (Foxes and Starkey), maritime traffic (AIS), and urban traffic (Car) using a suite of linear and non-linear machine learning models. As part of this evaluation, an Optuna-based hyperparameter optimization loop was integrated to empirically identify the best-performing augmentation parameters for each dataset within the explored search space. The results indicate that, while systematic selection is not a universal solution, it offers distinct advantages over the random baseline. Systematic strategies, particularly Outlierness and Uncertainty, demonstrated higher stability and were less prone to performance degradation observed with random sampling in dense datasets. However, the findings also reveal that the value of augmentation is strictly conditional. Visual analysis via UMAP demonstrates that while systematic augmentation successfully repairs topological fragmentation in sparse datasets, it can act as a corrupting noise signal in high-quality, dense datasets. Furthermore, the study identified physical limitations in high-velocity domains, where standard perturbation techniques lead to divergence in feature space...