本文为 HSR.health 的 Ajay K Gupta、Jean Felipe Teotonio 和 Paul A Churchyard 共同撰写的客座文章。HSR.health 是一家地理空间健康风险分析公司,其愿景是通过人类的智慧以及数据科学的精准聚焦应用,解决全球健康挑战。在本文中,我们介绍了一种方法 […]
This is a guest post co-authored by Ajay K Gupta, Jean Felipe Teotonio and Paul A Churchyard from HSR.health. HSR.health is a geospatial health risk analytics firm whose vision is that global health challenges are solvable through human ingenuity and the focused and accurate application of data analytics. In this post, we present one approach […]
HSR.health是一家专注于地理空间健康风险分析的公司,其愿景是通过人类的创造力以及数据科学的精准与聚焦应用,解决全球健康挑战。在本文中,我们介绍一种用于预防人畜共患病的策略,该策略利用Amazon SageMaker的地理空间功能,构建了一款工具,为卫生科学家提供更准确的疾病传播信息,从而帮助他们更快地拯救更多生命。公共卫生应对区域性疫情传播的主要手段是疾病监测:一个涵盖疾病报告、调查及各级公共卫生系统间数据通信的完整联动体系。这一系统不仅依赖于人力因素,还需技术与资源支持,以收集疾病数据、分析模式,并实现从地方到区域再到中央卫生机构的数据传输的持续性与一致性。COVID-19从局部暴发迅速演变为遍及全球所有大陆的流行病,这一过程应成为警示我们亟需借助创新技术,建立更高效、更精准的疾病监测系统的深刻例证。 人畜共患病溢出风险与多种社会、环境和地理因素密切相关,这些因素影响着人类与野生动物接触的频率。HSR.health的人畜共患病溢出风险指数(Zoonotic Disease Spillover Risk Index)综合运用了超过20个历史上已知会影响人与野生动物互动风险的地理、社会与环境因素,从而评估人畜共患病溢出的可能性。其中许多因素可通过卫星影像与遥感技术相结合的方式进行制图。机器学习(ML)在空间或时间数据的异常检测方面表现出高度有效性,因其能够从数据中自主学习,而无需显式编程来识别特定类型的异常。空间数据涉及物体的物理位置与形态,通常包含复杂模式与关系,传统算法难以有效分析。将机器学习与地理空间数据结合,可系统性增强对异常与异常模式的检测能力,这对于早期预警系统至关重要。此类系统在环境监测、灾害管理与安全等领域具有关键作用。 利用历史地理空间数据进行预测建模,使组织能够识别并为潜在未来事件做好准备。这些事件范围广泛,包括自然灾害、交通中断,以及本文所讨论的疾病暴发。为预测人畜共患病溢出风险,HSR.health采用多模态方法。通过融合环境、生物地理与流行病学等不同类型的数据,该方法能够全面评估疾病动态。这种多维度视角对于制定前瞻性措施、实现对疫情的快速响应至关重要。 HSR.health的工作流程涵盖数据预处理、特征提取以及使用机器学习技术生成信息丰富的可视化结果。这使得数据从原始形态到可操作洞察的演变过程得以清晰呈现。HSR.health采用了多种操作对数据进行预处理并提取相关特征,包括土地覆盖分类、温度变化制图以及植被指数计算。其中一种可用于指示植被健康的植被指数是归一化差异植被指数(Normalized Difference Vegetation Index, NDVI)。NDVI通过测量植被反射的近红外光与吸收的红光来量化植被健康状况。对NDVI随时间的变化进行监测,可揭示植被变化情况,例如森林砍伐等人类活动的影响。 本文所述步骤仅展示了HSR.health为构建风险指数所提取的众多栅格特征之一。在以栅格格式提取相关特征后,HSR.health利用区域统计(zonal statistics)方法,将栅格数据聚合至分配有社会与健康数据的行政边界多边形内。该分析整合了栅格与矢量地理空间数据。此类聚合方式允许在地理数据框(geodataframe)中管理栅格数据,便于与健康及社会数据集成,最终生成风险指数。 为有效评估提取的特征,HSR.health采用机器学习模型预测代表各特征的因素。其中一个使用的模型是支持向量机(Support Vector Machine, SVM)。SVM模型有助于揭示数据中的模式与关联,为风险评估提供依据。该风险指数以加权平均值形式量化风险水平,旨在帮助理解不同地区潜在的溢出事件。 诸如人畜共患病溢出风险指数这类结合机器学习与地理空间数据的解决方案,可协助地方公共卫生部门优先配置资源至高风险区域。通过此举,可建立针对性强且本地化的监测措施,及时发现并遏制区域性疫情在跨境扩散前蔓延。该方法可显著降低疾病暴发的影响,挽救生命。 Janosch Woschitz是AWS的高级解决方案架构师,专注于地理空间人工智能/机器学习领域。拥有超过15年的行业经验,他致力于帮助全球客户利用人工智能与机器学习技术,充分发挥地理空间数据的潜力,推动创新解决方案落地。其专业领域涵盖机器学习、数据工程与可扩展分布式系统,同时具备扎实的软件工程背景,并在自动驾驶等复杂行业领域积累了深厚实践经验。
HSR.health is a geospatial health risk analytics firm whose vision is that global health challenges are solvable through human ingenuity and the focused and accurate application of data analytics. In this post, we present one approach for zoonotic disease prevention that uses Amazon SageMaker geospatial capabilities to create a tool that provides more accurate disease spread information to health scientists to help them save more lives, quicker. The main weapon public health has against the propagation of regional outbreaks is disease surveillance: an entire interlocking system of disease reporting, investigation, and data communication between different levels of a public health system. This system is dependent not only on human factors, but also on technology and resources to collect disease data, analyze patterns, and create a consistent and continuous stream of data transfer from local to regional to central health authorities. The speed at which COVID-19 went from a local outbreak to a global disease present in every single continent should be a sobering example of the dire need to harness innovative technology to create more efficient and accurate disease surveillance systems. The risk of zoonotic disease spillover is sharply correlated with multiple social, environmental, and geographic factors that influence how often human beings interact with wildlife. HSR.health’s Zoonotic Disease Spillover Risk Index uses over 20 distinct geographic, social, and environmental factors historically known to affect the risk of human-wildlife interaction and therefore zoonotic disease spillover risk. Many of these factors can be mapped through a combination of satellite imagery and remote sensing. ML is highly effective for anomaly detection on spatial or temporal data due to its ability to learn from data without being explicitly programmed to identify specific types of anomalies. Spatial data, which relates to the physical position and shape of objects, often contains complex patterns and relationships that may be difficult for traditional algorithms to analyze. Incorporating ML with geospatial data enhances the capability to detect anomalies and unusual patterns systematically, which is essential for early warning systems. These systems are crucial in fields such as environmental monitoring, disaster management, and security. Predictive modeling using historical geospatial data allows organizations to identify and prepare for potential future events. These events range from natural disasters and traffic disruptions to, as this post discusses, disease outbreaks. To predict zoonotic spillover risks, HSR.health has adopted a multimodal approach. By using a blend of data types—including environmental, biogeographical, and epidemiological information—this method enables a comprehensive assessment of disease dynamics. Such a multifaceted perspective is critical for developing proactive measures and enabling a rapid response to outbreaks. HSR.health’s workflow encompasses data preprocessing, feature extraction, and the creation of informative visualizations using ML techniques. This allows for a clear understanding of the data’s evolution from its raw form to actionable insights. HSR.health used several operations to preprocess the data and extract relevant features. This includes operations such as land cover classification, mapping temperature variation, and vegetation indexes. One vegetation index relevant for indicating vegetation health is the Normalized Difference Vegetation Index (NDVI). The NDVI quantifies vegetation health by using near-infrared light, which vegetation reflects, and red light, which vegetation absorbs. Monitoring the NDVI over time can reveal changes in vegetation, such as the impact of human activities like deforestation. The steps outlined in this post demonstrate just one of the many raster-based features that HSR.health has extracted to create the risk index. After extracting the relevant features in raster format, HSR.health used zonal statistics to aggregate the raster data within the administrative boundary polygons to which the social and health data are assigned. The analysis incorporates a combination of raster and vector geospatial data. This kind of aggregation allows for the management of raster data in a geodataframe, which facilitates its integration with the health and social data to produce the final risk index. To evaluate the extracted features effectively, ML models are used to predict factors representing each feature. One of the models used is a support vector machine (SVM). The SVM model assists in revealing patterns and associations within data that inform risk assessments. The index represents a quantitative assessment of risk levels, calculated as a weighted average of these factors, to aid in understanding potential spillover events in various regions. Solutions that use ML and geospatial data, such as the Zoonotic Spillover Risk Index, can assist local public health authorities in prioritizing resource allocation to areas of highest risk. By doing so, they can establish targeted and localized surveillance measures to detect and halt regional outbreaks before they extend beyond borders. This approach can significantly limit the impact of a disease outbreak and save lives. Janosch Woschitz is a Senior Solutions Architect at AWS, specializing in geospatial AI/ML. With over 15 years of experience, he supports customers globally in leveraging AI and ML for innovative solutions that capitalize on geospatial data. His expertise spans machine learning, data engineering, and scalable distributed systems, augmented by a strong background in software engineering and industry expertise in complex domains such as autonomous driving.