欢迎来到《四川大学学报(医学版)》

利用掩码自编码器预训练的热红外视频呼吸监测方法

A Method for Respiratory Monitoring in Thermal Infrared Video Using Masked Autoencoder Pretraining

  • 摘要:
    目的 通过热红外视频构建时空图,结合自监督学习方法自动捕获呼吸引发的口鼻区域温度变化趋势,实现长期非接触式呼吸监测,为情绪、应激反应、呼吸系统疾病等早期预警提供技术支撑。
    方法 针对现有方法对热红外温度变化趋势表征能力不足的问题,提出两阶段自监督重建时空图-下游微调方法:首先通过仿射变换矩阵完成普通摄像头与热红外摄像头的空间配准,实现关键点跟踪并构建时空图;考虑到呼吸信号敏感区域的个体差异,先通过自监督学习挖掘深度表征,再对隐向量微调以适配下游监测任务,同时提出进出气量比与呼吸时长平均绝对误差两项新评估指标。
    结果 数据集实验结果表明,该两阶段训练方法可精准捕获呼吸信号趋势,端到端特征学习的波形拟合精度显著优于传统方法及现有流行模型,核心性能指标分别为:平均绝对误差0.07±0.02、均方根误差0.69±0.11、皮尔逊相关系数0.15±0.04、进出气量比0.40±0.12/0.26±0.05、呼吸时长平均绝对误差0.79±0.19/0.79±0.10。
    结论 基于掩码自编码器(masked autoencoder)的自监督预训练波形分析方法在呼吸监测中展现频域与时域双重优势;同时发现呼吸阶段脉搏信号的形态学差异,呼气阶段光电容积脉搏波(photoplethysmography, PPG)信号峰度显著降低、偏度减小。该发现为从PPG形态特征维度评估心肺耦合功能与自主神经调节提供了新视角。

     

    Abstract:
    Objective By constructing spatiotemporal maps from thermal infrared videos and applying self-supervised learning methods, temperature change trends in the oral and nasal regions caused by breathing can be automatically detected, enabling long-term, non-contact breathing monitoring. This offers technical support for early warning of emotions, stress responses, and respiratory system diseases.
    Methods To address the limitations of existing methods in characterizing temperature variation trends in thermal infrared, a two-stage self-supervised reconstruction spatio-temporal graph and downstream fine-tuning method is proposed. First, spatial registration between the ordinary camera and the thermal infrared camera is achieved using an affine transformation matrix, enabling key point tracking and the construction of a spatio-temporal graph. Considering individual differences in the sensitive areas of the respiratory signal, deep representations are initially extracted through self-supervised learning, and the latent vectors are then fine-tuned to adapt to downstream monitoring tasks. Additionally, two new evaluation metrics are introduced: the average absolute error of the ratio of inhaled to exhaled gas volumes and the average absolute error of respiratory duration.
    Results The experimental results on the dataset show that this two-stage training method can accurately capture the trend of respiratory signals. The waveform fitting accuracy of end-to-end feature learning is significantly better than that of traditional methods and current popular models. The core performance indicators are as follows: mean absolute error 0.07 ± 0.02, root mean square error 0.69 ± 0.11, Pearson correlation coefficient 0.15 ± 0.04, ratio of inhalation to exhalation volume 0.40 ± 0.12/0.26 ± 0.05, and mean absolute error of respiratory duration 0.79 ± 0.19/0.79 ± 0.10.
    Conclusion The self-supervised pre-training waveform analysis method based on a masked autoencoder demonstrates advantages in both the frequency and time domains for respiratory monitoring. Additionally, morphological differences in the pulse signal during the respiratory phase were identified. The kurtosis of the photoplethysmography (PPG) signal significantly decreased, and the skewness was reduced during the exhalation phase. This finding offers a new perspective for evaluating cardiopulmonary coupling function and autonomic nerve regulation from the morphological feature dimension of PPG.

     

/

返回文章
返回