Objective By constructing spatiotemporal maps from thermal infrared videos and applying self-supervised learning methods, temperature change trends in the oral and nasal regions caused by breathing can be automatically detected, enabling long-term, non-contact breathing monitoring. This offers technical support for early warning of emotions, stress responses, and respiratory system diseases.
Methods To address the limitations of existing methods in characterizing temperature variation trends in thermal infrared, a two-stage self-supervised reconstruction spatio-temporal graph and downstream fine-tuning method is proposed. First, spatial registration between the ordinary camera and the thermal infrared camera is achieved using an affine transformation matrix, enabling key point tracking and the construction of a spatio-temporal graph. Considering individual differences in the sensitive areas of the respiratory signal, deep representations are initially extracted through self-supervised learning, and the latent vectors are then fine-tuned to adapt to downstream monitoring tasks. Additionally, two new evaluation metrics are introduced: the average absolute error of the ratio of inhaled to exhaled gas volumes and the average absolute error of respiratory duration.
Results The experimental results on the dataset show that this two-stage training method can accurately capture the trend of respiratory signals. The waveform fitting accuracy of end-to-end feature learning is significantly better than that of traditional methods and current popular models. The core performance indicators are as follows: mean absolute error 0.07 ± 0.02, root mean square error 0.69 ± 0.11, Pearson correlation coefficient 0.15 ± 0.04, ratio of inhalation to exhalation volume 0.40 ± 0.12/0.26 ± 0.05, and mean absolute error of respiratory duration 0.79 ± 0.19/0.79 ± 0.10.
Conclusion The self-supervised pre-training waveform analysis method based on a masked autoencoder demonstrates advantages in both the frequency and time domains for respiratory monitoring. Additionally, morphological differences in the pulse signal during the respiratory phase were identified. The kurtosis of the photoplethysmography (PPG) signal significantly decreased, and the skewness was reduced during the exhalation phase. This finding offers a new perspective for evaluating cardiopulmonary coupling function and autonomic nerve regulation from the morphological feature dimension of PPG.