基于无监督学习的数字病理切片自动分割方法

秦航宇; 邓杨; 周燕燕; 刘洪红; 李丽; 周琪琪; 梅娟; 步宏; 包骥

doi:10.12182/20210960203

摘要:

目的使用无监督的方式进行图像分割，作为人工标记的一种替代。

方法选取了共100张HE染色和巴氏染色切片的全片数字化图像（whole slide image, WSI）数据作为研究和测试的对象，其中乳腺切片70张，肺切片20张，甲状腺切片10张。为了保证数据的多样性，乳腺的切片包含了正常组织、炎症、肿瘤，肺切片取材主要为下叶新生物（包含了炎症和肿瘤），甲状腺为细针穿刺的细胞（均为良性）。每张图像的最大总倍率（原始倍率）均为400倍，文件格式为ndpi。对每张WSI进行人工的标注，每张WSI的标注区域都大于10个视野，标注后的信息将用于有效性的验证。使用基于超像素与全卷积神经网络的算法构建无监督图像分割技术，对没有标记的WSI的任意感兴趣区域（regions of interest, ROI）进行图像分割。与区域邻接图合并的方法进行比较，以欠分割错误差率、边缘召回率和平均交并结果比判定两种方法的分割效果，并比较两种方法的效率。在执行效率的比较中，测试过程包含了超像素的预处理的时间，去掉了加载深度学习引擎的时间。

结果对WSI任意ROI区域按纹理和颜色对图像实现了无监督的自动分割，乳腺切片、肺切片和甲状腺切片测试的结果差异小，多次测试的结果稳定，但该方法在对炎症和肿瘤的区分中表现一般。其欠分割错误差率、边缘召回率和平均交并结果分别为19.10%、82.06%和45.06%。区域邻接图合并的方法的欠分割错误差率、边缘召回率和平均交并的结果分别为21.52%、78.39%和44.81%。在GPU模式下整个过程平均耗时为0.27 s，在CPU模式下平均耗时为1.30 s，由于区域邻接图合并的方法没有实现GPU模式，在CPU模式下平均耗时为10.5 s。

结论本方法通过简单的人机交互操作得到理想的像素级标注结果，可以有效降低数字病理切片数据标注的成本，比区域邻接图合并的方法在处理图像纹理的方面表现得更好，处理速度更快。

Abstract:

Objective To segment images through an unsupervised method as an alternative to manual labeling.

Methods A total of 100 whole slide image (WSI) data of HE stained and Pap stained slides were selected as the research and test objects, including 70 breast slides, 20 lung slides and 10 thyroid slides. In order to ensure the diversity of data, the breast slides included those of normal tissue, inflammation and tumor, the lung slides were mainly neoplasms in the lower lobe, including those of inflammation and tumor, and the thyroid slides were of cells, all benign, obtained through fine needle aspiration. The maximum total magnification (original magnification) of each image was 400 times, and the file format was NDPI. Each WSI was manually labeled, and the labeled area of each WSI was more than 10 fields of vision. The labeled information was to be used for validity verification. An unsupervised image segmentation technique based on superpixel and fully convolution neural network algorithms was constructed and used to segment any region of interest (ROI) of unlabeled WSI. In comparison with the region adjacency graph merging method, the segmentation effect of the two methods was assessed with the under segmentation error, the boundary recall and the mean Intersection-over-Union, and the efficiency of the two methods was also compared. In the comparison of execution efficiency, the test process included the preprocessing time of superpixel, and excluded the time of loading the deep learning engine.

Results Unsupervised automatic segmentation was implemented for any ROI region of WSI according to the texture and color. The results of the breast slides, lung slides and thyroid slides showed slight differences, and multiple tests yielded stable results. However, the performance of this method in differentiating inflammation and tumor was average. The under-segmentation error, the boundary recall and the mean Intersection-over-Union were 19.10%, 82.06% and 45.06%, respectively. The under segmentation error, the boundary recall and the mean Intersection-over-Union for the region adjacency graph merging method were 21.52%, 78.39% and 44.81%, respectively. The average time consumption of the whole process was 0.27 s in GPU mode and 1.30 s in CPU mode. The average time consumption of the region adjacency graph merging method was 10.5 s in CPU mode because the method of region adjacency graph merging was not realized in the GPU mode.

Conclusion This method produced ideal pixel level labeling results through simple human-computer interaction, which could effectively reduce the cost of digital pathology slide data labeling. Compared with the region adjacency graph merging method, this method had better performance in processing image texture and had faster processing speed.

基于无监督学习的数字病理切片自动分割方法

Automatic Segmentation of Digital Pathology Slides Based on Unsupervised Learning