欢迎来到《四川大学学报(医学版)》

生物信息学和机器学习筛选银屑病中医证型特征基因

Screening for Characteristic Genes of Different Traditional Chinese Medicine Syndromes of Psoriasis Vulgaris: A Study Based on Bioinformatics and Machine Learning

  • 摘要:
    目的 通过生物信息学和机器学习筛选寻常型银屑病(psoriasis vulgaris, PV)血热证(blood-heat syndrome, BHS)、血瘀证(blood stasis syndrome, BSS)及血燥证(blood-dryness syndrome, BDS)的重要特征基因,为不同中医证型PV的临床诊疗提供科学依据。
    方法 从基因表达数据库(Gene Expression Omnibus, GEO)下载GSE192867数据集,利用limma包筛选患者与健康人群的PV、BHS、BSS及BDS差异表达基因(differentially expressed genes, DEGs),并进行KEGG(Kyoto Encyclopedia of Genes and Genomes)通路富集分析。将PV、BHS、BSS及BDS筛选出的DEGs分别取交集,获取不同的特征基因。利用支持向量机(support vector machine, SVM)和随机森林(random forest, RF)两种算法中效能最优的方法对特征基因进行分析,将排序前5的基因作为重要特征基因,并利用pROC包绘制重要特征基因的受试者工作特征(receiver operating characteristic curve, ROC)曲线,计算曲线下面积(area under the curve, AUC),评价其诊断效能。
    结果 筛选出与PV、BHS、BSS以及BDS相关的DEGs数量分别为7699个、7291个、7654个和6578个。KEGG富集分析主要集中在Janus激酶(Janus kinase,JAK)/信号转导与转录激活因子(signal transducer and activator of transcription, STAT)、环磷酸腺苷(cyclic adenosine monophosphate, cAMP)、丝裂原活化蛋白激酶(mitogen-activated protein kinase, MAPK),以及细胞凋亡等通路。通过机器学习共筛选出13个重要特征基因,其中凝集素(malectin, MLEC)、TUB样蛋白3(TUB like protein 3, TULP3)、含SET域9(SET domain containing 9, SETD9)、核膜整合膜蛋白2(nuclear envelope integral membrane protein 2, NEMP2)和BTG抗增殖因子3(BTG anti-proliferation factor 3, BTG3)是BHS的重要特征基因,双特异性磷酸酶15(dual specificity phosphatase 15, DUSP15)、C1q与肿瘤坏死因子相关蛋白7(C1q and tumor necrosis factor related protein 7, C1QTNF7)、溶质载体家族12成员5(solute carrier family 12 member 5, SLC12A5)、含三联基元63(tripartite motif containing 63, TRIM63)和泛素相关蛋白样因子1(ubiquitin associated protein 1 like, UBAP1L)是BSS的重要特征基因,重组小鼠蛋白(recombinant mouse protein, RRNAD1)、GTP酶激活蛋白ASAP3蛋白(ASAP3 protein, ASAP3 )和人肌间蛋白2(human myomesin 2, MYOM2)是BDS的重要特征基因,且PV不同证型的特征基因ROC曲线均表现出较高诊断效能。
    结论 PV不同证型的特征基因存在显著差异,其可能成为潜在的诊断PV中医证型的生物标志物。

     

    Abstract:
    Objective To screen for the key characteristic genes of the psoriasis vulgaris (PV) patients with different Traditional Chinese Medicine (TCM) syndromes, including blood-heat syndrome (BHS), blood stasis syndrome (BSS), and blood-dryness syndrome (BDS), through bioinformatics and machine learning and to provide a scientific basis for the clinical diagnosis and treatment of PV of different TCM syndrome types.
    Methods The GSE192867 dataset was downloaded from Gene Expression Omnibus (GEO). The limma package was used to screen for the differentially expressed genes (DEGs) of PV, BHS, BSS, and BDS in PV patients and healthy populations. In addition, KEGG (Kyoto Encyclopedia of Genes and Genes) pathway enrichment analysis was performed. The DEGs associated with PV, BHS, BSS, and BDS were identified in the screening and were intersected separately to obtain differentially characterized genes. Out of two algorithms, the support vector machine (SVM) and random forest (RF), the one that produced the optimal performance was used to analyze the characteristic genes and the top 5 genes were identified as the key characteristic genes. The receiver operating characteristic (ROC) curves of the key characteristic genes were plotted by using the pROC package, the area under curve (AUC) was calculated, and the diagnostic performance was evaluated, accordingly.
    Results The numbers of DEGs associated with PV, BHS, BSS, and BDS were 7699, 7291, 7654, and 6578, respectively. KEGG enrichment analysis was focused on Janus kinase (JAK)/signal transducer and activator of transcription (STAT), cyclic adenosine monophosphate (cAMP), mitogen-activated protein kinase (MAPK), apoptosis, and other pathways. A total of 13 key characteristic genes were identified in the screening by machine learning. Among the 13 key characteristic genes, malectin (MLEC), TUB like protein 3 (TULP3), SET domain containing 9 (SETD9), nuclear envelope integral membrane protein 2 (NEMP2), and BTG anti-proliferation factor 3 (BTG3) were the key characteristic genes of BHS; phosphatase 15 (DUSP15), C1q and tumor necrosis factor related protein 7 (C1QTNF7), solute carrier family 12 member 5 (SLC12A5), tripartite motif containing 63 (TRIM63), and ubiquitin associated protein 1 like (UBAP1L) were the key characteristic genes of BSS; recombinant mouse protein (RRNAD1), GTPase-activating protein ASAP3 Protein (ASAP3), and human myomesin 2 (MYOM2) were the key characteristic genes of BDS. Moreover, all of them showed high diagnostic efficacy.
    Conclusion There are significant differences in the characteristic genes of different PV syndromes and they may be potential biomarkers for diagnosing TCM syndromes of PV.

     

/

返回文章
返回