欢迎来到《四川大学学报(医学版)》

大规模人群队列生活行为方式相关的肺癌风险预测模型的构建

Construction of a Risk Prediction Model for Lung Cancer Based on Lifestyle Behaviors in the UK Biobank Large-Scale Population Cohort

  • 摘要:
      目的  发现影响肺癌发病的生活行为相关危险因素,并构建肺癌风险预测模型,识别人群中的高风险个体,帮助肺癌早期筛查。
      方法  本研究数据来源于英国生物样本库(UK Biobank)2006年3月–2010年10月收集的502389名参与者。参考国内外肺癌筛查指南和高质量肺癌危险因素研究文献,确定本研究高危人群识别标准。采用单因素Cox回归分析及逐步回归筛选出肺癌的危险因素,通过Cox比例风险回归构建多因素肺癌风险预测模型,根据比较赤池信息准则以及Schoenfeld残差检验结果,最终选择等比例假设的最优拟合模型。多因素Cox比例风险回归考虑生存时间,将人群按7∶3的比例随机分为训练集和验证集,使用训练集建立模型,并用验证集对模型性能进行内部验证。受试者工作特征曲线(ROC)曲线的曲线下面积(AUC)被用于评估模型的效能。将人群按照发病概率的0%~<25%、25%~<75%、75%~100%分为低风险、中风险及高风险人群,分别计算其中的发病人数占比。
      结果  本研究最终纳入453558人,在累计随访5505402人年期间,共诊断出2330例肺癌。Cox比例风险回归分析筛选出10个自变量建立模型:年龄、体质量指数(body mass index, BMI)、学历、收入、体力活动情况、吸烟状态、饮酒频率、新鲜水果摄入量、癌症家族史、烟草暴露。该模型通过内部验证结果显示8个自变量(除BMI和新鲜水果摄入量外)均是肺癌的影响因素( P<0.05)。该模型训练集预测肺癌发生的一年、五年、十年AUC分别为0.825、0.785、0.777;验证集预测肺癌发生的一年、五年、十年AUC分别为0.857、0.782、0.765。筛查高风险人群可发现68.38%的未来肺癌发病个体。
      结论  本研究建立了大规模人群生活行为方式相关的肺癌风险预测模型,其在判别能力方面表现出良好的性能,为制定肺癌标准化筛查策略提供了工具。

     

    Abstract:
      Objective  To identify the risk factors related to lifestyle behaviors that affect the incidence of lung cancer, to build a lung cancer risk prediction model to identify, in the population, individuals who are at high risk, and to facilitate the early detection of lung cancer.
      Methods  The data used in the study were obtained from the UK Biobank, a database that contains information collected from 502389 participants between March 2006 and October 2010. Based on domestic and international guidelines for lung cancer screening and high-quality research literature on lung cancer risk factors, high-risk population identification criteria were determined. Univariate Cox regression was performed to screen for risk factors of lung cancer and a multifactor lung cancer risk prediction model was constructed using Cox proportional hazards regression. Based on the comparison of Akaike information criterion and Schoenfeld residual test results, the optimal fitted model assuming proportional hazards was selected. The multiple factor Cox proportional hazards regression was performed to consider the survival time and the population was randomly divided into a training set and a validation set by a ratio of 7:3. The model was built using the training set and the performance of the model was internally validated using the validation set. The area under the receiver operating characteristic (ROC) curve (AUC) was used to evaluate the efficacy of the model. The population was categorized into low-risk, moderate-risk, and high-risk groups based on the probability of occurrence of 0% to <25%, 25% to <75%, and 75% to 100%. The respective proportions of affected individuals in each risk group were calculated.
      Results  The study eventually covered 453558 individuals, and out of the cumulative follow-up of 5505402 person-years, a total of 2330 cases of lung cancer were diagnosed. Cox proportional hazards regression was performed to identify 10 independent variables as predictors of lung cancer, including age, body mass index (BMI), education, income, physical activity, smoking status, alcohol consumption frequency, fresh fruit intake, family history of cancer, and tobacco exposure, and a model was established accordingly. Internal validation results showed that 8 independent variables (all the 10 independent variables screened out except for BMI and fresh fruit intake) were significant influencing factors of lung cancer (P<0.05). The AUC of the training set for predicting lung cancer occurrence at one year, five years, and ten years were 0.825, 0.785, and 0.777, respectively. The AUC of the validation set for predicting lung cancer occurrence at one year, five years, and ten years were 0.857, 0.782, and 0.765, respectively. 68.38% of the individuals who might develop lung cancer in the future could be identified by screening the high-risk population.
      Conclusion  We established, in this study, a model for predicting lung cancer risks associated with lifestyle behaviors of a large population. Showing good performance in discriminatory ability, the model can be used as a tool for developing standardized screening strategies for lung cancer.

     

/

返回文章
返回