欢迎来到《四川大学学报(医学版)》
臧正卿, 赵永红, 蹇慧, 等. 一种基于主流单倍型的家系分类法以及基于贝叶斯理论的家系Y-STR容差规律研究[J]. 四川大学学报(医学版), 2021, 52(4): 671-678. DOI: 10.12182/20210760107
引用本文: 臧正卿, 赵永红, 蹇慧, 等. 一种基于主流单倍型的家系分类法以及基于贝叶斯理论的家系Y-STR容差规律研究[J]. 四川大学学报(医学版), 2021, 52(4): 671-678. DOI: 10.12182/20210760107
ZANG Zheng-qing, ZHAO Yong-hong, JIAN Hui, et al. Method of Identifying Male Lineages Based on Main Haplotype and Analysis of the Distribution of Y-STR Haplotype Mismatch Based on the Bayesian Theory[J]. Journal of Sichuan University (Medical Sciences), 2021, 52(4): 671-678. DOI: 10.12182/20210760107
Citation: ZANG Zheng-qing, ZHAO Yong-hong, JIAN Hui, et al. Method of Identifying Male Lineages Based on Main Haplotype and Analysis of the Distribution of Y-STR Haplotype Mismatch Based on the Bayesian Theory[J]. Journal of Sichuan University (Medical Sciences), 2021, 52(4): 671-678. DOI: 10.12182/20210760107

一种基于主流单倍型的家系分类法以及基于贝叶斯理论的家系Y-STR容差规律研究

Method of Identifying Male Lineages Based on Main Haplotype and Analysis of the Distribution of Y-STR Haplotype Mismatch Based on the Bayesian Theory

  • 摘要:
      目的  建立在大范围人群中区别不同男性家系的分类方法,研究中国汉族男性家系成员之间Y-STR基因座容差的分布规律,探索不同容差的个体对之间间隔不同减数分裂次数的概率分布情况。
      方法  收集12个中国汉族男性家系269名个体外周血样本与45名无关人员外周血样本,采用Yfiler PlusTM与ZGWZ FSY或Yfiler Platinum试剂盒,获得314个Y-STR单倍型;以重复次数为3次及以上的Y-STR单倍型为主流单倍型,选择众数最大的主流单倍型作为第一类数据中心,按Y-STR分型容差在5个基因座且6个步长以内的标准进行聚类合并,再以剩余数据中众数最大的主流单倍型作为第二类中心,依次聚类;将家系成员和无关个体分别进行两两比对,统计家系成员之间和无关个体之间的容差分布情况,进一步计算各基因座平均容差率,利用贝叶斯公式计算不同容差条件下间隔不同减数分裂次数的概率分布情况。
      结果  269名个体被划分为12个群组,组内个体与12个已知家系成员数据的对应率为100%,45名无关个体呈散点分布;家系成员之间的容差基因座数目分布在0~7个基因座与0~7个步长以内,无关个体之间的差异则至少在11个基因座和15个步长及以上;各家系内部一步容差和两步容差数目最多的基因座均各不相同,具有家系特异性;各基因座最小突变次数、平均容差率均与突变率显著相关;0容差的两个体有19.7%的概率间隔1次减数分裂,有71.2%的概率间隔6次以内;3个一步容差的两个体有65.2%的概率间隔减数分裂次数为10次以上。
      结论  以主流单倍型为聚类中心的聚类方法可以对大规模男性家系样本进行快速有效的区分,以及从中获得的不同容差条件下间隔不同减数分裂次数的概率分布情况,可为今后利用Y-STR数据库在家系调查、数据分析与实战应用中提供研究思路、筛选工具和重要参考依据。

     

    Abstract:
      Objective  To establish a classification method to identify different male lineages in a large population, to study the distribution patterns of Y-STR loci mismatches among Han Chinese male lineage members and to explore the mismatch probability distribution among the members with different meiosis intervals in the family.
      Methods  Peripheral blood samples of 269 male individuals from 12 lineages in Han Chinese population and 45 unrelated male individuals were collected. Then, Yfiler PlusTM and ZGWZ FSY or Yfiler Platinum amplification kits were used, obtaining 314 Y-STR haplotypes. The Y-STR haplotype with 3 or more repetitions were selected as the main haplotype, in which the largest number was selected as the first data center. According to the standard of Y-STR genotype, those with mismatches within five loci and six steps were clustered and merged. Then, the main haplotype of the largest number in the remaining data was taken as the second data center, and cluster analysis is carried out in turn until there is no main haplotype remained. Pair comparison was conducted between lineage members and unrelated individuals, and the mismatch distribution among lineage members and unrelated individuals was calculated respectively. The average mismatch rate of each locus was subsequently calculated, as well as the mismatch probability distribution among members with different meiosis intervals within the lineage.
      Results  269 out of the 314 individuals were divided into 12 groups by cluster analysis method, accomplishing 100% accuracy between the cluster groups thus identified and the 12 known lineages. The remaining 45 unrelated individuals were scattered. The mismatch loci was within 0-7 loci and 0-7 steps among lineage members and the mismatch between unrelated individuals was at least 11 loci and 15 steps. The mismatch loci with the largest number of one-step and two-step mismatch were different in each lineage and had features that were specific to each lineage. The minimum mutation count and average mismatch rate of each locus were significantly correlated with the mutation rate. Two individuals with no mismatch had a 19.7% probability of 1 meiosis interval and a 71.2% probability of less than 6 meiosis interval. Two individuals with 3 loci mismatches had a 65.2% probability of more than 10 meiosis intervals.
      Conclusion  The cluster analysis method based on main haplotypes provided in this paper can quickly and effectively differentiate large male lineage samples. The clustering method and the mismatch probability distribution of different meiosis intervals obtained thus can provide new ideas for research and screening instruments, and important reference for lineage investigation, data analysis and practical application of Y-STR database in the future.

     

/

返回文章
返回