Abstract:
Objective To establish a classification method to identify different male lineages in a large population, to study the distribution patterns of Y-STR loci mismatches among Han Chinese male lineage members and to explore the mismatch probability distribution among the members with different meiosis intervals in the family.
Methods Peripheral blood samples of 269 male individuals from 12 lineages in Han Chinese population and 45 unrelated male individuals were collected. Then, Yfiler PlusTM and ZGWZ FSY or Yfiler Platinum amplification kits were used, obtaining 314 Y-STR haplotypes. The Y-STR haplotype with 3 or more repetitions were selected as the main haplotype, in which the largest number was selected as the first data center. According to the standard of Y-STR genotype, those with mismatches within five loci and six steps were clustered and merged. Then, the main haplotype of the largest number in the remaining data was taken as the second data center, and cluster analysis is carried out in turn until there is no main haplotype remained. Pair comparison was conducted between lineage members and unrelated individuals, and the mismatch distribution among lineage members and unrelated individuals was calculated respectively. The average mismatch rate of each locus was subsequently calculated, as well as the mismatch probability distribution among members with different meiosis intervals within the lineage.
Results 269 out of the 314 individuals were divided into 12 groups by cluster analysis method, accomplishing 100% accuracy between the cluster groups thus identified and the 12 known lineages. The remaining 45 unrelated individuals were scattered. The mismatch loci was within 0-7 loci and 0-7 steps among lineage members and the mismatch between unrelated individuals was at least 11 loci and 15 steps. The mismatch loci with the largest number of one-step and two-step mismatch were different in each lineage and had features that were specific to each lineage. The minimum mutation count and average mismatch rate of each locus were significantly correlated with the mutation rate. Two individuals with no mismatch had a 19.7% probability of 1 meiosis interval and a 71.2% probability of less than 6 meiosis interval. Two individuals with 3 loci mismatches had a 65.2% probability of more than 10 meiosis intervals.
Conclusion The cluster analysis method based on main haplotypes provided in this paper can quickly and effectively differentiate large male lineage samples. The clustering method and the mismatch probability distribution of different meiosis intervals obtained thus can provide new ideas for research and screening instruments, and important reference for lineage investigation, data analysis and practical application of Y-STR database in the future.