泛基因组分析方法-Pan-Genome analysis-相关介绍

#泛基因组与传统全基因组信息的区别

泛基因组和传统全基因组信息在范围和用途上有显著差异。传统全基因组信息是通过测序、组装和注释得到的,通常基于单一个体或参考株的基因组,包含genome(完整DNA序列)、CDS(编码序列)、protein(蛋白质序列)和GFF3(基因注释文件)等。这些文件描述了该个体的基因组特征,适合研究单一个体的基因功能和结构。

相比之下,泛基因组是物种所有个体的基因或DNA序列的集合,包括核心基因组(所有个体共有的基因,通常负责基本代谢功能)和可变基因组(仅存在于部分个体的基因,可能与性状差异如抗病性相关)。因此,泛基因组更全面地反映了物种内的遗传多样性和变异,适合研究进化、遗传多样性和群体差异。

例如,传统全基因组信息可能遗漏某些个体特有的基因,而泛基因组则试图包含所有这些变异,特别是在细菌如大肠杆菌或植物如水稻的研究中。

#可以下载物种的泛基因组信息

是的,可以下载某些物种的泛基因组信息,特别是在经过广泛基因组研究的物种中。

  • 对于细菌,如大肠杆菌,可以通过数据库如ProPan下载泛基因组数据,ProPan包含1504个物种的数据。
  • 对于植物,如水稻,RPAN数据库提供了水稻的泛基因组信息。
  • 对于人类,人类泛基因组参考联盟(Human Pangenome Reference Consortium)正在开发代表全球人类基因组变异的泛基因组参考

#Pan-genome analysis

    Pan-genome analysis was conducted using protein sequences from 81 Solanaceae species. Gene families were inferred using OrthoFinder (20). Following established methodologies (43,44), pan-genome analysis classified gene clusters into core, soft-core, dispensable and specific clusters. Core clusters were defined as those present across all 81 genomes, while soft-core clusters were those present in 79–80 species. Dispensable clusters were defined as present in 2–78 species, and specific clusters as unique to a single species. The number of protein-coding genes in both the pan-genome and core genome was estimated using PanGP (v1.0.1) with a random sampling algorithm, selecting a sample size of 2000 and conducting 80 replicates. Pairwise genome alignments between potato and tomato were conducted using the nucmer program from the MUMmer package (v4.0.0beta2) (45). Structural variant (SV) detection was performed using SVMU (v0.4-alpha) (46) to generate copy number variants (CNVs), insertions and deletions. For SV calling, minimap2 (v2.21-r1071) (47) was used to generate paired genome alignments, which were then processed by SyRI (v1.2) (48) to identify and classify SVs, including insertions, deletions, single nucleotide polymorphisms (SNPs), inversions and translocations.