ka/ks计算和正选择分析英文写作及方法

0建树
 

Phylogeny of tomato accessions

A total of 1249 single-copy orthologs from 61 genomes of tomato wild distant species, ancestral, cultivated and domesticated tomato and four representative plant genomes including A. trichopodaV. vinifera, A. thaliana, and S. tuberosum—a close relative to tomato in the Solanum genus (Boris et al., 2011), were used in a phylogenetic analysis. Multiple sequence alignments of single-copy orthologous sequences were implemented with MAFFT v7.520 (Rozewicki et al., 2019), and the conserved sequences were retrieved from aligned sequences using GBLOCKS 0.91b, with the parameters: ‘-b4=5 -b5=h -t=p -e=.2’ (Castresana, 2000). The conserved sequences from the same sample were then merged into one sequence using SeqKit v0.3.1.1, with the parameter: ‘-w’ (Shen et al., 2016). IQ-Tree v2.3.1 with the parameters ‘-m MFP -bb 1000 -bnni -T 30’ was used to perform a phylogenetic analysis that included four outgroup genomes and 58 tomato samples (Nguyen et al., 2015).

 

1Ka/Ks

The whole-genome duplicated gene pairs between the tomato and grape genomes were analysed for rates of nonsynonymous (Ka) and synonymous (Ks) substitutions, as well as the Ka/Ks ratio, using the KaKs_Calculator2.0 software (Wang et al., 2010). This analysis independently generated 58 output files for Ka, Ks and Ka/Ks, respectively. The files containing Ka values were consolidated into a single dataset. This dataset was then categorized into core, softcore, dispensable and private pan-gene sets of Ka values based on the classification of tomato genes within the whole-genome gene pairs shared between the tomato and grape genomes. Subsequently, Ks and Ka/Ks analyses were conducted on these categorized sets, adhering to the pipeline used for the Ks analysis.

2.

Positively selected and rapidly evolving genes

A total of 1249 single-copy orthologs from 61 tomato genomes and a phylogenetic tree constructed from 65 genomes were used to identify the positively selected and rapidly evolving genes in the tomato clade. MAFFT v7.520 was used to perform multiple sequence alignments of coding sequences of orthologous gene pairs (Rozewicki et al., 2019) and PAML v4.4b was used to estimate the dN/dS ratios with the multiple coding sequence alignments (Yang, 2007). Firstly, we estimated the dN/dS ratio values using branch models (mode = 2 and NSsite = 0) involving 65 genomes with the following parameters: Codonfreq = 2; kappa = 2.5; initial omega = 0.2. Three hypotheses were examined: (i) H0 hypothesis, where all branches have identical dN/dS ratio values; (ii) H1 hypothesis, where the branch of target tomato groups has a single dN/dS ratio value while the other branches have other identical dN/dS ratio values; and (iii) H2 hypothesis, where all branches have different dN/dS ratio values. An LRT (likelihood-ratio test) was used to select target genes whose likelihood values of H1 were significantly larger (adjusted LRT P-value of <0.01) than those of H0, and likelihood values of H2 were not significantly larger than those of H1. The genes with larger dN/dS ratio values in target tomato groups than in other branches were defined as rapidly evolving [rate (FDR)-corrected P-values (<0.01)]. The branch-site models (model = 2 and NSsite = 2) were employed to identify the genes with positively selected sites in the tomato groups. The parameters of ‘fix_omega’ = 1 and ‘omega’ = 1 were used as a null hypothesis, but ‘fix_omega’ = 0 and ‘omega’ = 1.5 were used for the alternative hypothesis with the inferred phylogenetic tree. We used an FDR-corrected LRT with P-value (adjusted LRT P-value) cutoff <0.01 to identify genes with positively selected sites in target tomato groups.