去冗余

We agree that the relatively high number of predicted proteins in A. camansi (83,061) likely reflects redundancy introduced by the highly fragmented genome assembly (Scaffold N50 = 2.43 kb), which can lead to fragmented gene models and multiple partial protein predictions.

In the ArGD database, all predicted protein sequences were retained in the global annotation modules to preserve the completeness of the original gene models and ensure traceability to the source genome assemblies, which is a common practice for genome resource databases.

For downstream analyses that are sensitive to sequence redundancy, such as metabolic pathway reconstruction and gene family mining, MMseqs2 was applied to reduce redundancy and generate representative protein sets, thereby minimizing potential biases introduced by fragmented or highly similar sequences. We have now clarified this workflow in the revised manuscript.

We also explicitly indicate the fragmented nature of the A. camansi assembly in the database and caution users that analyses relying on long-range genomic structure should be interpreted carefully.