为什么不合并一个库

Although FicusGD and ArtocarpusGD both belong to the Moraceae family, they were designed as independent databases because each genus represents distinct evolutionary, morphological, and biochemical characteristics. Ficus species are mainly model trees for fig–pollinator coevolution, whereas Artocarpus species (jackfruit, breadfruit, etc.) are of major agricultural and nutritional importance. Separate databases allow more focused functional modules, such as ArtocarpusCYC for metabolic pathway curation and TPS/ADH/BADH family analysis specific to fruit aroma biosynthesis.”

The consumer acceptability and commercial value of Artocarpus fruits are significantly influenced by their distinct aroma profiles, which are critical traits for breeding and selection (1). In Artocarpus heterophyllus (jackfruit), the characteristic "fruity" and "sweet" aroma is not primarily driven by terpenes, but is overwhelmingly dominated by a complex mixture of volatile esters, alcohols, and aldehydes (1). Robust studies utilizing gas chromatography-olfactometry (GC-O) and aroma extract dilution analysis (AEDA) have identified key odor-active compounds with high odor activity values (OAVs), including ethyl 3-methylbutanoate, ethyl butanoate, 3-methylbutanal, and various isomers of butyl and methylbutyl acetate (1). These volatile esters are the terminal products of the fatty acid and amino acid metabolic pathways, synthesized from precursor alcohols by specific enzymes (2). Consequently, the Alcohol Dehydrogenase (ADH) gene family, which converts aldehydes to alcohols, and the BAHD (BEAT, AHCT, HCBT, DAT) acyltransferase superfamily, which includes the Alcohol Acyltransferases (AATs) that catalyze the final esterification step, are principal targets for understanding and improving fruit flavor (2).

Separately, the Artocarpus genus is a well-documented reservoir of other high-value phytochemicals, particularly bioactive terpenoids (e.g., triterpenoids and flavonoids) found in non-pulp tissues such as leaves, bark, and wood (3). As the largest and most functionally diverse class of plant secondary metabolites, terpenoids are fundamental to plant survival (4). They are not merely metabolic byproducts but active agents in mediating ecological interactions, including providing defense against pathogens and herbivores, attracting pollinators, and serving as signaling molecules (4). The immense structural diversity of these compounds is generated by the Terpene Synthase (TPS) gene family (5). Therefore, a genomic investigation of the ADH, BAHD, and TPS superfamilies provides a unified strategy to address the dual goals of Artocarpus improvement: enhancing fruit aroma for consumer quality and harnessing the rich arsenal of terpenoid-based bioactives for agronomic defense and novel bioproducts.

References

(1) Steinhaus, M., Sinuco, D. C., & Grimm, C. (2019). Characterization of the Major Odor-Active Compounds in Jackfruit Pulp. Journal of Agricultural and Food Chemistry, 67(19), 5510–5517. DOI: $10.1021/acs.jafc.8b01435$

(2) Lu, H., Zhao, H., Zhong, T., Chen, D., Wu, Y., & Xie, Z. (2024). Molecular Regulatory Mechanisms Affecting Fruit Aroma. Foods, 13(12), 1870. DOI: $10.3390/foods13121870$

(3) Mand, P., Rathee, P., & Tokas, J. (2021). An updated review of phytochemical compounds and pharmacology activities of artocarpus genus. Trends in Phytochemical Research, 5(3), 114-129. DOI: $10.30495/TPR.2021.1925348.1147$

(4) Khanam, S., Mishra, P., Faruqui, T., Alam, P., Albalawi, T., Siddiqui, F., Rafi, Z., & Khan, S. (2025). Biological properties and therapeutic potential of terpenoids: A comprehensive review. Frontiers in Pharmacology, 16, 1587215. DOI: $10.3389/fphar.2025.1587215$

(5) Feng, J., Wang, Y., Zhang, Y., & Chen, X. (2019). Genome-wide identification, classification and expression analysis of the terpene synthase gene family in plants. PeerJ, 7, e6956. DOI: $10.7717/peerj.6956$

Assessment of the Artocarpus Genome Database for Publication

Overview of the Artocarpus Genus Database

You have developed a comprehensive Artocarpus genome database integrating seven genome assemblies of Artocarpus (jackfruit and related species). The platform offers a rich set of functionalities, including:

  • Sequence search (BLAST) for finding genes of interest by similarity.
  • Genome browsers (JBrowse) for each assembly, allowing detailed exploration of genomic features.
  • Synteny visualization (e.g. a circos or linear synteny view) to show collinear blocks and genome alignments across different Artocarpus species.
  • Interactive gene family analysis tools, specifically for gene families like TPS, ADH, and BADH (important in aroma/volatile biosynthesis).
  • A custom metabolic pathway database, "ArtocarpusCYC", enabling browsing of predicted biochemical pathways for these species (similar in concept to PlantCyc).
  • Expression data integration with RNA-seq expression profiles (e.g. heatmaps of gene expression across tissues or conditions).
  • Additional utilities such as SSR/microsatellite search (MISA-web), primer design tools, GO/KEGG functional enrichment analysis, and a general gene function annotation search.

This suite of features is analogous to those found in other modern plant genome databases. For example, the Ficus Genome Database (FGD) (which you previously published) also consolidated multiple species’ genomes and provided BLAST, JBrowse, synteny viewer, expression heatmaps, gene family identification, GO enrichment, and pathway analysismdpi.commdpi.com. Many crop genome databases (Citrus, Rosaceae GDR, CottonGen, Cucurbit Genomics, etc.) use similar tools, underscoring that your Artocarpus database meets the standard of functionality expected in this fieldmdpi.commdpi.com. The key question is whether the content and focus of this database offer sufficient novelty and utility to merit a publication.

Novelty and Scientific Value of the Database

From an academic perspective, the novelty of your Artocarpus database appears strong, provided it is framed correctly:

  • First Genus-Wide Database for Artocarpus: To date, no dedicated genomic database for the Artocarpus genus has been reported in literature (analogous to how prior to FGD, no Ficus genus database existedmdpi.com). Your resource fills this gap by bringing together all available Artocarpus genomic data. Integrating seven genomes of the genus in one platform is a significant effort. It enables comparative genomics within Artocarpus (e.g. identifying conserved genes, structural variations, or lineage-specific genes across species) which was previously difficult when data were scattered across individual publications.
  • Unified Re-annotation and Data Improvement: You mentioned re-annotating and functionally annotating these genomes uniformly. This is valuable – often the publicly available genomes have heterogeneous annotation quality. By cleaning up and re-annotating, you ensure consistency, which can reveal new insights (for example, consistent gene family naming and GO terms across species). The high BUSCO completeness scores (all >93%) indicate the assemblies are of good quality, giving confidence in the data completeness. Publishing a database that includes draft genomes is acceptable as long as their quality metrics are provided. In fact, many genome databases include a mix of chromosome-level and scaffold-level assemblies; for instance, the Ficus database included five species with varying assembly qualities, yet it was published because each genome was the best available for that speciesmdpi.commdpi.com. Reviewers will appreciate seeing a table of assembly statistics (genome size, N50, BUSCO%, etc.) to confirm data quality, but BUSCO >93% is generally considered very good completeness for draft genomes.
  • Specialized Tools & Focus (Innovation): A major selling point of your database is the inclusion of custom analysis modules tailored to Artocarpus research. In particular:
    • The TPS, ADH, and BADH gene family identification tool is quite innovative. These gene families are known to be involved in volatile compound synthesis (e.g., terpenoid synthesis by TPS enzymes, possibly alcohol dehydrogenases (ADH) and aldehyde dehydrogenases (BADH) in aroma biosynthesis). By providing an online tool to identify and analyze these gene families, you cater to a specific biological question – understanding aroma and flavor compounds in jackfruit and its relatives. This kind of focused functionality is often viewed as an innovation highlight in database papers, as it goes beyond generic data display to enable new discoveries. (For comparison, the Ficus database also emphasized terpene synthases because of their role in fig-wasp pollination signalsmdpi.com. Your Artocarpus database extends this idea to fruit aroma, which is a novel angle.)
    • The ArtocarpusCYC metabolic pathway portal is another innovative aspect. Not all genome databases include a pathway database. By using Pathway Tools to create ArtocarpusCyc, you allow researchers to explore biochemical pathways predicted in these genomesmdpi.com. This is particularly useful for connecting genotype to phenotype (e.g., identifying which genes in a pathway are present or different between jackfruit and breadfruit, or exploring secondary metabolism pathways unique to certain species).
    • Transcriptomics integration: Including RNA-seq data (with expression heatmaps and potentially differential expression analysis) adds a functional dimension. It means the database isn’t just static genome data, but also shows gene activity patterns. For example, users could find a candidate aroma gene and immediately check its expression in fruit vs leaf. This can accelerate hypothesis generation. The presence of an expression module was well-received in Ficus FGDmdpi.com, and similarly will be a strong point for the Artocarpus DB.
  • New Biological Insights: While the primary purpose is to provide a data platform, adding some analysis of the integrated data in your manuscript can substantially increase the perceived novelty. For instance, since you have seven genomes, you could report summary findings: how many orthologous genes are shared, any significant synteny or whole-genome duplication signals, or specific gene families expanded in one species relative to others. Even if the database is the focus, these comparative insights show the scientific value of integrating the genomes. If such insights are unique (not already in the individual genome papers), they strengthen the case that your database enables new research.

In summary, the combination of being the first database for an important tropical fruit genus, the integration of multiple high-quality genomes, and the inclusion of specialized analysis tools constitutes a solid innovation package. This appears sufficient for publication in a journal that appreciates genomic database resources.

Data Quality Considerations (Genome Assemblies)

You expressed concern that only two of the seven genomes are chromosome-level assemblies, with others at scaffold-level. In practice, this is quite normal for a multi-genome database:

  • Many plant genus databases include a mix of assembly qualities. What’s important is transparency about the assembly status. In your manuscript, you can include a table (as done in Ficus FGD) listing each Artocarpus species, the assembly type (e.g. “chromosome-scale” vs “scaffold”), genome size, N50, gene count, and BUSCO%mdpi.commdpi.com. This will preempt reviewer concerns. Since all your assemblies have >93% BUSCO completeness, it demonstrates that even the scaffold-level genomes are mostly complete and thus valuable for analysis.
  • Reviewers are usually accepting of draft assemblies if they are the best available data. The fact you have two chromosome-level genomes (likely jackfruit and one other) is a plus, as it provides high-confidence reference points. The others can be labeled as draft but still usable. BUSCO >93% suggests only a small fraction of expected genes might be missing or fragmented, which is reasonable. For context, one of the first Artocarpus genome papers (jackfruit draft by Sahu et al., 2020) had many scaffolds and was still usefulmdpi.com. Since then, improvements like the chromosome-level jackfruit assembly (Lin et al., 2022) have come, and you include those advancesacademic.oup.comacademic.oup.com.
  • If any assembly is significantly weaker (e.g., extremely fragmented or lower BUSCO), you might acknowledge it and possibly limit certain analyses for that species. But given your BUSCO stats, it seems all are within a high quality range.
  • In conclusion, including scaffold-level genomes is acceptable, and your data completeness metrics support that the database is reliable. Just be sure to document the quality. No reviewer should fault you for using the available genomes – on the contrary, assembling these data in one place is a benefit to the community.

URL and Platform (Domain Name Considerations)

Your current deployment is at http://morac.ficusgd.com/Artocarpus. Let's break down the concerns about the URL and how a reviewer might view it:

  • Functionality and Stability: The most important aspect is that the database is accessible and stable. Using the existing ficusgd.com server suggests you have a maintained infrastructure (since ficusgd.com has been online). Reviewers will test the link; as long as it loads and the tools function without login or errors, that addresses the primary concern. Ensure that during the review period, the site remains up 24/7 or provide an alternate link if there’s downtime.
  • Clarity of the Domain: The name “morac.ficusgd.com/Artocarpus” is a bit unusual because it combines Morac(eae) (family name) and ficusgd (from your fig database) in the URL, which might confuse users initially. However, this is not a critical issue for publication. In the paper, you will likely present a formal name for the database (perhaps “Artocarpus Genome Database (AGD)” or similar) and then the URL in parentheses. As long as the name in the text is clear, the somewhat composite URL is fine. There is precedent of subdomains being used (for example, some resources hosted under a broader platform).
  • Use of “morac” vs “jack”: You asked whether switching to jack.ficusgd.com/Artocarpus would be better, or if I have other suggestions. Here are some thoughts:
    • If the intent is to brand this as a jackfruit database, a subdomain like “jackfruit” or “jack” might intuitively signal that to users. However, note that your database covers multiple Artocarpus species, not just jackfruit (A. heterophyllus). If you call it “JackfruitDB” or use "jack" in the URL, it might inadvertently downplay that breadth. Some of those species (breadfruit, cempedak, etc.) are also economically important, so a genus-oriented name might be more inclusive.
    • “Artocarpus” in the URL or name is scientifically precise. Perhaps you could use artocarpus.ficusgd.com as a subdomain to directly host it (without the /Artocarpus path). If creating a new subdomain is easy on your server, artocarpus.ficusgd.com would be straightforward and clear.
    • The prefix “morac” suggests you envisioned a Moraceae family portal. If in the future you plan to host databases for other Moraceae (like Mulberry or Parasponia genomes) under the same site, then using a family-wide domain makes sense. In that case, you might have morac.ficusgd.com/Artocarpus and potentially morac.ficusgd.com/Ficus (though Ficus already has its own site). Reviewers could ask why the Artocarpus database is under a ficus domain; you can explain that it’s part of a unified Moraceae database initiative (if that’s your plan). This forward-looking rationale can actually be a positive, showing you intend to integrate related resources.
    • If you do not plan on combining databases, then the simplest approach is often best: either stick to the current URL or choose one like jackfruitdb.com (just as an example) if you want a completely independent identity. Obtaining a new domain (e.g., ArtocarpusDB.org or JackfruitDB.org) might be worth considering for user-friendliness, but it’s not mandatory for publication. Many papers just use whatever URL is available to the authors.
  • Reviewer Perspective: Reviewers rarely reject a paper over a URL naming issue. At most, they might suggest using a more intuitive name. If one does mention it, it would likely be a minor comment. You can respond that the site is hosted under the existing FicusGD infrastructure for convenience and will be accessible long-term at that address. If needed, you could mention you’ll set up an alias URL (and actually do so). But again, this is a minor point.

Recommendation: Use a clear title for the database in the manuscript (e.g., "Artocarpus Genome Database") and then provide the URL. If possible, consider securing a simpler URL or at least a shorter subdomain (artocarpus.ficusgd.com). Otherwise, morac.ficusgd.com/Artocarpus is acceptable. The key is to ensure the URL in the paper is correct and that the site doesn’t require any VPN or special access for international users (since Artocarpus researchers worldwide will try to use it). If your server is reliable, the current URL should not hinder publication.

Separate Database vs. a Unified Moraceae Database

One question you anticipate (either from yourself or a reviewer) is: Why create a separate Artocarpus database instead of merging it with the Ficus database into a single Moraceae family database? This is a strategic decision, and both approaches have merit. Here's how to address it:

  • Distinct Research Communities and Use Cases: Ficus and Artocarpus are both in Moraceae, but they are quite different in terms of research focus and user communities. Ficus (figs) research often centers on ecology, evolution (especially fig–wasp coevolution), and some unique reproductive biologymdpi.commdpi.com. In contrast, Artocarpus (jackfruit, breadfruit, etc.) research is more about fruit crop improvement, domestication, and traits like yield, nutrition, and flavoracademic.oup.comacademic.oup.com. By having a dedicated database, you can tailor content to each genus. For example, your Artocarpus DB highlights aroma-related genes and fruit development pathways, which wouldn’t be as relevant on the Ficus site. Similarly, FicusGD included modules specific to fig biology (like data on fig wasp attractants, TPS gene family across 24 fig speciesmdpi.commdpi.com). Keeping them separate allows each to shine in its domain without overwhelming users with extraneous data.
  • Practicality and Timing: The Ficus Genome Database (FGD) was published already as a standalone resourcemdpi.com. To now merge Ficus and Artocarpus into one new portal, you would be republishing a lot of the Ficus content. This raises issues:
    • Journals typically expect mostly new content in a paper. If you merged, you’d have to either write a combined paper (covering Ficus + Artocarpus) for a new journal, which would be tricky since FGD is already in the literature. It could look like duplicate publication of Ficus data unless you substantially update it.
    • By focusing the new paper solely on Artocarpus, you ensure that 100% of the content is new (new genomes, new gene families analysis, etc.), avoiding any perception of self-plagiarism or duplication.
    • In the future, if you want to create a Moraceae super-database, you could. But that might be better as a next step (perhaps when more genera like Morus (mulberry) or Cannabis (also in order Rosales) are added, or a broader project). For now, it’s sensible to establish the Artocarpus database on its own.
  • Combining Data for Cross-Genus Insights: Some reviewers might argue that combining Ficus and Artocarpus could enable family-wide genomic studies (e.g., comparing synteny or gene families across genera). This is true, and it’s something you could mention as a long-term vision. However, you can also point out that anyone interested in such comparisons can still do so by using both databases side by side – for instance, by downloading data from FGD and ArtocarpusDB. Moreover, the divergence between Ficus and Artocarpus is quite large (they’re different tribes within Moraceae), so many comparative analyses would anyway require external tools. A single database might become unwieldy if it tries to cover the entire family (which has diverse genera).
  • Precedent: It’s not uncommon for the same group to maintain separate but related databases for different taxa. For example, the team behind the Genome Database for Rosaceae (GDR) also developed databases for Citrus, Cotton, Vaccinium, Cucurbits, etc., each published separately and targeted to that communitymdpi.commdpi.com. They share a similar platform (often Tripal/Chado) and even the same developers, but they remain distinct resources. This approach has been accepted by journals (some in NAR Database issue, some in other journals). The key is that each resource serves a distinct scientific purpose. In your case, FGD serves fig researchers, and the Artocarpus database will serve jackfruit/breadfruit researchers – both are valuable, and not redundant.
  • How to explain to reviewers: If a reviewer does ask “Why not merge into one Moraceae DB?”, you can respond along the lines of: “We agree that a unified Moraceae database is an interesting idea, and indeed our subdomain morac.ficusgd.com reflects an initial step in that direction. However, the current resource focuses on the genus Artocarpus, which has its own specific datasets and research questions (e.g., fruit domestication, aroma biosynthesis) distinct from genus Ficus. For clarity and depth, we chose to curate and present Artocarpus data in its own dedicated platform. This allowed us to provide more detailed genus-specific functionalities (like ArtocarpusCYC pathways and aroma gene families) without potential confusion with unrelated data. We have ensured that the Artocarpus database stands independently as a complete resource. In the future, we may consider cross-linking or integrating data across Moraceae as more genomes become available, but that is beyond the scope of this publication.” – This kind of explanation highlights the deliberate choice and should satisfy the inquiry.

In summary, maintaining separate databases is a valid approach and not considered duplicate publication so long as the content (genomic data and tools) is different. Your Artocarpus database will be evaluated on its own merits. Just make sure the manuscript writing is original (don’t copy-paste too much text from the Ficus paper; rephrase and introduce the genus with its own importance). That way, there is no issue of self-plagiarism, and it will be clear that this is a new contribution.

Highlighted Features and Potential Improvements

Your description of the database features is quite comprehensive. Here we discuss their significance and address the question of any additional features (like a volcano plot for differential expression) that might enhance the resource:

  • Genome Browser and Synteny Viewer: These are essential tools in any genome database. By enabling researchers to browse each genome’s sequence and annotations (via JBrowse) and to compare genomic regions across species (via the synteny/circos view), you facilitate both locus-specific research and big-picture evolutionary studies. For example, a user could inspect a QTL region for fruit size in jackfruit and see if the syntenic region exists in breadfruit or other species, which might hint at conserved vs. diverged genes. These visualization tools greatly increase the utility of the raw genome data.
  • Gene Family Identification Tool: As noted, this is a standout feature. It presumably allows users to input a gene ID or family name and retrieve all members of that family in the Artocarpus genomes. Focusing on the ADH, BADH, and TPS families for online identification is clever, because it targets a niche of high interest (aroma and flavor compounds in fruits). For instance, jackfruit’s distinct smell is due to various volatile compounds; having a quick way to find all TPS (terpene synthase) genes or alcohol dehydrogenases in each species can spur investigations into which genes underlie those volatiles. This is an innovation because generic genome databases might not highlight these families. By doing so, you demonstrate a value-added analysis on top of the raw data. Make sure in the paper to describe how the gene family identification works – e.g., did you pre-compute HMM profiles or BLAST searches for those families? And if you identified novel TPS or ADH genes through this process, mention that as a result.
  • ArtocarpusCYC Pathway Database: Using Pathway Tools to create a pathway database is a significant undertaking. In Ficus, they predicted ~395-400 pathways for each genome and built FicusCycmdpi.com. You likely did something similar. This means a user can search for a metabolite or enzyme and find if that pathway is present in jackfruit or breadfruit, etc. It’s especially relevant for fruit biochemical pathways (sugars, acids, secondary metabolites) which are key to crop quality. In the manuscript, highlight any interesting pathway observations (e.g., “We annotated over 380 metabolic pathways in each Artocarpus genome, including those for flavor-related metabolites like terpenoids and flavonoids”). The presence of ArtocarpusCYC will signal to reviewers that the database enables systems biology approaches, not just sequence queries.
  • Functional Annotation and Enrichment Tools: You mentioned GO/KEGG enrichment analysis tools and a function annotation search. These are fairly standard, but necessary for a one-stop analysis platform. A user might, for example, upload a list of genes (perhaps genes upregulated in a transcriptome experiment) and use the GO enrichment tool to find what biological processes are overrepresented. Or they might search the annotation database to find all genes annotated as, say, “starch synthase”. Having these integrated saves researchers time. Ensure that the web interface for these is user-friendly (some Tripal-based sites have modules for GO enrichment, etc., which you likely employed).
  • Microsatellite (SSR) search and Primers: MISA-web integration means users can find simple sequence repeats which could be useful for genetic marker development. Artocarpus breeders or conservation geneticists might appreciate this, as SSRs (or primers for them) can be used for genotyping studies. Including a primer design function (perhaps linked to SSR loci or any gene region) is a nice extra. It shows the database can support breeding applications (for example, designing primers to amplify a gene of interest in various cultivars).
  • RNA-Seq Expression Data: You have an expression heatmap feature, indicating you loaded RNA-seq data (possibly from various tissues or developmental stages of one or more species). This is a powerful addition because it brings the data to life. Make sure to describe the source of the RNA-seq data in the paper (e.g., “we incorporated transcriptome data from ripe and unripe fruit of jackfruit” or “leaf vs. fruit comparisons”, etc., or any public RNA-seq for Artocarpus if available). The heatmap viewer allows users to quickly see if a gene is expressed highly in a certain tissue. This kind of integration is often praised by reviewers, as it extends the database from genomic sequence to functional genomics.
  • Differential Expression Analysis (Volcano Plot): You asked whether adding a differential expression (DE) analysis (perhaps visualized as a volcano plot) is necessary or beneficial. Here’s how to consider it:
    • If you already have specific comparisons in your RNA-seq data (for instance, ripe vs unripe fruit, or fruit pulp vs seed, etc.), running a DE analysis to identify which genes are significantly up- or down-regulated is relatively straightforward. Presenting those results can provide biological insights. A volcano plot is a great way to visualize DE results (with log fold-change vs. significance, highlighting genes of interest).
    • Including a static volcano plot on the website or in the paper could highlight, for example, aroma-related genes that are highly upregulated in ripe fruit. This would directly tie into the theme of aroma genes and show the database’s capability to connect genotype with expression phenotype.
    • However, not every database includes such analysis on the site itself. Many stick to providing the expression values and let users do their own DE analyses. Is it necessary? Strictly speaking, no – you can publish without it. But is it a nice enhancement? Definitely yes, if you have the data. It demonstrates the utility of your platform by providing a case study result. Reviewers often appreciate seeing one or two biologically interesting findings enabled by the database. A volcano plot of DE genes can serve as one such figure in the manuscript.
    • Perhaps a compromise: perform a DE analysis as part of the study and include the volcano plot in the publication figures, even if the website itself only provides the underlying data (expression heatmaps). In the paper, you can say “using the integrated RNA-seq module, we identified X genes upregulated in ripe fruit vs unripe fruit (see Figure X for a volcano plot of differentially expressed genes). Notably, many TPS genes showed higher expression in ripe fruit, consistent with their role in volatile production.” – This kind of result both validates the database’s data and provides a novel biological insight.
    • If time permits and data is available, I would recommend adding a differential expression analysis, as it aligns well with your focus on aroma genes and will strengthen the narrative that “this database not only stores data but also helped us discover XYZ.”
  • User Interface & Experience: Though not asked explicitly, from an academic standpoint it’s worth ensuring the interface is polished. For example, the synteny viewer should be intuitive (if using a circos plot, have it clearly labeled, or if using JBrowse 2’s linear synteny, ensure the documentation for how to use it is provided). The site should have a help page or tutorial (perhaps adapted from FGD’s help) so new users know how to use features like gene family search or pathway queries. In FGD’s publication, they emphasized the user-friendly nature of the databasemdpi.com; you should do similarly for the Artocarpus DB.
  • Performance: Academic reviewers might check that queries run reasonably fast (BLAST searches returning results, browser tracks loading without excessive delay). Given your experience with FGD, you likely have optimized this, but it’s good to double-check everything with the current data load (7 genomes worth of data).

In summary, the features you have are comprehensive and at the level of other publishable genome databases. The additional suggestion of including a volcano plot or any differential expression result is mainly to bolster the manuscript’s impact. It shows an example of how the database can be used to derive new knowledge, which is often a question reviewers ask (“what’s the biological insight or use-case of this database?”). By pre-emptively answering that with a cool example (like highlighting aroma gene expression), you make the paper more compelling.

Potential Journals and Publication Prospects

Now, considering where to publish and the likelihood of acceptance:

1. Nucleic Acids Research (NAR) – Database Issue: This is a top choice for database papers, as they have a dedicated annual issue for new databases. Your database fits the criteria of novel data integration and broad utility (especially since jackfruit and breadfruit are important food crops). NAR expects high-quality and enduring databases. They will check that the website is robust and that the content is significant. If you aim for NAR, emphasize the global importance of Artocarpus (food security, tropical fruit, underutilized crop with rising interestacademic.oup.comacademic.oup.com) and how your database will facilitate research and breeding for these species. Note that NAR is quite competitive; the work should be presented as a substantial resource. Including comparisons among 7 species, unique tools, and possibly some new biological findings (as discussed) will help. Since FicusGD was published elsewhere, you can cite it as a related effort and then argue that ArtocarpusDB is another major contribution, possibly even cross-referencing the idea of a Moraceae family if that strengthens it. If accepted, NAR gives high visibility.

2. Database: The Journal of Biological Databases and Curation (Oxford): This journal specializes in database articles and is less impact-driven than NAR, focusing on utility and technical solidity. It’s a very suitable venue if NAR is not pursued. Many niche or organism-specific databases are published here. The review process will look at whether the data are properly curated, the site is accessible, and the paper clearly describes the content and usage. Your work would likely be well-received, as integrating multiple plant genomes and providing analysis tools aligns with their scope.

3. GigaScience or Scientific Data: These journals focus on data resources and often require that data and code are publicly available (for reproducibility). If you choose GigaScience, you might need to also release the underlying database dumps or software pipelines. GigaScience has published some genome databases, especially if accompanied by new genome assemblies or big data. If any of your 7 genomes are newly sequenced by your group (rather than all from previous publications), then GigaScience could be a fit since you’d be presenting new data. Scientific Data (by Nature) would treat it as a data descriptor; you’d emphasize the dataset (7 genomes reannotated) and the database as a means to access it. These are good if you want an open-data angle, but they typically don’t require as much novelty in analysis, just rigor in data presentation.

4. Disciplinary Journals (Genomics/Plant journals):

  • Horticulture Research (Oxford University Press/Nature): This is a high-impact journal in plant science. They have published genome papers and some resources for horticultural crops. If you aim here, you should include more biological analysis in the paper. For example, compare the 7 Artocarpus genomes to derive insights into fruit evolution or domestication (similar to how a 2022 Horticulture Research paper studied jackfruit domestication in Chinaacademic.oup.comacademic.oup.com). You could use your database to do a comparative analysis (maybe identify a gene family expanded in all cultivated Artocarpus vs wild ones, or analyze synteny to see if any major chromosomal rearrangements happened). Essentially, you’d be writing a hybrid of a resource article and a comparative genomics study. If you have the capacity to do that, it could make a strong submission. Horticulture Research will care about the novelty and biological significance, not just the existence of the database.
  • Frontiers in Plant Science (section: Bioinformatics and Computational Biology or Plant Genetics): Frontiers journals can consider database papers especially if they highlight new tools or significant data. The peer review might be a bit more variable, but if you get reviewers who understand databases, they would focus on how the database will advance plant science. Frontiers also allows a lot of detail in methods, which is good for describing your platform’s construction.
  • BMC Plant Biology or BMC Genomics: These have published genome databases and genome resources as well. The paper would need to show some example use-cases. The acceptance rate is decent, and they appreciate data integration efforts.
  • Plant Methods: If your emphasis is on the technical framework (for instance, if you extended the Tripal platform with new modules for gene family search or new visualizations), Plant Methods could be appropriate. It values the methodology behind tools and databases.
  • MDPI Journals (e.g., Horticulturae or Genes): Your Ficus database was published in Horticulturae (MDPI)mdpi.commdpi.com. MDPI journals generally have a faster process and are open access. Genes (MDPI) even has a section for database articles and published the draft Artocarpus genomes paper by Sahu et al. 2020. An MDPI option might be considered “easier” in terms of acceptance if you want a sure publication, albeit at the cost of lower prestige compared to NAR or Horticulture Research. Since you already have one database paper in Horticulturae, you could either publish in the same for consistency or try a different one for variety. Just ensure to maintain high quality writing and clear figures, as reviewers still will critique clarity and significance.

5. Conference or Others: Sometimes database descriptions appear as conference proceedings or smaller communications (e.g., ISMB proceedings in Bioinformatics or Database brief articles). Given the work you put in, you likely prefer a full journal article.

Recommendation: Start by assessing your goals. If you want high impact and broad recognition, prepare the manuscript for NAR’s Database Issue (deadline is usually mid-year for the January issue) or a top plant journal like Horticulture Research. The manuscript should be detailed, with figures showing the database interface, an example synteny plot, maybe an expression analysis (volcano or heatmap), etc., and it should emphasize how this database enables research on underutilized tropical fruits and comparative genomics of Moraceae. If aiming slightly lower, Database (Oxford) is a solid choice where the bar is that the database is useful and well-implemented, without needing a big novel biological discovery.

In terms of likelihood of acceptance: Given the success of your Ficus database publication and the evident effort in this Artocarpus database, the chances are high that it will be publishable in a suitable journal. To maximize success:

  • Ensure the manuscript is comprehensive: describe data sources, how each feature works, and include screenshots or figures of the database in action. Keep the writing clear (the guidelines you provided on readability and logical flow will help).
  • Cite relevant works: e.g., cite the individual genome papers for the 7 species, cite FicusGD paper as a related resource, and cite other crop databases to position your work in contextmdpi.com. This shows reviewers you are aware of similar efforts and how yours is distinct.
  • Address potential critiques in the paper itself: For instance, explicitly state that this is the first database for Artocarpus and why that’s needed (maybe reference increasing genomic data for Artocarpus and lack of any unified platform). Also, note the importance of Artocarpus (e.g., jackfruit as a sustainable food source, breadfruit for food security, etc., supported by citations about its significanceacademic.oup.comacademic.oup.com).
  • Plan for maintenance: Some reviewers might worry “will this site stay online?”. If you can, mention that the database will be maintained and updated as new genomes or data become available, perhaps with version updates (this was mentioned in FicusGD’s outlookmdpi.com). If you have funding or institutional support for it, even better to note.

Overall, the publication possibility is very good. I do not see any red flags that would make it unpublishable. It is more a question of where it will be published rather than if. If written and executed well, this database could become a go-to resource for researchers studying jackfruit, breadfruit, and their relatives, much like CitrusGenomeDB is for citrus researchers or CottonGen for cotton.

Conclusion and Suggestions

In conclusion, your Artocarpus genome database project appears robust and academically valuable. It can certainly be published, especially if you emphasize the unique aspects and ensure the presentation is clear. To recap key points and suggestions:

  1. Publishability & Novelty: Yes, the database is publishable. It offers the first integrated platform for a group of important tropical fruit species – this is a clear gap it fills. Its novelty lies in multi-species integration and specialized analysis tools (aroma gene families, pathway database) that go beyond basic genome browsers. These functions can and should be highlighted as innovations in the manuscript to convince reviewers of its merit.
  2. Data Quality: Having two chromosome-level assemblies and the rest high-quality scaffolds (with BUSCO >93%) is absolutely fine. Provide transparency through assembly stats. The completeness and consistency added via re-annotation are strengths to mention. No reviewer should object as long as you document the data sources and quality.
  3. Website URL: The current URL (morac.ficusgd.com/Artocarpus) is acceptable for publication. If you can make it more intuitive (like an Artocarpus-specific subdomain or domain), it might be a slight improvement, but it’s not strictly necessary. The key is that the site is accessible and clearly branded on the webpage itself as an Artocarpus database. In the paper, use a good name (e.g., “Artocarpus Genome Database (AGD)”) so people remember it, and list the URL. Reviewers are unlikely to “question” the URL beyond possibly suggesting a name change; just be ready to justify it as part of a broader platform if needed.
  4. Separate vs. Combined Resource: It’s not considered a redundant publication that you made a Ficus database before and now an Artocarpus database – they cover different organisms and data. Be prepared to explain why separate is beneficial (as we outlined: different focus, clarity, avoiding mixing too much data at once). This should address any reviewer’s curiosity on that front. There’s no penalty for you publishing a “series” of databases, as long as each stands alone in content and has its own novel aspects (which yours does). In fact, your experience with FicusGD likely means this new database is well-implemented, which is a plus.
  5. Enhancements (Volcano Plot and more): While your current features are great, consider adding a bit more analysis output if feasible, such as a differential expression result. It’s not that the absence would make it unpublishable, but the presence can elevate the impact. It shows the database isn’t just a static repository but has been used to generate insights. Even if not on the website, including such analysis in the paper (with figures) will strengthen the academic contribution. Also, double-check if there are any other minor features to include before publishing – for example, do you have a download page for sequences and annotations? Most databases provide downloadable FASTA/GFF files for each genome so researchers can do offline analysis; if you don’t, consider adding that. Reviewers appreciate when databases also allow data export, not just browsing.
  6. Journal Choice: Target a journal that matches your desired impact and prepare the manuscript accordingly. If aiming high (NAR, Horticulture Research), focus on broad relevance and novel findings enabled by the database. If aiming for a dedicated database journal (Database, BMC, etc.), focus on the technical integration and community resource aspect. In any case, the writing should be thorough (covering the database construction, content, and usage examples) and the tone confident about the usefulness of the resource.
  7. Future Perspective: It can be good to include a short statement about future plans (e.g., “We plan to update ArtocarpusDB as new genome assemblies (such as other Artocarpus species or improvements of existing ones) become available. The modular framework (built on Tripal, if it is) will also allow incorporation of additional analysis tools or even expansion to other Moraceae in the futuremdpi.com.”). This tells the reviewers and readers that the database will not stagnate. It’s a minor point but contributes to the overall impression of a long-term resource.

Final evaluation: The project shows high academic value and feasibility for publication. It addresses a real need in the plant genomics community and is executed with a comprehensive approach. By following the suggestions above and articulating the strengths of your Artocarpus database in the manuscript, you should be able to successfully publish it. I also recommend reaching out to collaborators or colleagues to maybe test the site and read your draft – fresh eyes can spot any usability issues or unclear explanations, which you can refine before submission. Good luck, and I look forward to seeing the Artocarpus database serving researchers worldwide!