About Ark

What is Ark?

Ark is a mammal genome availability atlas based on the Mammal Diversity Database (MDD). It shows all currently recognized mammal species and marks which species have downloaded NCBI genome assemblies in this local dataset.

Ark is a static read-only website — it does not perform live NCBI lookups, does not host genome files for download, and does not reflect global genome availability. It is a snapshot of what assemblies have been downloaded and indexed locally.

Data Sources

Mammal Diversity Database (MDD)
Taxonomic backbone with 6,871 mammal species, including IUCN status, distribution, and authority information. Used as the primary species list.
NCBI Datasets
Genome assembly metadata downloaded via NCBI Datasets API for mammalian taxa. Provides assembly-level information, annotation product availability, and accession identifiers.

Matching Method

MDD species are matched to NCBI records using scientific name comparison:

  • Exact match (979): The NCBI species name exactly matches the MDD scientific name.
  • Binomial or subspecies match (34): The NCBI record contains the MDD binomial (genus + specificEpithet) as a substring, typically indicating a subspecies or domestic breed record matching to the parent species.
  • No match (5,858): No NCBI assembly record matches this MDD species in the local dataset.

59 NCBI records could not be matched to any MDD species and are listed on the Unmatched page.

Coverage Summary

MDD species rows: 6871
Downloaded NCBI records: 1169
MDD rows matched to downloaded genome: 1013
MDD rows missing downloaded genome: 5858
Downloaded NCBI records not matched to MDD exact/binomial: 59

Match types:
none: 5858
exact: 979
binomial_or_subspecies: 34

Matched assembly levels:
Scaffold: 462
Chromosome: 366
Contig: 179
Complete Genome: 6

Matched availability:
with RNA: 243
with protein: 285
with GFF: 285
with GTF: 284

Limitations

  • Ark reflects only genomes that have been downloaded to this local dataset. A species shown as "no genome" may still have assemblies on NCBI — they were simply not downloaded here.
  • Assembly metadata is a static snapshot from the time of download. NCBI may have newer assemblies or annotations not reflected here.
  • IUCN status reflects the MDD snapshot and may not match the current IUCN Red List.
  • Matching by scientific name only — no fuzzy matching or synonym resolution beyond binomial/subspecies extraction.
  • Genome files are indexed in metadata but not publicly downloadable from Ark. Visit NCBI Datasets to access actual sequence data.

What v1 Does Not Do

  • Live NCBI Datasets API queries
  • Genome file downloads or data export
  • Fuzzy name matching or synonym resolution
  • Phylogenetic trees or comparative analyses
  • Search by geographic region or biogeographic filtering
  • User accounts or personalization