About Ark
What is Ark?
Ark is a mammal genome availability atlas based on the Mammal Diversity Database (MDD). It shows all currently recognized mammal species and marks which species have downloaded NCBI genome assemblies in this local dataset.
Ark is a static read-only website — it does not perform live NCBI lookups, does not host genome files for download, and does not reflect global genome availability. It is a snapshot of what assemblies have been downloaded and indexed locally.
Data Sources
- Mammal Diversity Database (MDD)
- Taxonomic backbone with 6,871 mammal species, including IUCN status, distribution, and authority information. Used as the primary species list.
- NCBI Datasets
- Genome assembly metadata downloaded via NCBI Datasets API for mammalian taxa. Provides assembly-level information, annotation product availability, and accession identifiers.
Matching Method
MDD species are matched to NCBI records using scientific name comparison:
- Exact match (979): The NCBI species name exactly matches the MDD scientific name.
- Binomial or subspecies match (34): The NCBI record contains the MDD binomial (genus + specificEpithet) as a substring, typically indicating a subspecies or domestic breed record matching to the parent species.
- No match (5,858): No NCBI assembly record matches this MDD species in the local dataset.
59 NCBI records could not be matched to any MDD species and are listed on the Unmatched page.
Coverage Summary
MDD species rows: 6871 Downloaded NCBI records: 1169 MDD rows matched to downloaded genome: 1013 MDD rows missing downloaded genome: 5858 Downloaded NCBI records not matched to MDD exact/binomial: 59 Match types: none: 5858 exact: 979 binomial_or_subspecies: 34 Matched assembly levels: Scaffold: 462 Chromosome: 366 Contig: 179 Complete Genome: 6 Matched availability: with RNA: 243 with protein: 285 with GFF: 285 with GTF: 284
Limitations
- Ark reflects only genomes that have been downloaded to this local dataset. A species shown as "no genome" may still have assemblies on NCBI — they were simply not downloaded here.
- Assembly metadata is a static snapshot from the time of download. NCBI may have newer assemblies or annotations not reflected here.
- IUCN status reflects the MDD snapshot and may not match the current IUCN Red List.
- Matching by scientific name only — no fuzzy matching or synonym resolution beyond binomial/subspecies extraction.
- Genome files are indexed in metadata but not publicly downloadable from Ark. Visit NCBI Datasets to access actual sequence data.
What v1 Does Not Do
- Live NCBI Datasets API queries
- Genome file downloads or data export
- Fuzzy name matching or synonym resolution
- Phylogenetic trees or comparative analyses
- Search by geographic region or biogeographic filtering
- User accounts or personalization