Difference between revisions of "Data repositories"

From Open Access Directory
Jump to navigation Jump to search
 
(51 intermediate revisions by the same user not shown)
Line 42: Line 42:
  
 
* [http://nssdc.gsfc.nasa.gov/ National Space Science Data Center].  From the US [http://www.nasa.gov/ National Aeronautics and Space Administration] (NASA).
 
* [http://nssdc.gsfc.nasa.gov/ National Space Science Data Center].  From the US [http://www.nasa.gov/ National Aeronautics and Space Administration] (NASA).
 +
 +
* [http://simbad.u-strasbg.fr/simbad/ SIMBAD Astronomical Database]([https://perma.cc/G9PM-25C7 perma.cc]). The SIMBAD astronomical database provides basic data, cross-identifications, bibliography and measurements for astronomical objects outside the solar system.
 +
 +
* [http://starchive.org/ Starchive]([https://perma.cc/7AEH-WA8U perma.cc]). An open source, open access stelar archive.
 +
 +
* [https://www.ukssdc.ac.uk/ UK Solar System Data Centre]([https://perma.cc/C5UD-J9FA perma.cc]).
  
 
== Biology ==
 
== Biology ==
Line 49: Line 55:
  
 
* [http://www.arabidopsis.org/submit/index.jsp The Arabidopsis Information Resource] - The Arabidopsis Information Resource (TAIR) maintains a [http://www.arabidopsis.org/search/ERwin/Tair.htm database] of genetic and [http://www.arabidopsis.org/about/datasources.jsp molecular biology data]for the model higher plant [http://www.arabidopsis.org/portals/education/aboutarabidopsis.jsp ''Arabidopsis thaliana''].
 
* [http://www.arabidopsis.org/submit/index.jsp The Arabidopsis Information Resource] - The Arabidopsis Information Resource (TAIR) maintains a [http://www.arabidopsis.org/search/ERwin/Tair.htm database] of genetic and [http://www.arabidopsis.org/about/datasources.jsp molecular biology data]for the model higher plant [http://www.arabidopsis.org/portals/education/aboutarabidopsis.jsp ''Arabidopsis thaliana''].
 +
 +
* [https://www.ebi.ac.uk/arrayexpress/ Array Express]([https://perma.cc/596W-E4RC perma.cc]). Archive of Functional Genomics Data stores data from high-throughput functional genomics experiments, and provides these data for reuse to the research community.
 +
 +
* [http://www.bmrb.wisc.edu/ Biological Magnetic Resonance Databank]([https://perma.cc/6UYB-XMVM perma.cc]). A Repository for Data from NMR Spectroscopy on Proteins, Peptides, Nucleic Acids, and other Biomolecules.
 +
 +
* [https://thebiogrid.org/ Biological General Repository for Interaction Datasets] - BioGRID. ([https://perma.cc/34CE-XK79 perma.cc]). BioGRID is an interaction repository with data compiled through comprehensive curation efforts.
 +
 +
* [http://www.ebi.ac.uk/biomodels/ BioModels]([https://perma.cc/YT9S-GEQ9 perma.cc]). BioModels is a repository of mathematical models of biological and biomedical systems.
  
 
* [http://bond.unleashedinformatics.com/ BOND] (Biomolecular Object Network Databank). From [http://www.unleashedinformatics.com/ Unleashed Informatics].
 
* [http://bond.unleashedinformatics.com/ BOND] (Biomolecular Object Network Databank). From [http://www.unleashedinformatics.com/ Unleashed Informatics].
 +
 +
* [https://www.cancerimagingarchive.net/ Cancer Imaging Archive]([https://perma.cc/C3AC-VTFQ perma.cc]). TCIA is a service which de-identifies and hosts a large archive of medical images of cancer accessible for public download.
  
 
* [http://cellimagelibrary.org/pages/contribute The Cell: An Image Library] Images of all cell types from all organisms, including intracellular structures and movies or animations demonstrating functions. This project relies upon the cell biology community to populate the library. The Cell: An Image Library™ is a freely accessible, easy-to-search, public repository of reviewed and annotated images, videos, and animations of cells from a variety of organisms, showcasing cell architecture, intracellular functionalities, and both normal and abnormal processes. The purpose of this database is to advance research, education, and training, with the ultimate goal of improving human health.
 
* [http://cellimagelibrary.org/pages/contribute The Cell: An Image Library] Images of all cell types from all organisms, including intracellular structures and movies or animations demonstrating functions. This project relies upon the cell biology community to populate the library. The Cell: An Image Library™ is a freely accessible, easy-to-search, public repository of reviewed and annotated images, videos, and animations of cells from a variety of organisms, showcasing cell architecture, intracellular functionalities, and both normal and abnormal processes. The purpose of this database is to advance research, education, and training, with the ultimate goal of improving human health.
 +
 +
* [http://www.cxidb.org/ Coherent X-ray Imaging Data Bank (CXIDB)]([https://perma.cc/UYU7-PBMZ perma.cc]). The main goal of the Coherent X-ray Imaging Data Bank is to address these problems by creating an open repository for CXI experimental data.
 +
 +
* [http://www.crystallography.net/cod/ Crystallography Open Database (COD)]([https://perma.cc/FL8F-7W97 perma.cc]). Open-access collection of crystal structures of organic, inorganic, metal-organics compounds and minerals, excluding biopolymers.
  
 
* [http://sysbio.unl.edu/DFVF/ Database of Virulence Factors in Fungal Pathogenes] (DFVF)([https://perma.cc/NK4J-Z73P perma.cc]). The database is expected to greatly stimulate and facilitate further studies in fungal pathogens; both experimental biologists and computational biologists can use the database and/or the predicted virulence factors to guide their search for new virulence factors and/or discovery of new pathogen-host interaction mechanisms in fungi.
 
* [http://sysbio.unl.edu/DFVF/ Database of Virulence Factors in Fungal Pathogenes] (DFVF)([https://perma.cc/NK4J-Z73P perma.cc]). The database is expected to greatly stimulate and facilitate further studies in fungal pathogens; both experimental biologists and computational biologists can use the database and/or the predicted virulence factors to guide their search for new virulence factors and/or discovery of new pathogen-host interaction mechanisms in fungi.
Line 59: Line 79:
  
 
* [http://databasin.org/ DataBasin].  OA data in conservation.  From the [http://www.consbio.org/ Conservation Biology Institute] in partnership with [http://www.rhizalabs.com/ Rhiza Labs].
 
* [http://databasin.org/ DataBasin].  OA data in conservation.  From the [http://www.consbio.org/ Conservation Biology Institute] in partnership with [http://www.rhizalabs.com/ Rhiza Labs].
 +
 +
* [https://www.ddbj.nig.ac.jp/index-e.html DNA Databank of Japan - DDBJ]([https://perma.cc/3ZKD-G9RX perma.cc]). Bioinformation and DDBJ Center provides sharing and analysis services for data from life science researches and advances science.
 +
 +
* [https://www.ncbi.nlm.nih.gov/gap/ dbGaP]([https://perma.cc/AZ8G-P68T perma.cc]). The database of Genotypes and Phenotypes (dbGaP) was developed to archive and distribute the data and results from studies that have investigated the interaction of genotype and phenotype in Humans.
 +
 +
* [https://www.ncbi.nlm.nih.gov/snp/ dpSNP]([https://perma.cc/L3AW-6EXP perma.cc]). dbSNP contains human single nucleotide variations, microsatellites, and small-scale insertions and deletions along with publication, population frequency, molecular consequence, and genomic and RefSeq mapping information for both common variations and clinical mutations.
 +
 +
* [https://www.ncbi.nlm.nih.gov/dbvar/ dbVar]([https://perma.cc/JDA9-K62V perma.cc]). dbVar is NCBI's database of human genomic Structural Variation — large variants >50 bp including insertions, deletions, duplications, inversions, mobile elements, translocations, and complex variants.
  
 
* [http://www.datadryad.org/ Dryad] Dryad is an international repository of data underlying scientific and medical publications, particularly data for which no specialized repository exists. All material in Dryad is associated with a scholarly publication. Most data in the repository are associated with peer-reviewed articles, although data associated with non-peer reviewed publications from reputable academic sources, such as dissertations, are also accepted.  Dryad is a non-profit organization.  
 
* [http://www.datadryad.org/ Dryad] Dryad is an international repository of data underlying scientific and medical publications, particularly data for which no specialized repository exists. All material in Dryad is associated with a scholarly publication. Most data in the repository are associated with peer-reviewed articles, although data associated with non-peer reviewed publications from reputable academic sources, such as dissertations, are also accepted.  Dryad is a non-profit organization.  
 +
 +
* [http://www.emdataresource.org/ Electron Microscopy Data Bank (EMDB)]([https://perma.cc/BM6D-K6R3 perma.cc]). Global resource for 3-Dimensional Electron Microscopy (3DEM) structure data archiving and retrieval, news, events, software tools, data standards, validation methods, and community challenges.
 +
 +
* [https://eupathdb.org/eupathdb/ Eukaryotic Pathogen Database Resources (EuPathDB)]([https://perma.cc/AUR7-GH9N perma.cc]). EuPathDB (formerly ApiDB) is an integrated database covering the eukaryotic pathogens in the genera listed in the [EuPathDB (formerly ApiDB) is an integrated database covering the eukaryotic pathogens in the genera listed in our Data Summary page Data Summary] page.
 +
 +
* [https://ega-archive.org/ The European Genome-phenome Archive (EGA)]([https://perma.cc/W3RN-RH7Q perma.cc]). The European Genome-phenome Archive (EGA) is a service for permanent archiving and sharing of all types of personally identifiable genetic and phenotypic data resulting from biomedical research projects.
 +
 +
* [https://www.ebi.ac.uk/eva/ European Variation Archive]([https://perma.cc/NH45-7GPB perma.cc]). An open-access database of all types of genetic variation data from all species.
 +
 +
* [http://flybase.org/ FlyBase]([https://perma.cc/B3US-9Y4D perma.cc]). A Database of Drosophila Genes & Genomes.
 +
 +
* [https://gin.g-node.org/ G-Node GIN]([https://perma.cc/QZ6B-E5VL perma.cc]). Modern Research Data Management for Neuroscience.
  
 
* [http://www.ncbi.nlm.nih.gov/geo/info/submission.html Gene Expression Omnibus] High-throughput functional genomic data, including all array-based applications and some high-throughput sequencing data.
 
* [http://www.ncbi.nlm.nih.gov/geo/info/submission.html Gene Expression Omnibus] High-throughput functional genomic data, including all array-based applications and some high-throughput sequencing data.
  
 
* [http://www.gbif.org/ Global Biodiversity Information Facility] (GBIF) ([https://perma.cc/BGE5-CBFH perma.cc]).  "Free and open access to biodiversity data."  Data portal launched in 2007 by institutions in 17 countries under a non-binding inter-governmental agreement.
 
* [http://www.gbif.org/ Global Biodiversity Information Facility] (GBIF) ([https://perma.cc/BGE5-CBFH perma.cc]).  "Free and open access to biodiversity data."  Data portal launched in 2007 by institutions in 17 countries under a non-binding inter-governmental agreement.
 +
 +
* [https://www.proteinatlas.org/ Human Protein Atlas]([https://perma.cc/2FML-ANKD perma.cc]). All the data in the knowledge resource is open access to allow scientists both in academia and industry to freely access the data for exploration of the human proteome.
 +
 +
* [http://idr.openmicroscopy.org/about/ http://idr.openmicroscopy.org/about/ Image Data Repository - IDR]([https://perma.cc/G2DX-8X9E perma.cc]). The Image Data Resource (IDR) is a public repository of reference image datasets from published scientific studies. IDR enables access, search and analysis of these highly annotated datasets.
 +
 +
* [https://perma.cc/N2H8-3XDY ImmPort Shared Data]([https://perma.cc/N2H8-3XDY perma.cc]). The ImmPort project provides advanced information technology support in the archiving and exchange of scientific data for the diverse community of life science researchers supported by NIAID/DAIT and serves as a long-term, sustainable archive of research and clinical data.
 +
 +
* [https://www.fludb.org/brc/home.spg?decorator=influenza Influenza Research Database]([https://perma.cc/HS5A-YHZR perma.cc]). This resource contains avian and non-human mammalian influenza surveillance data, human clinical data associated with virus extracts, phenotypic characteristics of viruses isolated from extracts, and all genomic and proteomic data available in public repositories for influenza viruses.
 +
 +
* [https://www.itis.gov/ Integrated Taxonomic Information System ITIS]([https://perma.cc/4GDQ-C2PZ perma.cc]). It provides authoritative taxonomic information on plants, animals, fungi, and microbes of North America and the world.
 +
 +
* [https://www.kimosys.org/ KiMoSys]([https://perma.cc/R6FY-W6KY perma.cc]). A web application for quantitative KInetic MOdels of biological SYStems.
 +
 +
* [https://www.ebi.ac.uk/metabolights/ MetaboLights]([https://perma.cc/6RPT-R26W perma.cc]). MetaboLights is a database for Metabolomics experiments and derived information.
 +
 +
* [https://www.ebi.ac.uk/metagenomics/ MGnify]([https://perma.cc/FDA2-58XY perma.cc]). MGnify offers an automated pipeline for the analysis and archiving of microbiome data to help determine the taxonomic diversity and functional & metabolic potential of environmental samples.
  
 
* [http://shirleyfung.com/mbdb/ Molecular Biology Databases].  From Shirley Fung. A list of [http://shirleyfung.com/mbdb/filter.php?by=alltab 34 databases] with annotations to show their openness under six criteria.  Also see her list of [http://shirleyfung.com/mbdb/filter.php?by=compliant 7 databases] which comply with the [http://sciencecommons.org/projects/publishing/open-access-data-protocol/ Science Commons Open Access Data Protocol].
 
* [http://shirleyfung.com/mbdb/ Molecular Biology Databases].  From Shirley Fung. A list of [http://shirleyfung.com/mbdb/filter.php?by=alltab 34 databases] with annotations to show their openness under six criteria.  Also see her list of [http://shirleyfung.com/mbdb/filter.php?by=compliant 7 databases] which comply with the [http://sciencecommons.org/projects/publishing/open-access-data-protocol/ Science Commons Open Access Data Protocol].
  
* [http://morphobank.org/ MorphoBank]. "Homology of phenotypes over the web."  Hosted by the [http://www.stonybrook.edu/ State University of New York at Stony Brook].
+
* [http://morphobank.org/ MorphoBank]([https://perma.cc/39QH-EKGT perma.cc]). "Homology of phenotypes over the web."  Hosted by the [http://www.stonybrook.edu/ State University of New York at Stony Brook]. MorphoBank assists scientists building the Tree of Life - the genealogy of all living and extinct species.
 +
 
 +
* [https://perma.cc/Z83U-Z6FH Mouse Genome Informatics (MGI)]([https://perma.cc/Z83U-Z6FH perma.cc]). MGI is the international database resource for the laboratory mouse, providing integrated genetic, genomic, and biological data to facilitate the study of human health and disease.
 +
 
 +
* [https://www.movebank.org/ Movebank Data Repository]([https://perma.cc/2KAM-G2YF perma.cc]). Movebank is a free, online database of animal tracking data hosted by the [http://www.ab.mpg.de/ Max Planck Institute of Animal Behavior].
  
 
* [http://www.nbii.gov/portal/server.pt?open=512&objID=236&mode=2&cached=true National Biological Information Infrastructure] A broad, collaborative program to provide increased access to data and information on the nation's biological resources. The NBII links diverse, high-quality biological databases, information products, and analytical tools maintained by [http://www.nbii.gov/portal/server.pt/community/nbii_partners/413 NBII partners] and other contributors in government agencies, academic institutions, non-government organizations, and private industry. (Note: In the President's budget for Fiscal Year 2012  the repository was terminated.)
 
* [http://www.nbii.gov/portal/server.pt?open=512&objID=236&mode=2&cached=true National Biological Information Infrastructure] A broad, collaborative program to provide increased access to data and information on the nation's biological resources. The NBII links diverse, high-quality biological databases, information products, and analytical tools maintained by [http://www.nbii.gov/portal/server.pt/community/nbii_partners/413 NBII partners] and other contributors in government agencies, academic institutions, non-government organizations, and private industry. (Note: In the President's budget for Fiscal Year 2012  the repository was terminated.)
 +
 +
* [https://www.ncbi.nlm.nih.gov/taxonomy NCBI Taxonomy]([https://perma.cc/QFM7-C8HW perma.cc]). The Taxonomy Database is a curated classification and nomenclature for all of the organisms in the public sequence databases.
 +
 +
* [http://www.ndexbio.org/ The Network Data Exchange (NDEx)]([https://perma.cc/J7UN-DGCP perma.cc]). The NDEx Project provides an open-source framework where scientists and organizations can share, store, manipulate, and publish biological network knowledge.
 +
 +
* [http://neuromorpho.org/neuroMorpho/index.jsp NeuroMorpho.org]([https://perma.cc/YDX2-JNT7 perma.cc]). NeuroMorpho.Org is a centrally curated inventory of digitally reconstructed neurons associated with peer-reviewed publications.
 +
 +
* [https://openneuro.org/ OpenNEURO]([https://perma.cc/G8NZ-WXG7 perma.cc]). A free and open platform for sharing MRI, MEG, EEG, iEEG, and ECoG data.
  
 
* [http://paleodb.org/cgi-bin/bridge.pl PaleoBiology Database].  "We are bringing together taxonomic and distributional information about the entire fossil record of plants and animals."  From a large number of [http://paleodb.org/cgi-bin/bridge.pl?action=displayAuthorizers researchers] at a large number of [http://paleodb.org/cgi-bin/bridge.pl?action=displayInstitutions institutions].
 
* [http://paleodb.org/cgi-bin/bridge.pl PaleoBiology Database].  "We are bringing together taxonomic and distributional information about the entire fossil record of plants and animals."  From a large number of [http://paleodb.org/cgi-bin/bridge.pl?action=displayAuthorizers researchers] at a large number of [http://paleodb.org/cgi-bin/bridge.pl?action=displayInstitutions institutions].
  
* [http://mips.helmholtz-muenchen.de/projects/plants/PlaNetPortal/index_html Planet] A network of European Plant Databases
+
* [http://www.peptideatlas.org/ PeptideAtlas]([https://perma.cc/2ER8-7XU8 perma.cc]). A multi-organism, publicly accessible compendium of peptides identified in a large set of tandem mass spectrometry proteomics experiments.
  
 
* [http://www.ncbi.nlm.nih.gov/peptidome/ Peptidome].  For "tandem mass spectrometry peptide and protein identification data."  From the US [http://www.ncbi.nlm.nih.gov/ National Center for Biotechnology Information].
 
* [http://www.ncbi.nlm.nih.gov/peptidome/ Peptidome].  For "tandem mass spectrometry peptide and protein identification data."  From the US [http://www.ncbi.nlm.nih.gov/ National Center for Biotechnology Information].
 +
 +
* [http://mips.helmholtz-muenchen.de/projects/plants/PlaNetPortal/index_html Planet] A network of European Plant Databases.
 +
 +
* [https://www.ebi.ac.uk/pride/ PRIDE Proteomics IDentifications Database]([https://perma.cc/MWF8-DC8P perma.cc]). This service is part of the ELIXIR infrastructure. 
 +
 +
* [https://pcddb.cryst.bbk.ac.uk/ Protein Circular Dichroism Data Bank (PCDDB)]([https://perma.cc/Y9KL-UL2J perma.cc]). The Protein Circular Dichroism Data Bank (PCDDB) is a public repository that archives and freely distributes circular dichroism (CD) and synchrotron radiation CD (SRCD) spectral data and their associated experimental metadata.
  
 
* [http://www.rcsb.org/pdb/home/home.do RCSB Protein Data Bank].  From the [http://home.rcsb.org/ Research Collaboratory for Structural Bioinformatics] (RCSB).
 
* [http://www.rcsb.org/pdb/home/home.do RCSB Protein Data Bank].  From the [http://home.rcsb.org/ Research Collaboratory for Structural Bioinformatics] (RCSB).
 +
 +
* [http://www.proteomexchange.org/ ProteomeXChange]([https://perma.cc/YD3Q-TYWP perma.cc]). The ProteomeXchange Consortium was established to provide globally coordinated standard data submission and dissemination pipelines involving the main proteomics repositories, and to encourage open data policies in the field.
 +
 +
* [https://rgd.mcw.edu/ Rat Genome Database]([https://perma.cc/9HNH-F9PY perma.cc]). The Rat Genome Database (RGD) was established in 1999.
  
 
* [http://sabio.h-its.org/ SABIO Biochemical Reaction Kinetics Database]([https://perma.cc/5X9B-H9G2 perma.cc]). SABIO-RK is a curated database that contains information about biochemical reactions, their kinetic rate equations with parameters and experimental conditions.
 
* [http://sabio.h-its.org/ SABIO Biochemical Reaction Kinetics Database]([https://perma.cc/5X9B-H9G2 perma.cc]). SABIO-RK is a curated database that contains information about biochemical reactions, their kinetic rate equations with parameters and experimental conditions.
 +
 +
* [https://www.sasbdb.org/ Small Angle Scattering Biological Data Bank]([https://perma.cc/2248-7JQT perma.cc]). Curated repository for small angle scattering data and models.
 +
 +
* [https://data.sbgrid.org/ Structural Biology Data Grid]. ([https://perma.cc/Q347-5XYD perma.cc]). It supports publication of X-ray diffraction, MicroED, LLSM datasets, as well as structural models.
  
 
* [http://www.treebase.org TreeBASE]. "A Database of Phylogenetic Knowledge."  Released in March 2010 based on a prototype launched in 1994.  Hosted by the [http://www.phylorf.org/ Phyloinformatics Research Foundation].
 
* [http://www.treebase.org TreeBASE]. "A Database of Phylogenetic Knowledge."  Released in March 2010 based on a prototype launched in 1994.  Hosted by the [http://www.phylorf.org/ Phyloinformatics Research Foundation].
Line 87: Line 169:
  
 
* [http://www.ubio.org/index.php?pagename=home uBio]([https://perma.cc/HNR7-RUMU perma.cc]). uBio uses names and taxonomic intelligence to manage information about organisms.
 
* [http://www.ubio.org/index.php?pagename=home uBio]([https://perma.cc/HNR7-RUMU perma.cc]). uBio uses names and taxonomic intelligence to manage information about organisms.
 +
 +
* [https://www.vectorbase.org/ VectorBase]([https://perma.cc/9L49-N7SV perma.cc]). A National Institute of Allergy and Infectious Diseases (NIAID) Bioinformatics Resource Center (BRC) providing genomic, phenotypic and population-centric data to the scientific community for invertebrate vectors of human pathogens.
 +
 +
* [https://www.wwpdb.org/ Worldwide Protein Data Bank (wwPDB)]([https://perma.cc/YF7J-N9ZR perma.cc]).  The Protein Data Bank archive (PDB) has served as the single repository of information about the 3D structures of proteins, nucleic acids, and complex assemblies.
 +
 +
* [http://zfin.org/ Zebrafish Model Organism Database (ZFIN)]([https://perma.cc/M89K-E5N7 perma.cc]). A database of genetic and genomic data for the zebrafish (Danio rerio) as a model organism.
  
 
== Chemistry ==
 
== Chemistry ==
Line 114: Line 202:
  
 
* [http://moltable.ncl.res.in/chemstar/index.jsp ChemStar].  Maintained by India's [http://www.ncl-india.org/ National Chemical Laboratory] and sponsored by India's [http://www.dsir.gov.in/ Department for Scientific & Industrial Research].
 
* [http://moltable.ncl.res.in/chemstar/index.jsp ChemStar].  Maintained by India's [http://www.ncl-india.org/ National Chemical Laboratory] and sponsored by India's [http://www.dsir.gov.in/ Department for Scientific & Industrial Research].
 +
 +
* [https://www.iochem-bd.org/ ioChem-BD Computational Chemistry Datasets]([https://perma.cc/C6FE-NV94 perma.cc]). The Computational Chemistry Results Repository.
  
 
* [http://onschallenge.wikispaces.com/ Open Notebook Science Solubility Challenge].  Maintained by Jean-Claude Bradley, Rajarshi Guha, Andrew Lang and Cameron Neylon.  A database of non-aqueous solubility measurements with links to lab notebook pages where experiments were recorded.  The database can be searched via [http://rguha.ath.cx/~rguha/cicc/jcsol/sol.html Web Query] or [http://onschallenge.wikispaces.com/list+of+experiments alternate means].
 
* [http://onschallenge.wikispaces.com/ Open Notebook Science Solubility Challenge].  Maintained by Jean-Claude Bradley, Rajarshi Guha, Andrew Lang and Cameron Neylon.  A database of non-aqueous solubility measurements with links to lab notebook pages where experiments were recorded.  The database can be searched via [http://rguha.ath.cx/~rguha/cicc/jcsol/sol.html Web Query] or [http://onschallenge.wikispaces.com/list+of+experiments alternate means].
Line 128: Line 218:
  
 
* [http://www.caida.org/data/ Cooperative Association for Internet Data Analysis (CAIDA)] Archive of data for scientific analysis of network functions.
 
* [http://www.caida.org/data/ Cooperative Association for Internet Data Analysis (CAIDA)] Archive of data for scientific analysis of network functions.
 +
 +
* [https://www.freestatistics.org/ FreeStatistics of Irreproducible Research]([https://perma.cc/2MJ6-69LP perma.cc]). The purpose of this project is to facilitate the creation, maintenance, and permanent storage of statistical computation objects that empower authors to publish reproducible and reusable research (in the form of a Compendium) through a series of web services.
  
 
* [https://github.com GitHub] keeps your public and private code available, secure, and backed up.
 
* [https://github.com GitHub] keeps your public and private code available, secure, and backed up.
Line 134: Line 226:
  
 
* [https://code.launchpad.net Launchpad] can host your project’s source code using the Bazaar version control system. We also import over 2000 CVS, SVN, Git and Mercurial projects, so you can use Bazaar with those too.  
 
* [https://code.launchpad.net Launchpad] can host your project’s source code using the Bazaar version control system. We also import over 2000 CVS, SVN, Git and Mercurial projects, so you can use Bazaar with those too.  
 +
 +
* [https://www.reprozip.org/ ReproZip!]([https://perma.cc/4574-F9M7 perma.cc]). ReproZip can automatically pack your research along with all necessary data files, libraries, environment variables and options into a self-contained bundle. Then ReproZip can use that bundle to automatically set up the same original environment so anybody can reproduce the research on a different machine, without tracking down and installing the dependencies, or even having to run the same operating system.
  
 
* [http://sourceforge.net/ SourceForge] 2.7 million developers create powerful software in over 260,000 projects. Our popular directory connects more than 46 million consumers with these open source projects and serves more than 2,000,000 downloads a day. SourceForge is where open source happens.
 
* [http://sourceforge.net/ SourceForge] 2.7 million developers create powerful software in over 260,000 projects. Our popular directory connects more than 46 million consumers with these open source projects and serves more than 2,000,000 downloads a day. SourceForge is where open source happens.
Line 141: Line 235:
 
* [http://konect.uni-koblenz.de/ KONECT ] (the Koblenz Network Collection) is a project to collect large network datasets of all types in order to perform research in network science and related fields, collected by the Institute of Web Science and Technologies at the University of Koblenz–Landau.
 
* [http://konect.uni-koblenz.de/ KONECT ] (the Koblenz Network Collection) is a project to collect large network datasets of all types in order to perform research in network science and related fields, collected by the Institute of Web Science and Technologies at the University of Koblenz–Landau.
  
* [http://pajek.imfm.si/doku.php?id=data:urls:index pajek's ] network data sources
+
* [http://pajek.imfm.si/doku.php?id=data:urls:index pajek's ] network data sources.
  
 
== Energy ==
 
== Energy ==
Line 150: Line 244:
  
 
* [http://en.openei.org/ OpenEI: Open Energy Information].  Freely-available energy data, tools, models, and other resources.
 
* [http://en.openei.org/ OpenEI: Open Energy Information].  Freely-available energy data, tools, models, and other resources.
 +
 +
== Engineering ==
 +
* ''Also see'' Multidisciplinary repositories.
 +
 +
* [https://trid.trb.org/ TRID]([https://perma.cc/P7C7-WH5B perma.cc]). an integrated database that combines the records from TRB’s Transportation Research Information Services ([http://www.trb.org/InformationServices/InformationServices.aspx TRIS]) Database and the OECD’s Joint Transport Research Centre’s International Transport Research Documentation ([https://www.itf-oecd.org/international-transport-research-documentation-public ITRD]) Database. TRID provides access to more than 1.25 million records of transportation research worldwide.
  
 
== Environmental sciences ==
 
== Environmental sciences ==
Line 167: Line 266:
 
*[http://his.cuahsi.org/datapublishers.html Consortium of Universities for the Advancement of Hydrologic Science, Inc] HIS stands for Hydrologic Information System. CUAHSI's HIS is an internet based system to support the sharing of hydrologic data. It consists of databases connected using the internet through web services as well as software for data discovery, access and publication.
 
*[http://his.cuahsi.org/datapublishers.html Consortium of Universities for the Advancement of Hydrologic Science, Inc] HIS stands for Hydrologic Information System. CUAHSI's HIS is an internet based system to support the sharing of hydrologic data. It consists of databases connected using the internet through web services as well as software for data discovery, access and publication.
  
*[http://www.marine-geo.org/contribute.php The Marine Geoscience Data System (MGDS)] The Marine Geoscience Data System (MGDS) provides access to data portals for the NSF-supported Ridge 2000 and MARGINS programs, the Antarctic and Southern Ocean Data Synthesis, the Global Multi-Resolution Topography Synthesis, and Seismic Reflection Field Data Portal.
+
* [https://portal.edirepository.org/nis/home.jsp EDI Data Portal]([https://perma.cc/XF2L-B3NC perma.cc]). The EDI Data Portal contains environmental and ecological data packages contributed by a number of participating organizations.
 +
 
 +
*[http://www.marine-geo.org/contribute.php The Marine Geoscience Data System (MGDS)]([https://perma.cc/4G5F-JJKV perma.cc]). The Marine Geoscience Data System (MGDS) provides access to data portals for the NSF-supported Ridge 2000 and MARGINS programs, the Antarctic and Southern Ocean Data Synthesis, the Global Multi-Resolution Topography Synthesis, and Seismic Reflection Field Data Portal.
  
 
* [http://nsidc.org/data/submit.html National Snow and Ice Data Center (NSIDC)] Cryospheric datasets from ground field research and satellites.  
 
* [http://nsidc.org/data/submit.html National Snow and Ice Data Center (NSIDC)] Cryospheric datasets from ground field research and satellites.  
  
 
* [http://www.neoninc.org/ National Ecological Observatory Network] (NEON).  A joint project of 50+ US [http://www.neoninc.org/neon-membership/neon-member-institutions.html universities and laboratories].
 
* [http://www.neoninc.org/ National Ecological Observatory Network] (NEON).  A joint project of 50+ US [http://www.neoninc.org/neon-membership/neon-member-institutions.html universities and laboratories].
 +
 +
* [https://nerc.ukri.org/research/sites/data/ NERC Data Centers]([https://perma.cc/P7KA-AG8C perma.cc]). NERC has a network of environmental data centres that provide a focal point for NERC's scientific data and information. These centres hold data from environmental scientists working in the UK and around the world.
  
 
* [http://www.polardata.ca Polar Data Catalogue] A primarily Canadian archive of free RADARSAT imagery as well as Arctic, Antarctic, and other cryospheric data sets covering a range of disciplines, from natural sciences and policy to health and social sciences.
 
* [http://www.polardata.ca Polar Data Catalogue] A primarily Canadian archive of free RADARSAT imagery as well as Arctic, Antarctic, and other cryospheric data sets covering a range of disciplines, from natural sciences and policy to health and social sciences.
  
* [https://sedac.ciesin.columbia.edu/data-submission Socioeconomic Data and Applications Center (SEDAC)] specializes in spatial data and services in support of human-environment research and applications, in the context of NASA’s Earth science mission and the overall U.S. Global Change Research Program.  
+
* [https://sedac.ciesin.columbia.edu/data-submission Socioeconomic Data and Applications Center (SEDAC)] specializes in spatial data and services in support of human-environment research and applications, in the context of NASA’s Earth science mission and the overall U.S. Global Change Research Program.
  
 
== Geology ==
 
== Geology ==
Line 202: Line 305:
  
 
* [http://www.geongrid.org/about.html The Geosciences Network (GEON)] project is a collaboration among a dozen PI institutions and a number of other partner projects, institutions, and agencies to develop cyberinfrastructure in support of an environment for integrative geoscience research. GEON is funded by the NSF Information Technology Research (ITR) program.  
 
* [http://www.geongrid.org/about.html The Geosciences Network (GEON)] project is a collaboration among a dozen PI institutions and a number of other partner projects, institutions, and agencies to develop cyberinfrastructure in support of an environment for integrative geoscience research. GEON is funded by the NSF Information Technology Research (ITR) program.  
 +
 +
* [https://www2.earthref.org/MagIC Magnetics Information Consortium (MagIC)]([https://perma.cc/7KF7-MLD4 perma.cc]). Improves research capacity in the Earth and Ocean sciences by maintaining an open community digital data archive for rock and paleomagnetic data with portals that allow users access to archive, search, visualize, download, and combine these versioned datasets.
  
 
* [http://www.nodc.noaa.gov/General/NODC-Submit/ National Geographic Data Center] Archive of national and international marine environmental and ecosystem datasets.  
 
* [http://www.nodc.noaa.gov/General/NODC-Submit/ National Geographic Data Center] Archive of national and international marine environmental and ecosystem datasets.  
Line 211: Line 316:
 
* [http://www.polardata.ca Polar Data Catalogue] A primarily Canadian archive of free RADARSAT imagery as well as Arctic, Antarctic, and other cryospheric data sets covering a range of disciplines, from natural sciences and policy to health and social sciences.
 
* [http://www.polardata.ca Polar Data Catalogue] A primarily Canadian archive of free RADARSAT imagery as well as Arctic, Antarctic, and other cryospheric data sets covering a range of disciplines, from natural sciences and policy to health and social sciences.
  
* [http://edina.ac.uk/projects/sharegeo/ ShareGeo].  Integrating the older [http://gradedemo.edina.ac.uk/dspace/index.jsp GRADE] (Geospatial Repository for Academic Deposit and Extraction) repository.  From [http://edina.ac.uk/ EDINA].
+
* [http://edina.ac.uk/projects/sharegeo/ ShareGeo].  Integrating the older [http://gradedemo.edina.ac.uk/dspace/index.jsp GRADE] (Geospatial Repository for Academic Deposit and Extraction) repository.  From [http://edina.ac.uk/ EDINA]. (Repository discontinued.)
  
 
== Linguistics ==
 
== Linguistics ==
Line 233: Line 338:
  
 
* ''Also see'' Entrez databases, listed under Multidisciplinary repositories.
 
* ''Also see'' Entrez databases, listed under Multidisciplinary repositories.
 +
 +
* [https://cananolab.nci.nih.gov/caNanoLab/#/ caNanoLab]([https://perma.cc/4TQA-BC8P perma.cc]) A data sharing portal designed to facilitate information sharing across the international biomedical nanotechnology research community to expedite and validate the use of nanotechnology in biomedicine.
 +
 +
* [ebi.ac.uk/chembl/ CheMBL]. ChEMBL is a manually curated database of bioactive molecules with drug-like properties. It brings together chemical, bioactivity and genomic data to aid the translation of genomic information into effective new drugs.
  
 
* [http://www.datadryad.org/ Dryad] Dryad is an international repository of data underlying scientific and medical publications, particularly data for which no specialized repository exists. All material in Dryad is associated with a scholarly publication. Most data in the repository are associated with peer-reviewed articles, although data associated with non-peer reviewed publications from reputable academic sources, such as dissertations, are also accepted.  Dryad is a non-profit organization.  
 
* [http://www.datadryad.org/ Dryad] Dryad is an international repository of data underlying scientific and medical publications, particularly data for which no specialized repository exists. All material in Dryad is associated with a scholarly publication. Most data in the repository are associated with peer-reviewed articles, although data associated with non-peer reviewed publications from reputable academic sources, such as dissertations, are also accepted.  Dryad is a non-profit organization.  
 +
 +
* [http://flowrepository.org/ FlowRepository]([https://perma.cc/35NS-MCGB perma.cc]). FlowRepository is a database of flow cytometry experiments where you can query and download data collected and annotated according to the MIFlowCyt standard.
  
 
* [http://www.ncbi.nlm.nih.gov/Genbank/index.html GenBank].  From the U.S. [http://www.ncbi.nlm.nih.gov/ National Center for Biotechnology Information] of the [http://www.nih.gov/ National Institutes of Health].
 
* [http://www.ncbi.nlm.nih.gov/Genbank/index.html GenBank].  From the U.S. [http://www.ncbi.nlm.nih.gov/ National Center for Biotechnology Information] of the [http://www.nih.gov/ National Institutes of Health].
  
 
* [http://www.ncbi.nlm.nih.gov/geo/ Gene Expression Omnibus].  From the U.S. [http://www.ncbi.nlm.nih.gov/ National Center for Biotechnology Information] of the [http://www.nih.gov/ National Institutes of Health].
 
* [http://www.ncbi.nlm.nih.gov/geo/ Gene Expression Omnibus].  From the U.S. [http://www.ncbi.nlm.nih.gov/ National Center for Biotechnology Information] of the [http://www.nih.gov/ National Institutes of Health].
 
* [https://opentrials.net/ OpenTrials]. OpenTrials is a repository of clinical trial data hosted by [https://okfn.org/ Open Knowledge International].
 
  
 
* [http://www.icpsr.umich.edu/icpsrweb/HMCA/index.jsp The Health and Medical Care Archive (HMCA)] is the data archive of the Robert Wood Johnson Foundation (RWJF), the largest philanthropy devoted exclusively to health and health care in the United States. Operated by the Inter-university Consortium for Political and Social Research (ICPSR) at the University of Michigan, HMCA preserves and disseminates data collected by selected research projects funded by the Foundation and facilitates secondary analyses of the data. The data collections in HMCA include surveys of health care professionals and organizations, investigations of access to medical care, surveys on substance abuse, and evaluations of innovative programs for the delivery of health care. Our goal is to increase understanding of health and health care in the United States through secondary analysis of RWJF-supported data collections.  
 
* [http://www.icpsr.umich.edu/icpsrweb/HMCA/index.jsp The Health and Medical Care Archive (HMCA)] is the data archive of the Robert Wood Johnson Foundation (RWJF), the largest philanthropy devoted exclusively to health and health care in the United States. Operated by the Inter-university Consortium for Political and Social Research (ICPSR) at the University of Michigan, HMCA preserves and disseminates data collected by selected research projects funded by the Foundation and facilitates secondary analyses of the data. The data collections in HMCA include surveys of health care professionals and organizations, investigations of access to medical care, surveys on substance abuse, and evaluations of innovative programs for the delivery of health care. Our goal is to increase understanding of health and health care in the United States through secondary analysis of RWJF-supported data collections.  
Line 247: Line 356:
  
 
* [http://www.mmmp.org/MMMP/ Melanoma Molecular Map Project].  On melanoma biology and treatment.
 
* [http://www.mmmp.org/MMMP/ Melanoma Molecular Map Project].  On melanoma biology and treatment.
 +
 +
* [National Addiction & HIV Data Archive Program (NAHDAP)]([https://perma.cc/HSC9-S2RB perma.cc]). The scope of the data housed at NAHDAP covers a wide range of legal and illicit drugs (alcohol, tobacco, marijuana, cocaine, synthetic drugs, and others) and the trajectories, patterns, and consequences of drug use as well as related predictors and outcomes.
  
 
* [http://www.ncbi.nlm.nih.gov/guide/all/#all_/ National Center for Biotechnology Information (NCBI)] The National Center for Biotechnology Information advances science and health by providing access to biomedical and genomic information.
 
* [http://www.ncbi.nlm.nih.gov/guide/all/#all_/ National Center for Biotechnology Information (NCBI)] The National Center for Biotechnology Information advances science and health by providing access to biomedical and genomic information.
 +
 +
* [https://nda.nih.gov/ National Database for Autism Research (NDAR)]([https://perma.cc/7FP8-8JYW perma.cc]). The National Institute of Mental Health Data Archive (NDA) makes available human subjects data collected from hundreds of research projects across many scientific domains.
  
 
* [http://www.nitrc.org/ir/ Neuroimaging Informatics Tools and Resources Clearinghouse Image Repository (NITRC-IR)]
 
* [http://www.nitrc.org/ir/ Neuroimaging Informatics Tools and Resources Clearinghouse Image Repository (NITRC-IR)]
  
 
* [http://neuromorpho.org/neuroMorpho/index.jsp NeuroMorpho].  Neuronal morphology data.  From the [http://krasnow.gmu.edu/ Krasnow Institute for Advanced Study] at [http://www.gmu.edu/ George Mason University].
 
* [http://neuromorpho.org/neuroMorpho/index.jsp NeuroMorpho].  Neuronal morphology data.  From the [http://krasnow.gmu.edu/ Krasnow Institute for Advanced Study] at [http://www.gmu.edu/ George Mason University].
 +
 +
* [https://octopus.zoo.ox.ac.uk/beta/ Ocean Tool for Public Understanding and Science]([https://perma.cc/Z59D-C2YS perma.cc]). OcToPUS relies on established free and open-source geospatial technology to provide interactive access to dynamically updated, multi-dimensional data on the marine environment.
 +
 +
* [https://opentrials.net/ OpenTrials]. OpenTrials is a repository of clinical trial data hosted by [https://okfn.org/ Open Knowledge International].
  
 
* [https://www.projectdatasphere.org Project Data Sphere, LLC,] is a repository to broadly share, integrate and analyze historical, de-identified, patient-level data from academic and industry cancer Phase II-III clinical trials.  Access to the Project Data Sphere platform is available to researchers affiliated with life science companies, hospitals and institutions, as well as independent researchers, at no cost and without requiring a research proposal.
 
* [https://www.projectdatasphere.org Project Data Sphere, LLC,] is a repository to broadly share, integrate and analyze historical, de-identified, patient-level data from academic and industry cancer Phase II-III clinical trials.  Access to the Project Data Sphere platform is available to researchers affiliated with life science companies, hospitals and institutions, as well as independent researchers, at no cost and without requiring a research proposal.
 +
 +
* [https://www.smir.ch/ SICAS Medical Image Repository]([https://perma.cc/DMH2-EZQV perma.cc]). A place to store medical research data.
 +
 +
* [https://vivli.org/ Vivli]([https://perma.cc/2TSK-9EFB perma.cc]). From the Center for Global Clinical Research Data. The Vivli platform includes an independent data repository, in-depth search engine and a secure research environment.
  
 
== Multidisciplinary repositories ==
 
== Multidisciplinary repositories ==
Line 266: Line 387:
  
 
* [http://www.datacite.org.s3-website-eu-west-1.amazonaws.com/index.html DataCite] ([https://perma.cc/U3GN-AYBU perma.cc]).  DataCite is a leading global non-profit organisation that provides persistent identifiers (DOIs) for research data and other research outputs.
 
* [http://www.datacite.org.s3-website-eu-west-1.amazonaws.com/index.html DataCite] ([https://perma.cc/U3GN-AYBU perma.cc]).  DataCite is a leading global non-profit organisation that provides persistent identifiers (DOIs) for research data and other research outputs.
 +
 +
* [http://dataconservancy.org/community/ Data Conservancy]([https://perma.cc/J99T-HLC7 perma.cc]). Data Conservancy is devoted to developing institutional solutions for the challenges of data collection, preservation and re-use.
  
 
* [https://datahub.io/ DataHub]([https://perma.cc/6HF6-6RTL perma.cc]). There are thousands of datasets from financial market data and population growth to cryptocurrency prices.  
 
* [https://datahub.io/ DataHub]([https://perma.cc/6HF6-6RTL perma.cc]). There are thousands of datasets from financial market data and population growth to cryptocurrency prices.  
Line 290: Line 413:
  
 
* [http://kpbc.umk.pl/dlibra KPBC]. Regional academic repository for data in all fields. Poland
 
* [http://kpbc.umk.pl/dlibra KPBC]. Regional academic repository for data in all fields. Poland
 +
 +
* [https://msropendata.com/ Microsoft Research Open Data]([https://perma.cc/F59P-ANLM perma.cc]). A collection of free datasets from Microsoft Research to advance state-of-the-art research in areas such as natural language processing, computer vision, and domain specific sciences. Download or copy directly to a cloud-based Data Science Virtual Machine for a seamless development experience.
  
 
* [http://www.occ-data.org/ Open Commons Consortium (OCC)].  The OCC is a not for profit that manages and operates cloud computing and data commons infrastructure to support scientific, medical, health care and environmental research. OCC members span the globe and include over 30 universities, companies, government agencies and national laboratories.   
 
* [http://www.occ-data.org/ Open Commons Consortium (OCC)].  The OCC is a not for profit that manages and operates cloud computing and data commons infrastructure to support scientific, medical, health care and environmental research. OCC members span the globe and include over 30 universities, companies, government agencies and national laboratories.   
Line 304: Line 429:
  
 
* [http://www.science3point0.com/opendata/index.php Science 3.0 Open Data].  A repository for RDF datasets in the public domain, in any field.  From [http://www.science3point0.com/ Science 3.0].
 
* [http://www.science3point0.com/opendata/index.php Science 3.0 Open Data].  A repository for RDF datasets in the public domain, in any field.  From [http://www.science3point0.com/ Science 3.0].
 +
 +
* [https://figshare.com/articles/Scientific_Data_recommended_repositories_June_2015/1434640 Scientific Data recommended repositories]([https://perma.cc/FLG9-BB7Z perma.cc]). Spreadsheet listing data repositories that are recommended by Scientific Data (Springer Nature) as being suitable for hosting data associated with peer-reviewed articles. Please see the repository list on Scientific Data's website for the most up to date list.
 +
 +
* [http://site.uit.no/trolling/about/ Tromsø Repository of Language and Linguistics (TROLLing)]([https://perma.cc/9BCZ-J42Y perma.cc]). TROLLing is designed as an archive of linguistic data and statistical code. The archive is open access, which means that all information is available to to everyone. All postings are accompanied by searchable metadata that identify the researchers, the languages and linguistic phenomena involved, the statistical methods applied, and scholarly publications based on the data (where relevant).
  
 
* [http://repository.usu.ac.id/ USU Repository] University of Sumatera Utara, Medan, Indonesia.
 
* [http://repository.usu.ac.id/ USU Repository] University of Sumatera Utara, Medan, Indonesia.
Line 330: Line 459:
  
 
* ''Also see'' Multidisciplinary repositories.
 
* ''Also see'' Multidisciplinary repositories.
 +
 +
* [https://archaeologydataservice.ac.uk/ Archeology Data Service]([https://perma.cc/A9KH-N3EZ perma.cc]). Heritage data, with over 20 years of experience supporting research, learning and teaching with free, high quality and dependable digital resources.
  
 
* [http://www.thearda.com/about/ Association of Religion Data Archives] Coverage includes international surveys, U.S. church membership data, and U.S. Surveys.
 
* [http://www.thearda.com/about/ Association of Religion Data Archives] Coverage includes international surveys, U.S. church membership data, and U.S. Surveys.
Line 342: Line 473:
  
 
* [http://www.esds.ac.uk/ Economic and Social Science Data Service]. From the [http://www.data-archive.ac.uk/ UK Data Archive (UKDA)] and [http://www.iser.essex.ac.uk/ Institute for Social and Economic Research (ISER)], University of Essex; [http://www.mimas.ac.uk/ Manchester Information and Associated Services (MIMAS)], and the [http://www.ccsr.ac.uk/ Cathie Marsh Centre for Census and Survey Research (CCSR)], University of Manchester. Access to data requires registration.
 
* [http://www.esds.ac.uk/ Economic and Social Science Data Service]. From the [http://www.data-archive.ac.uk/ UK Data Archive (UKDA)] and [http://www.iser.essex.ac.uk/ Institute for Social and Economic Research (ISER)], University of Essex; [http://www.mimas.ac.uk/ Manchester Information and Associated Services (MIMAS)], and the [http://www.ccsr.ac.uk/ Cathie Marsh Centre for Census and Survey Research (CCSR)], University of Manchester. Access to data requires registration.
 +
 +
* [https://www.ebi.ac.uk/ena European Nucleotide Archive]([https://perma.cc/ZE72-PECM perma.cc]). The European Nucleotide Archive (ENA) provides a comprehensive record of the world's nucleotide sequencing information, covering raw sequencing data, sequence assembly information and functional annotation.
  
 
* [http://www.icpsr.umich.edu/icpsrweb/ICPSR/ ICPSR] (Inter-University Consortium for Political and Social Research).  At the University of Michigan.
 
* [http://www.icpsr.umich.edu/icpsrweb/ICPSR/ ICPSR] (Inter-University Consortium for Political and Social Research).  At the University of Michigan.
  
 
* [http://www.icpsr.umich.edu/icpsrweb/NACJD/index.jsp National Archive of Criminal Justice Data] holds over 700 data collections relating to criminal justice.
 
* [http://www.icpsr.umich.edu/icpsrweb/NACJD/index.jsp National Archive of Criminal Justice Data] holds over 700 data collections relating to criminal justice.
 +
 +
* [http://nomad-repository.eu/ NOMAD Repository]([https://perma.cc/SU9A-YTRQ perma.cc]). Host, organize, and share materials data.
  
 
* [https://www.openicpsr.org/openicpsr/ openICPRS]([https://perma.cc/2QEZ-84DS perma.cc]). openICPSR is a great place to share and store your social and behavioral science research data. Your data will be preserved as-is and be available to data users at no cost.  
 
* [https://www.openicpsr.org/openicpsr/ openICPRS]([https://perma.cc/2QEZ-84DS perma.cc]). openICPSR is a great place to share and store your social and behavioral science research data. Your data will be preserved as-is and be available to data users at no cost.  

Latest revision as of 12:28, 20 May 2020

Oad2.jpeg This list is part of the Open Access Directory.

  • This is a list of repositories and databases for open data.
  • Please annotate the entries to indicate the hosting organization, scope, licensing, and usage restrictions (if any). If a repository is open in some respects but not others, please include it with an annotation rather than exclude it.
  • If you're not sure whether a given dataset or data collection is open, post your query to Is It Open Data?
  • Related lists in OAD: Disciplinary repositories (primarily for texts, not data).
  • For news about data repositories, including some newly launched repositories not yet listed here, follow the oa.repositories.data tag of the Open Access Tracking Project.
  • See also:

Archaeology

  • Also see Social sciences.
  • Fasti Online . Subdivided in Excavation, Restauration and Survey.

Astronomy

  • Also see Physics.
  • SIMBAD Astronomical Database(perma.cc). The SIMBAD astronomical database provides basic data, cross-identifications, bibliography and measurements for astronomical objects outside the solar system.

Biology

  • Also see BCO-DMO, Marine Biology data, listed with Marine Sciences repositories.
  • Also see DataONE, Entrez databases, KNB, and PANGAEA, listed under Multidisciplinary repositories.
  • Array Express(perma.cc). Archive of Functional Genomics Data stores data from high-throughput functional genomics experiments, and provides these data for reuse to the research community.
  • BioModels(perma.cc). BioModels is a repository of mathematical models of biological and biomedical systems.
  • Cancer Imaging Archive(perma.cc). TCIA is a service which de-identifies and hosts a large archive of medical images of cancer accessible for public download.
  • The Cell: An Image Library Images of all cell types from all organisms, including intracellular structures and movies or animations demonstrating functions. This project relies upon the cell biology community to populate the library. The Cell: An Image Library™ is a freely accessible, easy-to-search, public repository of reviewed and annotated images, videos, and animations of cells from a variety of organisms, showcasing cell architecture, intracellular functionalities, and both normal and abnormal processes. The purpose of this database is to advance research, education, and training, with the ultimate goal of improving human health.
  • Database of Virulence Factors in Fungal Pathogenes (DFVF)(perma.cc). The database is expected to greatly stimulate and facilitate further studies in fungal pathogens; both experimental biologists and computational biologists can use the database and/or the predicted virulence factors to guide their search for new virulence factors and/or discovery of new pathogen-host interaction mechanisms in fungi.
  • dbGaP(perma.cc). The database of Genotypes and Phenotypes (dbGaP) was developed to archive and distribute the data and results from studies that have investigated the interaction of genotype and phenotype in Humans.
  • dpSNP(perma.cc). dbSNP contains human single nucleotide variations, microsatellites, and small-scale insertions and deletions along with publication, population frequency, molecular consequence, and genomic and RefSeq mapping information for both common variations and clinical mutations.
  • dbVar(perma.cc). dbVar is NCBI's database of human genomic Structural Variation — large variants >50 bp including insertions, deletions, duplications, inversions, mobile elements, translocations, and complex variants.
  • Dryad Dryad is an international repository of data underlying scientific and medical publications, particularly data for which no specialized repository exists. All material in Dryad is associated with a scholarly publication. Most data in the repository are associated with peer-reviewed articles, although data associated with non-peer reviewed publications from reputable academic sources, such as dissertations, are also accepted. Dryad is a non-profit organization.
  • Electron Microscopy Data Bank (EMDB)(perma.cc). Global resource for 3-Dimensional Electron Microscopy (3DEM) structure data archiving and retrieval, news, events, software tools, data standards, validation methods, and community challenges.
  • Eukaryotic Pathogen Database Resources (EuPathDB)(perma.cc). EuPathDB (formerly ApiDB) is an integrated database covering the eukaryotic pathogens in the genera listed in the [EuPathDB (formerly ApiDB) is an integrated database covering the eukaryotic pathogens in the genera listed in our Data Summary page Data Summary] page.
  • The European Genome-phenome Archive (EGA)(perma.cc). The European Genome-phenome Archive (EGA) is a service for permanent archiving and sharing of all types of personally identifiable genetic and phenotypic data resulting from biomedical research projects.
  • Gene Expression Omnibus High-throughput functional genomic data, including all array-based applications and some high-throughput sequencing data.
  • Human Protein Atlas(perma.cc). All the data in the knowledge resource is open access to allow scientists both in academia and industry to freely access the data for exploration of the human proteome.
  • ImmPort Shared Data(perma.cc). The ImmPort project provides advanced information technology support in the archiving and exchange of scientific data for the diverse community of life science researchers supported by NIAID/DAIT and serves as a long-term, sustainable archive of research and clinical data.
  • Influenza Research Database(perma.cc). This resource contains avian and non-human mammalian influenza surveillance data, human clinical data associated with virus extracts, phenotypic characteristics of viruses isolated from extracts, and all genomic and proteomic data available in public repositories for influenza viruses.
  • KiMoSys(perma.cc). A web application for quantitative KInetic MOdels of biological SYStems.
  • MetaboLights(perma.cc). MetaboLights is a database for Metabolomics experiments and derived information.
  • MGnify(perma.cc). MGnify offers an automated pipeline for the analysis and archiving of microbiome data to help determine the taxonomic diversity and functional & metabolic potential of environmental samples.
  • Mouse Genome Informatics (MGI)(perma.cc). MGI is the international database resource for the laboratory mouse, providing integrated genetic, genomic, and biological data to facilitate the study of human health and disease.
  • National Biological Information Infrastructure A broad, collaborative program to provide increased access to data and information on the nation's biological resources. The NBII links diverse, high-quality biological databases, information products, and analytical tools maintained by NBII partners and other contributors in government agencies, academic institutions, non-government organizations, and private industry. (Note: In the President's budget for Fiscal Year 2012 the repository was terminated.)
  • NCBI Taxonomy(perma.cc). The Taxonomy Database is a curated classification and nomenclature for all of the organisms in the public sequence databases.
  • The Network Data Exchange (NDEx)(perma.cc). The NDEx Project provides an open-source framework where scientists and organizations can share, store, manipulate, and publish biological network knowledge.
  • NeuroMorpho.org(perma.cc). NeuroMorpho.Org is a centrally curated inventory of digitally reconstructed neurons associated with peer-reviewed publications.
  • OpenNEURO(perma.cc). A free and open platform for sharing MRI, MEG, EEG, iEEG, and ECoG data.
  • PeptideAtlas(perma.cc). A multi-organism, publicly accessible compendium of peptides identified in a large set of tandem mass spectrometry proteomics experiments.
  • Planet A network of European Plant Databases.
  • Protein Circular Dichroism Data Bank (PCDDB)(perma.cc). The Protein Circular Dichroism Data Bank (PCDDB) is a public repository that archives and freely distributes circular dichroism (CD) and synchrotron radiation CD (SRCD) spectral data and their associated experimental metadata.
  • ProteomeXChange(perma.cc). The ProteomeXchange Consortium was established to provide globally coordinated standard data submission and dissemination pipelines involving the main proteomics repositories, and to encourage open data policies in the field.
  • The Universal Protein Resource (UniProt) is a comprehensive resource for protein sequence and annotation data. The UniProt databases are the UniProt Knowledgebase (UniProtKB), the UniProt Reference Clusters (UniRef), and the UniProt Archive (UniParc). The UniProt Metagenomic and Environmental Sequences (UniMES) database is a repository specifically developed for metagenomic and environmental data.
  • uBio(perma.cc). uBio uses names and taxonomic intelligence to manage information about organisms.
  • VectorBase(perma.cc). A National Institute of Allergy and Infectious Diseases (NIAID) Bioinformatics Resource Center (BRC) providing genomic, phenotypic and population-centric data to the scientific community for invertebrate vectors of human pathogens.

Chemistry

  • Also see BCO-DMO, Marine Biology data, listed with Marine Sciences repositories.
  • Also see Entrez databases, listed under Multidisciplinary repositories.
  • Cambridge Structural Database The CCDC is a non-profit, charitable Institution whose objectives are the general advancement and promotion of the science of chemistry and crystallography for the public benefit.
  • ChemSpider. Hosted by the Royal Society of Chemistry.
  • ChemSynthesis. A database of chemicals and their physical properties.
  • eCrystals. From the Southampton Chemical Crystallography Group and the EPSRC UK National Crystallography Service.

Computer Science

  • CiteSeerX provides its databases of nearly 2 million documents and the associated texts and pdfs for research.
  • FreeStatistics of Irreproducible Research(perma.cc). The purpose of this project is to facilitate the creation, maintenance, and permanent storage of statistical computation objects that empower authors to publish reproducible and reusable research (in the form of a Compendium) through a series of web services.
  • GitHub keeps your public and private code available, secure, and backed up.
  • Google Code Project Hosting Project Hosting on Google Code provides a free collaborative development environment for open source projects. Each project comes with its own member controls, Subversion/Mercurial repository, issue tracker, wiki pages, and downloads section. Our project hosting service is simple, fast, reliable, and scalable, so that you can focus on your own open source development.
  • Launchpad can host your project’s source code using the Bazaar version control system. We also import over 2000 CVS, SVN, Git and Mercurial projects, so you can use Bazaar with those too.
  • ReproZip!(perma.cc). ReproZip can automatically pack your research along with all necessary data files, libraries, environment variables and options into a self-contained bundle. Then ReproZip can use that bundle to automatically set up the same original environment so anybody can reproduce the research on a different machine, without tracking down and installing the dependencies, or even having to run the same operating system.
  • SourceForge 2.7 million developers create powerful software in over 260,000 projects. Our popular directory connects more than 46 million consumers with these open source projects and serves more than 2,000,000 downloads a day. SourceForge is where open source happens.
  • SNAP Stanford Large Network Dataset Collection. The SNAP library is being actively developed since 2004 and is organically growing as a result of our research pursuits in analysis of large social and information networks. Largest network we analyzed so far using the library was the Microsoft Instant Messenger network from 2006 with 240 million nodes and 1.3 billion edges.
  • KONECT (the Koblenz Network Collection) is a project to collect large network datasets of all types in order to perform research in network science and related fields, collected by the Institute of Web Science and Technologies at the University of Koblenz–Landau.

Energy

Engineering

  • Also see Multidisciplinary repositories.
  • TRID(perma.cc). an integrated database that combines the records from TRB’s Transportation Research Information Services (TRIS) Database and the OECD’s Joint Transport Research Centre’s International Transport Research Documentation (ITRD) Database. TRID provides access to more than 1.25 million records of transportation research worldwide.

Environmental sciences

  • Also see BCO-DMO, Marine Biology data, listed with Marine Sciences repositories.
  • Also see DataONE, KNB, and PANGAEA, listed under Multidisciplinary repositories.
  • Also see Dryad, listed with Biology repositories.
  • EDI Data Portal(perma.cc). The EDI Data Portal contains environmental and ecological data packages contributed by a number of participating organizations.
  • The Marine Geoscience Data System (MGDS)(perma.cc). The Marine Geoscience Data System (MGDS) provides access to data portals for the NSF-supported Ridge 2000 and MARGINS programs, the Antarctic and Southern Ocean Data Synthesis, the Global Multi-Resolution Topography Synthesis, and Seismic Reflection Field Data Portal.
  • NERC Data Centers(perma.cc). NERC has a network of environmental data centres that provide a focal point for NERC's scientific data and information. These centres hold data from environmental scientists working in the UK and around the world.
  • Polar Data Catalogue A primarily Canadian archive of free RADARSAT imagery as well as Arctic, Antarctic, and other cryospheric data sets covering a range of disciplines, from natural sciences and policy to health and social sciences.
  • Socioeconomic Data and Applications Center (SEDAC) specializes in spatial data and services in support of human-environment research and applications, in the context of NASA’s Earth science mission and the overall U.S. Global Change Research Program.

Geology

  • Also see PANGAEA, listed under Multidisciplinary repositories.
  • IRIS (Incorporated Research Institutions for Seismology). From 100+ US universities and the National Science Foundation.

Geosciences and geospatial data

  • Also see DataONE and PANGAEA, listed under Multidisciplinary repositories.
  • EarthChem Library(perma.cc). The EarthChem Library is a data repository that archives, publishes and makes accessible data and other digital content from geoscience research (analytical data, data syntheses, models, technical reports, etc).
  • GeoNames. A database of placenames, under a CC-BY license. Founded by Marc Wick.
  • The Geosciences Network (GEON) project is a collaboration among a dozen PI institutions and a number of other partner projects, institutions, and agencies to develop cyberinfrastructure in support of an environment for integrative geoscience research. GEON is funded by the NSF Information Technology Research (ITR) program.
  • Magnetics Information Consortium (MagIC)(perma.cc). Improves research capacity in the Earth and Ocean sciences by maintaining an open community digital data archive for rock and paleomagnetic data with portals that allow users access to archive, search, visualize, download, and combine these versioned datasets.
  • National Space Science Data Center serves as the permanent archive for NASA space science mission data. "Space science" means astronomy and astrophysics, solar and space plasma physics, and planetary and lunar science. As permanent archive, NSSDC teams with NASA's discipline-specific space science "active archives" which provide access to data to researchers and, in some cases, to the general public.
  • OpenTopography(perma.cc). OpenTopography facilitates community access to high-resolution, Earth science-oriented, topography data, and related tools and resources.
  • Polar Data Catalogue A primarily Canadian archive of free RADARSAT imagery as well as Arctic, Antarctic, and other cryospheric data sets covering a range of disciplines, from natural sciences and policy to health and social sciences.
  • ShareGeo. Integrating the older GRADE (Geospatial Repository for Academic Deposit and Extraction) repository. From EDINA. (Repository discontinued.)

Linguistics

  • See the 40+ members of the Open Language Archives Community (OLAC).
  • TROLLing. Hosted by UiT. TROLLing "is designed as an archive of linguistic data and statistical code. The archive is open access, which means that all information is available to to everyone. All postings are accompanied by searchable metadata that identify the researchers, the languages and linguistic phenomena involved, the statistical methods applied, and scholarly publications based on the data (where relevant). Linguists worldwide are invited to post datasets and statistical models used in linguistic research."

Marine sciences

  • Also see DataONE and PANGAEA, listed under Multidisciplinary repositories.
  • BCO-DMO. The Biological and Chemical Oceanography Data Management Office, provides access to data sets contributed by investigators funded by the Biological and Chemical Oceanography sections of the US National Science Foundation (NSF).
  • SEAONE - Sea Open Scientific Data Publication(perma.cc). SEANOE (SEA scieNtific Open data Edition) is a publisher of scientific data in the field of marine sciences. Data published by SEANOE are available free. They can be used in accordance with the terms of the Creative Commons license selected by the author of data.

Medicine

  • Also see Entrez databases, listed under Multidisciplinary repositories.
  • caNanoLab(perma.cc) A data sharing portal designed to facilitate information sharing across the international biomedical nanotechnology research community to expedite and validate the use of nanotechnology in biomedicine.
  • [ebi.ac.uk/chembl/ CheMBL]. ChEMBL is a manually curated database of bioactive molecules with drug-like properties. It brings together chemical, bioactivity and genomic data to aid the translation of genomic information into effective new drugs.
  • Dryad Dryad is an international repository of data underlying scientific and medical publications, particularly data for which no specialized repository exists. All material in Dryad is associated with a scholarly publication. Most data in the repository are associated with peer-reviewed articles, although data associated with non-peer reviewed publications from reputable academic sources, such as dissertations, are also accepted. Dryad is a non-profit organization.
  • FlowRepository(perma.cc). FlowRepository is a database of flow cytometry experiments where you can query and download data collected and annotated according to the MIFlowCyt standard.
  • The Health and Medical Care Archive (HMCA) is the data archive of the Robert Wood Johnson Foundation (RWJF), the largest philanthropy devoted exclusively to health and health care in the United States. Operated by the Inter-university Consortium for Political and Social Research (ICPSR) at the University of Michigan, HMCA preserves and disseminates data collected by selected research projects funded by the Foundation and facilitates secondary analyses of the data. The data collections in HMCA include surveys of health care professionals and organizations, investigations of access to medical care, surveys on substance abuse, and evaluations of innovative programs for the delivery of health care. Our goal is to increase understanding of health and health care in the United States through secondary analysis of RWJF-supported data collections.
  • MIRAGE (Middlesex medical Image Repository with a CBIR ArchivinG Environment). From JISC and Middlesex University.
  • [National Addiction & HIV Data Archive Program (NAHDAP)](perma.cc). The scope of the data housed at NAHDAP covers a wide range of legal and illicit drugs (alcohol, tobacco, marijuana, cocaine, synthetic drugs, and others) and the trajectories, patterns, and consequences of drug use as well as related predictors and outcomes.
  • Project Data Sphere, LLC, is a repository to broadly share, integrate and analyze historical, de-identified, patient-level data from academic and industry cancer Phase II-III clinical trials. Access to the Project Data Sphere platform is available to researchers affiliated with life science companies, hospitals and institutions, as well as independent researchers, at no cost and without requiring a research proposal.
  • Vivli(perma.cc). From the Center for Global Clinical Research Data. The Vivli platform includes an independent data repository, in-depth search engine and a secure research environment.

Multidisciplinary repositories

  • Also see Social Sciences.
  • Also see BCO-DMO, Marine Biology data, listed with Marine Sciences repositories.
  • DataCite (perma.cc). DataCite is a leading global non-profit organisation that provides persistent identifiers (DOIs) for research data and other research outputs.
  • Data Conservancy(perma.cc). Data Conservancy is devoted to developing institutional solutions for the challenges of data collection, preservation and re-use.
  • DataHub(perma.cc). There are thousands of datasets from financial market data and population growth to cryptocurrency prices.
  • DataONE DataONE is an international federation of data repositories containing earth observations data, including data from fields such as ecology, biology, evolution, and environmental sciences such as hydrology, oceanography, and atmospheric science. DataONE is a federation with participation from hundreds of field stations, universities, and government agencies through the DataONE Member Nodes.
  • Dryad Dryad is an international repository of data underlying scientific and medical publications, particularly data for which no specialized repository exists. All material in Dryad is associated with a scholarly publication. Most data in the repository are associated with peer-reviewed articles, although data associated with non-peer reviewed publications from reputable academic sources, such as dissertations, are also accepted. Dryad is a non-profit organization.
  • EASY(perma.cc). EASY offers sustainable archiving of research data and access to thousands of datasets.
  • EUDAT(perma.cc). EUDAT offers heterogeneous research data management services and storage resources, supporting multiple research communities as well as individuals, through a geographically distributed, resilient network distributed across 15 European nations and data is stored alongside some of Europe’s most powerful supercomputers.
  • FigShare. Scientific publishing as it stands is an inefficient way to do science on a global scale. A lot of time and money is being wasted by groups around the world duplicating research that has already been carried out. FigShare allows you to share all of your data, negative results and unpublished figures. In doing this, other researchers will not duplicate the work, but instead may publish with your previously wasted figures, or offer collaboration opportunities and feedback on preprint figures.
  • KPBC. Regional academic repository for data in all fields. Poland
  • Microsoft Research Open Data(perma.cc). A collection of free datasets from Microsoft Research to advance state-of-the-art research in areas such as natural language processing, computer vision, and domain specific sciences. Download or copy directly to a cloud-based Data Science Virtual Machine for a seamless development experience.
  • Open Commons Consortium (OCC). The OCC is a not for profit that manages and operates cloud computing and data commons infrastructure to support scientific, medical, health care and environmental research. OCC members span the globe and include over 30 universities, companies, government agencies and national laboratories.
  • Open Science Data Cloud (OSDC). The OSDC is a data science ecosystem in which researchers can house and share their own scientific data, access complementary public datasets, build and share customized virtual machines with whatever tools necessary to analyze their data, and perform the analysis to answer their research questions. It is a one-stop shop for making scientific research faster and easier.
  • Open Science Framework (OSF) Open Science Framework serves as a scholarly commons for documentation, files, collaboration, and connecting to services for research outputs.
  • Scientific Data recommended repositories(perma.cc). Spreadsheet listing data repositories that are recommended by Scientific Data (Springer Nature) as being suitable for hosting data associated with peer-reviewed articles. Please see the repository list on Scientific Data's website for the most up to date list.
  • Tromsø Repository of Language and Linguistics (TROLLing)(perma.cc). TROLLing is designed as an archive of linguistic data and statistical code. The archive is open access, which means that all information is available to to everyone. All postings are accompanied by searchable metadata that identify the researchers, the languages and linguistic phenomena involved, the statistical methods applied, and scholarly publications based on the data (where relevant).
  • UPSpace University of Pretoria Research Repository, South Africa.
  • Webscope(perma.cc). The Yahoo Webscope Program is a reference library of interesting and scientifically useful datasets for non-commercial use by academics and other scientists. All datasets have been reviewed to conform to Yahoo's data protection standards, including strict controls on privacy.
  • Zenodo(perma.cc). All research outputs from across all fields of research.

Physics

  • Also see Astronomy.
  • HEP Data The data comprise total and differential cross sections, structure functions, fragmentation functions, distribuitions of jet measures, polarisations, etc... from a wide range of interactions.
  • Nist Atomic Spectra Database The Atomic Spectra Database (ASD) contains data for radiative transitions and energy levels in atoms and atomic ions. Data are included for observed transitions of 99 elements and energy levels of 56 elements.

Social sciences

  • Also see Multidisciplinary repositories.
  • Archeology Data Service(perma.cc). Heritage data, with over 20 years of experience supporting research, learning and teaching with free, high quality and dependable digital resources.
  • Databrary A repository for sharing and reusing research video data and related metadata in the developmental and learning sciences. Hosted at New York University with support from The Pennsylvania State University.
  • European Nucleotide Archive(perma.cc). The European Nucleotide Archive (ENA) provides a comprehensive record of the world's nucleotide sequencing information, covering raw sequencing data, sequence assembly information and functional annotation.
  • ICPSR (Inter-University Consortium for Political and Social Research). At the University of Michigan.
  • openICPRS(perma.cc). openICPSR is a great place to share and store your social and behavioral science research data. Your data will be preserved as-is and be available to data users at no cost.
  • Qualitative Data Repository(perma.cc). QDR curates, stores, preserves, publishes, and enables the download of digital data generated through qualitative and multi-method research in the social sciences.