In today’s world, scientists in many disciplines and a growing number of journalists live and breathe data. There are many thousands of data repositories on the web, providing access to millions of datasets; and local and national governments around the world publish their data as well. To enable easy access to this data, we launched Dataset Search, so that scientists, data journalists, data geeks, or anyone else can find the data required for their work and their stories, or simply to satisfy their intellectual curiosity.
Similar to how Google Scholar works, Dataset Search lets you find datasets wherever they’re hosted, whether it’s a publisher’s site, a digital library, or an author’s personal web page. To create Dataset search, we developed guidelines for dataset providers to describe their data in a way that Google (and other search engines) can better understand the content of their pages. These guidelines include salient information about datasets: who created the dataset, when it was published, how the data was collected, what the terms are for using the data, etc. We then collect and link this information, analyze where different versions of the same dataset might be, and find publications that may be describing or discussing the dataset. Our approach is based on an open standard for describing this information (schema.org) and anybody who publishes data can describe their dataset this way. We encourage dataset providers, large and small, to adopt this common standard so that all datasets are part of this robust ecosystem.
In this new release, you can find references to most datasets in environmental and social sciences, as well as data from other disciplines including government data and data provided by news organizations, such as ProPublica. As more data repositories use the schema.org standard to describe their datasets, the variety and coverage of datasets that users will find in Dataset Search, will continue to grow.
Dataset Search works in multiple languages with support for additional languages coming soon. Simply enter what you are looking for and we will help guide you to the published dataset on the repository provider’s site.
For example, if you wanted to analyze daily weather records, you might try this query in Dataset Search:
You’ll see data from NASA and NOAA, as well as from academic repositories such as Harvard’s Dataverse and Inter-university Consortium for Political and Social Research (ICPSR). Ed Kearns, Chief Data Officer at NOAA, is a strong supporter of this project and helped NOAA make many of their datasets searchable in this tool. “This type of search has long been the dream for many researchers in the open data and science communities” he said. “And for NOAA, whose mission includes the sharing of our data with others, this tool is key to making our data more accessible to an even wider community of users.”
This launch is one of a series of initiatives to bring datasets more prominently into our products. We recently made it easier to discover tabular data in Search, which uses this same metadata along with the linked tabular data to provide answers to queries directly in search results. While that initiative focused more on news organizations and data journalists, Dataset search can be useful to a much broader audience, whether you’re looking for scientific data, government data, or data provided by news organizations.
A search tool like this one is only as good as the metadata that data publishers are willing to provide. We hope to see many of you use the open standards to describe your data, enabling our users to find the data that they are looking for. If you publish data and don’t see it in the results, visit our instructions on our developers site which also includes a link to ask questions and provide feedback.