Many data companies believe that they have to create their own datasets in order to see the benefits of data analytics, but this is far from the truth. There are hundreds of thousands of free datasets on the internet that anyone can access completely free. These datasets can be useful for anyone who is looking to learn how to analyze data, create data visualizations, or just improve their data literacy skills.
In 2015, the United States Government pledged to make all government data available for free online. Data.gov allows you to search over 200 thousand datasets from a variety of sources and pertaining to many different topics. They offer datasets about Agriculture, Finances, Public Safety, Education, The Environment, Energy, and many other topics that span over a wide range of subjects.
With Google Trends, users are able to find search term data on any topic in the world. You can check how often people google your company, and you can even download the datasets for analysis in another program. Google offers a wide variety of filters, allowing you to narrow down your search by location, time ranges, categories, or even specific search types (ex. Image or video results).
Amazon Web Services Open Data Registry
Amazon offers just over 100 datasets for public use, covering a wide range of topics, such as an encyclopedia of DNA elements, Satellite data, and Trip data from Taxis and Limousines in New York City. Amazon also includes “usage examples” where they provide links to work that other organizations and groups have done with the data.
Just like the United States, The United Kingdom posts all of their data for public use free of charge. This is also the case with lots of other countries such as Singapore, Australia, and India. With so many countries offering their data to the public, it shouldn’t be hard to find a good data set to experiment with.
The Pew Research Center’s mission is to collect and analyze data from all over the world. They cover all sorts of topics like journalism, religion, politics, the economy, online privacy, social media, and demographic trends. They are nonprofit, nonpartisan and nonadvocacy. While they do their own work with the data they collect, they also offer it to the public for further analysis. To gain access to the data, all you need to do is register for a free account, and credit Pew Research Center as the source for the data.
Some members of r/datasets on Reddit have released a dataset of all comments on the site dating back to 2005. The datasets are categorized by year and are available to download for free by anyone and it could be a fun project to analyze the data and see what could be discovered about reddit commenters.
Another great source for datasets is Earthdata, which is a part of NASA’s Earth Science Data Systems Program. Its purpose is to process and record Earth Science data from Aircraft, Satellites, and field measurements.
UNICEF’s data page is a great source for data sets that relate to nutrition, development, education, diseases, gender equality, immunization and other issues relating to women and children. They have about 40 datasets open to the public.
National Climatic Data Center
The National Climatic Data Center is the largest archive of environmental data in the world. Here you can find an archive of weather and climate data sets from all around the United States. The National Climatic Data Center also has meteorological, geophysical, atmospheric, and oceanic data sets.