by Sayantani Sanyal
January 15, 2022
These platforms provide large volumes of information that can be used in data science projects in 2022.
Data science can be interpreted as different things for different roles. Technically the technology revolves around extracting knowledge and insights from data and information generated through various data science tools and applications. Its rising use has made several professionals and aspiring data science professionals create and participate in projects, assignments, including data visualization, data cleaning, and data science projects, along with several machine learning projects. Practicing these projects and assignments can help professionals ace their skills and excel in their careers. In this article, we have listed 10 platforms from where professionals can get datasets for their data science projects in 2022.
• Kaggle: Kaggle is a platform where professionals can learn, practice, and sharpen their data analytics and data science. The platform provides tons of data that are public and allows the users of the platform to share code so that they can learn the best practices within the data space.
• FiveThirtyEight: FiveThirtyEight is an interactive news and sports platform that has some incredible information for data visualization projects. The platform makes a lot of their data available to the public, which means they can download and use the information according to their own convenience.
• Google Dataset Search: Google Dataset Search is one of the most comprehensive dataset search engines that are available. It claims to hold more than 25 million online datasets and assists scientists and researchers in better locating datasets. It is armed with a function, which can sort data types, update dates, and so much more.
• Data.gov: Data.gov allows its users to download and explore data from multiple US government agencies. The information can range from government budgets to climate data. It is documented quite evidently so that it becomes easier for the users to navigate them.
• AWS Public Datasets: AWS Public Datasets allow the users to download the data and work with it on their individual devices. They can analyse the data in the cloud using EC2 and Hadoop via EMR. Amazon has a page that lists all the datasets for its users and also gives free access to all the new accounts.
• UCI Machine Learning Repository: UCI Machine Learning Repository is one of the oldest sources of datasets on the internet. The datasets are generally contributed by the users, and thus have varying levels of documentation and cleanliness. Users can download UCI Machine Learning Repository without any registration.
• Quandl: Quandl is a repository of economic and financial data. Most of this information is free, but some require purchasing. The platform is extremely useful for building models to predict economic indicators and stock prices.
• data.world: data.world describes itself as the social network for data professionals. It is a platform where they can search for copy, analyze, and download datasets. In addition to this, they can upload their data and use it to collaborate with others.
• Buzzfeed News: Buzzfeed provides datasets, analysis, libraries, tools, and guides that are used in the articles available on GitHub. It is a quite popular platform and is used by millions of data professionals.
• Academic Torrents: Academic Torrents is a new site that is geared around sharing the datasets from specific scientific papers. It is new to the dataset platform market and allows its users to browse data directly on the site.
Do the sharing thingy