Datasets for Data Science, Machine Learning, AI & Analytics

Welcome to Data Impala’s Dataset Collection — a handpicked archive of open datasets aimed at helping learners and explorers sharpen their skills in data science, analytics, and intelligent systems.

This space is designed to support those who want to get hands-on with real data — whether you’re experimenting with models, running your first analysis, or exploring patterns hidden in the numbers.

This collection grows over time, so you’re encouraged to check back whenever you’re looking for something new to work with.

Data Repository


Agricultural Dataset

1. Iris Flower Dataset

  • What it is: Classic dataset containing measurements of 150 iris flowers from 3 species.
  • Great for: Beginners learning classification, visualization, and clustering.

Disaster Dataset

1. Titanic Survival Dataset

  • What it is: Data on passengers aboard the Titanic, including who survived.
  • Great for: Logistic regression, classification problems.

Business, Finance & E-Commerce

1. Online Retail Dataset

  • What it is: Transaction data for a UK-based online store.
  • Great for: Market basket analysis, clustering, RFM segmentation.

2. Credit Card Fraud Detection

  • What it is: Anonymized credit card transactions labeled as fraudulent or not.
  • Great for: Anomaly detection, classification tasks.

Government & Open Access Data Portals

1. Data.gov (USA)

  • What it is: Huge collection of U.S. government datasets across industries.
  • Great for: Research on energy, health, finance, weather, etc.

2. EU Open Data Portal

  • What it is: Official datasets from the European Union.
  • Great for: European policies, transport, energy, and population studies.

Health & Medical Data

1. Heart Disease UCI Dataset

  • What it is: Patient data used to predict heart disease.
  • Great for: Classification models in healthcare.

2. Breast Cancer Wisconsin Dataset

  • What it is: Features of cell nuclei from biopsies for cancer classification.
  • Great for: Binary classification, healthcare AI.

Real Estate, Housing, and Environment

1. Ames Housing Dataset

  • What it is: Detailed information about houses in Ames, Iowa.
  • Great for: Regression, price prediction, data cleaning practice.

Study and Research Dataset

1. Student Performance Dataset

  • What it is: Academic performance of students, with demographic and school data.
  • Great for: Regression, correlation analysis, education-related models.

Social, Economic & Demographic Data

1. World Bank Open Data

  • What it is: Global economic, social, health, and development indicators.
  • Great for: Time series, global comparisons, real-world analytics.

2. UNICEF Data

  • What it is: Global data on child health, education, nutrition, and protection.
  • Great for: Social research, public policy, humanitarian analysis.

3. U.S. Census Data

  • What it is: Demographic and population statistics from the U.S. Census Bureau.
  • Great for: Urban planning, marketing, demographic analysis.

Miscellaneous

1. Wine Quality Dataset

  • What it is: Chemical properties and quality scores for red and white wine samples.
  • Great for: Regression, classification, and EDA (Exploratory Data Analysis).

Scroll to Top