Welcome to Data Impala’s Dataset Collection — a handpicked archive of open datasets aimed at helping learners and explorers sharpen their skills in data science, analytics, and intelligent systems.
This space is designed to support those who want to get hands-on with real data — whether you’re experimenting with models, running your first analysis, or exploring patterns hidden in the numbers.
This collection grows over time, so you’re encouraged to check back whenever you’re looking for something new to work with.
Table of Contents
Data Repository
Agricultural Dataset
- What it is: Classic dataset containing measurements of 150 iris flowers from 3 species.
- Great for: Beginners learning classification, visualization, and clustering.
Disaster Dataset
- What it is: Data on passengers aboard the Titanic, including who survived.
- Great for: Logistic regression, classification problems.
Business, Finance & E-Commerce
- What it is: Transaction data for a UK-based online store.
- Great for: Market basket analysis, clustering, RFM segmentation.
2. Credit Card Fraud Detection
- What it is: Anonymized credit card transactions labeled as fraudulent or not.
- Great for: Anomaly detection, classification tasks.
Government & Open Access Data Portals
- What it is: Huge collection of U.S. government datasets across industries.
- Great for: Research on energy, health, finance, weather, etc.
- What it is: Official datasets from the European Union.
- Great for: European policies, transport, energy, and population studies.
Health & Medical Data
- What it is: Patient data used to predict heart disease.
- Great for: Classification models in healthcare.
2. Breast Cancer Wisconsin Dataset
- What it is: Features of cell nuclei from biopsies for cancer classification.
- Great for: Binary classification, healthcare AI.
Real Estate, Housing, and Environment
- What it is: Detailed information about houses in Ames, Iowa.
- Great for: Regression, price prediction, data cleaning practice.
Study and Research Dataset
1. Student Performance Dataset
- What it is: Academic performance of students, with demographic and school data.
- Great for: Regression, correlation analysis, education-related models.
Social, Economic & Demographic Data
- What it is: Global economic, social, health, and development indicators.
- Great for: Time series, global comparisons, real-world analytics.
2. UNICEF Data
- What it is: Global data on child health, education, nutrition, and protection.
- Great for: Social research, public policy, humanitarian analysis.
- What it is: Demographic and population statistics from the U.S. Census Bureau.
- Great for: Urban planning, marketing, demographic analysis.
Miscellaneous
- What it is: Chemical properties and quality scores for red and white wine samples.
- Great for: Regression, classification, and EDA (Exploratory Data Analysis).