UCI Machine Learning Repository

From RP
Jump to: navigation, search

About the UCI Machine Learning Repository


The UC Irvine Machine Learning Repository is hosted by the Center for Machine Learning and Intelligent Systems at UC Irvine. They maintain data as a service to the machine learning community. You may view all data sets through their searchable interface. They accept dataset donations, please consult their donation policy.

Access the Data


Contents of the Data

UCI demo.png

As of this writing, there are 189 datasets on the UCI Machine Learning Repository. They are not categorized, but there's a listing of the most popular datasets (in terms of hits since 2007):

  • Iris - Famous database; from Fisher, 1936
  • Adult - Predict whether income exceeds $50K/yr based on census data. Also known as "Census Income" dataset.
  • Wine - Using chemical analysis determine the origin of wines
  • Breast Cancer Wisconsin (Diagnostic) - Diagnostic Wisconsin Breast Cancer Database
  • Abalone - Predict the age of abalone from physical measurements
  • Poker Hand - Purpose is to predict poker hands
  • Car Evaluation - Derived from simple hierarchical decision model, this database may be useful for testing constructive induction and structure discovery methods.
  • Forest Fires - This is a difficult regression task, where the aim is to predict the burned area of forest fires, in the northeast region of Portugal, by using meteorological and other data
  • Yeast - Predicting the Cellular Localization Sites of Proteins
  • Internet Advertisements - This dataset represents a set of possible advertisements on Internet pages.
  • Bag of Words - This data set contains five text collections in the form of bags-of-words.
  • SPECT Heart - Data on cardiac Single Proton Emission Computed Tomography (SPECT) images. Each patient classified into two categories: normal and abnormal.

Please visit the website to see all the datasets!