From RP
Jump to: navigation, search

About Orange


Orange is a component-based data mining software. It includes a range of preprocessing, modelling and data exploration techniques. It is based on C++ components, that are accessed either directly (not very common), through Python scripts (easier and better), or through GUI objects called Orange Widgets.

Orange is distributed free under GP.

Access the Software


Orange is a component-based framework, which means you can use existing components and build your own ones. You can even prototype your own components in Python, and use it in place of some standard C-based Orange component. For instance, you may craft your own function for attribute quality estimation, and use it within Orange's classification tree induction algorithm. Orange provides for some elementary components and more complex components build from elementary ones, and uses Python as a glue language.

Some of the readily-available features of Orange include:

  • Data input/ouput: Orange can read from and write to tab-delimited files and C4.5 files, and supports also some more exotic formats.
  • Preprocessing: feature subset selection, discretization, feature utility estimation for predictive tasks.
  • Predictive modelling: classification trees, naive bayesian classifer, k-NN, majority classifier, support vector machines, logistic regression, rule-based classifiers (e.g., CN2).
  • Ensemble methods, including boosting, bagging, and forest trees.
  • Data description methods: various visulizations (in widgets), self-organizing maps, hierarchical clustering, k-means clustering, multi-dimensional scaling, and other.
  • Model validation techniques, that include different data sampling and validation techniques (like cross-validation, random sampling, etc.), and various statistics for model validation (classification accuracy, AUC, sensitivity, specificity, ...).