Datasets used for classification: comparison of results

Computational Intelligence Laboratory | Department of Informatics, Nicolaus Copernicus University

Links on: AI and Machine Learning | AI in Information Retrieval | Cognitive science | Computational Intelligence | Neuroscience | Software & Databases | Science & Fringes | Logical rules extracted from data |

Before using any new dataset it should be described here!
Results from the Statlog project are here.
Logical rules derived for data are here.

Appendicitis |
Breast cancer (Wisconsin) |
Breast Cancer (Ljubljana) |
Diabetes (Pima Indian) |
Heart disease (Cleveland) |
Heart disease (Statlog version) |
Hepatitis |
Hypothyroid |
Hepatobiliary disorders |
Other datasets:
Ionosphere |
Satellite image dataset (Statlog version) |
Sonar |
Telugu Vovel |
Vovel |
Wine |
Other data: Glass, DNA |
More results for Statlog datasets.

A note of caution: comparison of different classifiers is not an easy task. Before you get into ranking of methods using the numbers presented in tables below please note the following facts.
Many results we have collected give only a single number (even results from the StatLog project!), without standard deviation. Since most classifiers may give results that differ by several percent on slightly different data partitions single numbers do not mean much.
Leave-one-out tests have been criticized as a basis for accuracy evaluation, the conclusion is that crossvalidation is safer, cf:
Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proc. of the 14th Int. Joint Conference on Artificial Intelligence, Morgan Kaufmann, pp. 1137-1143.

Crossvalidation tests (CV) are also not ideal. Theoretically about 2/3 of results should be within a single standard deviation from the average, and 95% of results should be within two standard deviations, so in a 10-fold crossvalidation you should see very rarely reuslts that are beter or worse than 2xSTDs. Running CV several times may also give you different answers. Search for the best estimator continues. Cf:
Dietterich, T. (1998) Approximate statistical tests for comparing supervised classification learning algorithms. Neural Computation, 10 (7), 1895-1924;
Nadeau C, Bengio Y. (1999) Inference for the Generalization Error. Tech. rep. 99s-25, CIRANO, J. Machine Learning (Kluver, in print).
Even the best accuracy and variance estimation is not sufficient, since performance cannot be characterized by a single number. It should be much better to provide full Receiver Operator Curves (ROC). Combining ROC with variance estimation would be ideal.
Unfortunately this still remains to be done. All we can do now is to collect some numbers in tables.
Our results are obtained usually with the GhostMiner package, developed in our group.
Some publications with results are on my page.
TuneIT, Testing Machine Learning & Data Mining Algorithms - Automated Tests, Repeatable Experiments, Meaningful Results.

Results of hand-written signs and numbers classification are here.


106 vectors, 8 attributes, two classes (85 acute a. +21 other, or 80.2+19.8%), data from Shalom Weiss;
Results obtained with the leave-one-out test, % of accuracy given
Attribute names: WBC1, MNEP, MNEA, MBAP, MBAA, HNEP, HNEA

Method Accuracy % Reference
PVM (logical rules) 89.6 Weiss, Kapouleas
C-MLP2LN (logical rules) 89.6±? our
k-NN, stand. Manhatan, k=8,9,22-25 k=4,5, stand. Euclid, f2+f4 removed 88.7 our (WD/KG)
9-NN, stand. Euclides 87.7 our (KG)
RIAC (prob. inductive) 86.9 Hamilton
1-NN, stand. Euclides, f2+f4 rem 86.8 our (WD/KG)
MLP+backpropagation 85.8 Weiss, Kapouleas
CART, C4.5 (dec. trees) 84.9 Weiss, Kapouleas
FSM 84.9 our (RA)
Bayes rule (statistical) 83.0 Weiss, Kapouleas

For 90% accuracy and p=0.95 confidence level 2-tailed bounds are: [82.8%,94.4%]
S.M. Weiss, I. Kapouleas, "An empirical comparison of pattern recognition, neural nets and machine learning classification methods", in: J.W. Shavlik and T.G. Dietterich, Readings in Machine Learning, Morgan Kauffman Publ, CA 1990
H.J. Hamilton, N. Shan, N. Cercone, RIAC: a rule induction algorithm based on approximate classification, Tech. Rep. CS 96-06, Regina University 1996.
C-MLP2LN (logical rules) only estimated l-o-o since the rules are like PVM.
3 crisp logical rules, overall 91.5% accuracy
Results for 10-fold stratified crossvalidation

Method Accuracy % Reference
NBC+WX+G(WX) ??.5±7.7 TM-GM
NBC+G(WX) ??.2±6.7 TM-GM
kNN auto+G(WX) Eukl ??.2±6.7 TM-GM
C-MLP2LN 89.6 our logical rules
20-NN, stand. Eukl f 4,1,7 89.3±8.6 our (KG); feature sel. from CV on the whole data set
SSV beam leaves 88.7±8.5 WD
SVM linear C=1 88.1±8.6 WD
6-NN, stand. Eukl. 88.0±7.9 WD
SSV default 87.8±8.7 WD
SSV beam pruning 86.9±9.8 WD
kNN, k=auto, Eucl 86.7±6.6 WD
FSM, a=0.9, Gauss, cluster 86.1±8.8 WD-GM
NBC 85.9±10.2 TM-GM
VSS 1 neuron, 4 it 84.9±7.4 WD/MK
SVM Gauss C=32, s=0.1 84.4±8.2 WD
MLP+BP (Tooldiag) 83.9 Rafał Adamczak
RBF (Tooldiag) 80.2 Rafał Adamczak

Maszczyk T, Duch W, Support Feature Machine, WCCI 2010 (submitted).

Wisconsin breast cancer.

From UCI repository, 699 cases, 9 attributes, two classes, 458 (65.5%) & 241 (34.5%).
Results obtained with the leave-one-out test, % of accuracy given.

F6 has 16 missing values, removing these vectors leaves 683 examples.

Method Accuracy % Reference
FSM 98.3 our (RA)
3-NN stand Manhatan 97.1 our (KG)
21-NN stand. Euclidean 96.9 our (KG)
C4.5 (decision tree) 96.0 Hamilton
RIAC (prob. inductive) 95.0 Hamilton

H.J. Hamilton, N. Shan, N. Cercone, RIAC: a rule induction algorithm based on approximate classification, Tech. Rep. CS 96-06, Regina University 1996.
Results obtained with the 10-fold crossvalidation, 16 vectors with F6 values missing removed, 683 samples left, % of accuracy given.

method Accuracy % Reference
Naive MFT 97.1 Opper, Winther, L-1-O est. 97.3
SVM Gauss, C=1,s=0.1 97.0±2.3 WD-GM
SVM (10xCV) 96.9 Opper, Winther
SVM lin, opt C 96.9±2.2 WD-GM, same with Minkovsky kernel
Cluster means, 2 prototypes 96.5±2.2 MB
Default, majority 65.5 --

Results obtained with the 10-fold crossvalidation, % of accuracy given, all data, missing vlues handled in different ways.

method Accuracy % Reference
NB + kernel est 97.5±1.8 WD, WEKA, 10X10CV
SVM (5xCV) 97.2 Bennet and Blue
kNN with DVDM distance 97.1 our (KG)
GM k-NN, k=3, raw, Manh 97.0±2.1 WD, 10X10CV
GM k-NN, k=opt, raw, Manh 97.0±1.7 WD, 10CV only
VSS, 8 it/2 neurons 96.9±1.8 WD/MK; 98.1% train
FSM-Feature Space Mapping 96.9±1.4 RA/WD, a=.99 Gaussian
Fisher linear discr. anal 96.8 Ster, Dobnikar
MLP+BP 96.7 Ster, Dobnikar
MLP+BP (Tooldiag) 96.6 Rafał Adamczak
LVQ 96.6 Ster, Dobnikar
kNN, Euclidean/Manhattan f. 96.6 Ster, Dobnikar
SNB, semi-naive Bayes (pairwise dependent) 96.6 Ster, Dobnikar
SVM lin, opt C 96.4±1.2 WD-GM, 16 missing with -10
VSS, 8 it/1 neuron! 96.4±2.0 WD/MK, train 98.0%
GM IncNet 96.4±2.1 NJ/WD; FKF, max. 3 neurons
NB - naive Bayes (completly independent) 96.4 Ster, Dobnikar
SSV opt nodes, 3CV int 96.3±2.2 WD/GM; training 96.6±0.5
IB1 96.3±1.9 Zarndt
DB-CART (decision tree) 96.2 Shang, Breiman
GM SSV Tree, opt nodes BFS 96.0±2.9 WD/KG (beam search 94.0)
LDA - linear discriminant analysis 96.0 Ster, Dobnikar
OC1 DT (5xCV) 95.9 Bennet and Blue
RBF (Tooldiag) 95.9 Rafał Adamczak
GTO DT (5xCV) 95.7 Bennet and Blue
ASI - Assistant I tree 95.6 Ster, Dobnikar
MLP+BP (Weka) 95.4±0.2 TW/WD
OCN2 95.2±2.1 Zarndt
IB3 95.0±4.0 Zarndt
MML tree 94.8±1.8 Zarndt
ASR - Assistant R (RELIEF criterion) tree 94.7 Ster, Dobnikar
C4.5 tree 94.7±2.0 Zarndt
LFC, Lookahead Feature Constr binary tree 94.4 Ster, Dobnikar
CART tree 94.4±2.4 Zarndt
ID3 94.3±2.6 Zarndt
C4.5 (5xCV) 93.4 Bennet and Blue
C 4.5 rules 86.7±5.9 Zarndt
Default, majority 65.5 --
QDA - quadratic discr anal 34.5 Ster, Dobnikar

For 97% accuracy and p=0.95 confidence level 2-tailed bounds are: [95.5%,98.0%]
K.P. Bennett, J. Blue, A Support Vector Machine Approach to Decision Trees, R.P.I Math Report No. 97-100, Rensselaer Polytechnic Institute, Troy, NY, 1997
N. Shang, L. Breiman, ICONIP'96, p.133
B. Ster and A. Dobnikar, Neural networks in medical diagnosis: Comparison with other methods. In A. Bulsari et al., editor, Proceedings of the International Conference EANN '96, pages 427-430, 1996.
F. Zarndt, A Comprehensive Case Study: An Examination of Machine Learning and Connectionist Algorithms, MSc Thesis, Dept. of Computer Science, Brigham Young University, 1995

Breast Cancer (Ljubljana data)

From UCI repository (restricted): 286 instances, 201 no-recurrence-events (70.3%), 85 recurrence-events (29.7%);
9 attributes, between 2-13 values each, 9 missing values
Results - 10xCV? Sometimes methodology was unclear;
difficult, noisy data, some methods are below the base rate (70.3%).


For 78% accuracy and p=0.95 confidence level 2-tailed bounds are: [72.9%,82.4%]

  • Assistant-86 achieved 78 %, but this seems to be best result that happens in some crossvalidations, not the average.
  • Cestnik,G., Konenenko,I, & Bratko,I. (1987). Assistant-86: A Knowledge-Elicitation Tool for Sophisticated Users. In I.Bratko & N.Lavrac (Eds.) Progress in Machine Learning, 31-45, Sigma Press.
  • Blanchard, G., Schafer,C., Rozenholc,Y., &Muller,K.-R. (2007) Optimal dyadic decision trees. Machine Learning 66: 709-717.
  • Clark,P. & Niblett,T. (1987). Induction in Noisy Domains. In: Progress in Machine Learning (from the Proceedings of the 2nd European Working Session on Learning), 11-30, Bled, Yugoslavia: Sigma Press.
  • Porter R.B., G. Beate Zimmer, Don R. Hush: Stack Filter Classifiers. ISMM 2009: 282-294
  • Michalski,R.S., Mozetic,I., Hong,J., & Lavrac,N. (1986). The Multi-Purpose Incremental Learning System AQ15 and its Testing Application to Three Medical Domains. In Proceedings of the Fifth National Conference on Artificial Intelligence, 1041-1045, Philadelphia, PA: Morgan Kaufmann.
  • Tan, M., & Eshelman, L. (1988). Using weighted networks to represent classification knowledge in noisy domains. Proceedings of the Fifth International Conference on Machine Learning, 121-134, Ann Arbor, MI.
  • F. Zarndt, A Comprehensive Case Study: An Examination of Machine Learning and Connectionist Algorithms, MSc Thesis, Dept. of Computer Science, Brigham Young University, 1995
  • S.M. Weiss, I. Kapouleas. An empirical comparison of pattern recognition, neural nets and machine learning classification methods, in: J.W. Shavlik and T.G. Dietterich, Readings in Machine Learning, Morgan Kauffman Publ, CA 1990

They used leave-one-out tests and obtained:
MLP+backprop: 75.7% train, 71.5% test;
Bayes 75.9% train, 71.8% test,
CART & PVM 77.4% train, 77.1% test;
k-NN 65.3 test


From UCI repository, 155 vectors, 19 attributes,
Two classes, die with 32 (20.6%), live with 123 (79.4%).
Many missing values! F18 has 67 missing values, F15 has 29, F17 has 16 and other features between 0 and 11.
Results obtained with the leave-one-out test, % of accuracy given
Method Accuracy, % test Reference
C-MLP2LN/SSV single rule 76.2±0.0 WD/K. Grabczewski, stable rule
SSV Tree rule 75.7±1.1 WD, av. from 10x10CV
MML Tree 75.3±7.8 Zarndt
SVM Gauss, C=1, s =0.1 73.8±4.3 WD, GM
MLP+backprop 73.5±9.4 Zarndt
SVM Gauss, C, s opt 72.4±5.1 WD, GM
IB1 71.8±7.5 Zarndt
CART 71.4±5.0 Zarndt
ODT trees 71.3±4.2 Blanchard
SVM lin, C=opt 71.0±4.7 WD, GM
UCN 2 70.7±7.8 Zarndt
SFC, Stack filters 70.6±4.2 Porter
Default, majority 70.3±0.0
SVM lin, C=1 70.0±5.6 WD, GM
C 4.5 rules 69.7±7.2 Zarndt
Bayes rule 69.3±10.0 Zarndt
C 4.5 69.2±4.9 Blanchard
Weighted networks 68-73.5 Tan, Eshelman
IB3 67.9±7.7 Zarndt
ID3 rules 66.2±8.5 Zarndt
AQ15 66-72 Michalski e.a.
Inductive 65-72 Clark, Niblett
Method Accuracy % Reference
21-NN, stand Manhattan 90.3 our (KG)
FSM 90.0 our (RA)
14-NN, stand. Euclid 89.0 our (KG)
LDA 86.4 Weiss & K
CART (decision tree) 82.7 Weiss & K
MLP+backprop 82.1 Weiss & K

MLP, CART, LDA results from (check it ?) S.M. Weiss, I. Kapouleas, "An empirical comparison of pattern recognition, neural nets and machine learning classification methods", in: J.W. Shavlik and T.G. Dietterich, Readings in Machine Learning, Morgan Kauffman Publ, CA 1990.
Other results - our own;
Results obtained with the 10-fold crossvalidation, % of accuracy given; our results with stratified crossvalidation, other results - who knows? Differences for this dataset are rather small, 0.1-0.2%.

Method Accuracy % Reference
Weighted 9-NN 92.9±? Karol Grudziński
18-NN, stand. Manhattan 90.2±0.7 Karol Grudziński
FSM with rotations 89.7±? Rafał Adamczak
15-NN, stand. Euclidean 89.0±0.5 Karol Grudziński
VSS 4 neurons, 5 it 86.5±8.8 WD/MK, train 97.1
FSM without rotations 88.5 Rafał Adamczak
LDA, linear discriminant analysis 86.4 Stern & Dobnikar
Naive Bayes and Semi-NB 86.3 Stern & Dobnikar
IncNet 86.0 Norbert Jankowski
QDA, quadratic discriminant analysis 85.8 Stern & Dobnikar
1-NN 85.3±5.4 Stern & Dobnikar, std added by WD
VSS 2 neurons, 5 it 85.1±7.4 WD/MK, train 95.0
ASR 85.0 Stern & Dobnikar
Fisher discriminant analysis 84.5 Stern & Dobnikar
LVQ 83.2 Stern & Dobnikar
CART (decision tree) 82.7 Stern & Dobnikar
MLP with BP 82.1 Stern & Dobnikar
ASI 82.0 Stern & Dobnikar
LFC 81.9 Stern & Dobnikar
RBF (Tooldiag) 79.0 Rafał Adamczak
MLP+BP (Tooldiag) 77.4 Rafał Adamczak

Results on BP, LVQ, ..., SNB are from: B. Ster and A. Dobnikar, Neural networks in medical diagnosis: Comparison with other methods. In A. Bulsari et al., editor, Proceedings of the International Conference EANN '96, pages 427-430, 1996.
Our good results reflect superior handling of missing values ?
Duch W, Grudziński K (1998) A framework for similarity-based methods. Second Polish Conference on Theory and Applications of Artificial Intelligence, Lodz, 28-30 Sept. 1998, pp. 33-60
Weighted kNN: Duch W, Grudzinski K and Diercksen G.H.F (1998) Minimal distance neural methods. World Congress of Computational Intelligence, May 1998, Anchorage, Alaska, IJCNN'98 Proceedings, pp. 1299-1304

Statlog version of Cleveland Heart disease.

13 attributes (extracted from 75), no missing values.
270=150+120 observations selected from the 303 cases (Cleveland Heart).
Attribute Information:

1. age 2. sex 3. chest pain type (4 values) 4. resting blood pressure 5. serum cholestorol in mg/dl
6. fasting blood sugar 120 mg/dl 7. resting electrocardiographic results (values 0,1,2) 8. maximum heart rate achieved 9. exercise induced angina 10. oldpeak = ST depression induced by exercise relative to rest
11. the slope of the peak exercise ST segment 12. number of major vessels (0-3) colored by flouroscopy 13. thal: 3 = normal; 6 = fixed defect; 7 = reversable defect

Attributes types: Real: 1,4,5,8,10,12; Ordered:11, Binary: 2,6,9 Nominal:7,3,13
Classes: Absence (1) or presence (2) of heart disease;
In Statlog experiments on heart data cost or risk matrix has been used with 9-fold crossvalidation, only cost values are given.
Results below are obtained with the 10-fold crossvalidation, % of accuracy given, no risk matrix

Method Accuracy % Reference
Lin SVM 2D QCP 85.9±5.5 MG, 10xCV
kNN auto+WX ??.8±5.6 TM GM 10xCV
SVM Gauss+WX+G(WX), C=1 s=2-5 ??.8±6.4 TM GM 10xCV
SVM lin, C=0.01 84.9±7.9 WD, GM 10x(9xCV)
SFM, G(WX), default C=1 ??±5.1 TM, GM 10xCV
Naive-Bayes 84.5±6.3 TM, GM 10xCV
Naive-Bayes 83.6 RA, WEKA
SVML default C=1 82.5±6.4 TM, GM 10xCV
K* 76.7 WEKA, RA
IB1c 74.0 WEKA, RA
1R 71.4 WEKA, RA
T2 68.1 WEKA, RA
MLP+BP 65.6 ToolDiag, RA
RBF 60.0 ToolDiag, RA
InductH 58.5 WEKA, RA
Base rate (majority classifier) 55.7
IB1-4 50.0 ToolDiag, RA

Results for Heart and other Statlog datasest are collected here.

Cleveland heart disease.

From UCI repository, 303 cases, 13 attributes (4 cont, 9 nominal), 7 vectors with missing values ?
2 (no, yes) or 5 classes (no, degree 1, 2, 3, 4).
Class distribution: 164 (54.1%) no, 55+36+35+13 yes (45.9%) with disease degree 1-4.
Results obtained with the leave-one-out test, % of accuracy given, 2 classes used.

Method Accuracy % Reference
LDA 84.5 Weiss ?
25-NN, stand, Euclid 83.6±0.5 WD/KG repeat??
C-MLP2LN 82.5 RA, estimated?
FSM 82.2 Rafał Adamczak
MLP+backprop 81.3 Weiss ?
CART 80.8 Weiss ?

MLP, CART, LDA where are these results from ???
Other results - our own.
Results obtained with the 10-fold crossvalidation, % of accuracy given.
Ster & Dobnikar reject 6 vectors (leaving 297) with missing values.
We use all 303 vectors replacing missing values by means for their class; in KNN we have used Stalog convention, 297 vectors

Method Accuracy % Reference
IncNet+transformations 90.0 Norbert Jankowski; check again!
28-NN, stand, Euclid, 7 features 85.1±0.5 WD/KG
LDA 84.5 Ster & Dobnikar
Fisher discriminant analysis 84.2 Ster & Dobnikar
k=7, Euclid, std 84.2±6.6 WD, GhostMiner
16-NN, stand, Euclid 84±0.6 WD/KG
FSM, 82.4-84% on test only 84.0 Rafał Adamczak
k=1:10, Manhattan, std 83.8±5.3 WD, GhostMiner
Naive Bayes 82.5-83.4 Rafał; Ster, Dobnikar
SNB 83.1 Ster & Dobnikar
LVQ 82.9 Ster & Dobnikar
GTO DT (5xCV) 82.5 Bennet and Blue
kNN, k=19, Eculidean 82.1±0.8 Karol Grudziński
k=7, Manhattan, std 81.8±10.0 WD, GhostMiner
SVM (5xCV) 81.5 Bennet and Blue
kNN (k=1? raw data?) 81.5 Ster & Dobnikar
MLP+BP (standarized) 81.3 Ster, Dobnikar, Rafał Adamczak
Cluster means, 2 prototypes 80.8±6.4 MB
CART 80.8 Ster & Dobnikar
RBF (Tooldiag, standarized) 79.1 Rafał Adamczak
Gaussian EM, 60 units 78.6 Stensmo & Sejnowski
ASR 78.4 Ster & Dobnikar
C4.5 (5xCV) 77.8 Bennet and Blue
IB1c (WEKA) 77.6 Rafał Adamczak
QDA 75.4 Ster & Dobnikar
LFC 75.1 Ster & Dobnikar
ASI 74.4 Ster & Dobnikar
K* (WEKA) 74.2 Rafał Adamczak
OC1 DT (5xCV) 71.7 Bennet and Blue
1 R (WEKA) 71.0 Rafał Adamczak
T2 (WEKA) 69.0 Rafał Adamczak
FOIL (WEKA) 66.4 Rafał Adamczak
InductH (WEKA) 61.3 Rafał Adamczak
Default, majority 54.1


C4.5 rules 53.8±5.9 Zarndt
IB1-4 (WEKA) 46.2 Rafał Adamczak

For 85% accuracy and p=0.95 confidence level 2-tailed bounds are: [80.5%,88.6%]
Results obtained with BP, LVQ, ..., SNB are from: B. Ster and A. Dobnikar, Neural networks in medical diagnosis: Comparison with other methods. In: A. Bulsari et al., editor, Proceedings of the International Conference EANN '96, pages 427-430, 1996.

Magnus Stensmo and Terrence J. Sejnowski, A Mixture Model System for Medical and Machine Diagnosis, Advances in Neural Information Processing Systems 7 (1995) 1077-1084

Kristin P. Bennett, J. Blue, A Support Vector Machine Approach to Decision Trees, R.P.I Math Report No. 97-100, Rensselaer Polytechnic Institute, Troy, NY, 1997
Other results for this dataset (methodology sometimes uncertain):
D. Wettschereck, averaging 25 runs with 70% train and 30% test, variants of k-NN with different metric functions and scaling.
David Aha & Dennis Kibler - From UCI repository past usage

Method Accuracy % Reference
k-NN, Value Distance Metric (VDM) 82.6 D. Wettschereck
k-NN, Euclidean 82.4±0.8 D. Wettschereck
k-NN, Variable Similarity Metric 82.4 D. Wettschereck
k-NN, Modified VDM 83.1 D. Wettschereck
Other k-NN variants < 82.4 D. Wettschereck
k-NN, Mutual Information 81.8 D. Wettschereck
CLASSIT (hierarchical clustering) 78.9 Gennari, Langley, Fisher
NTgrowth (instance-based) 77.0 Aha & Kibler
C4 74.8 Aha & Kibler
Naive Bayes 82.8±1.3 Friedman, 5xCV, 296 vectors

Gennari, J.H., Langley, P, Fisher, D. (1989). Models of incremental concept formation. Artificial Intelligence, 40, 11-61.
Friedman N, Geiger D, Goldszmit M (1997). Bayesian networks classifiers. Machine Learning 29: 131--163


From the UCI repository, dataset "Pima Indian diabetes":
2 classes, 8 attributes, 768 instances, 500 (65.1%) negative (class1), and 268 (34.9%) positive tests for diabetes. class2.
All patients were females at least 21 years old of Pima Indian heritage.
Attributes used:
1. Number of times pregnant
2. Plasma glucose concentration a 2 hours in an oral glucose tolerance test
3. Diastolic blood pressure (mm Hg)
4. Triceps skin fold thickness (mm)
5. 2-Hour serum insulin (mu U/ml)
6. Body mass index (weight in kg/(height in m)^2)
7. Diabetes pedigree function
8. Age (years)
Results obtained with the 10-fold crossvalidation, % of accuracy given; Statlog results are with 12-fold crossvalidation

Method Accuracy % Reference
Logdisc 77.7 Statlog
IncNet 77.6 Norbert Jankowski
DIPOL92 77.6 Statlog
Linear Discr. Anal. 77.5-77.2 Statlog; Ster & Dobnikar
SVM, linear, C=0.01 77.5±4.2 WD-GM, 10XCV averaged 10x
SVM, Gauss, C, sigma opt 77.4±4.3 WD-GM, 10XCV averaged 10x
SMART 76.8 Statlog
GTO DT (5xCV) 76.8 Bennet and Blue
kNN, k=23, Manh, raw, W 76.7±4.0 WD-GM, feature weighting 3CV
kNN, k=1:25, Manh, raw 76.6±3.4 WD-GM, most cases k=23
ASI 76.6 Ster & Dobnikar
Fisher discr. analysis 76.5 Ster & Dobnikar
MLP+BP 76.4 Ster & Dobnikar
MLP+BP 75.8±6.2 Zarndt
LVQ 75.8 Ster & Dobnikar
LFC 75.8 Ster & Dobnikar
RBF 75.7 Statlog
NB 75.5-73.8 Ster & Dobnikar; Statlog
kNN, k=22, Manh 75.5 Karol Grudziński
MML 75.5±6.3 Zarndt
SNB 75.4 Ster & Dobnikar
BP 75.2 Statlog
SSV DT 75.0±3.6 WD-GM, SSV BS, node 5CV MC
kNN, k=18, Euclid, raw 74.8±4.8 WD-GM
CART DT 74.7±5.4 Zarndt
CART DT 74.5 Stalog
DB-CART 74.4 Shang & Breiman
ASR 74.3 Ster & Dobnikar
ODT, dyadic trees 74.0±2.3 Blanchard
Cluster means, 2 prototypes 73.7±3.7 MB
SSV DT 73.7±4.7 WD-GM, SSV BS, node 10CV strat
SFC, stacking filters 73.3±1.9 Porter
C4.5 DT 73.0 Stalog
C4.5 DT 72.7±6.6 Zarndt
Bayes 72.2±6.9 Zarndt
C4.5 (5xCV) 72.0 Bennet and Blue
CART 72.8 Ster & Dobnikar
Kohonen 72.7 Statlog
C4.5 DT 72.1±2.6 Blanchard (averaged over 100 runs)
kNN 71.9 Ster & Dobnikar
ID3 71.7±6.6 Zarndt
IB3 71.7±5.0 Zarndt
IB1 70.4±6.2 Zarndt
kNN, k=1, Euclides, raw 69.4±4.4 WD-GM
kNN 67.6 Statlog
C4.5 rules 67.0±2.9 Zarndt
OCN2 65.1±1.1 Zarndt
Default, majority 65.1
QDA 59.5 Ster, Dobnikar

For 77.7% accuracy and p=0.95 confidence level 2-tailed bounds are: [74.6%,80.5%]
Results on BP, LVQ, ..., SNB are from: B. Ster and A. Dobnikar, Neural networks in medical diagnosis: Comparison with other methods. In A. Bulsari et al., editor, Proceedings of the International Conference EANN '96, pages 427-430, 1996.

Other results (with different tests):

Method Accuracy % Reference
SVM (5xCV) 77.6 Bennet and Blue
C4.5 76.0±0.9 Friedman, 5xCV
Semi-Naive Bayes 76.0±0.8 Friedman, 5xCV
Naive Bayes 74.5±0.9 Friedman, 5xCV
Default, majority 65.1

Friedman N, Geiger D, Goldszmit M (1997). Bayesian networks classifiers. Machine Learning 29: 131--163
Opper/Winther use 200 training and 332 test examples (following Rippley), with TAP MFT results on test 81%, SVS at 80.1% and best NN as 77.4%.


Thyroid, From UCI repository, dataset "": A Thyroid database suited for training ANNs.
3772 learning and 3428 testing examples; primary hypothyroid, compensated hypothyroid, normal.
Training: 93+191+3488 or 2.47%, 5.06%, 92.47%
Test: 73+177+3178 or 2.13%, 5.16%, 92.71%
21 attributes (15 binary, 6 continuous); 3 classes
The problem is to determine whether a patient referred to the clinic has hypothyroid. Therefore three classes are built: normal (not hypothyroid), hyperfunction and subnormal functioning. Because 92 percent of the patients are not hyperthyroid. A good classifier must be significant better than 92%.
Note: These are the datas Quinlans used in the case study of his article "Simplifying Decision Trees" (International Journal of Man-Machine Studies (1987) 221-234)
Names: I (W.D.) have investigated this issue and after some mail exchange with Chris Mertz, who maintains the UCI repository; here is the conclusion:

1 age: continuous 2 sex: {M, F} 3 on thyroxine: logical
4 maybe on thyroxine: logical 5 on antithyroid medication: logical 6 sick - patient reports malaise: logical
7 pregnant: logical 8 thyroid surgery: logical 9 I131 treatment: logical
10 test hypothyroid: logical 11 test hyperthyroid: logical 12 on lithium: logical
13 has goitre: logical 14 has tumor: logical 15 hypopituitary: logical
16 psychological symptoms: logical 17 TSH: continuous 18 T3: continuous
19 TT4: continuous 20 T4U: continuous 21 FTI: continuous


Method % training % test Reference
C-MLP2LN rules+ASA 99.90 99.36 Rafał/Krzysztof/Grzegorz
CART 99.80 99.36 Weiss
PVM 99.80 99.33 Weiss
SSV beam search 99.80 99.33 WD
IncNet 99.68 99.24 Norbert Jankowski
MLP+SCG, 4 neurons 99.81 99.24 SVNT paper
SVM Minkovsky kernel 100.0 99.18 SVNT paper
SSV opt leaves or pruning 99.7 99.1 WD
MLP init+ a,b opt. 99.5 99.1 Rafał
C-MLP2LN rules 99.7 99.0 Rafał/Krzysztof
MLP+SCG, 4 neurons, 67 SV 99.95 99.0 SVNT paper
MLP+SCG, 4 neurons, 45 SV 100 98.9 SVNT paper
MLP+SCG, 12 neurons 100 98.8 SVNT paper
Cascade correlation 100.0 98.5 Schiffmann
Local adapt. rates 99.6 98.5 Schiffmann
BP+genetic opt. 99.4 98.4 Schiffmann
Quickprop 99.6 98.3 Schiffmann
RPROP 99.6 98.0 Schiffmann
3-NN, Euclides, with 3 features 98.7 97.9 W.D./Karol
1-NN, Euclides, with 3 features 98.4 97.7 W.D./Karol
Best backpropagation 99.1 97.6 Schiffmann
1-NN, Euclides, 8 features used -- 97.3 Karol/W.D.
SVM Gauss, C=8 s=0.1 98.3 96.1 WD
Bayesian classif. 97.0 96.1 Weiss?
SVM Gauss, C=1 s=0.1 95.4 94.7 WD
BP+conj. gradient 94.6 93.8 Schiffmann
1-NN Manhattan, std data 93.8 Karol G./WD
SVM lin, C=1 94.1 93.3 WD
SVM Gauss, C=8 s=5 100 92.8 WD
Default, majority 250 test errors 92.7
1-NN Manhattan, raw data 92.2 Karol G./WD

For 99.90% accuracy on training and p=0.95 confidence level 2-tailed bounds are: [99.74%,99.96%]
Most NN results from W. Schiffmann, M. Joost, R. Werner, 1993; MLP2LN and Init+a,b ours.
k-NN, PVM and CART from S.M. Weiss, I. Kapouleas, "An empirical comparison of pattern recognition, neural nets and machine learning classification methods", in: J.W. Shavlik and T.G. Dietterich, Readings in Machine Learning, Morgan Kauffman Publ, CA 1990
SVM with linear and Gaussian kernels gives quite poor results on this data.
3 crisp logical rules using TSH, FTI, T3, on_thyroxine, thyroid_surgery, TT4 give 99.3% of accuracy on the test set.

Hepatobiliary disorders

Contains medical records of 536 patients admitted to a university-affiliated Tokyo-based hospital, with four types of hepatobiliary disorders: alcoholic liver damage, primary hepatoma, liver cirrhosis and cholelithiasis. The records included results of 9 biochemical tests and sex of the patient. The same 163 cases as in [Hayashi] were used as the test data.
FSM gives about 60 Gaussian or triangular membership functions achieving accuracy of 75.5-75.8%. Rotation of these functions (i.e. introducing linear combination of inputs to the rules) does not improve this accuracy. 10-fold crossvalidation tests on the mixed, training plus test data, give similar results. The best results were obtained with the K* method based on algorithmic complexity optimization, giving 78.5% on the test set, and kNN with Manhattan distance function, k=1 and selection of features (using the leave-one-out method on the training data, features 2, 5, 6 and 9 were removed), giving 80.4% accuracy. Simulated annealing optimization of the scaling factors for the remaining 5 features give 81.0% and optimizing scaling factors using all input features 82.8%. The scaling factors are: 0.92, 0.60, 0.91, 0.92, 0.07, 0.41, 0.55, 0.86, 0.30. Similar accuracy is obtained using multisimplex method for optimization of the scaling factors.

Method Training set Test set Reference
IB2-IB4 81.2-85.5 43.6-44.6 WEKA, our calculation
Naive Bayes -- 46.6 WEKA, our calculation
1R (rules) 58.4 50.3 WEKA, our calculation
T2 (rules from decision tree) 67.5 53.3 WEKA, our calculation
FOIL (inductive logic) 99 60.1 WEKA, our calculation
FSM, initial 49 crisp logical rules 83.5 63.2 FSM, our calculation
LDA (statistical) 68.4 65.0 our calculation
DLVQ (38 nodes) 100 66.0 our calculation
C4.5 decision rules 64.5 66.3 our calculation
Best fuzzy MLP model 75.5 66.3 Mitra et. al
MLP with RPROP 68.0 our calculation
Cascade Correlation 71.0 our calculation
Fuzzy neural network 100 75.5 Hayashi
C4.5 decision tree 94.4 75.5 our calculation
FSM, Gaussian functions 93 75.6 our calculation
FSM, 60 triangular functions 93 75.8 our calculation
IB1c (instance-based) -- 76.7 WEKA, our calculation
kNN, k=1, Camberra, raw 76.1 80.4 WD/SBL
K* method -- 78.5 WEKA, our calculation
1-NN, 4 features removed, Manhattan 76.9 80.4 our calculation, KG
1-NN, Camberra, raw, removed f2, 6, 8, 9 77.2 83.4 our calculation, KG

Y. Hayashi, A. Imura, K. Yoshida, “Fuzzy neural expert system and its appli-cation to medical diagnosis”, in: 8th International Congress on Cybernetics and Systems, New York City 1990, pp. 54-61
S. Mitra, R. De, S. Pal, “Knowledge based fuzzy MLP for classification and rule generation”, IEEE Transactions on Neural Networks 8, 1338-1350, 1997, a knowledge-based fuzzy MLP system gives results on the test set in the range from 33% to 66.3%, depending on the actual fuzzy model used.
W. Duch and K. Grudzinski, ``Prototype Based Rules - New Way to Understand the Data,'' Int. Joint Conference on Neural Networks, Washington D.C., pp. 1858-1863, 2001. Contains best results with 1-NN, Camberra and feature selection, 83.4% on the test.

Other, non-medical data

Landsat Satellite image dataset (STATLOG version)

Training 4435 test 2000 cases, 36 semi-continous [0 to 255] attributes (= 4 spectral bands x 9 pixels in neighbourhood) and 6 decision classes: 1,2,3,4,5 and 7 (class 6 has been removed because of doubts about the validity of this class).
The StatLog database consists of the multi-spectral values of pixels in 3x3 neighbourhoods in a satellite image, and the classification associated with the central pixel in each neighbourhood. The aim is to predict this classification, given the multi-spectral values. In the sample database, the class of a pixel is coded as a number.

Method % training % test Time train Time test
MLP+SCG 96.0 91.0 reg alfa=0.5, 36 hidden nodes, 1400 it fast; WD
k-NN -- 90.9 auto-k=3, Manhattan, std data GM 2.0
k-NN 91.1 90.6 2105, Statlog 944; parametry?
k-NN -- 90.4 auto-k=5, Euclidean, std data GM 2.0
k-NN -- 90.0 k=1, Manhattan, std data, no training fast, GM 2.0
FSM 95.1 89.7 std data, a=0.95 fast, GM 2.0; best NN result
LVQ 95.2 89.5 1273 44
k-NN -- 89.4 k=1, Euclidean, std data, no training fast, GM 2.0
Dipol92 94.9 88.9 746 111
MLP+SCG 94.4 88.5 5000 it; active learning+reg a=0.5, 8-12 hidden fast; WD
SVM 91.6 88.4 std data, Gaussian kernel fast, GM 2.0; unclassified 4.3%
Radial 88.9 87.9 564 74
Alloc80 96.4 86.8 63840 28757
IndCart 97.7 86.2 2109 9
CART 92.1 86.2 330 14
MLP+BP 88.8 86.1 72495 53
Bayesian Tree 98.0 85.3 248 10
C4.5 96.0 85.0 434 1
New ID 93.3 85.0 226 53
QuaDisc 89.4 84.5 157 53
SSV 90.9 84.3 default par. very fast, GM 2.0
Cascade 88.8 83.7 7180 1
Log DA, Disc 88.1 83.7 4414 41
LDA, Discrim 85.1 82.9 68 12
Kohonen 89.9 82.1 12627 129
Bayes 69.2 71.3 75 17

The original database was generated from Landsat Multi-Spectral Scanner image data. The sample database was generated taking a small section (82 rows and 100 columns) from the original data. One frame of Landsat MSS imagery consists of four digital images of the same scene in different spectral bands. Two of these are in the visible region (corresponding approximately to green and red regions of the visible spectrum) and two are in the (near) infra-red. Each pixel is a 8-bit binary word, with 0 corresponding to black and 255 to white. The spatial resolution of a pixel is about 80m x 80m. Each image contains 2340 x 3380 such pixels.
The database is a (tiny) sub-area of a scene, consisting of 82 x 100 pixels. Each line of data corresponds to a 3x3 square neighbourhood of pixels completely contained within the 82x100 sub-area. Each line contains the pixel values in the four spectral bands (converted to ASCII) of each of the 9 pixels in the 3x3 neighbourhood and a number indicating the classification label of the central pixel. In each line of data the four spectral values for the top-left pixel are given first followed by the four spectral values for the top-middle pixel and then those for the top-right pixel, and so on with the pixels read out in sequence left-to-right and top-to-bottom. Thus, the four spectral values for the central pixel are given by attributes 17,18,19 and 20. If you like you can use only these four attributes, while ignoring the others. This avoids the problem which arises when a 3x3 neighbourhood straddles a boundary.
All results from Statlog book, except GM - GhostMiner calculations, W. Duch.
N Description Train Test
1 red soil 1072 (24.17%) 461 (23.05%)
2 cotton crop 479 (10.80%) 224 (11.20%)
3 grey soil 961 (21.67%) 397 (19.85%)
4 damp grey soil 415 (09.36%) 211 (10.55%)
5 veg. Stubble 470 (10.60%) 237 (11.85%)
6 Mixture class 0 0
7 very damp grey soil 1038 (23.40%) 470 (23.50%)

Machine Learning, Neural and Statistical Classification, D. Michie, D.J. Spiegelhalter, C.C. Taylor (eds), Stalog project book!


351 data records, with class division 224 (63.8%) + 126 (35.9%). Usually first 200 vectors are taken for training, and last 151 for the test, but this is very unbalanced: in the training set 101 (50.5%) and 99 (49.5%) are from 1/2 class, in the test set 123 (82%) and 27 (18%) are from class 1/2.
34 attributes, but f2=0 always and should be removed; f1 is binary, the remaining 32 attributes are continuous.
2 classes - different types of radar signals reflected from ionoshpere.
Some vectors: 8, 18, 20, 22, 24, 30, 38, 52, 76, 78, 80, 82, 103, 163, 169, 171, 183, 187, 189, 191, 201, 215, 219, 221, 223, 225, 227, 229, 231, 233, 249, are either binary 0, 1 or have only 3 values -1, 0, +1.
For example, vector 169 has only one component = 1, all others are 0.
Method Accuracy % Reference
3-NN + simplex 98.7 Our own weighted kNN
VSS 2 epochs 96.7 MLP with numerical gradient
3-NN 96.7 KG, GM with or without weights
IB3 96.7 Aha, 5 errors on test
1-NN, Manhattan 96.0 GM kNN (our)
MLP+BP 96.0 Sigillito
SVM Gaussian 94.9±2.6 GM (our), defaults, similar for C=1-100
C4.5 94.9 Hamilton
3-NN Canberra 94.7 GM kNN (our)
RIAC 94.6 Hamilton
C4 (no windowing) 94.0 Aha
C4.5 93.7 Bennet and Blue
SVM 93.2 Bennet and Blue
Non-lin perceptron 92.0 Sigillito
FSM + rotation 92.8 our
1-NN, Euclidean 92.1 Aha, GM kNN (our)
DB-CART 91.3 Shang, Breiman
Linear perceptron 90.7 Sigillito
OC1 DT 89.5 Bennet and Blue
CART 88.9 Shang, Breiman
SVM linear 87.1±3.9 GM (our), defaults
GTO DT 86.0 Bennet and Blue

Perceptron+MLP results:
Sigillito, V. G., Wing, S. P., Hutton, L. V., & Baker, K. B. (1989) Classification of radar returns from the ionosphere using neural networks. Johns Hopkins APL Technical Digest, 10, 262-266.
N. Shang, L. Breiman, ICONIP'96, p.133
David Aha: k-NN+C4+IB3, from Aha, D. W., & Kibler, D. (1989). Noise-tolerant instance-based learning algorithms. In Proceedings of the Eleventh International Joint Conference on Artificial Intelligence (pp. 794-799). Detroit, MI: Morgan Kaufmann.
IB3 parameter settings: 70% and 80% for acceptance and dropping respectively.
RIAC, C4.5 from: H.J. Hamilton, N. Shan, N. Cercone, RIAC: a rule induction algorithm based on approximate classification, Tech. Rep. CS 96-06, Regina University 1996.
K.P. Bennett, J. Blue, A Support Vector Machine Approach to Decision Trees, R.P.I Math Report No. 97-100, Rensselaer Polytechnic Institute, Troy, NY, 1997
Training/test division is not too good in this case, distributions are a bit differnet.
In 10xCV results are:
Method Accuracy % Reference
SFM+G+G(WX) ??±2.6 GM (our), C=1, s=2-5
kNN auto+WX+G(WX) ??.4±3.6 GM (our)
SVM Gaussian 94.6±4.3 GM (our), C=1, s=2-5
VSS-MKNN 91.5±4.3 MK, 12 neurons (similar 8-17)
SVM lin 89.5±3.8 GM (our), C=1, s=2-5
SSV tree 87.8±4.5 GM (our), default
1-NN 85.8±4.9 GM std, Euclid
3-NN 84.0±5.4 GM std, Euclid

VSS is an MLP with search, implemented by Mirek Kordos, used with 3 epochs; neurons may be sigmoidal or step-wise (64 values).
Maszczyk T, Duch W, Support Feature Machine, WCCI 2010 (submitted).

Sonar: Mines vs Rocks

208 cases, 60 continuous attributes, 2 classes, 111 metal, 97 rock.
From the CMU benchmark repository
This dataset has been used in two kinds of experiments:
1. The "aspect-angle independent" experiments use all 208 cases with 13-fold crossvalidation, averaged over 10 runs to get std.
2. The "angle independent experiments" use training / test sets with 104 vectors each. Class distribution in training is 49 + 55, in test 62 + 42.
Estimation of L1O on the whole dataset (Opper and Winther) give 78.2% only; is the test so easy? Some of this results were made without standardization of the data, which is here very important!
The "angle independent experiments" with training / test sets.
Method Train % Test % Reference
1-NN, 5D from MDS, Euclid, std 97.1 our, GM (WD)
1-NN, Manhattan std 97.1 our, GM (WD)
1-NN, Euclid std 96.2 our, GM (WD)
TAP MFT Bayesian -- 92.3 Opper, Winther
Naive MFT Bayesian -- 90.4 Opper, Winther
SVM -- 90.4 Opper, Winther
MLP+BP, 12 hidden, best MLP -- 90.4 Gorman, Sejnowski
1-NN, Manhattan raw 92.3 our, GM (WD)
1-NN, Euclid raw 91.3 our, GM (WD)
FSM - methodology ? 83.6 our (RA)

The "angle dependent experiments" with 13 CV on all data.
1-NN Euclid on 5D MDS input 87.5±0.8 our GM (WD)
1-NN Euclidean, std data 86.8±1.2 our GM (WD)
1-NN Manhattan, std data 86.3±0.3 our GM (WD)
MLP+BP, 12 hidden 99.8±0.1 84.7±5.7 Gorman, Sejnowski
1-NN Manhattan, raw data 84.5±0.4 our GM (WD)
MLP+BP, 24 hidden 99.8±0.1 84.5±5.7 Gorman, Sejnowski
MLP+BP, 6 hidden 99.7±0.2 83.5±5.6 Gorman, Sejnowski
SVM linear, C=0.1 82.7±8.5 our GM (WD), std data
1-NN Euclidean, raw data 82.1±0.9 our GM (WD)
SVM Gauss, C=1, s=0.1 77.4±10.1 our GM (WD), std data
SVM linear, C=1 76.9±11.9 our GM (WD), raw data
SVM linear, C=1 76.0±9.8 our GM (WD), std data

DB-CART, 10xCV 81.8 Shang, Breiman
CART, 10xCV 67.9 Shang, Breiman

M. Opper and O. Winther, Gaussian Processes and SVM: Mean Field Results and Leave-One-Out. In: Advances in Large Margin Classifiers, Eds. A. J. Smola, P. Bartlett, B. Schölkopf, D. Schuurmans, MIT Press, 311-326, 2000; same methodology as Gorman with Sejnowski.
N. Shang, L. Breiman, ICONIP'96, p.133, 10xCV
Gorman, R. P., and Sejnowski, T. J. (1988). "Analysis of Hidden Units in a Layered Network Trained to Classify Sonar Targets", Neural Networks 1, pp. 75-89, 13xCV
Our results: kNN results from 10xCV and from 13xCV are quite similar, so Shang and Breiman should not differ much from 13 CV.
WD Leave-one-out (L1O) estimations on std data:
L1O with k=1, Euclidean distance, for all data gives 87.50%, other k and distance function do not give significant improvement.
SVM linear, C=1, L1O 75.0%, for Gaussian kernel, C=1, L1O is 78.8%
Other L1O results taken from C. Domeniconi, J. Peng, D. Gunopulos, "An adaptive metric for pattern classification".

Discriminant Adaptive NN, DANN 92.3
Adaptive metric NN 90.9
kNN 87.5
SVM Gauss C=1 78.8
C4.5 76.9
SVM linear C=1 75.0


528 training, 462 test cases, 10 continous attributes, 11 classes
From the UCI benchmark repository.
Speaker independent recognition of the eleven steady state vowels of British English using a specified training set of lpc derived log area ratios.
Results on the total set
Method Train Test Reference
CART-DB, 10xCV on total set !!! 90.0 Shang, Breiman
CART, 10xCV on total set 78.2 Shang, Breiman
Method Train Test Reference
Square node network, 88 units 54.8 UCI
Gaussian node network, 528 units 54.6 UCI
1-NN, Euclides, raw 99.24 56.3 WD/KG
Radial Basis Function, 528 units 53.5 UCI
Gaussian node network, 88 units 53.5 UCI
FSM Gauss, 10CV na treningowym 92.60 51.94 our (RA)
Square node network, 22 51.1 UCI
Multi-layer perceptron, 88 hidden 50.6 UCI
Modified Kanerva Model, 528 units 50.0 UCI
Radial Basis Function, 88 units 47.6 UCI
Single-layer perceptron, 88 hidden 33.3 UCI

N. Shang, L. Breiman, ICONIP'96, p.133, made 10xCv instead of using the test set.

Telugu Vovel

871 patterns, 6 overlapping vowel classes (Indian Telugu vowel sounds), 3 features (formant frequencies).
Method Test Reference
10xCV tests below
3-NN, Manhattan 87.8±4.0 Kosice
3-NN, Canberra 87.8±4.2 WD/GM
FSM, 65 Gaussian nodes 87.4±4.5 Kosice
3-NN, Euclid 87.3±3.9 WD/GM
SSV dec. tree, 22 rules 86.0±?? Kosice
SVM Gauss opt C~1000, s~1 85.0±4.0 WD, Ghostminer
SVM Gauss C=1000, s=1 83.5±4.1 WD, Ghostminer
SVM, Gauss, C=1, s=0.1 76.6±2.5 WD, Ghostminer
2xCV tests below
3-NN, Euclidean 86.1±0.6 Kosice
FSM, 40 Gaussian nodes 85.2±1.2 Kosice
MLP 84.6 Pal
Fuzzy MLP 84.2 Pal
SSV dec. tree, beam search 83.3±0.9 Kosice
SSV dec. tree, best first 83.0±1.0 Kosice
Bayes Classifier 79.2 Pal
Fuzzy SOM 73.5 Pal

Parameters in SVM were optimized, that is in each CV different paramters were used, so only approximate value can be quoted. If they are fixed to C=1000, s=1 results are a bit worse.
Papers using this data:

  • S. K. Pal and D. Dutta Majumder, ``Fuzzy sets and decision making approaches in vowel and speaker recognition'', IEEE Transactions on Systems, Man, and Cybernetics, Vol. 7, pp. 625-629, 1977.
  • S. Mitra, M. Banerjee and S. K. Pal, Rough knowledge-based network, fuzziness and classification, Neural Computing & Applications 7, 17-25, 1998.
  • Duch W and Hayashi Y, Computational intelligence methods and data understanding. In: Quo Vadis computational Intelligence? New trends and approaches in computational intelligence. Eds. P. Sincak, J. Vascak, Springer studies in fuzziness and soft computing, Vol. 54 (2000), pp. 256-270.
  • Chaoshun Li, Jianzhong Zhou, Qingqing Li and Xiuqiao Xiang, A Fuzzy Cluster Algorithm Based on Mutative Scale Chaos Optimization, LNCS 5264, 259-267, 2008.

Wine data

Source: UCI, described in Forina, M. et al, PARVUS - An Extendible Package for Data Exploration, Classification and Correlation. Institute of Pharmaceutical and Food Analysis and Technologies, Via Brigata Salerno, 16147 Genoa, Italy.
These data are the results of a chemical analysis of wines grown in the same region in Italy but derived from three different cultivars. The analysis determined the quantities of 13 constituents found in each of the three types of wines.
Class distribution: 178 cases = [59, 71, 48] in Class 1-3;
13 continuous attributes: alcohol, malic-acid, ash, alkalinity, magnesium, phenols, flavanoids, nonanthocyanins, proanthocyanins, color, hue, OD280/D315, proline.
Method Test Reference
Leave-one-out test results
RDA 100 [1]
QDA 99.4 [1]
LDA 98.9 [1]
kNN, Manhattan, k=1 98.7 GM-WD, std data
1NN 96.1 [1] z-transformed data
kNN, Euclidean, k=1 95.5 GM-WD, std data
kNN, Chebyshev, k=1 93.3 GM-WD, std data
10xCV tests below
kNN, Manhattan, auto k=1-10 98.9±2.3 GM-WD, 2D data, after MDS/PCA
IncNet, 10CV, def, Gauss 98.9±2.4 GM-WD, std data, up to 3 neurons
10 CV SSV, opt prune 98.3±2.7 GM-WD, 2D data, after MDS/PCA
10 CV SSV, node count 7 98.3±2.7 GM-WD, 2D data, after MDS/PCA
kNN, Euclidean, k=1 97.8±2.8 GM-WD, 2D data, after MDS/PCA
kNN, Manhattan, k=1 97.8±2.9 GM-WD, 2D data, after MDS/PCA
kNN, Manhattan, auto k=1-10 97.8±3.9 GM-WD
kNN, Euclidean, k=3, weighted features 97.8±4.7 GM-WD
IncNet, 10CV, def, bicentral 97.2±2.9 GM-WD, std data, up to 3 neurons
kNN, Euclidean, auto k=1-10 97.2±4.0 GM-WD
10 CV SSV, opt node 97.2±5.4 GM-WD, 2D data, after MDS/PCA
FSM a=.99, def 96.1±3.7 GM-WD, 2D data, after MDS/PCA
FSM 10CV, Gauss, a=.999 96.1±4.7 GM-WD, std data, 8-11 neurons
FSM 10CV, triang, a=.99 96.1±5.9 GM-WD, raw data
kNN, Euclidean, k=1 95.5±4.4 GM-WD
10 CV SSV, opt node, BFS 92.8±3.7 GM-WD
10 CV SSV, opt node, BS 91.6±6.5 GM-WD
10 CV SSV, opt prune, BFS 90.4±6.1 GM-WD

UCI past usage:
[1] S. Aeberhard, D. Coomans and O. de Vel, Comparison of Classifiers in High Dimensional Settings, Tech. Rep. no. 92-02, (1992), Dept. of Computer Science and Dept. of Mathematics and Statistics, James Cook University of North Queensland (submitted to Technometrics).
[2] S. Aeberhard, D. Coomans and O. de Vel, "The classification performance of RDA" Tech. Rep. no. 92-01, (1992), Dept. of Computer Science and Dept. of Mathematics and Statistics, James Cook University of North Queensland (submitted to Journal of Chemometrics).

Other Data

Glass identification

Shang, Breiman CART 71.4% accuracy, DB-CART 70.6%.
Leave-one-out results taken from C. Domeniconi, J. Peng, D. Gunopulos, "An adaptive metric for pattern classification".

Adaptive metric NN 75.2
Discriminant Adaptive NN, DANN 72.9
kNN 72.0
C4.5 68.2

DNA-Primate splice-junction gene sequences, with associated imperfect domain theory.

Stalog Data: splice junctions are points on a DNA sequence at which `superfluous' DNA is removed during the process of protein creation in higher organisms. The problem posed in this dataset is to recognize, given a sequence of DNA, the boundaries between exons (the parts of the DNA sequence retained after splicing) and introns (the parts of the DNA sequence that are spliced out).

This problem consists of two subtasks: recognizing exon/intron boundaries (referred to as EI sites), and recognizing intron/exon boundaries (IE sites). (In the biological community, IE borders are referred to a "acceptors'' while EI borders are referred to as "donors''.)
Number of Instances: 3190. Class distribution:
Class Train Test
1 464 (23.20%) 303 (25.55%)
2 485 (24.25%) 280 (23.61%)
3 1051 (52.55%) 603 (50.84%)
All 2000 (100%) 1186 (100%)

Number of attributes: originally 60 attributes {a,c,t,g}, usually converted to 180 binary indicator variables {(0,0,0), (0,0,1), (0,1,0), (1,0,0)}, or 240 binary variables.
Much better performance is generally observed if attributes closest to the junction are used (middle). In the StatLog version (180 variables), this means using attributes A61 to A120 only.

Method % in training % on test Time train Time test
RBF, 720 nodes 98.5 95.9
k-NN GM, p(X|C), k=6, Euclid, raw 96.8 95.5 0 short
Dipol92 99.3 95.2 213 10
Alloc80 93.7 94.3 14394 --
QuaDisc 100.0 94.1 1581 809
LDA, Discrim 96.6 94.1 929 31
FSM, 8 Gaussians, 180 binary 95.4 94.0
Log DA, Disc 99.2 93.9 5057 76
SSV Tree, p(X|C), opt node, 4CV 94.8 93.4 short short
Naive Bayes 94.8 93.2 52 15
Castle, middle 90 binary var 93.9 92.8 397 225
IndCart, 180 binary 96.0 92.7 523 516
C4.5, on 60 features 96.0 92.4 9 2
CART, middle 90 binary var 92.5 91.5 615 9
MLP+BP 98.6 91.2 4094 9
Bayesian Tree 99.9 90.5 82 11
CN2 99.8 90.5 869 74
New ID 100.0 90.0 698 1
Ac2 100.0 90.0 12378 87
Smart 96.6 88.5 79676 16
Cal5 89.6 86.9 1616 8
Itrule 86.9 86.5 2212 6
k-NN 91.1 85.4 2428 882
Kohonen 89.6 66.1 - -
Default, majority 52.5 50.8

kNN GM - GhostMiner version of kNN (our group)
SSV Decision Tree - our results

Włodzisław Duch, last modification 26.08.2012