Datasets used for classification: comparison of results 
Before using any new dataset it should be described here!
Results from the Statlog project are here.
Logical rules derived for data are here.
Medical: Appendicitis  Breast cancer (Wisconsin)  Breast Cancer (Ljubljana)  Diabetes (Pima Indian)  Heart disease (Cleveland)  Heart disease (Statlog version)  Hepatitis  Hypothyroid  Hepatobiliary disorders 
Other datasets:
Ionosphere 
Satellite image dataset (Statlog version) 
Sonar 
Telugu Vovel 
Vovel 
Wine 
Other data: Glass, DNA 
More results for Statlog datasets.
A note of caution: comparison of different classifiers is not an easy task. Before you get into ranking of methods using the numbers presented in tables below please note the following facts. Here relatively small data are analyzed, and simple classification methods are used, but not all task require large deep learning systems, sometimes simpler is better.
Many results we have collected give only a single number (even results from the StatLog project!), without standard deviation. Since most classifiers may give results that differ by several percent on slightly different data partitions single numbers do not mean much.
Leaveoneout tests have been criticized as a basis for accuracy evaluation, the conclusion is that crossvalidation is safer, cf:
Kohavi, R. (1995). A study of crossvalidation and bootstrap for accuracy estimation and model selection.
In: Proc. of the 14th Int. Joint Conference on Artificial Intelligence, Morgan Kaufmann, pp. 11371143.
Crossvalidation tests (CV) are also not ideal. Theoretically about 2/3 of results should be within a single standard deviation from the average, and 95% of results should be within two standard deviations, so in a 10fold crossvalidation you should see very rarely reuslts that are beter or worse than 2xSTDs. Running CV several times may also give you different answers. Search for the best estimator continues. Cf:
Dietterich, T. (1998) Approximate statistical tests for comparing supervised classification learning algorithms. Neural Computation, 10 (7), 18951924;
Nadeau C, Bengio Y. (1999) Inference for the Generalization Error. Tech. rep. 99s25, CIRANO, J. Machine Learning (Kluver, in print).
Even the best accuracy and variance estimation is not sufficient, since performance cannot be characterized by a single number. It should be much better to provide full Receiver Operator Curves (ROC). Combining ROC with variance estimation would be ideal.
Unfortunately this still remains to be done. All we can do now is to collect some numbers in tables.
Our results are obtained usually with the GhostMiner package, developed in our group.
Some
publications with results are on my page.
TuneIT, Testing Machine Learning & Data Mining Algorithms  Automated Tests, Repeatable Experiments, Meaningful Results.
106 vectors, 8 attributes, two classes (85 acute a. +21 other, or 80.2+19.8%),
data from Shalom Weiss;
Results obtained with the leaveoneout test, % of accuracy given
Attribute names: WBC1, MNEP, MNEA, MBAP, MBAA, HNEP, HNEA
Method   Reference 
PVM (logical rules)  89.6  Weiss, Kapouleas 
CMLP2LN (logical rules)  89.6± ?  our 
kNN, stand. Manhatan, k=8,9,2225
k=4,5, stand. Euclid, f2+f4 removed  88.7± 6.0  our (WD/KG) 
9NN, stand. Euclides  87.7  our (KG) 
RIAC (prob. inductive)  86.9  Hamilton et.al 
1NN, stand. Euclides, f2+f4 rem  86.8  our (WD/KG) 
MLP+backpropagation  85.8  Weiss, Kapouleas 
CART, C4.5 (dec. trees)  84.9  Weiss, Kapouleas 
FSM  84.9  our (RA) 
Bayes rule (statistical)  83.0  Weiss, Kapouleas 
For 90% accuracy and p=0.95 confidence level 2tailed bounds are: [82.8%,94.4%]
S.M. Weiss, I. Kapouleas, "An empirical comparison of pattern recognition,
neural nets and machine learning classification methods", in: J.W. Shavlik
and T.G. Dietterich, Readings in Machine Learning, Morgan Kauffman Publ,
CA 1990
H.J. Hamilton, N. Shan, N. Cercone, RIAC: a rule induction algorithm
based on approximate classification, Tech. Rep. CS 9606, Regina University
1996.
CMLP2LN (logical rules) only estimated loo since
the rules are like PVM.
3 crisp logical rules, overall 91.5% accuracy
Results for 10fold stratified crossvalidation
Method  Accuracy %  Reference 
NBC+WX+G(WX)  ??.5± 7.7  TMGM 
NBC+G(WX)  ??.2± 6.7  TMGM 
kNN auto+G(WX) Eukl  ??.2± 6.7  TMGM 
CMLP2LN  89.6  our logical rules 
20NN, stand. Eukl f 4,1,7  89.3± 8.6  our (KG); feature sel. from CV on the whole data set 
SSV beam leaves  88.7± 8.5  WD 
SVM linear C=1  88.1± 8.6  WD 
6NN, stand. Eukl.  88.0± 7.9  WD 
SSV default  87.8± 8.7  WD 
SSV beam pruning  86.9± 9.8  WD 
kNN, k=auto, Eucl  86.7± 6.6  WD 
FSM, a=0.9, Gauss, cluster  86.1± 8.8  WDGM 
NBC  85.9± 10.2  TMGM 
VSS 1 neuron, 4 it  84.9± 7.4  WD/MK 
SVM Gauss C=32, s=0.1  84.4± 8.2  WD 
MLP+BP (Tooldiag)  83.9  Rafał Adamczak 
RBF (Tooldiag)  80.2  Rafał Adamczak 
Maszczyk T, Duch W, Support Feature Machine, WCCI 2010 (submitted).
From UCI repository, 699 cases, 9 attributes, two classes, 458 (65.5%) & 241 (34.5%).
Results obtained with the leaveoneout test, % of accuracy given.
F6 has 16 missing values, removing these vectors leaves 683 examples.
Method   Reference 
FSM  98.3  our (RA) 
3NN stand Manhatan  97.1  our (KG) 
21NN stand. Euclidean  96.9  our (KG) 
C4.5 (decision tree)  96.0  Hamilton et.al 
RIAC (prob. inductive)  95.0  Hamilton et.al 
H.J. Hamilton, N. Shan, N. Cercone, RIAC: a rule induction algorithm based on approximate classification, Tech. Rep. CS 9606, Regina University 1996.
Results obtained with the 10fold crossvalidation, 16 vectors with F6 values missing removed, 683 samples left, % of accuracy given.
method   Reference 
Naive MFT  97.1  Opper, Winther, L1O est. 97.3 
SVM Gauss, C=1,s=0.1  97.0± 2.3  WDGM 
SVM (10xCV)  96.9  Opper, Winther 
SVM lin, opt C  96.9± 2.2  WDGM, same with Minkovsky kernel 
Cluster means, 2 prototypes  96.5± 2.2  MB 
Default, majority  65.5   
Results obtained with the 10fold crossvalidation, % of accuracy given, all data, missing vlues handled in different ways.
method   Reference 
NB + kernel est  97.5± 1.8  WD, WEKA, 10X10CV 
SVM (5xCV)  97.2  Bennet and Blue 
kNN with DVDM distance  97.1  our (KG) 
GM kNN, k=3, raw, Manh  97.0± 2.1  WD, 10X10CV 
GM kNN, k=opt, raw, Manh  97.0± 1.7  WD, 10CV only 
VSS, 8 it/2 neurons  96.9± 1.8  WD/MK; 98.1% train 
FSMFeature Space Mapping  96.9± 1.4  RA/WD, a=.99 Gaussian 
Fisher linear discr. anal  96.8  Ster, Dobnikar 
MLP+BP  96.7  Ster, Dobnikar 
MLP+BP (Tooldiag)  96.6  Rafał Adamczak 
LVQ  96.6  Ster, Dobnikar 
kNN, Euclidean/Manhattan f.  96.6  Ster, Dobnikar 
SNB, seminaive Bayes (pairwise dependent)  96.6  Ster, Dobnikar 
SVM lin, opt C  96.4± 1.2  WDGM, 16 missing with 10 
VSS, 8 it/1 neuron!  96.4± 2.0  WD/MK, train 98.0% 
GM IncNet  96.4± 2.1  NJ/WD; FKF, max. 3 neurons 
NB  naive Bayes (completly independent)  96.4  Ster, Dobnikar 
SSV opt nodes, 3CV int  96.3± 2.2  WD/GM; training 96.6± 0.5 
IB1  96.3± 1.9  Zarndt 
DBCART (decision tree)  96.2  Shang, Breiman 
GM SSV Tree, opt nodes BFS  96.0± 2.9  WD/KG (beam search 94.0) 
LDA  linear discriminant analysis  96.0  Ster, Dobnikar 
OC1 DT (5xCV)  95.9  Bennet and Blue 
RBF (Tooldiag)  95.9  Rafał Adamczak 
GTO DT (5xCV)  95.7  Bennet and Blue 
ASI  Assistant I tree  95.6  Ster, Dobnikar 
MLP+BP (Weka)  95.4± 0.2  TW/WD 
OCN2  95.2± 2.1  Zarndt 
IB3  95.0± 4.0  Zarndt 
MML tree  94.8± 1.8  Zarndt 
ASR  Assistant R (RELIEF criterion) tree  94.7  Ster, Dobnikar 
C4.5 tree  94.7± 2.0  Zarndt 
LFC, Lookahead Feature Constr binary tree  94.4  Ster, Dobnikar 
CART tree  94.4± 2.4  Zarndt 
ID3  94.3± 2.6  Zarndt 
C4.5 (5xCV)  93.4  Bennet and Blue 
C 4.5 rules  86.7± 5.9  Zarndt 
Default, majority  65.5   
QDA  quadratic discr anal  34.5  Ster, Dobnikar 
For 97% accuracy and p=0.95 confidence level 2tailed bounds are: [95.5%,98.0%]
K.P. Bennett, J. Blue, A Support Vector Machine Approach to Decision Trees, R.P.I Math Report No. 97100, Rensselaer Polytechnic Institute, Troy, NY, 1997
N. Shang, L. Breiman, ICONIP'96, p.133
B. Ster and A. Dobnikar, Neural networks in medical diagnosis: Comparison with other methods. In A. Bulsari et al., editor, Proceedings of the International Conference EANN '96, pages 427430, 1996.
F. Zarndt, A Comprehensive Case Study: An Examination of Machine Learning and Connectionist Algorithms, MSc Thesis, Dept. of Computer Science, Brigham Young University, 1995
From UCI repository (restricted): 286 instances, 201 norecurrenceevents (70.3%),
85 recurrenceevents (29.7%);
9 attributes, between 213 values each, 9 missing values
Results  10xCV? Sometimes methodology was unclear;
difficult, noisy data, some methods are below the base rate (70.3%).
Method   Reference 
CMLP2LN/SSV single rule  76.2± 0.0  WD/K. Grabczewski, stable rule 
SSV Tree rule  75.7± 1.1  WD, av. from 10x10CV 
MML Tree  75.3± 7.8  Zarndt 
SVM Gauss, C=1, s =0.1  73.8± 4.3  WD, GM 
MLP+backprop  73.5± 9.4  Zarndt 
SVM Gauss, C, s opt  72.4± 5.1  WD, GM 
IB1  71.8± 7.5  Zarndt 
CART  71.4± 5.0  Zarndt 
ODT trees  71.3± 4.2  Blanchard 
SVM lin, C=opt  71.0± 4.7  WD, GM 
UCN 2  70.7± 7.8  Zarndt 
SFC, Stack filters  70.6± 4.2  Porter 
Default, majority  70.3± 0.0  ============ 
SVM lin, C=1  70.0± 5.6  WD, GM 
C 4.5 rules  69.7± 7.2  Zarndt 
Bayes rule  69.3± 10.0  Zarndt 
C 4.5  69.2± 4.9  Blanchard 
Weighted networks  6873.5  Tan, Eshelman 
IB3  67.9± 7.7  Zarndt 
ID3 rules  66.2± 8.5  Zarndt 
AQ15  6672  Michalski e.a. 
Inductive  6572  Clark, Niblett 
For 78% accuracy and p=0.95 confidence level 2tailed bounds are: [72.9%,82.4%]
From UCI repository, 155 vectors, 19 attributes,
Two classes, die with 32 (20.6%), live with 123 (79.4%).
Many missing values!
F18 has 67 missing values, F15 has 29, F17 has 16 and other features between 0 and 11.
Results obtained with the leaveoneout test, % of accuracy given
Method   Reference 
21NN, stand Manhattan  90.3  our (KG) 
FSM  90.0  our (RA) 
14NN, stand. Euclid  89.0  our (KG) 
LDA  86.4  Weiss & K 
CART (decision tree)  82.7  Weiss & K 
MLP+backprop  82.1  Weiss & K 
MLP, CART, LDA results from (check it ?) S.M. Weiss, I. Kapouleas, "An empirical comparison of pattern recognition, neural nets and machine learning classification methods", in: J.W. Shavlik and T.G. Dietterich, Readings in Machine Learning, Morgan Kauffman Publ, CA 1990.
Other results  our own;
Results obtained with the 10fold crossvalidation, % of accuracy given;
our results with stratified crossvalidation, other results  who knows?
Differences for this dataset are rather small, 0.10.2%.
Method   Reference 
Weighted 9NN  92.9± ?  Karol Grudziński 
18NN, stand. Manhattan  90.2± 0.7  Karol Grudziński 
FSM with rotations  89.7± ?  Rafał Adamczak 
15NN, stand. Euclidean  89.0± 0.5  Karol Grudziński 
VSS 4 neurons, 5 it  86.5± 8.8  WD/MK, train 97.1 
FSM without rotations  88.5  Rafał Adamczak 
LDA, linear discriminant analysis  86.4  Stern & Dobnikar 
Naive Bayes and SemiNB  86.3  Stern & Dobnikar 
IncNet  86.0  Norbert Jankowski 
QDA, quadratic discriminant analysis  85.8  Stern & Dobnikar 
1NN  85.3± 5.4  Stern & Dobnikar, std added by WD 
VSS 2 neurons, 5 it  85.1± 7.4  WD/MK, train 95.0 
ASR  85.0  Stern & Dobnikar 
Fisher discriminant analysis  84.5  Stern & Dobnikar 
LVQ  83.2  Stern & Dobnikar 
CART (decision tree)  82.7  Stern & Dobnikar 
MLP with BP  82.1  Stern & Dobnikar 
ASI  82.0  Stern & Dobnikar 
LFC  81.9  Stern & Dobnikar 
RBF (Tooldiag)  79.0  Rafał Adamczak 
MLP+BP (Tooldiag)  77.4  Rafał Adamczak 
Results on BP, LVQ, ..., SNB are from: B. Ster and A. Dobnikar, Neural networks in medical diagnosis: Comparison with other methods. In A.
Bulsari et al., editor, Proceedings of the International Conference EANN '96, pages 427430, 1996.
Our good results reflect superior handling of missing values ?
Duch W, Grudziński K (1998) A framework for similaritybased methods.
Second Polish Conference on Theory and Applications of Artificial Intelligence, Lodz, 2830 Sept. 1998, pp. 3360
Weighted kNN: Duch W, Grudzinski K and Diercksen G.H.F (1998) Minimal distance neural methods.
World Congress of Computational Intelligence, May 1998, Anchorage, Alaska, IJCNN'98 Proceedings, pp. 12991304
13 attributes (extracted from 75), no missing values.
270=150+120 observations selected from the 303 cases (Cleveland Heart).
Attribute Information:
1. age  2. sex  3. chest pain type (4 values)  4. resting blood pressure  5. serum cholestorol in mg/dl 
6. fasting blood sugar 120 mg/dl  7. resting electrocardiographic results (values 0,1,2)  8. maximum heart rate achieved  9. exercise induced angina  10. oldpeak = ST depression induced by exercise relative to rest 
11. the slope of the peak exercise ST segment  12. number of major vessels (03) colored
by flouroscopy 
13. thal: 3 = normal; 6 = fixed defect; 7 = reversible defect 
Attributes types: Real: 1,4,5,8,10,12; Ordered:11, Binary: 2,6,9 Nominal:7,3,13
Classes: Absence (1) or presence (2) of heart disease;
In Statlog experiments on heart data
cost or risk matrix has been used with 9fold crossvalidation, only cost values are given.
Results below are obtained with the 10fold crossvalidation, % of accuracy given, no risk matrix
Method   Reference 
Lin SVM 2D QCP  85.9± 5.5  MG, 10xCV 
kNN auto+WX  ??.8± 5.6  TM GM 10xCV 
SVM Gauss+WX+G(WX), C=1 s=2^{5}  ??.8± 6.4  TM GM 10xCV 
SVM lin, C=0.01  84.9± 7.9  WD, GM 10x(9xCV) 
SFM, G(WX), default C=1  ??± 5.1  TM, GM 10xCV 
NaiveBayes  84.5± 6.3  TM, GM 10xCV 
NaiveBayes  83.6  RA, WEKA 
SVML default C=1  82.5± 6.4  TM, GM 10xCV 
K*  76.7  WEKA, RA 
IB1c  74.0  WEKA, RA 
1R  71.4  WEKA, RA 
T2  68.1  WEKA, RA 
MLP+BP  65.6  ToolDiag, RA 
FOIL  64.0  WEKA, RA 
RBF  60.0  ToolDiag, RA 
InductH  58.5  WEKA, RA 
Base rate (majority classifier)  55.7  
IB14  50.0  ToolDiag, RA 
Results for Heart and other Statlog datasest are collected here.
From UCI repository,
303 cases, 13 attributes (4 cont, 9 nominal), 7 vectors with missing values ?
2 (no, yes) or 5 classes (no, degree 1, 2, 3, 4).
Class distribution: 164 (54.1%) no, 55+36+35+13 yes (45.9%) with disease degree 14.
Results obtained with the leaveoneout test, % of accuracy given, 2 classes used.
  Reference 
LDA   Weiss ? 
25NN, stand, Euclid   WD/KG repeat?? 
CMLP2LN   RA, estimated? 
FSM   Rafał Adamczak 
MLP+backprop   Weiss ? 
CART   Weiss ? 
MLP, CART, LDA where are these results from ???
Other results  our own.
Results obtained with the 10fold crossvalidation, % of accuracy given.
Ster & Dobnikar reject 6 vectors (leaving 297) with missing values.
We use all 303 vectors replacing missing values by means for their class; in KNN we have used Stalog
convention, 297 vectors
Method   Reference 
IncNet+transformations  90.0  Norbert Jankowski; check again! 
28NN, stand, Euclid, 7 features  85.1± 0.5  WD/KG 
LDA  84.5  Ster & Dobnikar 
Fisher discriminant analysis  84.2  Ster & Dobnikar 
k=7, Euclid, std  84.2± 6.6  WD, GhostMiner 
16NN, stand, Euclid  84± 0.6  WD/KG 
FSM, 82.484% on test only  84.0  Rafał Adamczak 
k=1:10, Manhattan, std  83.8± 5.3  WD, GhostMiner 
Naive Bayes  82.583.4  Rafał; Ster, Dobnikar 
SNB  83.1  Ster & Dobnikar 
LVQ  82.9  Ster & Dobnikar 
GTO DT (5xCV)  82.5  Bennet and Blue 
kNN, k=19, Eculidean  82.1± 0.8  Karol Grudziński 
k=7, Manhattan, std  81.8± 10.0  WD, GhostMiner 
SVM (5xCV)  81.5  Bennet and Blue 
kNN (k=1? raw data?)  81.5  Ster & Dobnikar 
MLP+BP (standarized)  81.3  Ster, Dobnikar, Rafał Adamczak 
Cluster means, 2 prototypes  80.8± 6.4  MB 
CART  80.8  Ster & Dobnikar 
RBF (Tooldiag, standarized)  79.1  Rafał Adamczak 
Gaussian EM, 60 units  78.6  Stensmo & Sejnowski 
ASR  78.4  Ster & Dobnikar 
C4.5 (5xCV)  77.8  Bennet and Blue 
IB1c (WEKA)  77.6  Rafał Adamczak 
QDA  75.4  Ster & Dobnikar 
LFC  75.1  Ster & Dobnikar 
ASI  74.4  Ster & Dobnikar 
K* (WEKA)  74.2  Rafał Adamczak 
OC1 DT (5xCV)  71.7  Bennet and Blue 
1 R (WEKA)  71.0  Rafał Adamczak 
T2 (WEKA)  69.0  Rafał Adamczak 
FOIL (WEKA)  66.4  Rafał Adamczak 
InductH (WEKA)  61.3  Rafał Adamczak 
Default, majority  54.1  == baserate == 
C4.5 rules  53.8± 5.9  Zarndt 
IB14 (WEKA)  46.2  Rafał Adamczak 
For 85% accuracy and p=0.95 confidence level 2tailed bounds are: [80.5%,88.6%]
Results obtained with BP, LVQ, ..., SNB are from: B. Ster and A. Dobnikar,
Neural networks in medical diagnosis: Comparison with other methods.
In: A. Bulsari et al., editor, Proceedings of the International Conference EANN '96, pages 427430, 1996.
Magnus Stensmo and Terrence J. Sejnowski, A Mixture Model System for Medical and Machine Diagnosis,
Advances in Neural Information Processing Systems 7 (1995) 10771084
Kristin P. Bennett, J. Blue, A Support Vector Machine Approach to Decision Trees, R.P.I Math Report No. 97100, Rensselaer Polytechnic Institute, Troy, NY, 1997
Other results for this dataset (methodology sometimes uncertain):
D. Wettschereck, averaging 25 runs with 70% train and 30% test, variants of kNN with different metric functions
and scaling.
David Aha & Dennis Kibler  From UCI repository past usage
Method  Accuracy %  Reference 
kNN, Value Distance Metric (VDM)   D. Wettschereck 
kNN, Euclidean   D. Wettschereck 
kNN, Variable Similarity Metric   D. Wettschereck 
kNN, Modified VDM   D. Wettschereck 
Other kNN variants   D. Wettschereck 
kNN, Mutual Information   D. Wettschereck 
CLASSIT (hierarchical clustering)   Gennari, Langley, Fisher 
NTgrowth (instancebased)   Aha & Kibler 
C4   Aha & Kibler 
Naive Bayes   Friedman et.al, 5xCV, 296 vectors 
Gennari, J.H., Langley, P, Fisher, D. (1989). Models of incremental concept formation. Artificial Intelligence, 40, 1161.
Friedman N, Geiger D, Goldszmit M (1997). Bayesian networks classifiers. Machine Learning 29: 131163
From the UCI repository, dataset
"Pima Indian diabetes":
2 classes, 8 attributes, 768 instances, 500 (65.1%) negative (class1), and 268 (34.9%) positive tests for diabetes.
class2.
All patients were females at least 21 years old of Pima Indian heritage.
Attributes used:
1. Number of times pregnant
2. Plasma glucose concentration a 2 hours in an oral glucose tolerance test
3. Diastolic blood pressure (mm Hg)
4. Triceps skin fold thickness (mm)
5. 2Hour serum insulin (mu U/ml)
6. Body mass index (weight in kg/(height in m)^2)
7. Diabetes pedigree function
8. Age (years)
Results obtained with the 10fold crossvalidation, % of accuracy given; Statlog results are with 12fold crossvalidation
Method  Accuracy %  Reference 
Logdisc  77.7  Statlog 
IncNet  77.6  Norbert Jankowski 
DIPOL92  77.6  Statlog 
Linear Discr. Anal.  77.577.2  Statlog; Ster & Dobnikar 
SVM, linear, C=0.01  77.5± 4.2  WDGM, 10XCV averaged 10x 
SVM, Gauss, C, sigma opt  77.4± 4.3  WDGM, 10XCV averaged 10x 
SMART  76.8  Statlog 
GTO DT (5xCV)  76.8  Bennet and Blue 
kNN, k=23, Manh, raw, W  76.7± 4.0  WDGM, feature weighting 3CV 
kNN, k=1:25, Manh, raw  76.6± 3.4  WDGM, most cases k=23 
ASI  76.6  Ster & Dobnikar 
Fisher discr. analysis  76.5  Ster & Dobnikar 
MLP+BP  76.4  Ster & Dobnikar 
MLP+BP  75.8± 6.2  Zarndt 
LVQ  75.8  Ster & Dobnikar 
LFC  75.8  Ster & Dobnikar 
RBF  75.7  Statlog 
NB  75.573.8  Ster & Dobnikar; Statlog 
kNN, k=22, Manh  75.5  Karol Grudziński 
MML  75.5± 6.3  Zarndt 
FSM stand. 5 feat.  75.4± 4.9  WD, 10x10 test, CC>0.15 
SNB  75.4  Ster & Dobnikar 
BP  75.2  Statlog 
SSV DT  75.0± 3.6  WDGM, SSV BS, node 5CV MC 
kNN, k=18, Euclid, raw  74.8± 4.8  WDGM 
CART DT  74.7± 5.4  Zarndt 
CART DT  74.5  Stalog 
DBCART  74.4  Shang & Breiman 
ASR  74.3  Ster & Dobnikar 
FSM standard  74.1± 1.1  WD, 10x10 test 
ODT, dyadic trees  74.0± 2.3  Blanchard 
Cluster means, 2 prototypes  73.7± 3.7  MB 
SSV DT  73.7± 4.7  WDGM, SSV BS, node 10CV strat 
SFC, stacking filters  73.3± 1.9  Porter 
C4.5 DT  73.0  Stalog 
C4.5 DT  72.7± 6.6  Zarndt 
Bayes  72.2± 6.9  Zarndt 
C4.5 (5xCV)  72.0  Bennet and Blue 
CART  72.8  Ster & Dobnikar 
Kohonen  72.7  Statlog 
C4.5 DT  72.1± 2.6  Blanchard (averaged over 100 runs) 
kNN  71.9  Ster & Dobnikar 
ID3  71.7± 6.6  Zarndt 
IB3  71.7± 5.0  Zarndt 
IB1  70.4± 6.2  Zarndt 
kNN, k=1, Euclides, raw  69.4± 4.4  WDGM 
kNN  67.6  Statlog 
C4.5 rules  67.0± 2.9  Zarndt 
OCN2  65.1± 1.1  Zarndt 
Default, majority  65.1  
QDA  59.5  Ster, Dobnikar 
For 77.7% accuracy and p=0.95 confidence level 2tailed bounds are: [74.6%,80.5%]
Results on BP, LVQ, ..., SNB are from: B. Ster and A. Dobnikar, Neural networks in medical diagnosis: Comparison with other methods. In A. Bulsari et al., editor, Proceedings of the International Conference EANN '96, pages 427430, 1996.
Other results (with different tests):
Method   Reference 
SVM (5xCV)  77.6  Bennet and Blue 
C4.5  76.0± 0.9  Friedman, 5xCV 
SemiNaive Bayes  76.0± 0.8  Friedman, 5xCV 
Naive Bayes  74.5± 0.9  Friedman, 5xCV 
Default, majority  65.1 
Friedman N, Geiger D, Goldszmit M (1997). Bayesian networks classifiers. Machine Learning 29: 131163
Opper/Winther use 200 training and 332 test examples (following Rippley), with TAP MFT results on test 81%, SVS at 80.1% and best NN as 77.4%.
Thyroid, From UCI repository, dataset "anntrain.data":
A Thyroid database suited for training ANNs.
3772 learning and 3428 testing examples; primary hypothyroid, compensated hypothyroid, normal.
Training: 93+191+3488 or 2.47%, 5.06%, 92.47%
Test: 73+177+3178 or 2.13%, 5.16%, 92.71%
21 attributes (15 binary, 6 continuous); 3 classes
The problem is to determine whether a patient referred to the clinic has hypothyroid. Therefore three classes are built: normal (not hypothyroid), hyperfunction and subnormal functioning. Because 92 percent of the patients are not hyperthyroid. A good classifier must be significant better than 92%.
Note: These are the data Quinlan has used in the case study of in the article "Simplifying Decision Trees" (International Journal of ManMachine Studies (1987) 221234)
Names: I (W.D.) have investigated this issue and after some mail exchange with Chris Mertz, who maintains the UCI repository; here is the conclusion:
1 age: continuous  2 sex: {M, F}  3 on thyroxine: logical 
4 maybe on thyroxine: logical  5 on antithyroid medication: logical  6 sick  patient reports malaise: logical 
7 pregnant: logical  8 thyroid surgery: logical  9 I131 treatment: logical 
10 test hypothyroid: logical  11 test hyperthyroid: logical  12 on lithium: logical 
13 has goitre: logical  14 has tumor: logical  15 hypopituitary: logical 
16 psychological symptoms: logical  17 TSH: continuous  18 T3: continuous 
19 TT4: continuous  20 T4U: continuous  21 FTI: continuous 
Method  Reference  
CMLP2LN rules+ASA  99.90  99.36  Rafał/Krzysztof/Grzegorz 
CART  99.80  99.36  Weiss 
PVM  99.80  99.33  Weiss 
SSV beam search  99.80  99.33  WD 
IncNet  99.68  99.24  Norbert 
SSV opt leaves or pruning  99.7  99.1  WD 
MLP init+ a,b opt.  99.5  99.1  Rafał 
CMLP2LN rules  99.7  99.0  Rafał/Krzysztof 
Cascade correlation  100.0  98.5  Schiffmann 
Local adapt. rates  99.6  98.5  Schiffmann 
BP+genetic opt.  99.4  98.4  Schiffmann 
Quickprop  99.6  98.3  Schiffmann 
RPROP  99.6  98.0  Schiffmann 
3NN, Euclides, with 3 features  98.7  97.9  W.D./Karol 
1NN, Euclides, with 3 features  98.4  97.7  W.D./Karol 
Best backpropagation  99.1  97.6  Schiffmann 
1NN, Euclides, 8 features used    97.3  Karol/W.D. 
SVM Gauss, C=8 s=0.1  98.3  96.1  WD 
Bayesian classif.  97.0  96.1  Weiss? 
SVM Gauss, C=1 s=0.1  95.4  94.7  WD 
BP+conj. gradient  94.6  93.8  Schiffmann 
1NN Manhattan, std data  93.8  Karol G./WD  
SVM lin, C=1  94.1  93.3  WD 
SVM Gauss, C=8 s=5  100  92.8  WD 
Default, majority 250 test errors  92.7  
1NN Manhattan, raw data  92.2  Karol G./WD 
For 99.90% accuracy on training and p=0.95 confidence level 2tailed bounds are: [99.74%,99.96%]
Most NN results from W. Schiffmann, M. Joost, R. Werner, 1993; MLP2LN and Init+a,b ours.
kNN, PVM and CART from S.M. Weiss, I. Kapouleas, "An empirical
comparison of pattern recognition, neural nets and machine learning classification
methods", in: J.W. Shavlik and T.G. Dietterich, Readings in Machine
Learning, Morgan Kauffman Publ, CA 1990
SVM with linear and Gaussian kernels gives quite poor results on this data.
3 crisp logical rules using TSH, FTI, T3, on_thyroxine, thyroid_surgery, TT4 give 99.3% of accuracy on the test set.
Contains medical records of 536 patients admitted to a universityaffiliated Tokyobased hospital, with four types of hepatobiliary disorders: alcoholic liver damage, primary hepatoma, liver cirrhosis and cholelithiasis. The records included results of 9 biochemical tests and sex of the patient. The same 163 cases as in [Hayashi et.al] were used as the test data.
FSM gives about 60 Gaussian or triangular membership functions achieving accuracy of 75.575.8%. Rotation of these functions (i.e. introducing linear combination of inputs to the rules) does not improve this accuracy. 10fold crossvalidation tests on the mixed, training plus test data, give similar results. The best results were obtained with the K* method based on algorithmic complexity optimization, giving 78.5% on the test set, and kNN with Manhattan distance function, k=1 and selection of features (using the leaveoneout method on the training data, features 2, 5, 6 and 9 were removed), giving 80.4% accuracy. Simulated annealing optimization of the scaling factors for the remaining 5 features give 81.0% and optimizing scaling factors using all input features 82.8%. The scaling factors are: 0.92, 0.60, 0.91, 0.92, 0.07, 0.41, 0.55, 0.86, 0.30. Similar accuracy is obtained using multisimplex method for optimization of the scaling factors.
Method  Training set  Test set  Reference 
IB2IB4  81.285.5  43.644.6  WEKA, our calculation 
Naive Bayes    46.6  WEKA, our calculation 
1R (rules)  58.4  50.3  WEKA, our calculation 
T2 (rules from decision tree)  67.5  53.3  WEKA, our calculation 
FOIL (inductive logic)  99  60.1  WEKA, our calculation 
FSM, initial 49 crisp logical rules  83.5  63.2  FSM, our calculation 
LDA (statistical)  68.4  65.0  our calculation 
DLVQ (38 nodes)  100  66.0  our calculation 
C4.5 decision rules  64.5  66.3  our calculation 
Best fuzzy MLP model  75.5  66.3  Mitra et. al 
MLP with RPROP  68.0  our calculation  
Cascade Correlation  71.0  our calculation  
Fuzzy neural network  100  75.5  Hayashi 
C4.5 decision tree  94.4  75.5  our calculation 
FSM, Gaussian functions  93  75.6  our calculation 
FSM, 60 triangular functions  93  75.8  our calculation 
IB1c (instancebased)    76.7  WEKA, our calculation 
kNN, k=1, Canberra, raw  76.1  80.4  WD/SBL 
K* method    78.5  WEKA, our calculation 
1NN, 4 features removed, Manhattan  76.9  80.4  our calculation, KG 
1NN, Canberra, raw, removed f2, 6, 8, 9  77.2  83.4  our calculation, KG 
Y. Hayashi, A. Imura, K. Yoshida, Fuzzy neural expert system and its application to medical diagnosis. In: 8th International Congress on Cybernetics and Systems, New York City 1990, pp. 5461
S. Mitra, R. De, S. Pal, Knowledge based fuzzy MLP for classification and rule generation. IEEE Transactions on Neural Networks 8, 13381350, 1997, a knowledgebased fuzzy MLP system gives results on the test set in the range from 33% to 66.3%, depending on the actual fuzzy model used.
W. Duch and K. Grudziński, ``Prototype Based Rules  New Way to Understand the Data,'' Int. Joint Conference on Neural Networks, Washington D.C., pp. 18581863, 2001. Contains best results with 1NN, Canberra and feature selection, 83.4% on the test.
Training 4435 test 2000 cases, 36 semicontinuous [0 to 255] attributes (= 4 spectral bands x 9 pixels in neighborhood) and 6 decision classes: 1,2,3,4,5 and 7 (class 6 has been removed because of doubts about the validity of this class).
The StatLog database consists of the multispectral values of pixels in 3x3 neighborhoods in a satellite image, and the classification associated with the central pixel in each neighborhood. The aim is to predict this classification, given the multispectral values. In the sample database, the class of a pixel is coded as a number.
Method   
 
MLP+SCG  96.0  91.0  reg alfa=0.5, 36 hidden nodes, 1400 it  fast; WD 
kNN    90.9  autok=3, Manhattan, std data  GM 2.0 
kNN  91.1  90.6  2105, Statlog  944; parametry? 
kNN    90.4  autok=5, Euclidean, std data  GM 2.0 
kNN    90.0  k=1, Manhattan, std data, no training  fast, GM 2.0 
FSM  95.1  89.7  std data, a=0.95  fast, GM 2.0; best NN result 
LVQ  95.2  89.5  1273  44 
kNN    89.4  k=1, Euclidean, std data, no training  fast, GM 2.0 
Dipol92  94.9  88.9  746  111 
MLP+SCG  94.4  88.5  5000 it; active learning+reg a=0.5, 812 hidden  fast; WD 
SVM  91.6  88.4  std data, Gaussian kernel  fast, GM 2.0; unclassified 4.3% 
Radial  88.9  87.9  564  74 
Alloc80  96.4  86.8  63840  28757 
IndCart  97.7  86.2  2109  9 
CART  92.1  86.2  330  14 
MLP+BP  88.8  86.1  72495  53 
Bayesian Tree  98.0  85.3 
248  10 
C4.5  96.0  85.0  434  1 
New ID  93.3  85.0 
226  53 
QuaDisc  89.4  84.5 
157  53 
SSV  90.9  84.3 
default par.  very fast, GM 2.0 
Cascade  88.8  83.7 
7180  1 
Log DA, Disc  88.1  83.7 
4414  41 
LDA, Discrim  85.1  82.9
 68  12 
Kohonen  89.9  82.1 
12627  129 
Bayes  69.2  71.3 
75  17 
The original database was generated from Landsat MultiSpectral Scanner image data. The sample database was generated taking a small section (82 rows and 100 columns) from the original data. One frame of Landsat MSS imagery consists of four digital images of the same scene in different spectral bands. Two of these are in the visible region (corresponding approximately to green and red regions of the visible spectrum) and two are in the (near) infrared. Each pixel is a 8bit binary word, with 0 corresponding to black and 255 to white. The spatial resolution of a pixel is about 80m x 80m. Each image contains 2340 x 3380 such pixels.
The database is a (tiny) subarea of a scene, consisting of 82 x 100 pixels. Each line of data corresponds to a 3x3 square neighborhood of pixels completely contained within the 82x100 subarea. Each line contains the pixel values in the four spectral bands (converted to ASCII) of each of the 9 pixels in the 3x3 neighborhood and a number indicating the classification label of the central pixel. In each line of data the four spectral values for the topleft pixel are given first followed by the four spectral values for the topmiddle pixel and then those for the topright pixel, and so on with the pixels read out in sequence lefttoright and toptobottom. Thus, the four spectral values for the central pixel are given by attributes 17,18,19 and 20. If you like you can use only these four attributes, while ignoring the others. This avoids the problem which arises when a 3x3 neighborhood straddles a boundary.
All results from Statlog book, except GM  GhostMiner calculations, W. Duch.
N  Description  Train  Test 
1  red soil  1072 (24.17%)  461
(23.05%) 
2  cotton crop  479 (10.80%) 
224 (11.20%) 
3  grey soil  961 (21.67%)  397
(19.85%) 
4  damp grey soil  415 (09.36%)  211 (10.55%) 
5  veg. Stubble  470 (10.60%) 
237 (11.85%) 
6  Mixture class  0  0

7  very damp grey soil  1038 (23.40%) 
470 (23.50%) 
Machine Learning, Neural and Statistical Classification, D. Michie, D.J. Spiegelhalter, C.C. Taylor (eds), Stalog project book!
351 data records, with class division 224 (63.8%) + 126 (35.9%).
Usually first 200 vectors are taken for training, and last 151 for the test, but this is very unbalanced: in the training set 101 (50.5%) and 99 (49.5%) are from 1/2 class,
in the test set 123 (82%) and 27 (18%) are from class 1/2.
34 attributes, but f2=0 always and should be removed; f1 is binary, the remaining 32 attributes are continuous.
2 classes  different types of radar signals reflected from ionosphere.
Some vectors: 8, 18, 20, 22, 24, 30, 38, 52, 76, 78, 80, 82, 103, 163, 169, 171, 183, 187, 189, 191, 201, 215, 219, 221, 223, 225, 227, 229, 231, 233, 249, are either binary 0, 1 or have only 3 values 1, 0, +1.
For example, vector 169 has only one component = 1, all others are 0.
  Reference 
3NN + simplex  98.7  Our own weighted kNN 
VSS 2 epochs  96.7  MLP with numerical gradient 
3NN  96.7  KG, GM with or without weights 
IB3  96.7  Aha, 5 errors on test 
1NN, Manhattan  96.0  GM kNN (our) 
MLP+BP  96.0  Sigillito 
SVM Gaussian  94.9± 2.6  GM (our), defaults, similar for C=1100 
C4.5  94.9  Hamilton 
3NN Canberra  94.7  GM kNN (our) 
RIAC  94.6  Hamilton 
C4 (no windowing)  94.0  Aha 
C4.5  93.7  Bennet and Blue 
SVM  93.2  Bennet and Blue 
Nonlin perceptron  92.0  Sigillito 
FSM + rotation  92.8  our 
1NN, Euclidean  92.1  Aha, GM kNN (our) 
DBCART  91.3  Shang, Breiman 
Linear perceptron  90.7  Sigillito 
OC1 DT  89.5  Bennet and Blue 
CART  88.9  Shang, Breiman 
SVM linear  87.1± 3.9  GM (our), defaults 
GTO DT  86.0  Bennet and Blue 
Perceptron+MLP results:
Sigillito, V. G., Wing, S. P., Hutton, L. V., & Baker, K. B. (1989)
Classification of radar returns from the ionosphere using neural networks.
Johns Hopkins APL Technical Digest, 10, 262266.
N. Shang, L. Breiman, ICONIP'96, p.133
David Aha: kNN+C4+IB3, from Aha, D. W., & Kibler, D. (1989). Noisetolerant instancebased learning algorithms. In Proceedings of the Eleventh International Joint Conference on Artificial Intelligence (pp. 794799). Detroit, MI: Morgan Kaufmann.
IB3 parameter settings: 70% and 80% for acceptance and dropping respectively.
RIAC, C4.5 from: H.J. Hamilton, N. Shan, N. Cercone, RIAC: a rule induction
algorithm based on approximate classification, Tech. Rep. CS 9606, Regina
University 1996.
K.P. Bennett, J. Blue,
A Support Vector Machine Approach to Decision Trees, R.P.I Math Report No. 97100, Rensselaer Polytechnic Institute, Troy, NY, 1997
Training/test division is not too good in this case, distributions are a bit different.
In 10xCV results are:
  Reference 
SFM+G+G(WX)  ??± 2.6  GM (our), C=1, s=2^{5} 
kNN auto+WX+G(WX)  ??.4± 3.6  GM (our) 
SVM Gaussian  94.8± 3.5  GM (our), C=1, s=0.1, 10x10CV, std 
SVM Gaussian  94.6± 4.3  GM (our), C=1, s=2^{5} 
VSSMKNN  91.5± 4.3  MK, 12 neurons (similar 817) 
SVM lin  89.5± 3.8  GM (our), C=1, s=2^{5} 
SSV tree  87.8± 4.5  GM (our), default 
1NN  85.8± 4.9  GM std, Euclid 
3NN  84.0± 5.4  GM std, Euclid 
VSS is an MLP with search, implemented by Mirek Kordos, used with 3 epochs;
neurons may be sigmoidal or stepwise (64 values).
Maszczyk T, Duch W,
Support Feature Machine, WCCI 2010 (submitted).
208 cases, 60 continuous attributes, 2 classes, 111 metal, 97 rock.
From the CMU benchmark repository
This dataset has been used in two kinds of experiments:
1. The "aspectangle independent" experiments use all 208 cases with 13fold crossvalidation, averaged over 10 runs to get std.
2. The "angle independent experiments" use training / test sets with 104 vectors each. Class distribution in training is 49 + 55, in test 62 + 42.
Estimation of L1O on the whole dataset (Opper and Winther) give 78.2% only; is the test so easy? Some of this results were made without standardization of the data, which is here very important!
The "angle independent experiments" with training / test sets.
Method    Reference 
1NN, 5D from MDS, Euclid, std  97.1  our, GM (WD)  
1NN, Manhattan std  97.1  our, GM (WD)  
1NN, Euclid std  96.2  our, GM (WD)  
TAP MFT Bayesian    92.3  Opper, Winther 
Naive MFT Bayesian    90.4  Opper, Winther 
SVM    90.4  Opper, Winther 
MLP+BP, 12 hidden, best MLP    90.4  Gorman, Sejnowski 
1NN, Manhattan raw  92.3  our, GM (WD)  
1NN, Euclid raw  91.3  our, GM (WD)  
FSM  methodology ?  83.6  our (RA) 
The "angle dependent experiments" with 13 CV on all data.
1NN Euclid on 5D MDS input  88.0± 7.8  our GM (WD) av 10x10CV  
1NN Euclidean, std data  87.7± 6.8  our GM (WD), 10x10CV av  
1NN Manhattan, std data  86.7± 8.6  our GM (WD) av 10x10CV  
MLP+BP, 12 hidden  99.8± 0.1  84.7± 5.7  Gorman, Sejnowski 
1NN Manhattan, raw data  84.8± 8.3  our GM (WD) av 10x10CV  
MLP+BP, 24 hidden  99.8± 0.1  84.5± 5.7  Gorman, Sejnowski 
MLP+BP, 6 hidden  99.7± 0.2  83.5± 5.6  Gorman, Sejnowski 
SVM linear, C=0.1  82.7± 8.5  our GM (WD), std data  
1NN Euclidean, raw data  82.4± 10.7  our GM (WD) av 10x10CV  
SVM Gauss, C=1, s=0.1  77.4± 10.1  our GM (WD), std data  
SVM linear, C=1  76.9± 11.9  our GM (WD), raw data  
SVM linear, C=1  76.0± 9.8  our GM (WD), std data  
       
DBCART, 10xCV  81.8  Shang, Breiman  
CART, 10xCV  67.9  Shang, Breiman 
M. Opper and O. Winther, Gaussian Processes and SVM: Mean Field Results and LeaveOneOut. In: Advances in Large Margin Classifiers, Eds. A. J. Smola, P. Bartlett, B. Sch�lkopf, D. Schuurmans, MIT Press, 311326, 2000; same methodology as Gorman with Sejnowski.
N. Shang, L. Breiman, ICONIP'96, p.133, 10xCV
Gorman, R. P., and Sejnowski, T. J. (1988). "Analysis of Hidden Units in a Layered Network Trained to Classify Sonar Targets", Neural Networks 1, pp. 7589, 13xCV
Our results: kNN results from 10xCV and from 13xCV are quite similar, so Shang and Breiman should not differ much from 13 CV.
WD Leaveoneout (L1O) estimations on std data:
L1O with k=1, Euclidean distance, for all data gives 87.50%, other k and distance function do not give significant improvement.
SVM linear, C=1, L1O 75.0%, for Gaussian kernel, C=1, L1O is 78.8%
Other L1O results taken from C. Domeniconi, J. Peng, D. Gunopulos, "An adaptive metric for pattern classification".
Discriminant Adaptive NN, DANN  92.3  
Adaptive metric NN  90.9  
kNN  87.5  
SVM Gauss C=1  78.8  
C4.5  76.9  
SVM linear C=1  75.0 
528 training, 462 test cases, 10 continuous attributes, 11 classes
From the UCI benchmark repository.
Speaker independent recognition of the eleven steady state vowels of British English using a specified training set of lpc derived log area ratios.
Results on the total set
Method    Reference 
CARTDB, 10xCV on total set !!!  90.0  Shang, Breiman  
CART, 10xCV on total set  78.2  Shang, Breiman 
Method    Reference 
Square node network, 88 units  54.8  UCI  
Gaussian node network, 528 units  54.6  UCI  
1NN, Euclides, raw  99.24  56.3  WD/KG 
Radial Basis Function, 528 units  53.5  UCI  
Gaussian node network, 88 units  53.5  UCI  
FSM Gauss, 10CV na treningowym  92.60  51.94  our (RA) 
Square node network, 22  51.1  UCI  
Multilayer perceptron, 88 hidden  50.6  UCI  
Modified Kanerva Model, 528 units  50.0  UCI  
Radial Basis Function, 88 units  47.6  UCI  
Singlelayer perceptron, 88 hidden  33.3  UCI 
N. Shang, L. Breiman, ICONIP'96, p.133, made 10xCv instead of using the test set.
871 patterns, 6 overlapping vowel classes (Indian Telugu vowel sounds), 3 features (formant frequencies).
Method   Reference 
10xCV tests below  
3NN, Manhattan  87.8± 4.0  Kosice 
3NN, Canberra  87.8± 4.2  WD/GM 
FSM, 65 Gaussian nodes  87.4± 4.5  Kosice 
3NN, Euclid  87.3± 3.9  WD/GM 
SSV dec. tree, 22 rules  86.0± ??  Kosice 
SVM Gauss opt C~1000, s~1  85.0± 4.0  WD, Ghostminer 
SVM Gauss C=1000, s=1  83.5± 4.1  WD, Ghostminer 
SVM, Gauss, C=1, s=0.1  76.6± 2.5  WD, Ghostminer 
2xCV tests below  
3NN, Euclidean  86.1± 0.6  Kosice 
FSM, 40 Gaussian nodes  85.2± 1.2  Kosice 
MLP  84.6  Pal 
Fuzzy MLP  84.2  Pal 
SSV dec. tree, beam search  83.3± 0.9  Kosice 
SSV dec. tree, best first  83.0± 1.0  Kosice 
Bayes Classifier  79.2  Pal 
Fuzzy SOM  73.5  Pal 
Parameters in SVM were optimized, that is in each CV different parameters were used, so only approximate value can be quoted. If they are fixed to C=1000, s=1 results are a bit worse.
Papers using this data:
Source: UCI, described in Forina, M. et al, PARVUS  An Extendible Package for Data Exploration, Classification and Correlation. Institute of Pharmaceutical and Food Analysis and Technologies, Via Brigata Salerno, 16147 Genoa, Italy.
These data are the results of a chemical analysis of wines grown in the same region in Italy but derived from three different cultivars. The analysis determined the quantities of 13 constituents found in each of the three types of wines.
Class distribution: 178 cases = [59, 71, 48] in Class 13;
13 continuous attributes: alcohol, malicacid, ash, alkalinity, magnesium, phenols, flavanoids, nonanthocyanins, proanthocyanins, color, hue, OD280/D315, proline.
Method   Reference 
Leaveoneout test results  
RDA  100  [1] 
QDA  99.4  [1] 
LDA  98.9  [1] 
kNN, Manhattan, k=1  98.7  GMWD, std data 
1NN  96.1  [1] ztransformed data 
kNN, Euclidean, k=1  95.5  GMWD, std data 
kNN, Chebyshev, k=1  93.3  GMWD, std data 
10xCV tests below  
kNN, Manhattan, auto k=110  98.9± 2.3  GMWD, 2D data, after MDS/PCA 
IncNet, 10CV, def, Gauss  98.9± 2.4  GMWD, std data, up to 3 neurons 
10 CV SSV, opt prune  98.3± 2.7  GMWD, 2D data, after MDS/PCA 
10 CV SSV, node count 7  98.3± 2.7  GMWD, 2D data, after MDS/PCA 
kNN, Euclidean, k=1  97.8± 2.8  GMWD, 2D data, after MDS/PCA 
kNN, Manhattan, k=1  97.8± 2.9  GMWD, 2D data, after MDS/PCA 
kNN, Manhattan, auto k=110  97.8± 3.9  GMWD 
kNN, Euclidean, k=3, weighted features  97.8± 4.7  GMWD 
IncNet, 10CV, def, bicentral  97.2± 2.9  GMWD, std data, up to 3 neurons 
kNN, Euclidean, auto k=110  97.2± 4.0  GMWD 
10 CV SSV, opt node  97.2± 5.4  GMWD, 2D data, after MDS/PCA 
FSM a=.99, def  96.1± 3.7  GMWD, 2D data, after MDS/PCA 
FSM 10CV, Gauss, a=.999  96.1± 4.7  GMWD, std data, 811 neurons 
FSM 10CV, triang, a=.99  96.1± 5.9  GMWD, raw data 
kNN, Euclidean, k=1  95.5± 4.4  GMWD 
10 CV SSV, opt node, BFS  92.8± 3.7  GMWD 
10 CV SSV, opt node, BS  91.6± 6.5  GMWD 
10 CV SSV, opt prune, BFS  90.4± 6.1  GMWD 
UCI past usage:
[1] S. Aeberhard, D. Coomans and O. de Vel, Comparison of Classifiers in High Dimensional Settings, Tech. Rep. no. 9202, (1992), Dept. of Computer Science and Dept. of Mathematics and Statistics, James Cook University of North Queensland (submitted to Technometrics).
[2] S. Aeberhard, D. Coomans and O. de Vel, "The classification performance of RDA" Tech. Rep. no. 9201, (1992), Dept. of Computer Science and Dept. of Mathematics and Statistics, James Cook University of North Queensland (submitted to Journal of Chemometrics).
Shang, Breiman CART 71.4% accuracy, DBCART 70.6%.
Leaveoneout results taken from C. Domeniconi, J. Peng, D. Gunopulos, "An adaptive metric for pattern classification".
Adaptive metric NN  75.2  
Discriminant Adaptive NN, DANN  72.9  
kNN  72.0  
C4.5  68.2 
Stalog Data: splice junctions are points on a DNA sequence at which `superfluous' DNA
is removed during the process of protein creation in higher organisms.
The problem posed in this dataset is to recognize, given a sequence of
DNA, the boundaries between exons (the parts of the DNA sequence retained
after splicing) and introns (the parts of the DNA sequence that are spliced
out).
This problem consists of two subtasks: recognizing exon/intron boundaries
(referred to as EI sites), and recognizing intron/exon boundaries (IE sites).
(In the biological community, IE borders are referred to a "acceptors''
while EI borders are referred to as "donors''.)
Number of Instances: 3190. Class distribution:
Class   
 464 (23.20%)  303 (25.55%) 
 485 (24.25%)  280 (23.61%) 
 1051 (52.55%)  603 (50.84%) 
 2000 (100%)  1186 (100%) 
Number of attributes: originally 60 attributes {a,c,t,g}, usually converted to 180 binary indicator variables {(0,0,0), (0,0,1), (0,1,0), (1,0,0)}, or 240 binary variables.
Much better performance is generally observed if attributes closest to the junction are used (middle). In the StatLog version (180 variables), this means using attributes A61 to A120 only.
Method  
 
RBF, 720 nodes  98.5  95.9  
kNN GM, p(XC), k=6, Euclid, raw  96.8  95.5  0  short 
Dipol92  99.3  95.2  213  10 
Alloc80  93.7  94.3  14394   
QuaDisc  100.0  94.1  1581  809 
LDA, Discrim  96.6  94.1  929  31 
FSM, 8 Gaussians, 180 binary  95.4  94.0  
Log DA, Disc  99.2  93.9  5057  76 
SSV Tree, p(XC), opt node, 4CV  94.8  93.4  short  short 
Naive Bayes  94.8  93.2  52  15 
Castle, middle 90 binary var  93.9  92.8  397  225 
IndCart, 180 binary  96.0  92.7  523  516 
C4.5, on 60 features  96.0  92.4  9  2 
CART, middle 90 binary var  92.5  91.5  615  9 
MLP+BP  98.6  91.2  4094  9 
Bayesian Tree  99.9  90.5  82  11 
CN2  99.8  90.5  869  74 
New ID  100.0  90.0  698  1 
Ac2  100.0  90.0  12378  87 
Smart  96.6  88.5  79676  16 
Cal5  89.6  86.9  1616  8 
Itrule  86.9  86.5  2212  6 
kNN  91.1  85.4  2428  882 
Kohonen  89.6  66.1     
Default, majority  52.5  50.8 
kNN GM  GhostMiner version of kNN (our group)
SSV Decision Tree  our results
Maintained by Wlodzislaw Duch.