Datasets used for classification: comparison of results |
Before using any new dataset it should be described here!
Results from the Statlog project are here.
Logical rules derived for data are here.
Medical:
Appendicitis |
Breast cancer (Wisconsin) |
Breast Cancer (Ljubljana) |
Diabetes (Pima Indian) |
Heart disease (Cleveland) |
Heart disease (Statlog version) |
Hepatitis |
Hypothyroid |
Hepatobiliary disorders |
Other datasets:
Ionosphere |
Satellite image dataset (Statlog version) |
Sonar |
Telugu Vovel |
Vovel |
Wine |
Other data: Glass, DNA |
More results for Statlog datasets.
A note of caution: comparison of different classifiers is not an easy task. Before you get into ranking of methods using the numbers presented in tables below please note the following facts.
Many results we have collected give only a single number (even results from the StatLog project!), without standard deviation. Since most classifiers may give results that differ by several percent on slightly different data partitions single numbers do not mean much.
Leave-one-out tests have been criticized as a basis for accuracy evaluation, the conclusion is that crossvalidation is safer, cf:
Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proc. of the 14th Int. Joint Conference on Artificial Intelligence, Morgan Kaufmann, pp. 1137-1143.
Crossvalidation tests (CV) are also not ideal. Theoretically about 2/3 of results should be within a single standard deviation from the average, and 95% of results should be within two standard deviations, so in a 10-fold crossvalidation you should see very rarely reuslts that are beter or worse than 2xSTDs. Running CV several times may also give you different answers. Search for the best estimator continues. Cf:
Dietterich, T. (1998) Approximate statistical tests for comparing supervised classification learning algorithms. Neural Computation, 10 (7), 1895-1924;
Nadeau C, Bengio Y. (1999) Inference for the Generalization Error. Tech. rep. 99s-25, CIRANO, J. Machine Learning (Kluver, in print).
Even the best accuracy and variance estimation is not sufficient, since performance cannot be characterized by a single number. It should be much better to provide full Receiver Operator Curves (ROC). Combining ROC with variance estimation would be ideal.
Unfortunately this still remains to be done. All we can do now is to collect some numbers in tables.
Our results are obtained usually with the GhostMiner package, developed in our group.
Some publications with results are on my page.
TuneIT, Testing Machine Learning & Data Mining Algorithms - Automated Tests, Repeatable Experiments, Meaningful Results.
Results of hand-written signs and numbers classification are here.
106 vectors, 8 attributes, two classes (85 acute a. +21 other, or 80.2+19.8%), data from Shalom Weiss;
Results obtained with the leave-one-out test, % of accuracy given
Attribute names: WBC1, MNEP, MNEA, MBAP, MBAA, HNEP, HNEA
Method | Accuracy % | Reference |
PVM (logical rules) | 89.6 | Weiss, Kapouleas |
C-MLP2LN (logical rules) | 89.6±? | our |
k-NN, stand. Manhatan, k=8,9,22-25 k=4,5, stand. Euclid, f2+f4 removed | 88.7 | our (WD/KG) |
9-NN, stand. Euclides | 87.7 | our (KG) |
RIAC (prob. inductive) | 86.9 | Hamilton et.al |
1-NN, stand. Euclides, f2+f4 rem | 86.8 | our (WD/KG) |
MLP+backpropagation | 85.8 | Weiss, Kapouleas |
CART, C4.5 (dec. trees) | 84.9 | Weiss, Kapouleas |
FSM | 84.9 | our (RA) |
Bayes rule (statistical) | 83.0 | Weiss, Kapouleas |
For 90% accuracy and p=0.95 confidence level 2-tailed bounds are: [82.8%,94.4%]
S.M. Weiss, I. Kapouleas, "An empirical comparison of pattern recognition, neural nets and machine learning classification methods", in: J.W. Shavlik and T.G. Dietterich, Readings in Machine Learning, Morgan Kauffman Publ, CA 1990
H.J. Hamilton, N. Shan, N. Cercone, RIAC: a rule induction algorithm based on approximate classification, Tech. Rep. CS 96-06, Regina University 1996.
C-MLP2LN (logical rules) only estimated l-o-o since the rules are like PVM.
3 crisp logical rules, overall 91.5% accuracy
Results for 10-fold stratified crossvalidation
Method | Accuracy % | Reference |
NBC+WX+G(WX) | ??.5±7.7 | TM-GM |
NBC+G(WX) | ??.2±6.7 | TM-GM |
kNN auto+G(WX) Eukl | ??.2±6.7 | TM-GM |
C-MLP2LN | 89.6 | our logical rules |
20-NN, stand. Eukl f 4,1,7 | 89.3±8.6 | our (KG); feature sel. from CV on the whole data set |
SSV beam leaves | 88.7±8.5 | WD |
SVM linear C=1 | 88.1±8.6 | WD |
6-NN, stand. Eukl. | 88.0±7.9 | WD |
SSV default | 87.8±8.7 | WD |
SSV beam pruning | 86.9±9.8 | WD |
kNN, k=auto, Eucl | 86.7±6.6 | WD |
FSM, a=0.9, Gauss, cluster | 86.1±8.8 | WD-GM |
NBC | 85.9±10.2 | TM-GM |
VSS 1 neuron, 4 it | 84.9±7.4 | WD/MK |
SVM Gauss C=32, s=0.1 | 84.4±8.2 | WD |
MLP+BP (Tooldiag) | 83.9 | Rafał Adamczak |
RBF (Tooldiag) | 80.2 | Rafał Adamczak |
Maszczyk T, Duch W, Support Feature Machine, WCCI 2010 (submitted).
From UCI repository, 699 cases, 9 attributes, two classes, 458 (65.5%) & 241 (34.5%).
Results obtained with the leave-one-out test, % of accuracy given.
F6 has 16 missing values, removing these vectors leaves 683 examples.
Method | Accuracy % | Reference |
FSM | 98.3 | our (RA) |
3-NN stand Manhatan | 97.1 | our (KG) |
21-NN stand. Euclidean | 96.9 | our (KG) |
C4.5 (decision tree) | 96.0 | Hamilton et.al |
RIAC (prob. inductive) | 95.0 | Hamilton et.al |
H.J. Hamilton, N. Shan, N. Cercone, RIAC: a rule induction algorithm based on approximate classification, Tech. Rep. CS 96-06, Regina University 1996.
Results obtained with the 10-fold crossvalidation, 16 vectors with F6 values missing removed, 683 samples left, % of accuracy given.
method | Accuracy % | Reference |
Naive MFT | 97.1 | Opper, Winther, L-1-O est. 97.3 |
SVM Gauss, C=1,s=0.1 | 97.0±2.3 | WD-GM |
SVM (10xCV) | 96.9 | Opper, Winther |
SVM lin, opt C | 96.9±2.2 | WD-GM, same with Minkovsky kernel |
Cluster means, 2 prototypes | 96.5±2.2 | MB |
Default, majority | 65.5 | -- |
Results obtained with the 10-fold crossvalidation, % of accuracy given, all data, missing vlues handled in different ways.
method | Accuracy % | Reference |
NB + kernel est | 97.5±1.8 | WD, WEKA, 10X10CV |
SVM (5xCV) | 97.2 | Bennet and Blue |
kNN with DVDM distance | 97.1 | our (KG) |
GM k-NN, k=3, raw, Manh | 97.0±2.1 | WD, 10X10CV |
GM k-NN, k=opt, raw, Manh | 97.0±1.7 | WD, 10CV only |
VSS, 8 it/2 neurons | 96.9±1.8 | WD/MK; 98.1% train |
FSM-Feature Space Mapping | 96.9±1.4 | RA/WD, a=.99 Gaussian |
Fisher linear discr. anal | 96.8 | Ster, Dobnikar |
MLP+BP | 96.7 | Ster, Dobnikar |
MLP+BP (Tooldiag) | 96.6 | Rafał Adamczak |
LVQ | 96.6 | Ster, Dobnikar |
kNN, Euclidean/Manhattan f. | 96.6 | Ster, Dobnikar |
SNB, semi-naive Bayes (pairwise dependent) | 96.6 | Ster, Dobnikar |
SVM lin, opt C | 96.4±1.2 | WD-GM, 16 missing with -10 |
VSS, 8 it/1 neuron! | 96.4±2.0 | WD/MK, train 98.0% |
GM IncNet | 96.4±2.1 | NJ/WD; FKF, max. 3 neurons |
NB - naive Bayes (completly independent) | 96.4 | Ster, Dobnikar |
SSV opt nodes, 3CV int | 96.3±2.2 | WD/GM; training 96.6±0.5 |
IB1 | 96.3±1.9 | Zarndt |
DB-CART (decision tree) | 96.2 | Shang, Breiman |
GM SSV Tree, opt nodes BFS | 96.0±2.9 | WD/KG (beam search 94.0) |
LDA - linear discriminant analysis | 96.0 | Ster, Dobnikar |
OC1 DT (5xCV) | 95.9 | Bennet and Blue |
RBF (Tooldiag) | 95.9 | Rafał Adamczak |
GTO DT (5xCV) | 95.7 | Bennet and Blue |
ASI - Assistant I tree | 95.6 | Ster, Dobnikar |
MLP+BP (Weka) | 95.4±0.2 | TW/WD |
OCN2 | 95.2±2.1 | Zarndt |
IB3 | 95.0±4.0 | Zarndt |
MML tree | 94.8±1.8 | Zarndt |
ASR - Assistant R (RELIEF criterion) tree | 94.7 | Ster, Dobnikar |
C4.5 tree | 94.7±2.0 | Zarndt |
LFC, Lookahead Feature Constr binary tree | 94.4 | Ster, Dobnikar |
CART tree | 94.4±2.4 | Zarndt |
ID3 | 94.3±2.6 | Zarndt |
C4.5 (5xCV) | 93.4 | Bennet and Blue |
C 4.5 rules | 86.7±5.9 | Zarndt |
Default, majority | 65.5 | -- |
QDA - quadratic discr anal | 34.5 | Ster, Dobnikar |
For 97% accuracy and p=0.95 confidence level 2-tailed bounds are: [95.5%,98.0%]
K.P. Bennett, J. Blue, A Support Vector Machine Approach to Decision Trees, R.P.I Math Report No. 97-100, Rensselaer Polytechnic Institute, Troy, NY, 1997
N. Shang, L. Breiman, ICONIP'96, p.133
B. Ster and A. Dobnikar, Neural networks in medical diagnosis: Comparison with other methods. In A. Bulsari et al., editor, Proceedings of the International Conference EANN '96, pages 427-430, 1996.
F. Zarndt, A Comprehensive Case Study: An Examination of Machine Learning and Connectionist Algorithms, MSc Thesis, Dept. of Computer Science, Brigham Young University, 1995
Breast Cancer (Ljubljana data)
From UCI repository (restricted): 286 instances, 201 no-recurrence-events (70.3%), 85 recurrence-events (29.7%);
9 attributes, between 2-13 values each, 9 missing values
Results - 10xCV? Sometimes methodology was unclear;
difficult, noisy data, some methods are below the base rate (70.3%).
||
||
For 78% accuracy and p=0.95 confidence level 2-tailed bounds are: [72.9%,82.4%]
From UCI repository, 155 vectors, 19 attributes,
Two classes, die with 32 (20.6%), live with 123 (79.4%).
Many missing values! F18 has 67 missing values, F15 has 29, F17 has 16 and other features between 0 and 11.
Results obtained with the leave-one-out test, % of accuracy given
MLP, CART, LDA results from (check it ?) S.M. Weiss, I. Kapouleas, "An empirical comparison of pattern recognition, neural nets and machine learning classification methods", in: J.W. Shavlik and T.G. Dietterich, Readings in Machine Learning, Morgan Kauffman Publ, CA 1990.
Other results - our own;
Results obtained with the 10-fold crossvalidation, % of accuracy given; our results with stratified crossvalidation, other results - who knows? Differences for this dataset are rather small, 0.1-0.2%.
Method | Accuracy % | Reference |
Weighted 9-NN | 92.9±? | Karol Grudziński |
18-NN, stand. Manhattan | 90.2±0.7 | Karol Grudziński |
FSM with rotations | 89.7±? | Rafał Adamczak |
15-NN, stand. Euclidean | 89.0±0.5 | Karol Grudziński |
VSS 4 neurons, 5 it | 86.5±8.8 | WD/MK, train 97.1 |
FSM without rotations | 88.5 | Rafał Adamczak |
LDA, linear discriminant analysis | 86.4 | Stern & Dobnikar |
Naive Bayes and Semi-NB | 86.3 | Stern & Dobnikar |
IncNet | 86.0 | Norbert Jankowski |
QDA, quadratic discriminant analysis | 85.8 | Stern & Dobnikar |
1-NN | 85.3±5.4 | Stern & Dobnikar, std added by WD |
VSS 2 neurons, 5 it | 85.1±7.4 | WD/MK, train 95.0 |
ASR | 85.0 | Stern & Dobnikar |
Fisher discriminant analysis | 84.5 | Stern & Dobnikar |
LVQ | 83.2 | Stern & Dobnikar |
CART (decision tree) | 82.7 | Stern & Dobnikar |
MLP with BP | 82.1 | Stern & Dobnikar |
ASI | 82.0 | Stern & Dobnikar |
LFC | 81.9 | Stern & Dobnikar |
RBF (Tooldiag) | 79.0 | Rafał Adamczak |
MLP+BP (Tooldiag) | 77.4 | Rafał Adamczak |
Results on BP, LVQ, ..., SNB are from: B. Ster and A. Dobnikar, Neural networks in medical diagnosis: Comparison with other methods. In A. Bulsari et al., editor, Proceedings of the International Conference EANN '96, pages 427-430, 1996.
Our good results reflect superior handling of missing values ?
Duch W, Grudziński K (1998) A framework for similarity-based methods. Second Polish Conference on Theory and Applications of Artificial Intelligence, Lodz, 28-30 Sept. 1998, pp. 33-60
Weighted kNN: Duch W, Grudzinski K and Diercksen G.H.F (1998) Minimal distance neural methods. World Congress of Computational Intelligence, May 1998, Anchorage, Alaska, IJCNN'98 Proceedings, pp. 1299-1304
Statlog version of Cleveland Heart disease.
13 attributes (extracted from 75), no missing values.
270=150+120 observations selected from the 303 cases (Cleveland Heart).
Attribute Information:
1. age | 2. sex | 3. chest pain type (4 values) | 4. resting blood pressure | 5. serum cholestorol in mg/dl |
6. fasting blood sugar 120 mg/dl | 7. resting electrocardiographic results (values 0,1,2) | 8. maximum heart rate achieved | 9. exercise induced angina | 10. oldpeak = ST depression induced by exercise relative to rest |
11. the slope of the peak exercise ST segment | 12. number of major vessels (0-3) colored by flouroscopy | 13. thal: 3 = normal; 6 = fixed defect; 7 = reversable defect |
Attributes types: Real: 1,4,5,8,10,12; Ordered:11, Binary: 2,6,9 Nominal:7,3,13
Classes: Absence (1) or presence (2) of heart disease;
In Statlog experiments on heart data cost or risk matrix has been used with 9-fold crossvalidation, only cost values are given.
Results below are obtained with the 10-fold crossvalidation, % of accuracy given, no risk matrix
Method | Accuracy % | Reference |
Lin SVM 2D QCP | 85.9±5.5 | MG, 10xCV |
kNN auto+WX | ??.8±5.6 | TM GM 10xCV |
SVM Gauss+WX+G(WX), C=1 s=2-5 | ??.8±6.4 | TM GM 10xCV |
SVM lin, C=0.01 | 84.9±7.9 | WD, GM 10x(9xCV) |
SFM, G(WX), default C=1 | ??±5.1 | TM, GM 10xCV |
Naive-Bayes | 84.5±6.3 | TM, GM 10xCV |
Naive-Bayes | 83.6 | RA, WEKA |
SVML default C=1 | 82.5±6.4 | TM, GM 10xCV |
K* | 76.7 | WEKA, RA |
IB1c | 74.0 | WEKA, RA |
1R | 71.4 | WEKA, RA |
T2 | 68.1 | WEKA, RA |
MLP+BP | 65.6 | ToolDiag, RA |
FOIL | 64.0 | WEKA, RA |
RBF | 60.0 | ToolDiag, RA |
InductH | 58.5 | WEKA, RA |
Base rate (majority classifier) | 55.7 | |
IB1-4 | 50.0 | ToolDiag, RA |
Results for Heart and other Statlog datasest are collected here.
From UCI repository, 303 cases, 13 attributes (4 cont, 9 nominal), 7 vectors with missing values ?
2 (no, yes) or 5 classes (no, degree 1, 2, 3, 4).
Class distribution: 164 (54.1%) no, 55+36+35+13 yes (45.9%) with disease degree 1-4.
Results obtained with the leave-one-out test, % of accuracy given, 2 classes used.
Method | Accuracy % | Reference |
LDA | 84.5 | Weiss ? |
25-NN, stand, Euclid | 83.6±0.5 | WD/KG repeat?? |
C-MLP2LN | 82.5 | RA, estimated? |
FSM | 82.2 | Rafał Adamczak |
MLP+backprop | 81.3 | Weiss ? |
CART | 80.8 | Weiss ? |
MLP, CART, LDA where are these results from ???
Other results - our own.
Results obtained with the 10-fold crossvalidation, % of accuracy given.
Ster & Dobnikar reject 6 vectors (leaving 297) with missing values.
We use all 303 vectors replacing missing values by means for their class; in KNN we have used Stalog convention, 297 vectors
Method | Accuracy % | Reference |
IncNet+transformations | 90.0 | Norbert Jankowski; check again! |
28-NN, stand, Euclid, 7 features | 85.1±0.5 | WD/KG |
LDA | 84.5 | Ster & Dobnikar |
Fisher discriminant analysis | 84.2 | Ster & Dobnikar |
k=7, Euclid, std | 84.2±6.6 | WD, GhostMiner |
16-NN, stand, Euclid | 84±0.6 | WD/KG |
FSM, 82.4-84% on test only | 84.0 | Rafał Adamczak |
k=1:10, Manhattan, std | 83.8±5.3 | WD, GhostMiner |
Naive Bayes | 82.5-83.4 | Rafał; Ster, Dobnikar |
SNB | 83.1 | Ster & Dobnikar |
LVQ | 82.9 | Ster & Dobnikar |
GTO DT (5xCV) | 82.5 | Bennet and Blue |
kNN, k=19, Eculidean | 82.1±0.8 | Karol Grudziński |
k=7, Manhattan, std | 81.8±10.0 | WD, GhostMiner |
SVM (5xCV) | 81.5 | Bennet and Blue |
kNN (k=1? raw data?) | 81.5 | Ster & Dobnikar |
MLP+BP (standarized) | 81.3 | Ster, Dobnikar, Rafał Adamczak |
Cluster means, 2 prototypes | 80.8±6.4 | MB |
CART | 80.8 | Ster & Dobnikar |
RBF (Tooldiag, standarized) | 79.1 | Rafał Adamczak |
Gaussian EM, 60 units | 78.6 | Stensmo & Sejnowski |
ASR | 78.4 | Ster & Dobnikar |
C4.5 (5xCV) | 77.8 | Bennet and Blue |
IB1c (WEKA) | 77.6 | Rafał Adamczak |
QDA | 75.4 | Ster & Dobnikar |
LFC | 75.1 | Ster & Dobnikar |
ASI | 74.4 | Ster & Dobnikar |
K* (WEKA) | 74.2 | Rafał Adamczak |
OC1 DT (5xCV) | 71.7 | Bennet and Blue |
1 R (WEKA) | 71.0 | Rafał Adamczak |
T2 (WEKA) | 69.0 | Rafał Adamczak |
FOIL (WEKA) | 66.4 | Rafał Adamczak |
InductH (WEKA) | 61.3 | Rafał Adamczak |
Default, majority | 54.1 | baserate |
C4.5 rules | 53.8±5.9 | Zarndt |
IB1-4 (WEKA) | 46.2 | Rafał Adamczak |
For 85% accuracy and p=0.95 confidence level 2-tailed bounds are: [80.5%,88.6%]
Results obtained with BP, LVQ, ..., SNB are from: B. Ster and A. Dobnikar, Neural networks in medical diagnosis: Comparison with other methods. In: A. Bulsari et al., editor, Proceedings of the International Conference EANN '96, pages 427-430, 1996.
Magnus Stensmo and Terrence J. Sejnowski, A Mixture Model System for Medical and Machine Diagnosis, Advances in Neural Information Processing Systems 7 (1995) 1077-1084
Kristin P. Bennett, J. Blue, A Support Vector Machine Approach to Decision Trees, R.P.I Math Report No. 97-100, Rensselaer Polytechnic Institute, Troy, NY, 1997
Other results for this dataset (methodology sometimes uncertain):
D. Wettschereck, averaging 25 runs with 70% train and 30% test, variants of k-NN with different metric functions and scaling.
David Aha & Dennis Kibler - From UCI repository past usage
Method | Accuracy % | Reference |
k-NN, Value Distance Metric (VDM) | 82.6 | D. Wettschereck |
k-NN, Euclidean | 82.4±0.8 | D. Wettschereck |
k-NN, Variable Similarity Metric | 82.4 | D. Wettschereck |
k-NN, Modified VDM | 83.1 | D. Wettschereck |
Other k-NN variants | < 82.4 | D. Wettschereck |
k-NN, Mutual Information | 81.8 | D. Wettschereck |
CLASSIT (hierarchical clustering) | 78.9 | Gennari, Langley, Fisher |
NTgrowth (instance-based) | 77.0 | Aha & Kibler |
C4 | 74.8 | Aha & Kibler |
Naive Bayes | 82.8±1.3 | Friedman et.al, 5xCV, 296 vectors |
Gennari, J.H., Langley, P, Fisher, D. (1989). Models of incremental concept formation. Artificial Intelligence, 40, 11-61.
Friedman N, Geiger D, Goldszmit M (1997). Bayesian networks classifiers. Machine Learning 29: 131--163
From the UCI repository, dataset "Pima Indian diabetes":
2 classes, 8 attributes, 768 instances, 500 (65.1%) negative (class1), and 268 (34.9%) positive tests for diabetes. class2.
All patients were females at least 21 years old of Pima Indian heritage.
Attributes used:
1. Number of times pregnant
2. Plasma glucose concentration a 2 hours in an oral glucose tolerance test
3. Diastolic blood pressure (mm Hg)
4. Triceps skin fold thickness (mm)
5. 2-Hour serum insulin (mu U/ml)
6. Body mass index (weight in kg/(height in m)^2)
7. Diabetes pedigree function
8. Age (years)
Results obtained with the 10-fold crossvalidation, % of accuracy given; Statlog results are with 12-fold crossvalidation
Method | Accuracy % | Reference |
Logdisc | 77.7 | Statlog |
IncNet | 77.6 | Norbert Jankowski |
DIPOL92 | 77.6 | Statlog |
Linear Discr. Anal. | 77.5-77.2 | Statlog; Ster & Dobnikar |
SVM, linear, C=0.01 | 77.5±4.2 | WD-GM, 10XCV averaged 10x |
SVM, Gauss, C, sigma opt | 77.4±4.3 | WD-GM, 10XCV averaged 10x |
SMART | 76.8 | Statlog |
GTO DT (5xCV) | 76.8 | Bennet and Blue |
kNN, k=23, Manh, raw, W | 76.7±4.0 | WD-GM, feature weighting 3CV |
kNN, k=1:25, Manh, raw | 76.6±3.4 | WD-GM, most cases k=23 |
ASI | 76.6 | Ster & Dobnikar |
Fisher discr. analysis | 76.5 | Ster & Dobnikar |
MLP+BP | 76.4 | Ster & Dobnikar |
MLP+BP | 75.8±6.2 | Zarndt |
LVQ | 75.8 | Ster & Dobnikar |
LFC | 75.8 | Ster & Dobnikar |
RBF | 75.7 | Statlog |
NB | 75.5-73.8 | Ster & Dobnikar; Statlog |
kNN, k=22, Manh | 75.5 | Karol Grudziński |
MML | 75.5±6.3 | Zarndt |
SNB | 75.4 | Ster & Dobnikar |
BP | 75.2 | Statlog |
SSV DT | 75.0±3.6 | WD-GM, SSV BS, node 5CV MC |
kNN, k=18, Euclid, raw | 74.8±4.8 | WD-GM |
CART DT | 74.7±5.4 | Zarndt |
CART DT | 74.5 | Stalog |
DB-CART | 74.4 | Shang & Breiman |
ASR | 74.3 | Ster & Dobnikar |
ODT, dyadic trees | 74.0±2.3 | Blanchard |
Cluster means, 2 prototypes | 73.7±3.7 | MB |
SSV DT | 73.7±4.7 | WD-GM, SSV BS, node 10CV strat |
SFC, stacking filters | 73.3±1.9 | Porter |
C4.5 DT | 73.0 | Stalog |
C4.5 DT | 72.7±6.6 | Zarndt |
Bayes | 72.2±6.9 | Zarndt |
C4.5 (5xCV) | 72.0 | Bennet and Blue |
CART | 72.8 | Ster & Dobnikar |
Kohonen | 72.7 | Statlog |
C4.5 DT | 72.1±2.6 | Blanchard (averaged over 100 runs) |
kNN | 71.9 | Ster & Dobnikar |
ID3 | 71.7±6.6 | Zarndt |
IB3 | 71.7±5.0 | Zarndt |
IB1 | 70.4±6.2 | Zarndt |
kNN, k=1, Euclides, raw | 69.4±4.4 | WD-GM |
kNN | 67.6 | Statlog |
C4.5 rules | 67.0±2.9 | Zarndt |
OCN2 | 65.1±1.1 | Zarndt |
Default, majority | 65.1 | |
QDA | 59.5 | Ster, Dobnikar |
For 77.7% accuracy and p=0.95 confidence level 2-tailed bounds are: [74.6%,80.5%]
Results on BP, LVQ, ..., SNB are from: B. Ster and A. Dobnikar, Neural networks in medical diagnosis: Comparison with other methods. In A. Bulsari et al., editor, Proceedings of the International Conference EANN '96, pages 427-430, 1996.
Other results (with different tests):
Method | Accuracy % | Reference |
SVM (5xCV) | 77.6 | Bennet and Blue |
C4.5 | 76.0±0.9 | Friedman, 5xCV |
Semi-Naive Bayes | 76.0±0.8 | Friedman, 5xCV |
Naive Bayes | 74.5±0.9 | Friedman, 5xCV |
Default, majority | 65.1 |
Friedman N, Geiger D, Goldszmit M (1997). Bayesian networks classifiers. Machine Learning 29: 131--163
Opper/Winther use 200 training and 332 test examples (following Rippley), with TAP MFT results on test 81%, SVS at 80.1% and best NN as 77.4%.
Thyroid, From UCI repository, dataset "ann-train.data": A Thyroid database suited for training ANNs.
3772 learning and 3428 testing examples; primary hypothyroid, compensated hypothyroid, normal.
Training: 93+191+3488 or 2.47%, 5.06%, 92.47%
Test: 73+177+3178 or 2.13%, 5.16%, 92.71%
21 attributes (15 binary, 6 continuous); 3 classes
The problem is to determine whether a patient referred to the clinic has hypothyroid. Therefore three classes are built: normal (not hypothyroid), hyperfunction and subnormal functioning. Because 92 percent of the patients are not hyperthyroid. A good classifier must be significant better than 92%.
Note: These are the datas Quinlans used in the case study of his article "Simplifying Decision Trees" (International Journal of Man-Machine Studies (1987) 221-234)
Names: I (W.D.) have investigated this issue and after some mail exchange with Chris Mertz, who maintains the UCI repository; here is the conclusion:
1 age: continuous | 2 sex: {M, F} | 3 on thyroxine: logical |
4 maybe on thyroxine: logical | 5 on antithyroid medication: logical | 6 sick - patient reports malaise: logical |
7 pregnant: logical | 8 thyroid surgery: logical | 9 I131 treatment: logical |
10 test hypothyroid: logical | 11 test hyperthyroid: logical | 12 on lithium: logical |
13 has goitre: logical | 14 has tumor: logical | 15 hypopituitary: logical |
16 psychological symptoms: logical | 17 TSH: continuous | 18 T3: continuous |
19 TT4: continuous | 20 T4U: continuous | 21 FTI: continuous |
Results:
Method | % training | % test | Reference |
C-MLP2LN rules+ASA | 99.90 | 99.36 | Rafał/Krzysztof/Grzegorz |
CART | 99.80 | 99.36 | Weiss |
PVM | 99.80 | 99.33 | Weiss |
SSV beam search | 99.80 | 99.33 | WD |
IncNet | 99.68 | 99.24 | Norbert Jankowski |
MLP+SCG, 4 neurons | 99.81 | 99.24 | SVNT paper |
SVM Minkovsky kernel | 100.0 | 99.18 | SVNT paper |
SSV opt leaves or pruning | 99.7 | 99.1 | WD |
MLP init+ a,b opt. | 99.5 | 99.1 | Rafał |
C-MLP2LN rules | 99.7 | 99.0 | Rafał/Krzysztof |
MLP+SCG, 4 neurons, 67 SV | 99.95 | 99.0 | SVNT paper |
MLP+SCG, 4 neurons, 45 SV | 100 | 98.9 | SVNT paper |
MLP+SCG, 12 neurons | 100 | 98.8 | SVNT paper |
Cascade correlation | 100.0 | 98.5 | Schiffmann |
Local adapt. rates | 99.6 | 98.5 | Schiffmann |
BP+genetic opt. | 99.4 | 98.4 | Schiffmann |
Quickprop | 99.6 | 98.3 | Schiffmann |
RPROP | 99.6 | 98.0 | Schiffmann |
3-NN, Euclides, with 3 features | 98.7 | 97.9 | W.D./Karol |
1-NN, Euclides, with 3 features | 98.4 | 97.7 | W.D./Karol |
Best backpropagation | 99.1 | 97.6 | Schiffmann |
1-NN, Euclides, 8 features used | -- | 97.3 | Karol/W.D. |
SVM Gauss, C=8 s=0.1 | 98.3 | 96.1 | WD |
Bayesian classif. | 97.0 | 96.1 | Weiss? |
SVM Gauss, C=1 s=0.1 | 95.4 | 94.7 | WD |
BP+conj. gradient | 94.6 | 93.8 | Schiffmann |
1-NN Manhattan, std data | 93.8 | Karol G./WD | |
SVM lin, C=1 | 94.1 | 93.3 | WD |
SVM Gauss, C=8 s=5 | 100 | 92.8 | WD |
Default, majority 250 test errors | 92.7 | ||
1-NN Manhattan, raw data | 92.2 | Karol G./WD |
For 99.90% accuracy on training and p=0.95 confidence level 2-tailed bounds are: [99.74%,99.96%]
Most NN results from W. Schiffmann, M. Joost, R. Werner, 1993; MLP2LN and Init+a,b ours.
k-NN, PVM and CART from S.M. Weiss, I. Kapouleas, "An empirical comparison of pattern recognition, neural nets and machine learning classification methods", in: J.W. Shavlik and T.G. Dietterich, Readings in Machine Learning, Morgan Kauffman Publ, CA 1990
SVM with linear and Gaussian kernels gives quite poor results on this data.
3 crisp logical rules using TSH, FTI, T3, on_thyroxine, thyroid_surgery, TT4 give 99.3% of accuracy on the test set.
Contains medical records of 536 patients admitted to a university-affiliated Tokyo-based hospital, with four types of hepatobiliary disorders: alcoholic liver damage, primary hepatoma, liver cirrhosis and cholelithiasis. The records included results of 9 biochemical tests and sex of the patient. The same 163 cases as in [Hayashi et.al] were used as the test data.
FSM gives about 60 Gaussian or triangular membership functions achieving accuracy of 75.5-75.8%. Rotation of these functions (i.e. introducing linear combination of inputs to the rules) does not improve this accuracy. 10-fold crossvalidation tests on the mixed, training plus test data, give similar results. The best results were obtained with the K* method based on algorithmic complexity optimization, giving 78.5% on the test set, and kNN with Manhattan distance function, k=1 and selection of features (using the leave-one-out method on the training data, features 2, 5, 6 and 9 were removed), giving 80.4% accuracy. Simulated annealing optimization of the scaling factors for the remaining 5 features give 81.0% and optimizing scaling factors using all input features 82.8%. The scaling factors are: 0.92, 0.60, 0.91, 0.92, 0.07, 0.41, 0.55, 0.86, 0.30. Similar accuracy is obtained using multisimplex method for optimization of the scaling factors.
Method | Training set | Test set | Reference |
IB2-IB4 | 81.2-85.5 | 43.6-44.6 | WEKA, our calculation |
Naive Bayes | -- | 46.6 | WEKA, our calculation |
1R (rules) | 58.4 | 50.3 | WEKA, our calculation |
T2 (rules from decision tree) | 67.5 | 53.3 | WEKA, our calculation |
FOIL (inductive logic) | 99 | 60.1 | WEKA, our calculation |
FSM, initial 49 crisp logical rules | 83.5 | 63.2 | FSM, our calculation |
LDA (statistical) | 68.4 | 65.0 | our calculation |
DLVQ (38 nodes) | 100 | 66.0 | our calculation |
C4.5 decision rules | 64.5 | 66.3 | our calculation |
Best fuzzy MLP model | 75.5 | 66.3 | Mitra et. al |
MLP with RPROP | 68.0 | our calculation | |
Cascade Correlation | 71.0 | our calculation | |
Fuzzy neural network | 100 | 75.5 | Hayashi |
C4.5 decision tree | 94.4 | 75.5 | our calculation |
FSM, Gaussian functions | 93 | 75.6 | our calculation |
FSM, 60 triangular functions | 93 | 75.8 | our calculation |
IB1c (instance-based) | -- | 76.7 | WEKA, our calculation |
kNN, k=1, Camberra, raw | 76.1 | 80.4 | WD/SBL |
K* method | -- | 78.5 | WEKA, our calculation |
1-NN, 4 features removed, Manhattan | 76.9 | 80.4 | our calculation, KG |
1-NN, Camberra, raw, removed f2, 6, 8, 9 | 77.2 | 83.4 | our calculation, KG |
Y. Hayashi, A. Imura, K. Yoshida, “Fuzzy neural expert system and its appli-cation to medical diagnosis”, in: 8th International Congress on Cybernetics and Systems, New York City 1990, pp. 54-61
S. Mitra, R. De, S. Pal, “Knowledge based fuzzy MLP for classification and rule generation”, IEEE Transactions on Neural Networks 8, 1338-1350, 1997, a knowledge-based fuzzy MLP system gives results on the test set in the range from 33% to 66.3%, depending on the actual fuzzy model used.
W. Duch and K. Grudzinski, ``Prototype Based Rules - New Way to Understand the Data,'' Int. Joint Conference on Neural Networks, Washington D.C., pp. 1858-1863, 2001. Contains best results with 1-NN, Camberra and feature selection, 83.4% on the test.
Other, non-medical data
Landsat Satellite image dataset (STATLOG version)
Training 4435 test 2000 cases, 36 semi-continous [0 to 255] attributes (= 4 spectral bands x 9 pixels in neighbourhood) and 6 decision classes: 1,2,3,4,5 and 7 (class 6 has been removed because of doubts about the validity of this class).
The StatLog database consists of the multi-spectral values of pixels in 3x3 neighbourhoods in a satellite image, and the classification associated with the central pixel in each neighbourhood. The aim is to predict this classification, given the multi-spectral values. In the sample database, the class of a pixel is coded as a number.
Method | % training | % test | Time train | Time test |
MLP+SCG | 96.0 | 91.0 | reg alfa=0.5, 36 hidden nodes, 1400 it | fast; WD |
k-NN | -- | 90.9 | auto-k=3, Manhattan, std data | GM 2.0 |
k-NN | 91.1 | 90.6 | 2105, Statlog | 944; parametry? |
k-NN | -- | 90.4 | auto-k=5, Euclidean, std data | GM 2.0 |
k-NN | -- | 90.0 | k=1, Manhattan, std data, no training | fast, GM 2.0 |
FSM | 95.1 | 89.7 | std data, a=0.95 | fast, GM 2.0; best NN result |
LVQ | 95.2 | 89.5 | 1273 | 44 |
k-NN | -- | 89.4 | k=1, Euclidean, std data, no training | fast, GM 2.0 |
Dipol92 | 94.9 | 88.9 | 746 | 111 |
MLP+SCG | 94.4 | 88.5 | 5000 it; active learning+reg a=0.5, 8-12 hidden | fast; WD |
SVM | 91.6 | 88.4 | std data, Gaussian kernel | fast, GM 2.0; unclassified 4.3% |
Radial | 88.9 | 87.9 | 564 | 74 |
Alloc80 | 96.4 | 86.8 | 63840 | 28757 |
IndCart | 97.7 | 86.2 | 2109 | 9 |
CART | 92.1 | 86.2 | 330 | 14 |
MLP+BP | 88.8 | 86.1 | 72495 | 53 |
Bayesian Tree | 98.0 | 85.3 | 248 | 10 |
C4.5 | 96.0 | 85.0 | 434 | 1 |
New ID | 93.3 | 85.0 | 226 | 53 |
QuaDisc | 89.4 | 84.5 | 157 | 53 |
SSV | 90.9 | 84.3 | default par. | very fast, GM 2.0 |
Cascade | 88.8 | 83.7 | 7180 | 1 |
Log DA, Disc | 88.1 | 83.7 | 4414 | 41 |
LDA, Discrim | 85.1 | 82.9 | 68 | 12 |
Kohonen | 89.9 | 82.1 | 12627 | 129 |
Bayes | 69.2 | 71.3 | 75 | 17 |
The original database was generated from Landsat Multi-Spectral Scanner image data. The sample database was generated taking a small section (82 rows and 100 columns) from the original data. One frame of Landsat MSS imagery consists of four digital images of the same scene in different spectral bands. Two of these are in the visible region (corresponding approximately to green and red regions of the visible spectrum) and two are in the (near) infra-red. Each pixel is a 8-bit binary word, with 0 corresponding to black and 255 to white. The spatial resolution of a pixel is about 80m x 80m. Each image contains 2340 x 3380 such pixels.
The database is a (tiny) sub-area of a scene, consisting of 82 x 100 pixels. Each line of data corresponds to a 3x3 square neighbourhood of pixels completely contained within the 82x100 sub-area. Each line contains the pixel values in the four spectral bands (converted to ASCII) of each of the 9 pixels in the 3x3 neighbourhood and a number indicating the classification label of the central pixel. In each line of data the four spectral values for the top-left pixel are given first followed by the four spectral values for the top-middle pixel and then those for the top-right pixel, and so on with the pixels read out in sequence left-to-right and top-to-bottom. Thus, the four spectral values for the central pixel are given by attributes 17,18,19 and 20. If you like you can use only these four attributes, while ignoring the others. This avoids the problem which arises when a 3x3 neighbourhood straddles a boundary.
All results from Statlog book, except GM - GhostMiner calculations, W. Duch.
N | Description | Train | Test |
1 | red soil | 1072 (24.17%) | 461 (23.05%) |
2 | cotton crop | 479 (10.80%) | 224 (11.20%) |
3 | grey soil | 961 (21.67%) | 397 (19.85%) |
4 | damp grey soil | 415 (09.36%) | 211 (10.55%) |
5 | veg. Stubble | 470 (10.60%) | 237 (11.85%) |
6 | Mixture class | 0 | 0 |
7 | very damp grey soil | 1038 (23.40%) | 470 (23.50%) |
Machine Learning, Neural and Statistical Classification, D. Michie, D.J. Spiegelhalter, C.C. Taylor (eds), Stalog project book!
351 data records, with class division 224 (63.8%) + 126 (35.9%). Usually first 200 vectors are taken for training, and last 151 for the test, but this is very unbalanced: in the training set 101 (50.5%) and 99 (49.5%) are from 1/2 class, in the test set 123 (82%) and 27 (18%) are from class 1/2.
34 attributes, but f2=0 always and should be removed; f1 is binary, the remaining 32 attributes are continuous.
2 classes - different types of radar signals reflected from ionoshpere.
Some vectors: 8, 18, 20, 22, 24, 30, 38, 52, 76, 78, 80, 82, 103, 163, 169, 171, 183, 187, 189, 191, 201, 215, 219, 221, 223, 225, 227, 229, 231, 233, 249, are either binary 0, 1 or have only 3 values -1, 0, +1.
For example, vector 169 has only one component = 1, all others are 0.
Method | Accuracy % | Reference |
3-NN + simplex | 98.7 | Our own weighted kNN |
VSS 2 epochs | 96.7 | MLP with numerical gradient |
3-NN | 96.7 | KG, GM with or without weights |
IB3 | 96.7 | Aha, 5 errors on test |
1-NN, Manhattan | 96.0 | GM kNN (our) |
MLP+BP | 96.0 | Sigillito |
SVM Gaussian | 94.9±2.6 | GM (our), defaults, similar for C=1-100 |
C4.5 | 94.9 | Hamilton |
3-NN Canberra | 94.7 | GM kNN (our) |
RIAC | 94.6 | Hamilton |
C4 (no windowing) | 94.0 | Aha |
C4.5 | 93.7 | Bennet and Blue |
SVM | 93.2 | Bennet and Blue |
Non-lin perceptron | 92.0 | Sigillito |
FSM + rotation | 92.8 | our |
1-NN, Euclidean | 92.1 | Aha, GM kNN (our) |
DB-CART | 91.3 | Shang, Breiman |
Linear perceptron | 90.7 | Sigillito |
OC1 DT | 89.5 | Bennet and Blue |
CART | 88.9 | Shang, Breiman |
SVM linear | 87.1±3.9 | GM (our), defaults |
GTO DT | 86.0 | Bennet and Blue |
Perceptron+MLP results:
Sigillito, V. G., Wing, S. P., Hutton, L. V., & Baker, K. B. (1989) Classification of radar returns from the ionosphere using neural networks. Johns Hopkins APL Technical Digest, 10, 262-266.
N. Shang, L. Breiman, ICONIP'96, p.133
David Aha: k-NN+C4+IB3, from Aha, D. W., & Kibler, D. (1989). Noise-tolerant instance-based learning algorithms. In Proceedings of the Eleventh International Joint Conference on Artificial Intelligence (pp. 794-799). Detroit, MI: Morgan Kaufmann.
IB3 parameter settings: 70% and 80% for acceptance and dropping respectively.
RIAC, C4.5 from: H.J. Hamilton, N. Shan, N. Cercone, RIAC: a rule induction algorithm based on approximate classification, Tech. Rep. CS 96-06, Regina University 1996.
K.P. Bennett, J. Blue, A Support Vector Machine Approach to Decision Trees, R.P.I Math Report No. 97-100, Rensselaer Polytechnic Institute, Troy, NY, 1997
Training/test division is not too good in this case, distributions are a bit differnet.
In 10xCV results are:
Method | Accuracy % | Reference |
SFM+G+G(WX) | ??±2.6 | GM (our), C=1, s=2-5 |
kNN auto+WX+G(WX) | ??.4±3.6 | GM (our) |
SVM Gaussian | 94.6±4.3 | GM (our), C=1, s=2-5 |
VSS-MKNN | 91.5±4.3 | MK, 12 neurons (similar 8-17) |
SVM lin | 89.5±3.8 | GM (our), C=1, s=2-5 |
SSV tree | 87.8±4.5 | GM (our), default |
1-NN | 85.8±4.9 | GM std, Euclid |
3-NN | 84.0±5.4 | GM std, Euclid |
VSS is an MLP with search, implemented by Mirek Kordos, used with 3 epochs; neurons may be sigmoidal or step-wise (64 values).
Maszczyk T, Duch W, Support Feature Machine, WCCI 2010 (submitted).
208 cases, 60 continuous attributes, 2 classes, 111 metal, 97 rock.
From the CMU benchmark repository
This dataset has been used in two kinds of experiments:
1. The "aspect-angle independent" experiments use all 208 cases with 13-fold crossvalidation, averaged over 10 runs to get std.
2. The "angle independent experiments" use training / test sets with 104 vectors each. Class distribution in training is 49 + 55, in test 62 + 42.
Estimation of L1O on the whole dataset (Opper and Winther) give 78.2% only; is the test so easy? Some of this results were made without standardization of the data, which is here very important!
The "angle independent experiments" with training / test sets.
Method | Train % | Test % | Reference |
1-NN, 5D from MDS, Euclid, std | 97.1 | our, GM (WD) | |
1-NN, Manhattan std | 97.1 | our, GM (WD) | |
1-NN, Euclid std | 96.2 | our, GM (WD) | |
TAP MFT Bayesian | -- | 92.3 | Opper, Winther |
Naive MFT Bayesian | -- | 90.4 | Opper, Winther |
SVM | -- | 90.4 | Opper, Winther |
MLP+BP, 12 hidden, best MLP | -- | 90.4 | Gorman, Sejnowski |
1-NN, Manhattan raw | 92.3 | our, GM (WD) | |
1-NN, Euclid raw | 91.3 | our, GM (WD) | |
FSM - methodology ? | 83.6 | our (RA) |
The "angle dependent experiments" with 13 CV on all data.
1-NN Euclid on 5D MDS input | 87.5±0.8 | our GM (WD) | |
1-NN Euclidean, std data | 86.8±1.2 | our GM (WD) | |
1-NN Manhattan, std data | 86.3±0.3 | our GM (WD) | |
MLP+BP, 12 hidden | 99.8±0.1 | 84.7±5.7 | Gorman, Sejnowski |
1-NN Manhattan, raw data | 84.5±0.4 | our GM (WD) | |
MLP+BP, 24 hidden | 99.8±0.1 | 84.5±5.7 | Gorman, Sejnowski |
MLP+BP, 6 hidden | 99.7±0.2 | 83.5±5.6 | Gorman, Sejnowski |
SVM linear, C=0.1 | 82.7±8.5 | our GM (WD), std data | |
1-NN Euclidean, raw data | 82.1±0.9 | our GM (WD) | |
SVM Gauss, C=1, s=0.1 | 77.4±10.1 | our GM (WD), std data | |
SVM linear, C=1 | 76.9±11.9 | our GM (WD), raw data | |
SVM linear, C=1 | 76.0±9.8 | our GM (WD), std data | |
DB-CART, 10xCV | 81.8 | Shang, Breiman | |
CART, 10xCV | 67.9 | Shang, Breiman |
M. Opper and O. Winther, Gaussian Processes and SVM: Mean Field Results and Leave-One-Out. In: Advances in Large Margin Classifiers, Eds. A. J. Smola, P. Bartlett, B. Schölkopf, D. Schuurmans, MIT Press, 311-326, 2000; same methodology as Gorman with Sejnowski.
N. Shang, L. Breiman, ICONIP'96, p.133, 10xCV
Gorman, R. P., and Sejnowski, T. J. (1988). "Analysis of Hidden Units in a Layered Network Trained to Classify Sonar Targets", Neural Networks 1, pp. 75-89, 13xCV
Our results: kNN results from 10xCV and from 13xCV are quite similar, so Shang and Breiman should not differ much from 13 CV.
WD Leave-one-out (L1O) estimations on std data:
L1O with k=1, Euclidean distance, for all data gives 87.50%, other k and distance function do not give significant improvement.
SVM linear, C=1, L1O 75.0%, for Gaussian kernel, C=1, L1O is 78.8%
Other L1O results taken from C. Domeniconi, J. Peng, D. Gunopulos, "An adaptive metric for pattern classification".
Discriminant Adaptive NN, DANN | 92.3 | |
Adaptive metric NN | 90.9 | |
kNN | 87.5 | |
SVM Gauss C=1 | 78.8 | |
C4.5 | 76.9 | |
SVM linear C=1 | 75.0 |
528 training, 462 test cases, 10 continous attributes, 11 classes
From the UCI benchmark repository.
Speaker independent recognition of the eleven steady state vowels of British English using a specified training set of lpc derived log area ratios.
Results on the total set
Method | Train | Test | Reference |
CART-DB, 10xCV on total set !!! | 90.0 | Shang, Breiman | |
CART, 10xCV on total set | 78.2 | Shang, Breiman |
Method | Train | Test | Reference |
Square node network, 88 units | 54.8 | UCI | |
Gaussian node network, 528 units | 54.6 | UCI | |
1-NN, Euclides, raw | 99.24 | 56.3 | WD/KG |
Radial Basis Function, 528 units | 53.5 | UCI | |
Gaussian node network, 88 units | 53.5 | UCI | |
FSM Gauss, 10CV na treningowym | 92.60 | 51.94 | our (RA) |
Square node network, 22 | 51.1 | UCI | |
Multi-layer perceptron, 88 hidden | 50.6 | UCI | |
Modified Kanerva Model, 528 units | 50.0 | UCI | |
Radial Basis Function, 88 units | 47.6 | UCI | |
Single-layer perceptron, 88 hidden | 33.3 | UCI |
N. Shang, L. Breiman, ICONIP'96, p.133, made 10xCv instead of using the test set.
871 patterns, 6 overlapping vowel classes (Indian Telugu vowel sounds), 3 features (formant frequencies).
Method | Test | Reference |
10xCV tests below | ||
3-NN, Manhattan | 87.8±4.0 | Kosice |
3-NN, Canberra | 87.8±4.2 | WD/GM |
FSM, 65 Gaussian nodes | 87.4±4.5 | Kosice |
3-NN, Euclid | 87.3±3.9 | WD/GM |
SSV dec. tree, 22 rules | 86.0±?? | Kosice |
SVM Gauss opt C~1000, s~1 | 85.0±4.0 | WD, Ghostminer |
SVM Gauss C=1000, s=1 | 83.5±4.1 | WD, Ghostminer |
SVM, Gauss, C=1, s=0.1 | 76.6±2.5 | WD, Ghostminer |
2xCV tests below | ||
3-NN, Euclidean | 86.1±0.6 | Kosice |
FSM, 40 Gaussian nodes | 85.2±1.2 | Kosice |
MLP | 84.6 | Pal |
Fuzzy MLP | 84.2 | Pal |
SSV dec. tree, beam search | 83.3±0.9 | Kosice |
SSV dec. tree, best first | 83.0±1.0 | Kosice |
Bayes Classifier | 79.2 | Pal |
Fuzzy SOM | 73.5 | Pal |
Parameters in SVM were optimized, that is in each CV different paramters were used, so only approximate value can be quoted. If they are fixed to C=1000, s=1 results are a bit worse.
Papers using this data:
Source: UCI, described in Forina, M. et al, PARVUS - An Extendible Package for Data Exploration, Classification and Correlation. Institute of Pharmaceutical and Food Analysis and Technologies, Via Brigata Salerno, 16147 Genoa, Italy.
These data are the results of a chemical analysis of wines grown in the same region in Italy but derived from three different cultivars. The analysis determined the quantities of 13 constituents found in each of the three types of wines.
Class distribution: 178 cases = [59, 71, 48] in Class 1-3;
13 continuous attributes: alcohol, malic-acid, ash, alkalinity, magnesium, phenols, flavanoids, nonanthocyanins, proanthocyanins, color, hue, OD280/D315, proline.
Method | Test | Reference |
Leave-one-out test results | ||
RDA | 100 | [1] |
QDA | 99.4 | [1] |
LDA | 98.9 | [1] |
kNN, Manhattan, k=1 | 98.7 | GM-WD, std data |
1NN | 96.1 | [1] z-transformed data |
kNN, Euclidean, k=1 | 95.5 | GM-WD, std data |
kNN, Chebyshev, k=1 | 93.3 | GM-WD, std data |
10xCV tests below | ||
kNN, Manhattan, auto k=1-10 | 98.9±2.3 | GM-WD, 2D data, after MDS/PCA |
IncNet, 10CV, def, Gauss | 98.9±2.4 | GM-WD, std data, up to 3 neurons |
10 CV SSV, opt prune | 98.3±2.7 | GM-WD, 2D data, after MDS/PCA |
10 CV SSV, node count 7 | 98.3±2.7 | GM-WD, 2D data, after MDS/PCA |
kNN, Euclidean, k=1 | 97.8±2.8 | GM-WD, 2D data, after MDS/PCA |
kNN, Manhattan, k=1 | 97.8±2.9 | GM-WD, 2D data, after MDS/PCA |
kNN, Manhattan, auto k=1-10 | 97.8±3.9 | GM-WD |
kNN, Euclidean, k=3, weighted features | 97.8±4.7 | GM-WD |
IncNet, 10CV, def, bicentral | 97.2±2.9 | GM-WD, std data, up to 3 neurons |
kNN, Euclidean, auto k=1-10 | 97.2±4.0 | GM-WD |
10 CV SSV, opt node | 97.2±5.4 | GM-WD, 2D data, after MDS/PCA |
FSM a=.99, def | 96.1±3.7 | GM-WD, 2D data, after MDS/PCA |
FSM 10CV, Gauss, a=.999 | 96.1±4.7 | GM-WD, std data, 8-11 neurons |
FSM 10CV, triang, a=.99 | 96.1±5.9 | GM-WD, raw data |
kNN, Euclidean, k=1 | 95.5±4.4 | GM-WD |
10 CV SSV, opt node, BFS | 92.8±3.7 | GM-WD |
10 CV SSV, opt node, BS | 91.6±6.5 | GM-WD |
10 CV SSV, opt prune, BFS | 90.4±6.1 | GM-WD |
UCI past usage:
[1] S. Aeberhard, D. Coomans and O. de Vel, Comparison of Classifiers in High Dimensional Settings, Tech. Rep. no. 92-02, (1992), Dept. of Computer Science and Dept. of Mathematics and Statistics, James Cook University of North Queensland (submitted to Technometrics).
[2] S. Aeberhard, D. Coomans and O. de Vel, "The classification performance of RDA" Tech. Rep. no. 92-01, (1992), Dept. of Computer Science and Dept. of Mathematics and Statistics, James Cook University of North Queensland (submitted to Journal of Chemometrics).
Shang, Breiman CART 71.4% accuracy, DB-CART 70.6%.
Leave-one-out results taken from C. Domeniconi, J. Peng, D. Gunopulos, "An adaptive metric for pattern classification".
Adaptive metric NN | 75.2 | |
Discriminant Adaptive NN, DANN | 72.9 | |
kNN | 72.0 | |
C4.5 | 68.2 |
DNA-Primate splice-junction gene sequences, with associated imperfect domain theory.
Stalog Data: splice junctions are points on a DNA sequence at which `superfluous' DNA is removed during the process of protein creation in higher organisms. The problem posed in this dataset is to recognize, given a sequence of DNA, the boundaries between exons (the parts of the DNA sequence retained after splicing) and introns (the parts of the DNA sequence that are spliced out).
This problem consists of two subtasks: recognizing exon/intron boundaries (referred to as EI sites), and recognizing intron/exon boundaries (IE sites). (In the biological community, IE borders are referred to a "acceptors'' while EI borders are referred to as "donors''.)
Number of Instances: 3190. Class distribution:
Class | Train | Test |
1 | 464 (23.20%) | 303 (25.55%) |
2 | 485 (24.25%) | 280 (23.61%) |
3 | 1051 (52.55%) | 603 (50.84%) |
All | 2000 (100%) | 1186 (100%) |
Number of attributes: originally 60 attributes {a,c,t,g}, usually converted to 180 binary indicator variables {(0,0,0), (0,0,1), (0,1,0), (1,0,0)}, or 240 binary variables.
Much better performance is generally observed if attributes closest to the junction are used (middle). In the StatLog version (180 variables), this means using attributes A61 to A120 only.
Method | % in training | % on test | Time train | Time test |
RBF, 720 nodes | 98.5 | 95.9 | ||
k-NN GM, p(X|C), k=6, Euclid, raw | 96.8 | 95.5 | 0 | short |
Dipol92 | 99.3 | 95.2 | 213 | 10 |
Alloc80 | 93.7 | 94.3 | 14394 | -- |
QuaDisc | 100.0 | 94.1 | 1581 | 809 |
LDA, Discrim | 96.6 | 94.1 | 929 | 31 |
FSM, 8 Gaussians, 180 binary | 95.4 | 94.0 | ||
Log DA, Disc | 99.2 | 93.9 | 5057 | 76 |
SSV Tree, p(X|C), opt node, 4CV | 94.8 | 93.4 | short | short |
Naive Bayes | 94.8 | 93.2 | 52 | 15 |
Castle, middle 90 binary var | 93.9 | 92.8 | 397 | 225 |
IndCart, 180 binary | 96.0 | 92.7 | 523 | 516 |
C4.5, on 60 features | 96.0 | 92.4 | 9 | 2 |
CART, middle 90 binary var | 92.5 | 91.5 | 615 | 9 |
MLP+BP | 98.6 | 91.2 | 4094 | 9 |
Bayesian Tree | 99.9 | 90.5 | 82 | 11 |
CN2 | 99.8 | 90.5 | 869 | 74 |
New ID | 100.0 | 90.0 | 698 | 1 |
Ac2 | 100.0 | 90.0 | 12378 | 87 |
Smart | 96.6 | 88.5 | 79676 | 16 |
Cal5 | 89.6 | 86.9 | 1616 | 8 |
Itrule | 86.9 | 86.5 | 2212 | 6 |
k-NN | 91.1 | 85.4 | 2428 | 882 |
Kohonen | 89.6 | 66.1 | - | - |
Default, majority | 52.5 | 50.8 |
kNN GM - GhostMiner version of kNN (our group)
SSV Decision Tree - our results