EBSVM: A Sparsification Method from LP-Based SVM via Stochastic Functional ξ-Analysis

Rezaul Karim; Zakir Hossain; Amit Kumar Kundu; Ali Ahmed Ave

doi:10.31449/inf.v49i28.8742

EBSVM: A Sparsification Method from LP-Based SVM via Stochastic Functional ξ-Analysis

Rezaul Karim, Zakir Hossain, Amit Kumar Kundu, Ali Ahmed Ave

Abstract

While SVM being a solid mathematical and state-of-the-art sparse but powerful stochastic machine with top accuracy, Vapnik’s LP based SVM is sparser than the QP generated one though offers similar accuracy. Yet, further sparsity is essential to efficiently work with very complicated and large datasets. In our work, we propose a novel efficient method, EBSVM by imposing additional sparsification on this sparser LPSVM maintaining optimal complexity of the machine to further reduce classification cost while preserving similar accuracy by analyzing the stochastic auxiliary variable, ξ after a compact deterministic mapping of data. Experiments with Gunnar Rätsch benchmark data as well as synthetic data reveals that our model demands classification cost much smaller than the cutting-edge prominent classifiers while posing similar accuracy despite of being simpler to be implemented. On average, it costs 12.4% of the standard QPSVM, 50.3% of the VLPSVM, 68.1% of the EESVM [1], 98.1% of the greedy SpSVM-2 [2]. It insists the least MAC (Machine Accuracy Cost) value as well as GFR(Generalization Failure Rate) considering these machines. Numerical calculus based noise-sensitivity analysis proves our model’s potentiality in working with noisy data. This machine is also expected to work with high efficiency in case of very large and complex data satisfying constraints in real-time.

Full Text:

PDF

References

A. K. Kundu, R. Karim, and A. A. Ave,

“Escalating svm-efficiency by sequential qp

lp and slack variable analysis,” Journal

of Computer Science, vol. 19, no. 10,

pp. 1253–1262, 2023. [Online]. Available:

https://thescipub.com/abstract/jcssp

.2023.1253.1262

S. S. Keerthi, O. Chapelle, and D. De-

Coste, “Building support vector machines

with reduced classifier complexity,” Jour-

nal of Machine Learning Research, vol. 7,

pp. 1493–1515, 2006. [Online]. Available:

https://www.jmlr.org/papers/v7/

keerthi06a.html

I. Steinwart, “Sparseness of support vector

machines—some asymptotically sharp bounds,” in Advances in Neural Information Processing Systems 16, S. Thrun,

L. K. Saul, and B. Sch¨olkopf, Eds.,

, pp. 1069–1076. [Online]. Available:

https://proceedings.neurips.cc/paper/2003/

file/4c8c76b39d294759a9000cbda3a6571a-

Paper.pdf

T. Downs, K. E. Gates, and A. Mas-

ters, “Exact simplification of support

vector solutions,” Journal of Ma-

chine Learning Research, vol. 2,

pp. 293–297, 2001. [Online]. Available:

https://dl.acm.org/doi/10.5555/944790.944814

T. Joachims and C.-N. J. Yu, “Sparse

kernel svms via cutting-plane training,” in

Machine Learning and Knowledge Discov-

ery in Databases, ser. Lecture Notes in

Computer Science, vol. 5781. Springer,

, pp. 47–62. [Online]. Available:

https://link.springer.com/chapter/10.1007/

-3-642-04180-8 8

A. Cotter, S. Shalev-Shwartz, and N. Sre-

bro, “Learning optimally sparse support

vector machines,” in Proceedings of the 30th

International Conference on Machine Learn-

ing, 2013, pp. 266–274. [Online]. Available:

https://proceedings.mlr.press/v28/cotter13

.html

S. Romdhani, P. Torr, B. Sch¨olkopf,

and A. Blake, “Efficient face detec-

tion by a cascaded support–vector ma-

chine expansion,” vol. 460, no. 2051,

, pp. 3283–3297. [Online]. Available:

https://royalsocietypublishing.org/doi/10.

/rspa.2004.1333

M. Wu, B. Sch¨olkopf, and G. Bakır,

“A direct method for building sparse

kernel learning algorithms,” Journal of

Machine Learning Research, vol. 7,

pp. 603–624, 2006. [Online]. Available:

https://jmlr.org/papers/volume7/wu06a

/wu06a.pdf

M. R¨atsch, S. Romdhani, G. Teschke, and

T. Vetter, “Over-complete wavelet approxi-

mation of a support vector machine for effi-

cient classification,” in Pattern Recognition:

th DAGM Symposium, ser. Lecture Notes

in Computer Science, vol. 3663. Springer,

, pp. 351–360. [Online]. Available:

https://link.springer.com/chapter/10.1007/

B. Heisele, T. Serre, S. Prentice, and

T. Poggio, “Hierarchical classification

and feature reduction for fast face

detection with support vector machines,”

Pattern Recognition, vol. 36, no. 9,

pp. 2007–2017, 2003. [Online]. Available:

https://www.sciencedirect.com/science/article

/abs/pii/S0031320303000621

R. Karim, M. Bergtholdt, J. H. Kappes,

and C. Schn¨orr, “Greedy-based design

of sparse two-stage svms for fast classi-

fication,” in Pattern Recognition: 29th

DAGM Symposium, ser. Lecture Notes in

Computer Science, vol. 4713. Springer,

, pp. 395–404. [Online]. Available:

https://link.springer.com/chapter/10.1007/978-

-540-74936-3 40

K. Z. Arreola, J. Fehr, and H. Burkhardt,

“Fast support vector machine classification

using linear svms,” in Proceedings of the 18th

International Conference on Pattern Recog-

nition (ICPR), 2006, pp. 366–369.

H. Sahbi and D. Geman, “A hier-

archy of support vector machines for

pattern detection,” Journal of Ma-

chine Learning Research, vol. 7, pp.

–2123, 2006. [Online]. Available:

https://jmlr.org/papers/v7/sahbi06a.html

S. Maji, A. C. Berg, and J. Malik,

“Efficient classification for additive kernel

svms,” IEEE Transactions on Pattern

Analysis and Machine Intelligence, vol. 35,

no. 1, pp. 66–77, 2013. [Online]. Available:

https://acberg.com/papers/mbm2012pami.pdf

ˇLubor Ladick´y and P. H. S. Torr,

“Locally linear support vector machines,”

in Proceedings of the 28th International

Conference on Machine Learning (ICML),

, pp. 985–992. [Online]. Available:

https://icml.cc/Conferences/2011/papers/508

icmlpaper.pdf

Z. Xu, J. Gardner, S. Tyree,

and K. Weinberger, “Compressed sup-

port vector machines,” arXiv preprint

arXiv:1501.06478, 2015. [Online]. Available:

https://arxiv.org/abs/1501.06478

J. Li and Y. Zhang, “Learning surf cas-

cade for fast and accurate object detection,”

in Proceedings of the IEEE Conference on

Computer Vision and Pattern Recognition

(CVPR), 2013, pp. 3468–3475.

V. C. Raykar, B. Krishnapuram, and

S. Yu, “Designing efficient cascaded classi-

fiers: Tradeoff between accuracy and cost,”

in Proceedings of the 16th ACM SIGKDD

International Conference on Knowledge Dis-

covery and Data Mining, ser. KDD ’10.

ACM, 2010, pp. 853–860. [Online]. Available:

https://dl.acm.org/doi/10.1145/1835804.

R. Karim, A. K. Kundu, and A. A.

Ave, “Sequential mathematical program-

ming with ζ-analysis for hesvm,” in Math-

ematics and Computer Science: Contem-

porary Developments. BP International,

, vol. 7, pp. 144–169. [Online].

Available: https://stm.bookpi.org/MCSCD-

V7/article/view/16162

Z. Fu, A. Robles-Kelly, and J. Zhou,

“Mixing linear svms for nonlinear clas-

sification,” IEEE Transactions on Neu-

ral Networks, vol. 21, no. 12, pp.

–1975, 2010. [Online]. Available:

https://doi.org/10.1109/TNN.2010.2080319

R. Karim and A. K. Kundu, “Efficiency

and performance analysis of a sparse and

powerful second order svm based on lp

and qp,” International Journal of Advanced

Computer Science and Applications, vol. 9,

no. 2, pp. 311–318, 2018. [Online]. Available:

https://thesai.org/Publications/ViewPaper?

Volume=9&Issue=2&Code=IJACSA&

SerialNo=44

——, “Computational analysis to reduce

classification cost keeping high accuracy of

the sparser lpsvm,” International Journal of

Machine Learning and Computing, vol. 9,

no. 6, pp. 728–733, 2019. [Online]. Available:

https://www.ijmlc.org/vol9/865-L0294.pdf

V. Franc and V. Hlav´aˇc, “Greedy algorithm

for a training set reduction in the kernel

methods,” in Computer Analysis of Images

and Patterns (CAIP), ser. Lecture Notes

in Computer Science, vol. 2756. Springer,

, pp. 426–433. [Online]. Available:

https://link.springer.com/chapter/10.1007/978-

-540-45179-2 53

Q. Gu and J. Han, “Clustered support

vector machines,” in Proceedings of the

th International Conference on Artificial

Intelligence and Statistics (AISTATS),

, pp. 307–315. [Online]. Available:

https://proceedings.mlr.press/v31/gu13b.html

R. Karim, M. Hasan, A. K. Kundu,

and A. A. Ave, “Lp svm with a novel

similarity function outperforms powerful

lp-qp-kernel-svm considering efficient clas-

sification,” Informatica, vol. 47, no. 8,

pp. 1–12, 2023. [Online]. Available:

https://www.informatica.si/index.php/

informatica/article/view/4767

S. Zhou, “Sparse SVM for Sufficient

Data Reduction,” IEEE Transactions

on Pattern Analysis and Machine

Intelligence, vol. 44, no. 9, pp.

–4958, 2022. [Online]. Available:

https://ieeexplore.ieee.org/document/9415153

D. Geebelen, J. A. K. Suykens, and J. Van-

dewalle, “Reducing the number of support

vectors of svm classifiers using the smoothed

separable case approximation,” IEEE Trans-

actions on Neural Networks and Learning

Systems, vol. 23, no. 4, pp. 682–688, 2012.

H. Wang, Y. Shao, S. Zhou, C. Zhang, and

N. Xiu, “Support Vector Machine Classifier

via L0/1 Soft-Margin Loss,” IEEE Transac-

tions on Pattern Analysis and Machine Intel-

ligence, vol. 44, no. 10, pp. 7253–7265, 2022.

J. Liu, L.-W. Huang, Y.-H. Shao,

W.-J. Chen, and C.-N. Li, “Nonlin-

ear kernel support vector machine with

-1 soft margin loss,” arXiv preprint

arXiv:2203.00399, 2022. [Online]. Available:

https://arxiv.org/abs/2203.00399

M. Wu, Z. Yang, and J. Ye,

“Nonlinear Kernel-Free Quadratic Hyper-

Surface Support Vector Machine with

-1 Loss Function,” arXiv preprint

arXiv:2404.10559, 2024. [Online]. Available:

https://arxiv.org/abs/2404.10559

M. Akhtar, M. Tanveer, and M. Arshad,

“RoBoSS: A robust, bounded, sparse, and

smooth loss function for supervised learn-

ing,” IEEE Transactions on Pattern Anal-

ysis and Machine Intelligence, 2024.

R. Lin, Y. Yao, and Y. Liu, “Ker-

nel support vector machine classifiers

with ℓ0-norm hinge loss,” Neurocomputing,

vol. 589, p. 127669, 2024. [Online]. Available:

https://www.sciencedirect.com/science/article

/abs/pii/S0925231224004405

H. Wang and Y. Shao, “Sparse and robust

svm classifier for large scale classification,”

Applied Intelligence, vol. 53, no. 16, pp.

647–19 671, 2023. [Online]. Available:

https://link.springer.com/article/10.1007/s10489-

-04511-w

——, “Fast truncated huber loss

svm for large scale classification,”

Knowledge-Based Systems, vol. 260,

p. 110074, 2023. [Online]. Available:

https://www.sciencedirect.com/science/article

/pii/S0950705122011674

H. Wang, Z. Zhu, and Y. Shao, “Fast sup-

port vector machine with low-computational

complexity for large-scale classification,”

IEEE Transactions on Systems, Man,

and Cybernetics: Systems, vol. 54,

pp. 4151–4163, 2024. [Online]. Available:

https://api.semanticscholar.org/CorpusID:

H. Wang, Y. Shao, and S. Zhou, “Fast svm

classifier for large-scale classification prob-

lems,” Information Sciences, vol. 625, p.

, 2023.

L. K. Hansen, “Stochastic linear learn-

ing: Exact test and training error

averages,” Neural Networks, vol. 6, no. 3,

pp. 393–396, 1993. [Online]. Available:

https://www.sciencedirect.com/science/article

/pii/089360809390006I

R. Karim, “Theoretical analysis for exact re-

sults in stochastic linear learning,” Master’s

thesis, Citeseer, 2004.

V. Kecman, “Support vector machines—an

introduction,” in Support Vector Ma-

chines: Theory and Applications, ser.

Studies in Fuzziness and Soft Comput-

ing, L. Wang, Ed. Springer, 2005,

vol. 177, pp. 1–47. [Online]. Available:

https://link.springer.com/chapter/10.1007

/10984697 1

S. Geman, E. Bienenstock, and R. Doursat,

“Neural networks and the bias/variance

dilemma,” Neural Computation, vol. 4,

no. 1, pp. 1–58, 1992. [Online]. Available:

https://doi.org/10.1162/neco.1992.4.1.1

I. Guyon, V. N. Vapnik, B. E.

Boser, L. Bottou, and S. A. Solla,

“Structural risk minimization for char-

acter recognition,” Advances in Neural

Information Processing Systems, pp.

–479, 1992. [Online]. Available:

https://proceedings.neurips.cc/paper/1991/

file/10a7cdd970fe135cf4f7bb55c0e3b59f-

Paper.pdf

D. C. Montgomery, E. A. Peck, and G. G.

Vining, Introduction to Linear Regression

Analysis, 6th ed. John Wiley & Sons, 2021.

C. J. Burges, “A tutorial on support vector

machines for pattern recognition,” Data

Mining and Knowledge Discovery, vol. 2,

no. 2, pp. 121–167, 1998. [Online]. Available:

https://doi.org/10.1023/A:1009715923555

C. M. Bishop, Neural Networks for

Pattern Recognition. Oxford: Claren-

don Press, 1995. [Online]. Available:

https://dl.acm.org/doi/10.5555/525960

V. Vapnik, The Nature of Statis-

tical Learning Theory. New York:

Springer-Verlag, 1995. [Online]. Available:

https://doi.org/10.1007/978-1-4757-2440-0

V. N. Vapnik, Estimation of Dependences

Based on Empirical Data. Springer, 1979.

V. Vapnik and A. Chervonenkis, “A note on

one class of perceptrons,” Automation and

Remote Control, vol. 25, 1964.

M. Awad and R. Khanna, “Support vec-

tor machines for classification,” in Effi-

cient Learning Machines. Berkeley, CA:

Apress, 2015, pp. 39–66. [Online]. Avail-

able: https://doi.org/10.1007/978-1-4302-

-9 3

T. M. Cover, “Geometrical and sta-

tistical properties of systems of linear

inequalities with applications in pattern

recognition,” IEEE Transactions on Elec-

tronic Computers, vol. EC-14, no. 3,

pp. 326–334, 1965. [Online]. Available:

https://doi.org/10.1109/PGEC.1965.264137

M. Aizerman, E. Braverman, and L. Rozo-

noer, “Theoretical foundations of the poten-

tial function method in pattern recognition

learning,” Automat. Remote Control, vol. 25,

pp. 821–837, 1964. [Online]. Available:

https://cs.uwaterloo.ca/ y328yu/classics/

kernel.pdf

K.-L. Du, B. Jiang, J. Lu, J. Hua,

and M. N. S. Swamy, “Exploring ker-

nel machines and support vector ma-

chines: Principles, techniques, and fu-

ture directions,” Mathematics, vol. 12,

no. 24, p. 3935, 2024. [Online]. Available:

https://doi.org/10.3390/math12243935

J. Mercer, “Functions of positive and neg-

ative type, and their connection with the

theory of integral equations,” Philosophi-

cal Transactions of the Royal Society of

London. Series A, Containing Papers of a

Mathematical or Physical Character, vol.

, pp. 415–446, 1909. [Online]. Available:

https://royalsocietypublishing.org/doi/

1098/rsta.1909.0016

R. Bellman, Dynamic Programming. Prince-

ton, NJ: Princeton University Press, 1957.

N. Aronszajn, “Theory of reproducing ker-

nels,” Transactions of the American Mathe-

matical Society, vol. 68, pp. 337–404, 1950.

F. Girosi, “An equivalence between sparse

approximation and support vector ma-

chines,” Artificial Intelligence Laboratory,

Massachusetts Institute of Technology, Tech.

Rep. AIM-1606, 1997. [Online]. Available:

https://dspace.mit.edu/handle/1721.1/7289

N. E. Heckman, “The theory and appli-

cation of penalized least squares methods

or reproducing kernel hilbert spaces made

easy,” Department of Statistics, University

of British Columbia, Tech. Rep. Tech-

nical Report 216, 1997. [Online]. Avail-

able: https://www.stat.ubc.ca/technical-

reports-archive/doc/216.pdf

G. Wahba, Spline Models for Ob-

servational Data, ser. CBMS-NSF Re-

gional Conference Series in Applied

Mathematics. Philadelphia: Society

for Industrial and Applied Mathemat-

ics, 1990, vol. 59. [Online]. Available:

https://epubs.siam.org/doi/book/10.1137/

9781611970128

S. Boyd and L. Vandenberghe, Con-

vex Optimization. Cambridge Uni-

versity Press, 2004. [Online]. Available:

https://doi.org/10.1017/CBO9780511804441

V. N. Vapnik, Statistical Learn-

ing Theory. Wiley, 1998. [Online].

Available: https://www.wiley.com/en-

us/Statistical+Learning+Theory-p-

A. Nefedov, J. Ye, C. Kulikowski,

I. Muchnik, and K. Morgan, “Experi-

mental study of support vector machines

based on linear and quadratic optimiza-

tion criteria,” DIMACS Technical Re-

port 2009-18, 2009. [Online]. Available:

https://doi.org/10.1109/icmla.2009.52

T. Suttorp and C. Igel, “Multi-objective

optimization of support vector machines,”

pp. 199–220, 2006. [Online]. Available:

https://doi.org/10.1007/11399346 9

N. Cristianini and J. Shawe-Taylor,

An Introduction to Support Vector Ma-

chines and Other Kernel-Based Learning

Methods. Cambridge: Cambridge Uni-

versity Press, 2000. [Online]. Available:

https://doi.org/10.1017/CBO9780511801389

A. Tharwat, A. E. Hassanien, and B. E. El-

naghi, “A ba-based algorithm for parame-

ter optimization of support vector machine,”

Pattern Recognition Letters, vol. 93, pp. 13–

, 2016.

S. R. Gunn, “Support vector machines

for classification and regression,” School

of Electronics and Computer Science, Uni-

versity of Southampton, Tech. Rep. ISIS

Technical Report, 1998. [Online]. Available:

https://eprints.soton.ac.uk/256459/

G. R¨atsch. Benchmark datasets.

Re-packaged and available on

GitHub. [Online]. Available:

https://github.com/tdiethe/gunnar raetsch

benchmark datasets

J. Kools, “6 functions for generating arti-

ficial datasets,” 2013. [Online]. Available:

https://www.mathworks.com/matlabcentral

/fileexchange/41459-6-functions-for-

generating-artificial-datasets

DOI: https://doi.org/10.31449/inf.v49i28.8742

This work is licensed under a Creative Commons Attribution 3.0 License.

Username
Password
Remember me