Date of Graduation


Document Type


Degree Type



Statler College of Engineering and Mineral Resources


Lane Department of Computer Science and Electrical Engineering

Committee Chair

Afzel Noore

Committee Co-Chair

Yaser Fallah

Committee Member

Xin Li


Support Vector Machines (SVMs) is a popular and highly performing classification model with a very good generalization ability for many classification applications. This method uses kernels to classify data that are not linearly separable. The added complexity of the kernels that map data to a higher dimensional space degrade the SVM classifier performance when dealing with large datasets. Moreover, classifying a dataset by choosing an appropriate kernel and finding the best set of its parameters is challenging. Failing in this process can easily cause to an overfit problem.;In this thesis we propose the Piece-wise Linear SVM (PWLSVM) using MagKmeans clustering to address the complexity and computational cost of SVMs. We use a linear SVM to overcome the complexity of dealing with the kernels, and a MagKmeans clustering to cluster the data into balanced groups. MagKmeans which is a supervised technique clusters equal number of each class in one group. It ensures that a linear SVM has balanced training samples for each class and can attain an accurate model.;The detailed mathematical formulation and modeling of the proposed Distributed MagKmeans (D-MagKmeans) is presented. The algorithm uses a Distributed MagKmeans clustering approach to transform the PWLSVM to Distributed Piece-wise Linear SVM (D-PWLSVM). The proposed D-MagKmeans clustering approach makes the MagKmeans clustering work in distributed network by only passing the centroid of each cluster in one node only to its one-hop neighbors. This feature of the D-Magkmeans makes our approach appropriate for a distributed processing and decision making while maximizing privacy and minimizing the communication overload.;The proposed algorithm was validated using four datasets in terms of dimensions on the features and the number of samples. Pima Indian Diabetes, with 768 samples and 8 features, is the smallest dataset of the four. We also examined Abalone, with 4177 samples and 8 features, Waveform, with 5000 samples and 22 features, and EHarmony, with over half million samples and 116 features. The results reveal that a reasonable trade-off is required when dealing with a large dataset. The results also illustrate that PWLSVM and D-PWLSVM outperform SVMs on a relatively large dataset, such as EHarmony, Abalone, and Waveform.