Maxmin Data Range Heuristic-Based Initial Centroid Method of Partitional Clustering for Big Data Mining

Kamlesh Kumar Pandey, Diwakar Shukla

Source Title: International Journal of Information Retrieval Research (IJIRR)12(1)

ISSN: 2155-6377|EISSN: 2155-6385|EISBN13: 9781683182085|DOI: 10.4018/IJIRR.289954

MLA

Pandey, Kamlesh Kumar, and Diwakar Shukla. "Maxmin Data Range Heuristic-Based Initial Centroid Method of Partitional Clustering for Big Data Mining." IJIRR vol.12, no.1 2022: pp.1-22. http://doi.org/10.4018/IJIRR.289954

APA

Pandey, K. K. & Shukla, D. (2022). Maxmin Data Range Heuristic-Based Initial Centroid Method of Partitional Clustering for Big Data Mining. International Journal of Information Retrieval Research (IJIRR), 12(1), 1-22. http://doi.org/10.4018/IJIRR.289954

Chicago

Pandey, Kamlesh Kumar, and Diwakar Shukla. "Maxmin Data Range Heuristic-Based Initial Centroid Method of Partitional Clustering for Big Data Mining," International Journal of Information Retrieval Research (IJIRR) 12, no.1: 1-22. http://doi.org/10.4018/IJIRR.289954

Export Reference

Favorite Full-Issue Download

View Full Text HTML

View Full Text PDF

Abstract

The centroid-based clustering algorithm depends on the number of clusters, initial centroid, distance measures, and statistical approach of central tendencies. The initial centroid initialization algorithm defines convergence speed, computing efficiency, execution time, scalability, memory utilization, and performance issues for big data clustering. Nowadays various researchers have proposed the cluster initialization techniques, where some initialization techniques reduce the number of iterations with the lowest cluster quality, and some initialization techniques increase the cluster quality with high iterations. For these reasons, this study proposed the initial centroid initialization based Maxmin Data Range Heuristic (MDRH) method for K-Means (KM) clustering that reduces the execution times, iterations, and improves quality for big data clustering. The proposed MDRH method has compared against the classical KM and KM++ algorithms with four real datasets. The MDRH method has achieved better effectiveness and efficiency over RS, DB, CH, SC, IS, and CT quantitative measurements.