MethodsX

High performance GPU implementation of KNN algorithm: A review.

Pooja Bidye, Pradnya Borkar, Nitin Rakesh

Published: 202510.1016/j.mex.2025.103633

Abstract

Open Access

With large volumes of complex data generated by different applications, Machine Learning (ML) algorithms alone may not yield significant performance benefits on a single or multi-core CPU. Applying optimization techniques to these ML algorithms in a High-Performance Computing (HPC) environment can give considerable speedups for high-dimensional datasets. One of the most widely used classification algorithms, with applications in various domains, is the K-Nearest Neighbor (KNN). Despite its simplicity, KNN poses several challenges while handling high-dimensional data. However, the algorithm's inherent nature presents an opportunity for parallelization. This paper reviews the optimization techniques employed by several researchers to accelerate the KNN algorithm on a GPU platform. The study reveals that techniques such as coalesced-memory access, tiling with shared memory, chunking, data segmentation, and pivot-based partitioning significantly contribute towards speeding up the KNN algorithm to leverage the GPU capabilities. The algorithms reviewed have performed exceptionally well on high-dimensional data with speedups up to 750x for a dual-GPU platform and up to 1840x for a multi-GPU platform. This study serves as a valuable resource for researchers examining KNN acceleration in high-performance computing environments and its applications in various fields.

View at DOI