Parkinson's disease severity clustering based on gait activity from mobile device.
Panyawut Sri-Iesaranusorn, Warisara Asawaponwiput, Pongsakorn Ajchariyasakchai, Roongroj Bhidayasiri, Decho Surangsrirat
Abstract
Open AccessParkinson's disease (PD) is a neurodegenerative disorder characterized by motor symptoms, including gait impairments, which significantly affect patient mobility and quality of life. An accurate assessment of the severity of PD is crucial for clinical management. This study investigates the utility of smartphone-derived gait data to objectively cluster PD severity using unsupervised machine learning, with the aim of improving precision in disease monitoring. We analyze gait data from the mPower dataset, comprising 8779 accelerometer recordings from 1957 participants (PD patients and healthy controls). Stride cycles were segmented using frequency analysis and peak detection, followed by sequence padding to standardize input lengths. K-means clustering with dynamic time warping (DTW) was applied to identify gait patterns, while autoencoder embeddings and t-SNE visualized high-dimensional data. The groups were correlated with the MDS-UPDRS scores (Parts I and II) to assess severity. Four distinct gait clusters were identified, correlating with the severity of PD. The most severe group (Cluster 1) exhibited significantly higher MDS-UPDRS scores for balance/walking problems (2.43×) and freezing episodes (8.41×) compared to the least severe group (Cluster 4). The visualization of t-SNE confirmed the clear separation of the clusters, with higher severity scores concentrated in cluster 1. Sequence padding showed no significant impact on clustering outcomes (p > 0.05), validating its use for handling variable-length data. This study demonstrates that smartphone-derived gait patterns, analyzed via unsupervised clustering and visualization techniques, effectively stratify PD severity. Gait features related to balance, freezing, and walking difficulties are critical biomarkers for disease progression. Key advantage of our technique is the use of unsupervised learning to identify latent patterns without preconceived group assumptions, allowing subgroups to emerge organically and providing an unbiased exploration of gait-pattern relationships with Parkinson's severity. While limitations include the reliance on self-reported MDS-UPDRS data and k-means algorithm variability, these findings highlight the potential of wearable sensors and machine learning to develop objective, scalable tools for PD assessment.