Developing and validating Parkinson’s disease subtypes and their motor and cognitive progression

Objectives To use a data-driven approach to determine the existence and natural history of subtypes of Parkinson’s disease (PD) using two large independent cohorts of patients newly diagnosed with this condition. Methods 1601 and 944 patients with idiopathic PD, from Tracking Parkinson’s and Discovery cohorts, respectively, were evaluated in motor, cognitive and non-motor domains at the baseline assessment. Patients were recently diagnosed at entry (within 3.5 years of diagnosis) and were followed up every 18 months. We used a factor analysis followed by a k-means cluster analysis, while prognosis was measured using random slope and intercept models. Results We identified four clusters: (1) fast motor progression with symmetrical motor disease, poor olfaction, cognition and postural hypotension; (2) mild motor and non-motor disease with intermediate motor progression; (3) severe motor disease, poor psychological well-being and poor sleep with an intermediate motor progression; (4) slow motor progression with tremor-dominant, unilateral disease. Clusters were moderately to substantially stable across the two cohorts (kappa 0.58). Cluster 1 had the fastest motor progression in Tracking Parkinson’s at 3.2 (95% CI 2.8 to 3.6) UPDRS III points per year while cluster 4 had the slowest at 0.6 (0.1–1.1). In Tracking Parkinson’s, cluster 2 had the largest response to levodopa 36.3% and cluster 4 the lowest 28.8%. Conclusions We have found four novel clusters that replicated well across two independent early PD cohorts and were associated with levodopa response and motor progression rates. This has potential implications for better understanding disease pathophysiology and the relevance of patient stratification in future clinical trials.


Patient evaluation
The Tracking Parkinson's cohort began measuring olfaction using the University of Pennsylvania Smell Identification Test (UPSIT) before changing to the Sniffin' sticks 16 item odour identification test when there became a difficulty in obtaining the UPSIT kits. The Discovery cohort only measured olfaction using the Sniffin' sticks 16 item odour identification test. We used IRT methods to convert the UPSIT scores to equivalent Sniffin' 16 scores 1 . We also used equipercentile methods to convert Leeds Anxiety and Depression scale into the more commonly used Hospital Anxiety and Depression scale.
We consider the L-dopa challenge as a percentage change by dividing the difference in pre and post dose total MDS-UPDRS III score measurements by the pre dose measure. A pragmatic levodopa challenge test was performed only in consenting patients who were already taking levodopa medication by the time of their 18 month (Discovery) or 24 month (Tracking Parkinson's) visit. Patients were asked to omit their usual levodopa dose approximately 12 hours before the morning challenge test. Patients who were also taking levodopa agonist medication, MAO-B or COMT inhibitors were also asked to omit these 12 hours before the challenge test (or 24 hours before if taking once daily dopamine agonist formulations). During the levodopa challenge, the patient was given their usual dose of oral levodopa with peripheral dopa-decarboxylase inhibitor, rather than a supramaximal standard dose of levodopa, and the MDS-UPDRS III performed at baseline and 1 hour later to assess response by a trained neurologist.

Statistical Analysis
Since k-means cluster analysis is not a statistical model per-say (it does not measure the uncertainty in any model estimates as it is just an algorithm) so it is not possible to use Rubin's rules to collate the 10 imputed datasets into one model. So for simplicity we used the data from our 10 multiply imputed datasets to create one single dataset (after carrying out the confirmatory factor analysis which is a statistical model) by taking the average for each variable across all 10 datasets. Also note that the amount of missing data we had was small and unlikely to bias our results in anyway. After taking into account those individuals who answered between 80-100% of a questionnaire in Tracking Parkinson's we had between 0.9%-5.3% missing baseline data (although the BFI and Sniffin' scores had ~10% missing data because they were collected at six months post baseline and were hence affected by drop-out) whilst in Discovery we had between 0.4% -4.8% missing data.
Our factor analysis consisted of first an exploratory factor analysis in the Discovery cohort followed by a confirmatory factor analysis (CFA) in the Tracking Parkinson's cohort. In the CFA we examined the following goodness of fit statistics: Comparative Fit Index (CFI), Tucker-Lewis Index (TLI) and the Root Mean Square Error of Approximation (RMSEA). A model was considered to fit the data well if CFI was ≥ 0.90, TLI ≥ 0.90 and RMSEA ≤ 0.08 The same algorithm that produced the factor scores in Tracking Parkinson's was used in Discovery to ensure comparability between these variables across the cohorts.
Pattern-mixture models were estimated using a Structural Equation Modelling approach within Mplus. We constrained all the variances of the outcomes equal across the clusters and time-points. The variances and covariance of the latent (random) intercept and slope were constrained to be equal across the clusters. Our pattern-mixture model was defined such that all patients withdrawing were considered the same, i.e withdrawing after visit 2 was considered to have the same effect on the intercept and slope as withdrawing after baseline.

Exploratory factor analysis (EFA)
In Discovery using the eigenvalue criteria we found 2 factors in each of the ten imputed datasets. The two factors identified were identical to the psychological well-being and nontremor motor factors from the original paper EFA 2 (ignoring unavailable variables not in both datasets, that is the Purdue tests, the Get Up and Go Test and the flamingo test) except that the non-tremor motor factor also included the MoCA and semantic fluency variables.

Confirmatory factor analysis (CFA)
The CFA in the Tracking Parkinson's cohort using the variables from the EFA only met one of our pre-defined goodness of fit criteria with a CFI of 0.83, TLI of 0.87, and a RMSEA of 0.078. Although the cognitive and motor variables in the second factor might be highly correlated we thought that clinically it did not make sense to include them within the same factor. Dropping the two cognitive variables from this factor improved the goodness of fit with a CFI of 0.91, TLI of 0.93, and a RMSEA of 0.063. Therefore we excluded the cognitive variables from this factor with main manuscript Table 2 displaying the results from the resulting CFA. We named factor 1 "psychological well-being" and factor 2 "non-tremor motor" matching our original paper. The factor loadings varied from 0.31 -0.86 and 0.42 -0.77 in the two factors respectively. At this stage we excluded the other four BFI variables not loading into a factor and the UPDRS constipation variable for the sake of parsimony, which is the same approach used in our original paper. We also excluded the semantic fluency variable since we thought that the MoCA was a better measure of global cognitive function.

Cluster Analysischoosing number of clusters
Web table 1 shows the statistics we used to determine the optimum number of clusters which pointed to the two and five cluster solutions in both Tracking Parkinson's and Discovery. So we used criteria other than these fit statistics to decide which was the optimum number of clusters.
We considered, initially, the agreement between our k-means clusters in the Discovery cohort and the clusters predicted from our Tracking Parkinson's discriminant model. The two cluster solution had excellent overall agreement (91.5%) and a kappa statistic consistent with "almost perfect" agreement (0.83) however because this only stratified individuals into a good and bad group this was not regarded as clinically that informative. The five cluster solution had the same overall agreement (67.9%) as the four cluster solution and a higher kappa statistic 0.60 compared to 0.58. However these kappa statistics are almost equivalent and both close to the borders of what would be considered "moderate" to "substantial" agreement. We chose the four cluster solution because it is more parsimonious than the five cluster.

Comparison of prognosis by clusters between Tracking Parkinson's and Discovery
We looked at what would happen if we relaxed the assumptions in our pattern-mixture model such that variances of the outcomes are equal across time-points or clusters and the variances/covariances of the random effects are equal across clusters. So we also fitted the following models for UPDRS III and compared commonly used goodness of fit statistics like AIC, BIC as well as likelihood ratio tests.
1. Variances and covariance of random effects are different for the four clusters 2. Variances of the outcomes are different at each time-point 3. Combining assumptions for models 1 and 2 above. 4. As model 3 but variances of the outcomes are also different within each cluster Using the goodness of fit statistics in Tracking PD we would select model 3 over our standard PMM model. However within all models the largest difference in mean progression rate (compared to standard PMM model) in any cluster was 0.22 UPDRS III points per year so almost identical to our default model. Using the same model (which was also favoured by AIC and likelihood ratio tests) in Discovery the largest difference in mean progression rate in any cluster was 0.12 UPDRS III points per year and the difference between clusters p-value increased from 0.04 to 0.10. However if we used the BIC to select a model in Discovery we would have selected the default model. Hence we are confident that the assumptions we made in our model has not made any impact on our progression rate estimates. We will further explore these issues along with non-linearity in a future paper when we have more longitud ina l data available. 10.8 (7.0) 8.9 (5.8) 7.0 (4.9) 15.0 (7.3) 15.0 (6.5) 13.0 (6.5) 15.0 (6.1) 9.9 (5.2) 16.3 (6.9) 12.6 (5.