Objectives To use a data-driven approach to determine the existence and natural history of subtypes of Parkinson’s disease (PD) using two large independent cohorts of patients newly diagnosed with this condition.
Methods 1601 and 944 patients with idiopathic PD, from Tracking Parkinson’s and Discovery cohorts, respectively, were evaluated in motor, cognitive and non-motor domains at the baseline assessment. Patients were recently diagnosed at entry (within 3.5 years of diagnosis) and were followed up every 18 months. We used a factor analysis followed by a k-means cluster analysis, while prognosis was measured using random slope and intercept models.
Results We identified four clusters: (1) fast motor progression with symmetrical motor disease, poor olfaction, cognition and postural hypotension; (2) mild motor and non-motor disease with intermediate motor progression; (3) severe motor disease, poor psychological well-being and poor sleep with an intermediate motor progression; (4) slow motor progression with tremor-dominant, unilateral disease. Clusters were moderately to substantially stable across the two cohorts (kappa 0.58). Cluster 1 had the fastest motor progression in Tracking Parkinson’s at 3.2 (95% CI 2.8 to 3.6) UPDRS III points per year while cluster 4 had the slowest at 0.6 (0.1–1.1). In Tracking Parkinson’s, cluster 2 had the largest response to levodopa 36.3% and cluster 4 the lowest 28.8%.
Conclusions We have found four novel clusters that replicated well across two independent early PD cohorts and were associated with levodopa response and motor progression rates. This has potential implications for better understanding disease pathophysiology and the relevance of patient stratification in future clinical trials.
This is an Open access article distributed in accordance with the Creative Commons Attribution 4.0 Unported (CC BY 4.0) license, which permits others to copy, redistribute, remix, transform and build upon this work for any purpose, provided the original work is properly cited, a link to the licence is given, and indication of whether changes were made. See: https://creativecommons.org/licenses/by/4.0/.
Statistics from Altmetric.com
Parkinson’s disease (PD) is a progressive neurodegenerative disorder characterised by a wide range of motor and non-motor features, for which there is no known cure. However, therapeutic strategies might soon be available with prolonged benefits that could affect the underlying pathogenesis, and hence delay or ultimately prevent the inexorable course of this disease. To date, none of the 16 drugs evaluated for PD disease modification have succeeded in phase III trials, with a further eight compounds currently in the discovery pipeline.1 PD is an inherently complex disorder with known heterogeneity in terms of clinical presentation as well as rate of progression and risk of disease complications. The basis for this is only now starting to be understood, in terms of the role of genetic factors, for example, glucocerebrosidase gene mutations. The implications for future clinical trial design—if patient heterogeneity is ignored at baseline study selection, leading to potential confounds and misinterpretation of subsequent progression/complication rates—are highly significant.
Few naturalistic cohort studies in PD have been undertaken using large numbers of representative, community-ascertained patients, unselected on the basis of age or family history, and prospectively followed early from diagnosis. Such cohorts would more faithfully recapitulate disease evolution in the true-to-life populations encountered in clinical practice, where disease progression reflects both pathophysiology and any treatment effects, as reported in the CamPaIGN study.2
Data-driven approaches to delineate subtypes using cohorts of incident PD as well as cross-sectional studies3–7 have hypothesised that there are different PD subtypes. Better defining these subtypes will be important for understanding the aetiology of the disease, discovering biomarkers related to prognosis and for stratified medicine, including the discovery and response to new medications.8 In this study, we sought to better explore this aspect of PD using two large independent cohorts of newly diagnosed PD and in particular the number of distinct disease subtypes, their levodopa responsiveness and rate of motor and cognitive decline. This extends our previous work in this area using only one of the two cohorts (Discovery), without assessing levodopa responsiveness or the subsequent rate of motor and cognitive decline.9
Materials and methods
Tracking Parkinson’s is a prospective cohort of recently diagnosed patients with PD who were recruited from around the UK between February 2012 and May 2014. Full details of this cohort and full inclusion/exclusion criteria have been published elsewhere.10 The Oxford Parkinson’s Disease Centre Discovery cohort (hereafter referred to as Discovery) is also a prospective cohort of recently diagnosed patients with PD who were recruited from 11 hospitals in the Thames Valley region between September 2010 and January 2016. Full details of the Discovery cohort and full inclusion/exclusion criteria have also been published elsewhere.11 In both studies, patients were defined as recently diagnosed if recruited within 3.5 years of diagnosis. In order to exclude patients with similar conditions that may have been incorrectly diagnosed as PD, we only included individuals in both cohorts if they had a probability of PD ≥90% as rated by a research neurologist/movement disorder specialist at their latest visit. Patients have been (and are continuing to be) followed up every 18 months. Both studies were funded by Parkinson’s UK.
Assessments of patients were via self-completed questionnaires and from outpatient clinics using standardised and validated scales both at baseline and follow-up. Variables used in this analysis were those adopted in our original cluster analysis paper9 and which were also included in the Tracking Parkinson’s cohort, and these included the Movement Disorders Society (MDS) revised Unified Parkinson’s Disease Rating Scale (UPDRS), where part III was measured in the ‘on’ state; Big Five Inventory; Epworth Sleepiness Scale; REM Sleep Behaviour Disorder Screening Questionnaire; Hospital Anxiety and Depression Scale; Questionnaire for Impulsive-Compulsive Disorders in Parkinson’s Disease; Honolulu Asia Aging Study Constipation Questionnaire; Montreal Cognitive Assessment (MoCA) adjusted for education years; Semantic verbal fluency (animals); Orthostatic blood pressure measurement; and Sniffin’ 16 odour identification scores. The levodopa equivalent daily dose (LEDD) was calculated from medication use questionnaires using established formulae.12 In addition, a levodopa challenge was carried out giving us a quantitative measure of response to medication (see online supplementary e-appendix for more details on methods).
We imputed missing data using the mean score if 80% or more questions were answered in any given test. Additionally, missing baseline data were imputed using the chained equations approach separately in the two cohorts. Factor analysis was used as a variable reduction technique on all the baseline phenotypic variables (details in online supplementary e-appendix). We then derived the clusters by using a k-means analysis of the factor scores and other baseline phenotypic variables not loading into one of the factors. Variables were standardised separately within each cohort to ensure that each variable had the same weighting within the k-means analysis. Further details are described in our previous publication.9
A discriminant analysis model was then fitted to the Tracking Parkinson’s clusters and used to predict clusters within the Discovery cohort. These predicted clusters were compared with the k-means clusters in the Discovery cohort to determine the stability of our approach. We used the kappa statistic to compute the extent of agreement and adopted accepted guidelines13 to determine the strength of this agreement.
We then carried out a between-cluster comparison of a range of clinical and demographic variables, which had not been used in the estimation of the clusters using analysis of variance and χ2 tests. We modelled important disease-related variables (UPDRS III and MoCA scores) longitudinally using multilevel random slope and intercept models to estimate disease progression by cluster. A sensitivity analysis using pattern-mixture models was carried out to determine whether patients lost to follow-up may potentially have biased our disease progression estimates.14
We analysed data on 1601 patients in Tracking Parkinson’s and 944 in Discovery (online web supplementary figure 1). Both cohorts had around 35% women, were predominantly white (>98%) and had an average age of diagnosis of about 66 years (see table 1). The disease duration from diagnosis was on average 1.2–1.3 years. Compared with Tracking Parkinson’s, the Discovery cohort had more severe motor disease as measured by the UPDRS III and disease severity as measured by the Hoehn and Yahr or the sum score of UPDRS parts I–IV (p<0.001), and slightly worse average cognition as measured by the MoCA. However, the Tracking Parkinson’s cohort had worse motor aspects of experiences of daily living (UPDRS II) and motor complications (UPDRS IV) and had fewer untreated patients.
In our factor analysis, we found two factors: a psychological well-being and a non-tremor motor factor (table 2), as we reported previously.9 This shows that within our baseline phenotypic variables, we had multiple variables related to psychological well-being and to non-tremor motor function that were highly correlated. Using the statistics in web supplementary table 1 helped us decide that four clusters gave us an optimal solution. Figure 1 highlights the important features of the clusters and figure 2 shows the average of each of the standardised variables within each cluster for the Tracking Parkinson’s and Discovery cohorts. The groups were arbitrarily ordered in terms of size for Tracking Parkinson’s, but for the Discovery cohort they were ordered by similarity to the Tracking Parkinson’s clusters. In general, the cluster patterns between the cohorts were fairly similar but with some differences (see below). Details of the clusters are available in web supplementary table 2, which shows all the scores from the different tests included in the cluster analysis and categorised scores using standard cut-points from the literature for easier clinical interpretation. More details of the factor and the cluster analysis can be found in the online supplementary e-appendix.
The following describes the clusters observed in Tracking Parkinson’s (unless otherwise stated). The fast motor progression (1) cluster had less advanced motor features and psychological well-being but worse than average non-motor features such as blood pressure postural drop, olfaction and cognition with more symmetrical motor disease. However, within the Discovery cohort, the non-tremor motor was worse, rather than better than average, for this cluster. The mild motor and non-motor disease (2) cluster showed a milder form of the disease being better in most domains and was similar in the Discovery cohort analysis. The severe motor disease, poor psychological well-being and poor sleep (3) cluster was similar in the two cohorts and showed a more severe form of PD, especially in non-tremor motor features particularly bradykinesia and postural scores, worse psychological well-being and poor sleep and excessive daytime somnolence. The slow motor progression (4) cluster had severe tremor with unilateral disease and was similar in Discovery except for the fact that the non-tremor motor features were better than average in Discovery and worse than average in Tracking Parkinson’s.
Web supplementary table 3 shows the agreement between the k-means clusters in Discovery and those predicted by the Tracking Parkinson’s discriminant model. This reveals an overall agreement of 67.9% and a kappa value of 0.58 (95% CI 0.54 to 0.61) indicating moderate to substantial agreement.13 The major inconsistency comes in the mild motor and non-motor disease (2) cluster where 110 (34.5%) individuals are wrongly predicted to be in the fast motor progression (1) cluster.
Clinical and demographic correlates of the clusters
The focus for the rest of this paper is on the clusters predicted from the larger Tracking Parkinson’s model and applied to the Discovery cohort because future patients would be classified from their baseline measurements into predicted clusters. We found a modest difference in disease duration since diagnosis (maximum average difference 3.5 months) between the clusters in both cohorts (table 3) but did not regard this as being clinically important in terms of explaining differences in phenotype. There was evidence of differences in gender, age at diagnosis, motor phenotype, Hoehn and Yahr stage, and medication use at baseline across the four clusters in both cohorts (p<0.001 in all variables) (see table 3). The mild motor and non-motor disease (2) cluster had the highest proportion of women and youngest age at diagnosis, while the fast motor progression(1) cluster had the highest age at diagnosis. The severe motor disease, poor psychological well-being and poor sleep (3) cluster had the highest proportion with the postural instability gait difficulty (PIGD) phenotype while the slow motor progression (4) cluster had the highest proportion of tremor-dominant disease at baseline. The LEDD at baseline was highest in the severe motor disease, poor psychological well-being and poor sleep (3) cluster, which also had the smallest proportion of untreated patients.
Within the Tracking Parkinson’s cohort, the L-dopa challenge was completed by 1021 (77.8%) out of 1313 patients who have had their 24-month visit. In the Discovery cohort, only 273 (35.5%) out of 770 patients completed the 18-month L-dopa challenge indicating a lack of power and potential selection bias in this data set. The mean percentage decrease in UPDRS III comparing pre with post challenge was greater in Tracking Parkinson’s than in Discovery (32.1% vs 23.6%). The change was highest in the mild motor and non-motor disease (2) cluster and slightly lower than average in the slow motor progression (4) cluster within both cohorts. There was strong evidence of a difference in response to L-dopa across the clusters in Tracking Parkinson’s (p=0.002), but not so strong in the smaller sample and potentially biased Discovery cohort (p=0.06).
Comparison of prognosis by clusters between Tracking Parkinson’s and Discovery
In Tracking Parkinson’s, 1421 (88.8%), 1154 (72.1%) and 204 (12.7%) have had 18-month, 36-month and 54-month assessment visits, respectively, with a median follow-up time of 3.0 years (IQR 1.8–3.2). In Discovery, 770 (81.6%), 490 (51.9%), 230 (24.4%) and 39 (4.1%) have had 18-month, 36-month, 54-month and 72-month assessment visits, respectively, with a median follow-up time of 3.0 years (IQR 1.5–4.4). All of the progression rates by cluster and cohort are shown in table 4. There was evidence of a significant difference in progression rates for the UPDRS III across clusters in Tracking Parkinson’s (p<0.001) and in Discovery (p=0.007). The same pattern of was seen in both cohorts. The fast motor progression (1) cluster had the fastest progression: 3.2 UPDRS III points per year in Tracking Parkinson’s and 2.8 points per year in Discovery, while the slow motor progression (4) cluster had the slowest motor progression, although the estimate for progression in the slow motor progression (4) cluster was markedly slower in Tracking Parkinson’s (0.6 UPDRS III points per year) than Discovery (1.6 points per year) and with hardly any overlap across the 95% CIs (see figure 3). Repeating the analysis using the UPDRS part II score (web supplementary figure 2), we found the same clusters in Tracking Parkinson’s with the fastest and slowest progression; however, in the Discovery cohort, we found no evidence of difference in progression rates.
Cognitive decline, as measured by the MoCA, was fastest in the severe motor disease, poor psychological well-being and poor sleep (3) cluster in both cohorts (figure 4), but overall there was no significant difference in cognitive progression rates across clusters (Tracking Parkinson’s p=0.04; Discovery p=0.41).
Repeating our analyses using pattern-mixture models showed little difference in progression rate estimates (table 4), providing evidence that withdrawal has not biased our estimates. Adjusting the slope and intercept for baseline LEDD in our UPDRS III models, an attempt to see the effect that treatment had on progression rates, we found very similar rates (results not included).
No significant differences in motor UPDRS III progression were found between conventional tremor, PIGD and mixed clusters (web supplementary figure 3), although in Tracking (p<0.001), there was some evidence to suggest that those in the PIGD cluster have faster cognitive decline (web supplementary figure 4).
Our analyses identified four phenotypic subgroups among patients recently diagnosed with PD: (1) fast motor progression with symmetrical motor disease, poor olfaction, cognition and postural hypotension; (2) mild motor and non-motor disease with intermediate motor progression; (3) severe motor disease (prominent bradykinesia/postural impairment), poor psychological well-being (mood, apathy, pain, fatigue) and poor sleep with intermediate motor progression; (4) slow motor progression with tremor-dominant, unilateral disease. The kappa statistic showed that the clusters calculated within the Discovery cohort were relatively stable compared with those predicted using the Tracking Parkinson’s cohort model even though some baseline characteristics differed significantly between the cohorts.
Our analysis has taken into account the five points recommended for studies using cluster analysis.6 (1) Our sample of patients with PD were all recently diagnosed and hence had more similar disease duration than other cross-sectional studies. (2) We used two sample populations of patients who have been well phenotyped across a wide a range of important domains. (3) We have taken into account the limitations of k-means by (a) using hierarchical clustering prior to the analysis to determine the number of clusters, (b) standardising all the variables so they have equal weighting and (c) using 500 random starts to prevent the selection of local optima. (4) We have looked at independent associations between our clusters with clinically meaningful variables such as response to L-dopa challenge and disease progression. (5) We have validated our approach using a second cohort collected using a nearly identical methodology.
Our previous paper reported five clusters in the Discovery cohort. The clusters in our new analysis are qualitatively similar although two of the original clusters (a) poor psychological well-being, rapid eye movement sleep behaviour disorder and sleep, and (b) severe motor and non-motor disease with poor psychological well-being have now merged into a single cluster (cluster 3). Our clusters are consistent with other similar studies in PD, which generally find a group with milder symptoms and a younger age at onset3 5 15–22 (our second cluster). Three studies also found a tremor-dominant group17 18 20 (our fourth cluster) and most studies find a group with more severe symptoms or rapid disease progression3 4 15–22 (our first and third clusters). Importantly, we have now demonstrated different rates of motor progression across our baseline-defined clusters, with a mean annual deterioration in UPDRS III scores varying significantly from 0.6 to 3.2 points (in Tracking Parkinson’s) between those with slowest and fastest progression. Interestingly, we also found, in keeping with another study3, that poor cognition and postural hypotension predicted faster motor progression.
What is the clinical relevance of these findings?
Stratification, or defining different subcategories, is key to better understanding disease mechanisms and kinetics in PD, predicting disease course and ultimately delivering personalised management strategies. The emerging focus of PD trial design is on early motor disease, including novel immunomodulatory therapies that require intensive and invasive monitoring. Traditionally, little account has been taken of disease heterogeneity in early PD when selecting patients for randomised, placebo-controlled studies. However, our results show that baseline phenotype is associated with variable rates of subsequent motor progression, although confounded by potential medication response effects. The mean difference in UPDRS motor scores between the fastest and slowest motor progression subtypes in Tracking Parkinson’s was 2.6 points, equivalent to the primary hierarchical endpoint of several studies, including the ADAGIO study.23 Recruitment without taking into account heterogeneity and potential sources of recruitment bias may lead to less efficient designs, though there are various trade-offs between the cost of selecting patient subgroups, the sample size required for demonstrating a reduction in disease progression and increasing the length of follow-up.
Strengths and limitations
This study has used two of the largest PD incidence cohorts worldwide. In addition, the methods were designed collaboratively with similar variables being collected using almost identical inclusion criteria, though the source of recruitment differed. While this may impact on the frequency of the subtypes of PD, it should not influence the consistency of the clusters or the within-cluster progression rates. It is possible that some patients will turn out to have other parkinsonian disorders, such as multiple system atrophy, despite only including those with a diagnostic probability of ≥90% at the latest visit, especially in the fast progression cluster. We had little missing data and we used imputation methods to reduce any bias. The association we found with levodopa response (which was analysed as relative change) may simply reflect the fact that the second cluster has milder disease, and since our estimates of motor function is carried out in the ‘on’ state, we would expect those with mild motor disease to be those who respond well to the medication. We are also limited by the proportion who completed the L-dopa challenge although the vast majority of those missing this data in the Tracking Parkinson’s cohort is due to them either not taking levodopa as part of their normal medication regime or not reaching the 24-month time point. Levodopa response is also composed of both short-duration and long-duration responses.24 Our levodopa challenge only measures the short-term response and our pre-dose scores are largely determined by the long-duration response. Also, the long-duration response is typically much larger than the short-duration response.
We used non-statistical criteria to help judge the best number of clusters, as the optimal number of clusters can differ depending on which statistic is the primary focus. Each cohort has its strengths and weaknesses. Tracking Parkinson’s is larger with more centres from a wider area of the UK population. The Discovery cohort used fewer clinicians to assess participants and had lower inter-rater variability. Discovery had more disabling disease and slightly worse cognitive function at baseline. Each cohort may have a slightly different mix of patients, but this will also occur in patients recruited for different clinical trials.
The major limitation in this analysis is that most of our data are restricted to the first 3 years of follow-up due to the studies being ongoing and patients not yet reaching 4.5 years of follow-up. We suspect this has reduced our power to detect differences between the clusters. The associations we saw between clusters and progression rates could be due to non-linearity of growth rates; however, non-linearity cannot be tested until the vast majority of patients have four or more observations.
We took a pragmatic perspective where disease progression estimates reflected both pathophysiology and treatment effects. An alternative approach is measurement of the untreated (underlying) progression of subtypes, which reduces potential confounding effects of dopaminergic therapy in modifying disease progression, and has been applied elsewhere.25 26 Accordingly the generalisability of our method may be limited if different treatment regimes were used in other clinical settings.
Neuropathological characterisation of the patient clusters at post mortem would help to address the question of the distribution and loads of α-synuclein, tau, vascular and amyloid pathology in driving both baseline clinical phenotype and subsequent motor and cognitive progression throughout the disease evolution of PD.27 It is intriguing to speculate whether patients in cluster 1, who have the fastest motor progression, prominent baseline non-motor symptoms, more symmetrical disease and a poor levodopa response, are defined by prominent cerebrovascular or amyloid pathologies. Clear delineation of patient subtypes is likely to introduce other potential therapeutic targets and lifestyle interventions to the clinical trials arena that look beyond pure α-synuclein-driven pathology. To date, a total of 345 subjects with PD (195 Tracking Parkinson’s, 150 Discovery cohort) have signed up to the nationally funded Parkinson’s UK Brain Donation programme, with six brains now available for neuropathological characterisation to begin to address these issues.
We have found four clusters that replicate across two large independent cohorts of newly diagnosed patients with PD and which are associated with different responses to levodopa and motor progression rates. Future work should examine the reasons for these differences, and with longer follow-up and using growth mixture models, we should be able to identify more easily patient groups with different progression rates and how this relates to their baseline characteristics. This will also allow us to determine the robustness and clinical use of stratifying patients early in the disease course with better defined endpoints.
We would like to thank the anonymous reviewers for their useful comments and all patients who have participated in this study.
DGG and MTMH are joint senior authors.
DGG and MTMH contributed equally.
Contributors ML: analysis and interpretation of the data, writing of the manuscript. YB-S: study concept and design, analysis and interpretation of the data, revision of the manuscript. MTY: analysis and interpretation of the data, revision of the manuscript. FB: acquisition of data, revision of the manuscript. TRB: acquisition of data, revision of the manuscript. JCK: acquisition of data, revision of the manuscript. DMAS: acquisition of data, revision of the manuscript. NM: acquisition of data, revision of the manuscript. KAG: study concept and design, acquisition of data, revision of the manuscript. NB: study concept and design, acquisition of data, revision of the manuscript. RAB: study concept and design, acquisition of data, revision of the manuscript. NW: study concept and design, revision of the manuscript. DJB: study concept and design, acquisition of data, revision of the manuscript. TF: study concept and design, acquisition of data, revision of the manuscript. HRM: study concept and design, acquisition of data, revision of the manuscript. NWW: study concept and design, revision of the manuscript. DGG: study concept and design, acquisition of data, revision of the manuscript. MT-MH: study concept and design, acquisition of data, revision of the manuscript.
Funding The Oxford Discovery study was funded by the Monument Trust Discovery Award from Parkinson’s UK and supported by the National Institute for Health Research (NIHR) Oxford Biomedical Research Centre based at Oxford University Hospitals NHS Trust and University of Oxford, and the NIHR Clinical Research Network: Thames Valley and South Midlands. The Tracking Parkinson’s study was funded by Parkinson’s UK and supported by the National Institute for Health Research (NIHR) DeNDRoN network, the NIHR Newcastle Biomedical Research Unit based at Newcastle upon Tyne Hospitals NHS Foundation Trust and Newcastle University, and the NIHR-funded Biomedical Research Centre in Cambridge.
Disclaimer The views expressed are those of the authors and not necessarily those of the NHS, the NIHR or the Department of Health.
Competing interests NB has received payment for advisory board attendance from UCB, Teva Lundbeck, Britannia, GSK, Boehringer and honoraria from UCB Pharma, GE Healthcare, Lily Pharma and Medtronic. He has received research grant support from GE Healthcare, Wellcome Trust, MRC and Parkinson’s UK and royalties from Wiley. RAB received grants from Parkinson’s UK, NIHR, Cure Parkinson’s Trust, Evelyn Trust, Rosetrees Trust, MRC and EU along with payment for advisory board attendance from Oxford Biomedica and LCT, and honoraria from Wiley and Springer. DJB received grants from NIHR, Wellcome Trust, GlaxoSmithKline Ltd, Parkinson’s UK and Michael J Fox Foundation. TF received payment for advisory board meetings for Abbvie and Oxford Biomedica, and honoraria for presentations at meetings sponsored by Medtronic, St Jude Medical, Britannia and Teva pharmaceuticals. HRM has received grants from Parkinson’s UK, grants from Medical Research Council UK, during the conduct of the study; grants from Welsh Assembly Government, personal fees from Teva, personal fees from Abbvie, personal fees from Teva, personal fees from UCB, personal fees from Boerhinger-Ingelheim, personal fees from GSK, non-financial support from Teva, grants from Ipsen Fund, non-financial support from Medtronic, grants from MNDA, grants from PSP Association, grants from CBD Solutions, grants from Drake Foundation and personal fees from Acorda, outside the submitted work. In addition, HRM has a patent and is a co-applicant on a patent application related to C9ORF72—Method for diagnosing a neurodegenerative disease (PCT/GB2012/052140) pending. DGG received payment for advisory board attendance from AbbVie and honoraria from UCB Pharma, GE Healthcare and Acorda.
Patient consent Not required.
Provenance and peer review Not commissioned; externally peer reviewed.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.