Article Text

Download PDFPDF

Dementia prevention: the Mendelian randomisation perspective
  1. Emma Louise Anderson1,
  2. Neil M Davies2,3,
  3. Roxanna Korologou-Linden4,
  4. Mika Kivimäki1
  1. 1 Mental Health of Older People, Division of Psychiatry, University College London, London, UK
  2. 2 Epidemiology & Applied Clinical Research, Division of Psychiatry, University College London, London, UK
  3. 3 Department of Statistical Sciences, University College London, London, UK
  4. 4 Integrative Epidemiology Unit, Bristol Medical School, University of Bristol, Bristol, UK
  1. Correspondence to Dr Emma Louise Anderson, Mental Health of Older People, Division of Psychiatry, University College London, London, W1T 7NF, UK; emma.anderson{at}


Understanding the causes of Alzheimer’s disease and related dementias remains a challenge. Observational studies investigating dementia risk factors are limited by the pervasive issues of confounding, reverse causation and selection biases. Conducting randomised controlled trials for dementia prevention is often impractical due to the long prodromal phase and the inability to randomise many potential risk factors. In this essay, we introduce Mendelian randomisation as an alternative approach to examine factors that may prevent or delay Alzheimer’s disease. Mendelian randomisation is a causal inference method that has successfully identified risk factors and treatments in various other fields. However, applying this method to dementia risk factors has yielded unexpected findings. Here, we consider five potential explanations and provide recommendations to enhance causal inference from Mendelian randomisation studies on dementia. By employing these strategies, we can better understand factors affecting dementia risk.


Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.


Despite significant investments in research of Alzheimer’s disease and related dementias over the past decades, the development of effective treatments has been challenging and the underlying causes of these diseases remain elusive. Many hypothesised risk factors and biomarkers show correlations with the risk of Alzheimer’s disease in observational studies. However, most of the randomised controlled trials (RCTs) developed to halt or delay Alzheimer’s disease progression, often motivated by this observational evidence, have been unsuccessful.1 Dementia prevention RCTs are not always feasible. The extended prodromal phase of dementia, which can last up to 20–30 years, poses challenges in randomising participants to interventions before the onset of significant neurodegeneration and tracking them until a clinical diagnosis of dementia is made. It is also not possible to randomise many exposures (eg, air pollution and educational attainment). Thus, at present, we only have a group of risk factors that associate with dementia; some of which may cause dementia, some of which may be caused by dementia or genetic liability to dementia (eg, APOE), and some which may spuriously associate with dementia due to various biases. In this essay, we introduce Mendelian randomisation (MR) as an alternative approach to examine factors that may prevent or delay dementia, discuss the strengths and weaknesses of MR for dementia research, and recommend tangible steps to improve our ability to make causal inferences from MR studies of dementia.

Mendelian randomisation

Observational epidemiology often faces challenges of confounding and reverse causation, which hinder our ability to draw unbiased inferences about whether an exposure causes an outcome. Many instances exist where apparently robust associations between various exposures and diseases in observational studies have failed to deliver the expected health benefits in RCTs (eg, beta-carotene, vitamins and hormone replacement therapy in relation to cardiovascular disease).2

MR is a technique that uses genetic variants as instrumental variables, to obtain less biased estimates of the causal effects of an exposure on an outcome, including both direction and magnitude. MR exploits the random inheritance of genetic variants from parents to offspring that occurs during meiosis. This random inheritance means variants should not correlate with potential confounders or be influenced by subsequent disease. Three assumptions underlie MR and must be satisfied for the causal estimation to be valid (figure 1). First, (IV1) the genetic variants are robustly associated with the exposure. Second, (IV2) there is no confounding of the relationship between the genetic variant and the outcome (eg, by population stratification). Third, (IV3) the genetic variants should not exert effects on the outcome that do not operate through the exposure (ie, there is no horizontal pleiotropy, explained in detail below). The ability to make a causal conclusion from an MR analysis depends on the plausibility of these assumptions. IV1 is the only assumption that can be empirically tested. The remaining two assumptions cannot be tested, but can be falsified. MR can be performed using individual-level data (sometimes called ‘one-sample MR’), where the genotype, risk factor and disease measurements are taken from the same sample, or using summary-level data (sometimes called ‘two-sample MR’), relying on summary statistics from separate genome-wide association studies (GWASs) for the exposure and the outcome.

Figure 1

An illustration of Mendelian randomisation including the three core underlying assumptions. G represents a genetic variant, X represents an exposure, Y represents an outcome and C represents confounders. IV1 assumes genetic variants are robustly associated with the exposure; IV2 assumes no confounding of the relationship between the genetic variant and the outcome; and IV3 assumes no effect of the genetic variants on the outcome that do not go through the exposure (ie, no horizontal pleiotropy).

MR has already been successfully applied to several other diseases to: (a) confirm known epidemiological findings (eg, the effects of smoking on lung cancer),3 (b) highlight novel causal risk factors or biomarkers for disease (eg, using hypothesis-free phenome-wide approaches for Alzheimer’s disease4), (c) identify drugs for repurposing to reduce risk of diseases other than those for which they were originally approved (eg, interleukin 6 receptor inhibitors to treat severe COVID-19 infection)5 6 and (d) challenge observational associations which were previously believed to be causal, where emerging evidence from both MR and RCTs has shown them not to be (eg, the effect of C reactive protein on coronary heart disease7 and selenium supplementation on prostate cancer).8

Analogies between MR and RCTs

MR and RCTs share several similarities (figure 2). The random segregation of alleles at conception, which separates a group with the ‘risk’ (or exposure increasing) allele from another group with the ‘non-risk’ (or reference) allele, is analogous to the randomisation of treatment and placebo in RCTs. The randomisation aims to ensure that, on average, confounders are balanced between the groups, allowing for meaningful outcome comparison. Clearly, MR cannot and should not replace RCTs for reasons which have previously been discussed by Swanson et al.9 However, MR can offer insights into the potential success of an RCT; drugs with genetically supported targets are more than twice as likely to be approved.10 MR is also not constrained by the logistical challenges that often accompany, for example, long-term lifestyle RCTs. This makes it valuable in scenarios where conducting RCTs is impractical. As with any other study, however, the validity of an MR study depends on how rigorously it has been conducted. Box 1 provides a set of questions that are useful for evaluating the robustness of any given MR study.

Box 1

Questions to ask when assessing if a Mendelian randomisation (MR) study is robust

IV1: genetic variants are robustly associated with the exposure

  1. Do the instruments have biological plausibility? If this is not known, are they robustly associated with the exposure in several independent cohorts?

  2. Are the instruments associated with the exposure at the level of genome-wide significance, and are they independent (i.e., uncorrelated)?

  3. Are the instruments strong (i.e., is the F statistic above 10, and what is the R2)?

  4. Are the instrument weights taken from a discovery genome-wide association study (GWAS) and therefore at risk of bias due to Winner’s curse? Ideally, they should be taken from a replication GWAS or a meta-analysis of both.

  5. Are any of the instruments outliers and having undue influence on the estimate? This can be evaluated using leave-one-out plots or radial MR.

  6. Has a test for genetic colocalisation been conducted to test whether there are two distinct variants for the exposure and the outcome?

  7. If a one-sample MR, were the weights used in the MR from an external dataset (i.e., was the GWAS conducted in a different sample to the MR analysis)?

  8. If a two-sample MR, do the two GWAS datasets capture the same underlying population? Is there any sample overlap between the two GWAS (for example, do they both contain UK Biobank)? Note that bias due to sample overlap is less of a concern with strong instruments.

  9. Are your instruments truly instruments for your exposure and not your outcome (i.e., do they explain more variance in your exposure than your outcome)? This can be examined with Steiger testing.

IV2: no confounding of the genetic variant–outcome relationship

  1. Has the MR been conducted in a homogeneous population (e.g., just one ancestral group) to avoid bias due to population stratification?

  2. Have principal components been adjusted for to account for population stratification?

  3. Is assortative mating likely to cause bias for this exposure and outcome? If so, has a method that reduces bias from assortative mating been used (e.g., within-family MR)?

IV3: no effect of the genetic variant on the outcome that does not go through the exposure (i.e., no horizontal pleiotropy)

  1. Is there any evidence of heterogeneity across genetic variants (e.g., assessed using I2, Q statistic or E-value)?

  2. Have pleiotropy robust sensitivity analyses been conducted (e.g., MR-Egger, weighted median and mode, radial MR)?

  3. If MR-Egger has been conducted, is there evidence that the intercept differs from zero (i.e., evidence of horizontal pleiotropy)?

  4. If there is evidence of pleiotropy, have methods like multivariable MR been used to adjust for the pleiotropic pathways?

Other considerations

  1. Is the MR study well powered?

  2. Has multiple testing been accounted for (e.g., using a false discovery rate or Bonferroni correction)?

Figure 2

A comparison of Mendelian randomisation and randomised controlled trials.

WHO guidelines and evidence from MR studies of dementia

Based on the quality of available observational and interventional evidence, the WHO provides ‘strong’ recommendations for interventions related to physical activity, smoking and dietary intake for dementia prevention.11 Additionally, the WHO gives ‘conditional’ recommendations for interventions targeting hypertension, diabetes, high alcohol intake and adiposity. In their proposed actions for member states, the WHO emphasises the importance of formal education, as low education is considered a potential modifiable risk factor for dementia.12 Table 1 presents a summary of MR findings for each of these risk factors (for the purpose of this discussion, MR studies on dietary intake were not included due to the broad and heterogeneous nature of the literature in this area). On the basis of WHO recommendations—that is, that targeting these risk factors should reduce dementia risk—it would be expected that a significant proportion of the published MR studies would fall into the red ‘harmful’ column. However, with the exception of low education, most MR studies do not support a causal effect for any of the WHO risk factors. Some MR evidence even goes in the opposite direction to what we would expect (figure 2). For example, there is evidence from MR studies suggesting that higher levels of physical activity may increase the risk of dementia, contrary to the expected protective effect. Additionally, MR evidence has suggested a potential protective effect of smoking on dementia risk, which contradicts the well-established harmful effects of smoking on overall health. This discrepancy is surprising, because for many other disease outcomes, MR studies examining these same exposures using the same genetic instruments have produced expected associations that align with RCT evidence. For example, MR studies support a causal effect of higher systolic blood pressure on greater cardiovascular disease risk,13 and of smoking on higher cancer risk.3 Here, we consider five potential explanations for these unexpected findings and propose strategies to enhance the reliability of MR studies on dementia risk factors moving forwards.

Table 1

Summary of findings from Mendelian randomisation (MR) studies on Alzheimer’s disease risk factors recommended by the WHO

Addressing sources of bias in MR studies on dementia

Heterogeneity in the outcome

Diagnosing the cause of dementia in living patients is challenging and misdiagnosis rates are high. The current gold-standard method for diagnosis is postmortem autopsy. Studies have shown that between 15% and 30% of patients diagnosed with Alzheimer’s disease do not have sufficient Alzheimer’s pathology at autopsy to account for the presence of dementia.14 Over 70% of patients receiving a clinical diagnosis of Alzheimer’s disease will also show significant additional pathology on autopsy (eg, cerebrovascular pathology or Lewy bodies), suggesting that most dementias cases are actually mixed.15 This is problematic for MR, as current GWASs are based on cases in whom diagnosis of a specific dementia subtype has been made largely based on clinical signs and symptoms in living patients16 . For example, the published Alzheimer’s disease GWASs, which are used for most two-sample MR studies, include patients who were clinically diagnosed using a variety of methods (primarily neuroimaging and cognitive test batteries) that are notoriously unreliable for distinguishing underlying causes of dementia. Attempts were made by some cohorts to reduce heterogeneity by excluding patients with a history of cardiovascular disease (in whom cerebrovascular pathology is more likely). Thus, such GWASs are likely to be enriched for Alzheimer’s disease pathology but will inevitably comprise a large proportion of cases with mixed pathology. Indeed, the Alzheimer’s disease polygenic risk score generated from the Alzheimer’s disease GWAS summary statistics has been shown to be predictive for Alzheimer’s disease, vascular dementia and all-cause dementia.17 More recently, the inherent heterogeneity that is present across cases when including clinically diagnosed (rather than neuropathologically diagnosed) patients is exacerbated in GWASs that additionally include ‘Alzheimer’s disease by-proxy’ cases. These are UK Biobank participants who have not themselves been diagnosed with Alzheimer’s disease, but have reported either of their parents to have had ‘dementia’. The adverse consequences of heterogeneity in GWAS have been described in detail by Escott-Price and Hardy.18 Briefly, imprecise diagnosis and the resulting heterogeneity in the disease outcome are problematic because risk factors and causal pathways may differ for each dementia subtype. If diverse subtypes are grouped into one outcome, the direction and magnitude of the estimated causal effect for any given risk factor on dementia will depend on the relative proportions of dementia subtypes included in the study sample. For example, it is plausible that high blood pressure causes vascular dementia, but not necessarily Alzheimer’s disease. This might explain why we generally obtain null findings for blood pressure MR studies, because the outcome GWAS is enriched for Alzheimer’s pathology. For risk factors that affect multiple dementia subtypes in the same direction, heterogeneity is less likely to cause bias. In addition, heterogeneity in the GWAS samples means that genetic markers with small effect sizes that are specific to a single dementia subtype will be harder to detect than variants which affect all causes of dementia. Thus, we may currently be examining risk factors for the most common ‘general’ dementia pathways in MR studies, rather than risk factors for any specific cause of dementia.

Overcoming this issue at present requires a trade-off between statistical power and precision in the outcome. Performing individual-level data MR in samples with better characterisation of the outcome (eg, in postmortem samples or samples with more detailed imaging) would enable better understanding of risk factors for specific dementia subtypes. However, the availability of these data remains limited compared with clinical diagnoses, rendering sample sizes much smaller and critically imprecise. That said, efforts are currently underway to increase genotyping of brain bank tissue samples to facilitate the examination of any bias introduced by this heterogeneity on both GWAS and MR findings.19

Survival bias

Selection bias due to censoring by death (or survival bias) can induce spurious exposure–outcome associations that are not due to the causal effect of the exposure on the outcome. It is sometimes referred to as collider bias, because the bias arises from conditioning on a collider (ie, a common effect, which in this case is survival/participation in study, figure 3) of both the exposure (dementia risk factor, for example, smoking) and the outcome (dementia). Indeed, many people do not live old enough to know whether they would have received a dementia diagnosis or not, making dementia risk factor studies prone to survival bias. There are currently no empirical studies comparing the relative impact of survival bias for different diseases, but the average age at diagnosis for dementia is over 80 years, which is older than the average life expectancy in the USA. The average age at first cardiovascular disease event is around 65 years. Thus, the impact of survival bias is likely to be greater for dementia studies than it is for studies of other diseases. It is also likely that preclinical dementia affects recruitment into studies.20

Figure 3

Directed acyclic graph representing survival bias. X is an exposure or risk factor; Y is an outcome.

A known limitation of MR is that it can be affected by collider bias.21 The intuition is simple; for example, people with variants predisposing them to higher levels of smoking are likely to die prematurely, before developing Alzheimer’s disease. Thus, people with variants for heavier smoking will appear to have lower risk of disease, and indeed several MR studies to date report this direction of effect (table 1). Most MR dementia studies published to date have not examined the effect of survival bias, despite the availability of several methods to interrogate this in an MR framework, which we will summarise briefly here.

In an individual-level data MR setting, at least the following selection/survival bias checks can be applied:

  1. Independent (ie, uncorrelated) genetic variants, selected from a GWAS of a given risk factor, are typically used as instruments in an MR study. Check whether the independent genetic variants identified by the GWAS remain uncorrelated within the sample selected for MR analysis. If selection bias is present, correlations between otherwise independent variants may be found.

  2. In an unselected sample, there should be no associations observed between the genetic instruments and age, sex or other predictors of study participation within the selected sample.22 Studies have previously reported many loci to be spuriously associated with sex (a non-heritable trait) in the presence of sex-differential participation bias.23 In cross-sectional studies, check whether genetic instruments are associated with these variables. Any associations observed indicate selection bias may affect the results.

  3. In longitudinal studies, check whether genetic instruments are associated with study participation across time, or with survival if those data are available. Any associations observed indicate selection bias may affect the results.20

To address any potential selection bias, correction methods are now available. In an individual-level data setting, inverse probability weighting can be applied to reweight selected samples back to a more representative sampling population, thus reducing selection bias. This has been previously applied to the highly selected UK Biobank sample, which was reweighted to UK census data and shown to reduce selection bias in risk factor—outcome associations by around 78%. These inverse probability weights are publicly available.24

In a summary data MR setting (ie, when using GWAS summary statistics for the exposure and the outcome, rather than individual-level data), relatively simple simulations can be performed to gauge the extent to which observational and MR associations could be induced artificially by survival bias. It can then be considered whether the estimated magnitude of bias is large enough to fully or partially explain the causal effect observed between an exposure and an outcome in the MR analysis. This has been done previously for the causal effect of body mass index on Parkinson’s disease risk.25 The methods for conducting those simulations are described in detail in that paper.

Horizontal pleiotropy

MR estimates can be biased by horizontal pleiotropy (IV3 in figure 1).26 Horizontal pleiotropy occurs when there is at least one causal pathway from the genetic variant to the outcome that does not go via the risk factor of interest. This happens because genes often have multiple functions and can simultaneously affect multiple traits. For example, the APOE gene is pleiotropic and has known effects on multiple diseases including Alzheimer’s disease, cardiovascular disease and leprosy.27 A plethora of MR methods now exist to identify and correct for horizontal pleiotropy.26 MR studies can seek to interrogate whether pleiotropy is a likely source of bias by reporting Cochran’s Q heterogeneity statistics, the MR-Egger intercept and any pleiotropy-adjusted causal effect estimates. Guidance for doing so can be found in the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE)-MR guidelines.28 In addition to checking for horizontal pleiotropy, it is advisable to assess genetic colocalisation26 Colocalisation is a sensitivity analysis aimed at distinguishing between two scenarios: (1) the causal variant for the exposure and the outcome are shared (ie, colocalised) and (2) the causal variant for the exposure is distinct from the causal variant for the outcome, while being at the same locus (no colocalisation). The first scenario is a necessary, but not sufficient condition for establishing a causal relationship. Although colocalisation helps to rule out the possibility that there are two distinct causal variants in the same region, it cannot exclude the presence of horizontal pleiotropy.

Statistical power

MR studies typically have much lower statistical power than other observational study designs of a similar sample size. For some risk factors, studies report that there is no evidence of a causal effect on dementia risk, but with wide CIs which are consistent with a large causal effect in either direction. It is important to distinguish between risk factors for which there remains uncertainty about their causal effects and for which we may need larger GWAS (or power-boosting GWAS and MR methods)—that is, absence of evidence—versus those risk factors for which precisely estimated null effects suggest that they are unlikely to causally affect dementia risk—that is, evidence of absence. For example, for physical activity, some CIs are very wide and cannot exclude large effects in either direction (eg, Zhang et al 29, OR: 0.62, 95% CI: 0.17 to 2.32; Desai et al 30, OR: 0.48, 95% CI: 0.10 to 2.30). Statistical power is particularly critical for detecting small effect sizes which may not be important at an individual level, but have wide public health benefits at a population level.31

Researchers should take into account the precision with which casual effects are estimated, and refrain from concluding that there is no evidence of a causal effect for a particular risk factor when CIs are wide. Another method for increasing statistical power is to use continuous proxies of preclinical disease, such as cerebrospinal fluid levels of beta-amyloid and tau, rather than the binary case–control outcomes. GWASs of these biomarkers are increasing in size and will hopefully continue to do so.

Timing of causal effects

Some causal effects may be specific to a particular life stage. One example involves blood pressure. High blood pressure in midlife is known to accelerate atherosclerosis and arterial stiffening increasing risk of non-fatal strokes, microbleeds and infarcts in the brain, which can all cause dementia. For this reason, lowering blood pressure in midlife could plausibly reduce dementia risk. In old age when atherosclerosis in cerebral arteries is common, any benefits from blood pressure lowering may be smaller. This is because low blood pressure can lead to insufficient cerebral perfusion and hypoxia in parts of the ageing brain, potentially contributing to decrease in brain volume.

At least two MR methods can be used to investigate timing of causal effects. First, using age-stratified GWAS for the risk factor to estimate the relative effects of the risk factor at different points in the life course. This is dependent on the genetic aetiology of the risk factor being sufficiently variable to allow the identification of the causal effects of risk factors at different points in the life course. Recently, this method was applied to identify causal effects of childhood body mass index on health outcomes, independently of adult body mass index.32 Second, we can investigate the association between genetic liability to the disease of interest, such as dementia, in people who have not been clinically diagnosed with the disease, and phenotypes and risk factors across the life course. This phenome-wide approach may allow identification of risk factors at the earliest manifestations of disease, and when these occur.4


No single study design can claim to reveal the absolute truth, and this applies to MR as well as any other approach. To reliably identify modifiable risk factors for dementia, it is imperative to triangulate evidence from multiple study designs. MR offers a promising tool to address the limitations of observational dementia epidemiology and the practical constraints of conducting RCTs for dementia prevention. While MR studies have their own biases, many of these biases have been recognised and are increasingly well understood. There are now guidelines (STROBE-MR) for designing, conducting and interpreting a robust MR study, which should be adhered to. To further advance MR studies on dementia risk factors going forward, concerted efforts are needed to scrutinise and account for potential distortions in MR findings. Fortunately, as described here, there are now a variety of methods available to accomplish this task.

Ethics statements

Patient consent for publication

Ethics approval

Not applicable.



  • Contributors ELA, NMD and MK conceptualised the idea for the manuscript. ELA wrote the first draft. NMD, MK and RK-L provided critical feedback and contributed to subsequent drafts of the manuscript.

  • Funding UK Research and Innovation (MR/W011581/1).

  • Competing interests None declared.

  • Provenance and peer review Not commissioned; externally peer reviewed.