Neurogenetics: single gene disorders
- Division of Neurology, Cedars-Sinai Medical Center, Departments of Medicine and Neurobiology, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA
- Correspondence to: Dr S-M Pulst 8631 West Third Street, Suite 1145, Los Angeles, CA 90048, USA;
The advent of molecular biology has changed the way in which neurological illnesses are classified, and the single genes causing a number of disorders have been identified. In addition, techniques such as linkage analysis and DNA sequencing have resulted in greater understanding of multi-gene diseases. This review covers some of the molecular tools and animal models used for genetic analysis and for DNA based diagnosis, and a brief survey of information available on the internet.
- ddNTPs, dideoxynucleotide triphosphates
- dNTPs, deoxynucleotide triphosphates
- PCR, polymerase chain reaction
- RFLPs, restriction fragment length polymorphisms
For centuries, neurology and the classification of neurological illnesses have relied mainly on the neurological phenotype. Subsequent classifications and subclassifications were aided by increased understanding provided by neuropathology and neurophysiology, and more recently by neuroimaging. Grouping disorders based on inheritance patterns did not necessarily result in an improved classification. Diseases such as the dominant ataxias or the dominant neuropathies share a similar inheritance pattern, but phenotypes do not necessarily breed true within families, and overlap between families is significant. We now know that mutations in several different genes can cause these phenotypes. A truly novel understanding and grouping of disorders was only possible with the advent of linkage analysis and the eventual isolation of the causative disease genes.
The first neurological illness mapped to a specific chromosome was a disorder, now known as spinocerebellar ataxia type 1. Mapping was accomplished with polymorphic protein markers.1 An inherited neuropathy, now known as CMT1B, was mapped to chromosome 1 by linkage to the Duffy blood group.2 These chromosome assignments were contemporaneous with the suggestion that restriction fragment length polymorphisms (RFLPs) could be used to establish a genetic map of the human genome.3 Another movement disorder, Huntington disease, was the first neurological illness to be genetically mapped using DNA based RFLP markers.4
In the following sections, we will highlight some of the theoretical approaches to neurogenetic diseases and introduce the reader to a small selection of the techniques used for genetic analysis and for DNA based diagnosis. We will also provide a brief survey of information available on the worldwide web that can aid the clinician in phenotype based diagnosis as well as the choice and interpretation of DNA based test results. A glossary of the terms used can be seen in box 1.
GENETIC LINKAGE ANALYSIS
Genetic linkage analysis has been one of the most important tools for the identification of disease genes. A recent brief review of these concepts for neurologists is provided elsewhere.5 Genetic linkage refers to the observation that genes that are physically close on a chromosome are inherited together. With increasing physical distance between two genes, the probability of their separation during meiotic chiasma formation increases.
By comparing the inheritance pattern of the disease phenotype with that of DNA marker alleles, a chromosomal location for the causative gene can be assigned. A marker very close to the disease gene shows no recombination with the disease trait, but markers further distant or on other chromosomes show an inheritance pattern that is completely different. This is illustrated in fig 1 for three different markers in a pedigree with autosomal dominant inheritance.
The significance of observed linkage depends on the number of meioses in which the two loci remain linked. It is intuitively obvious that the observation of linkage in four meioses is less significant than the observation of linkage in 20 meioses. A measure for the likelihood of linkage is the logarithm of the odds (lod) score. The lod score Z is the logarithm of the odds that the loci are linked divided by the odds that the loci are unlinked.6 The concept of genetic linkage can also be applied to polygenic diseases that are caused by changes in more than one gene and to complex diseases where several genetic and environmental factors interact.
LINKAGE DISEQUILIBRIUM AND ASSOCIATION
Linkage and association should not be confused with one another. Linkage relates to the physical location of genetic loci (or disease traits) and refers to their relationship. Association describes the concurrence of a specific allele with another trait, and thus refers to the relationship of alleles at a frequency greater than predicted by chance. Linkage disequilibrium refers to the occurrence of specific alleles at two loci with a frequency greater than that expected by chance.
If the alleles at locus A are a1 and a2 with frequencies of 70% and 30%, and alleles at the locus B are b1 and b2 with frequencies of 60% and 40%, the expected frequencies of haplotypes would be a1b1 = 0.42, a1b2 = 0.28, a2b1 = 0.18, and a2b2 = 0.12. Even if the two loci were closely linked, unrestricted recombination would result in allelic combinations that are close to the frequencies given above. When a particular combination occurs at a higher frequency, for example a2b2 at a frequency of 25%, this is called linkage disequilibrium, and is a powerful tool for genetic mapping. When a disease mutation arises on a founder chromosome and not much time has elapsed since the mutational event, the disease mutation will be in linkage disequilibrium with alleles from loci close to the gene. At the population level, linkage disequilibrium can be used to investigate processes such as mutation, recombination, admixture, and selection.
Box 1 Glossary of genetic terms
Allele: one of two (or more) forms of a gene; differing DNA sequence at a given locus.
Exon: coding part of a gene; those parts of a gene that are found in the mature messenger RNA (mRNA).
Genotype: genetic constitution of an individual in contrast to the visible features (phenotype). Genotype at a given autosomal locus describes the two alleles (see also fig 1).
Haplotype: string of alleles on a single chromosome.
Homozygous: presence of identical alleles at a given locus.
Heterozygous: Presence of two different alleles at a given locus.
Intron: non-coding parts of a gene; introns are located between exons and are spliced out during formation of the mature mRNA.
Polymorphism: DNA sequence variation between individuals in a given region of the genome.
Splicing: process that removes introns (non-coding parts of a gene) from transcribed RNA; can also give rise to formation of different mRNA isoforms by selective splicing of specific exons.
In order to study association, allele frequencies in unrelated cases have to be determined and compared with the allele frequencies found in controls. For association studies, it is imperative to repeat the analyses in different patient populations to minimise effects attributable to population stratification, particularly when individuals with the disease belong to a genetically distinct subset of the population. The use of parents as controls can circumvent this problem (transmission disequilibrium test). The transmission of specific parental alleles to affected offspring is scored and statistically analysed.7
Association studies may point to genetic factors involved in the pathogenesis or susceptibility of a disease. Well known examples of genetic association studies are the involvement of certain apolipoprotein E (ApoE) alleles in Alzheimer disease and of tau alleles in progressive supranuclear palsy.8,9 Association studies are very powerful for the detection of alleles that have a relatively small effect, but also require that the tested polymorphism be relatively close to the genetic variant that is responsible for the phenotypic variation. This often means that the polymorphism itself has a causative effect or is at least located within the gene of interest.
Mutations (heritable changes) and polymorphisms in DNA are the basis of genetic variation. Types of mutations range from changes of a single base pair change in a gene, to deletions or duplications of exons or entire genes. Some mutations involve large alterations of parts of a chromosome or involve an entire chromosome such as a trisomy for chromosome 21. Chromosomal abnormalities cause a significant proportion of genetic disease. They are the leading cause of pregnancy loss and mental retardation. Structural chromosomal abnormalities usually affect several genes and cause phenotypes involving malformations and dysfunction of several organ systems. With the exception of chromosomal translocations, structural chromosomal abnormalities would be a rare cause of a phenotype dominated by a dysfunction in a single neurological system without dysmorphological features or involvement of other organ systems.
Mutations that affect single genes are most commonly located in the coding region of genes, at intron-exon boundaries, or in regulatory sequences. These changes can be due to changes of single base pairs, or insertions/deletions of one or multiple base pairs. Single base pair substitutions, also called point mutations, result when a single base pair is replaced by another. Many single base pair mutations, even when located in the coding region of a gene, may not change the amino acid sequence owing to the redundancy of the genetic code. These are often located in the third position of a codon, and are called silent substitutions. Although a base pair substitution may not change the amino acid sequence, it may nevertheless cause disease by introducing cryptic splice signals.
Most amino acid changes are not pathogenic. In particular, when the amino acid change is conservative (substitution with a similar amino acid), the resulting protein may represent a normal variant and have normal function. At times, it may be difficult to distinguish a normal variant from a sequence change causing disease, especially when the function of a protein is poorly understood.
Deletions or insertions of one or several base pairs of DNA represent another mutation type. If the change involves three base pairs or a multiple thereof, amino acids are added or deleted from the protein, but the reading frame and the remainder of the protein remain intact. Deletion of a single GAG codon in torsin is sufficient to cause dystonia type I (DYT1).10 Deletions/insertions that are not a multiple of three will alter the reading frame and the resulting amino acid sequence downstream of the deletion/insertion. This change in the reading frame (frameshift mutations) usually results in a shortened polypeptide, because the frameshift will result in the recognition of a premature stop codon.
MOLECULAR GENETIC TOOLS
The array of tools and techniques used in molecular genetics could fill many book volumes.11,12 For the purpose of this review, we will focus on two basic techniques that are regularly used for DNA based diagnosis, the polymerase chain reaction (PCR) and DNA sequencing. Genetic tests are certainly not limited to the analysis of DNA; metabolic disorders have long been diagnosed by determination of enzyme activity.
Polymerase chain reaction
Molecular biology was revolutionised by the introduction of PCR.13,14 In an ingeniously simple procedure, PCR combines the sequence specificity of restriction enzymes with amplification of DNA fragments, previously only possible by cloning of restriction fragments. Sequence specificity is provided by the annealing of oligonucleotide probes complementary to the DNA sequence of interest (also called a DNA oligonucleotide primer). Amplification is achieved by repeated rounds of oligonucleotide primed DNA synthesis (fig 2). The specificity of PCR is primarily determined by the annealing temperature. When the annealing temperature is lowered, the primers will anneal to DNA sequences that are not perfectly matched, and other fragments may be amplified.
Almost all DNA based testing involves PCR. Deletion or insertion mutations can be detected as a variation in amplicon length. This approach is typically used for disorders involving DNA repeats such as the polyglutamine (polyQ) disorders, for Friedreich ataxia, or for myotonic dystrophy. PCR is also used for mutation analysis that involves sequencing of the gene of interest.
Although PCR is a highly specific and sensitive method, it has technical limitations that need to be recognised when interpreting DNA test results. Any setting that results in non-amplification of the mutant allele may result in a false negative DNA test, because amplification from the wild type allele will proceed normally. Thus, non-amplification of the mutant allele may appear as homozygosity for the wild type allele. For example, elongation of the PCR product may be impeded by secondary structure of the intervening DNA sequence or by large insertions. This is particularly relevant for mutations caused by large DNA repeats. If a mutation disrupts annealing of the oligonucleotide primer, an amplicon from the mutant allele is not generated, and thus will escape detection. Similarly, if a genomic deletion involves the entire length of DNA that is to be amplified (such as an entire exon) it will not be detected, because the homologous DNA on the other chromosome will generate an amplicon.
Several sequencing methods exist. Only the chain termination method will be further described, because it is the method currently used for automated sequencing. In this method, a DNA fragment is provided as template for the synthesis of new DNA strands using a DNA polymerase.15 The reaction is primed by a sequencing primer usually 17–22 bp in length that specifically binds to the region of interest. In addition to regular deoxynucleotide triphosphates (dNTPs), smaller amounts of dideoxynucleotide triphosphates (ddNTPs) are added to the reaction mix. Although ddNTPs are incorporated into the newly synthesised DNA chain, they cause abrupt termination of the chain due to lack of the 3′ hydroxyl group, preventing formation of the phosphodiester bond. Chain termination will occur randomly whenever a ddNTP is incorporated into the growing chain instead of dNTP. The length of the fragments can be determined by polyacrylamide gel or capillary electrophoresis, and the fragments visualised by labelling with radioisotopes or fluorophores.
Although sequence based mutation detection may appear straightforward, technical difficulties can occur. DNA structure may interfere with the abundance of a particular fragment (seen as a weak band in a radioactive gel or a decreased peak in automated sequencing). As most sequencing for mutations occurs on genomic DNA (with two alleles present except for X chromosomal sequences in males), detection of sequence differences occurs in the heterozygous state. A weak peak marking a single base mutation may be masked by the presence of a normal sized peak from the normal allele. This can be circumvented by adjustments in sequence analysis software, by sequencing in forward and reverse directions, and by use of different sequencing primers.
MOLECULAR GENETIC TESTING
DNA based diagnosis falls into two broad categories: direct mutation detection and indirect detection of a mutated gene using the analysis of polymorphic genetic markers that flank the gene of interest or lie within it. When the disease gene has been identified and the types of mutations are limited, direct mutational analysis is possible. For some diseases, however, a large number of different mutations exist without mutational "hot spots", so that indirect testing is technically more feasible than direct mutational analysis. A similar situation arises when the chromosomal location of a disease gene is known, but the gene itself has not been identified. In the above two scenarios, DNA diagnosis can be performed using polymorphic DNA markers, which are used to track the disease chromosome in a given family. With ever improving sequencing technology, even genes with multiple non-recurring mutations distributed over many exons, such as the CACNA1A gene, are now amenable to direct mutation analysis. Genetic diagnostic tests involving the sequence analysis of an entire gene (connexin 32 for CMTX, MECP2 for Rett syndrome) are now commercially available.
Direct mutation detection
For diseases caused by a specific mutation, direct detection of the mutation is simple and inexpensive. The direct genetic test more closely resembles a conventional laboratory test. A blood sample is taken from a patient and the test is used to determine whether the individual carries a specific mutation in a given gene, with a yes/no answer. Barring technical problems or a sample switch resulting in false positive results, a positive test indicates that a symptomatic individual has the disease. A positive result is independent of the accuracy of the clinical assessment.
Examples of direct mutation testing include diseases caused by DNA repeats. The repeat is amplified by PCR and the allele with the expansion can be detected as a larger than normal fragment by gel electrophoresis. The same is true for other mutations that result in deletion or additions of DNA codons. Despite the fact that all mutation detection is in the end sequence based, various technologies and strategies are employed depending on the nature and frequency of mutations, and the size of the gene to be analysed.
As with all laboratory tests, clinical judgement is the final arbiter. It is possible that a patient has symptomatic multiple sclerosis and an inherited ataxia as indicated by a DNA test for which she is pre-symptomatic. Two recent examples may illustrate this point. A 5 year old boy presented with dysarthria and a very severe hypertrophic cardiomyopathy. On molecular analysis, he was found to have a double mutation in two distinct genes, the frataxin gene and the cardiac troponin T gene.16 In a family segregating an autosomal dominant trait for tremor, one child developed a dystonic tremor. This proband was found to have Wilson disease, which obviously had significant implications for therapy.17
A negative test in a symptomatic individual needs to be interpreted, realising its limitations. Phenotype and gene test need to be correctly matched. If a patient with a dominant spinocerebellar ataxia (SCA) has a negative test for SCA1 or SCA3, this does not mean that this individual does not have a dominant SCA, because the mutation could be in one of the other SCA loci. Alternatively, the mutation could be in the respective gene, but it was not assayed by the gene test that was performed. For example, a progressive ataxia can be caused by expansion of a polyglutamine repeat in the α1A voltage dependent calcium channel subunit encoded by the CACNL1A gene on chromosome 19p. Specific missense mutations in this gene, which would not be detected by a PCR based test for CAG repeat length, can also be associated with a progressive ataxia.18,19
Direct sequence analysis is increasingly used to analyse genes for point mutations. It is particularly useful when several different mutations are clustered in one exon and can be analysed by sequencing of a single PCR product. The availability of newer sequencing technologies will undoubtedly broaden the applicability of sequence analysis to larger genes with multiple different mutations.20 Sequence analysis of entire genes, initially only available in research laboratories, is now also performed commercially.
As it is unknown where the mutation might occur, the quality of the sequence has to be high throughout the entire region that is being sequenced. Less than 100% sensitivity of mutation detection by sequencing has its causes in these technical limitations, but it can also be due to the presence of mutations outside the sequenced regions (such as intronic or promoter mutations or large scale deletions). Another cause for a false negative test may be the result of non-allelic heterogeneity (identical phenotypes caused by mutations in different genes).
The differentiation of non-disease causing DNA polymorphisms from pathogenic mutations may not be straightforward. This is illustrated by the fact that changes in the DRD2 receptor were found in patients with myoclonus dystonia, but it is not yet resolved whether these DNA changes are indeed disease causing.21 Similarly, some alleles may represent mutations associated with reduced penetrance. Examples include HD and SCA2 alleles with intermediate expansion, or Friedreich alleles associated with very late onset. Widespread application of sequence analysis to disease genes will undoubtedly result in the detection of sequence variants that at the time of detection may be of unknown biological significance.
Genetic technologies have provided the tools to create animal models resembling human diseases, even if the respective diseases do not naturally occur in these animals. In principle, these technologies can be applied to any animal, but the mouse (Mus musculus), zebra fish (Danio rerio), fruit fly (Drosophila melanogaster), and nematode (Cenorhabditis elegans) are most commonly used. For phenotypes involving cognitive dysfunction, we will increasingly see the use of the rat.22 Two main approaches are used to alter the genetic constitution of animals. The first involves insertion of a novel, often mutated gene into the germline, the second an alteration of endogenous genes by gene targeting or random mutagenesis.
Expression of foreign transgenes
Transgenes can be injected into fertilised oocytes or introduced into embryonic stem cells. The expression of transgenes in mice does not necessarily mirror the expression of a mutated allele in human disease conditions. Firstly, the transgene is expressed in addition to the two endogenous mouse alleles, often from multiple copies of the transgene. Secondly, unless the endogenous promoter is used, expression may be higher than under physiological conditions, may occur in different sets of cell types, and may not have the correct temporal profile. Even if the physiological promoter is used, the transgene construct may lack all the appropriate regulatory elements. Finally, by chance the transgene can integrate into an endogenous mouse gene and disrupt proper functioning of this gene (so-called insertional mutagenesis).
Despite these shortcomings, expression of transgenes with mutated human cDNAs can replicate important morphological and functional aspects of human movement disorders as shown by animal models of polyglutamine diseases.23,24 It is important to compare phenotypes obtained by expression of the wild type gene with those obtained by expression of a mutant gene to discern effects of the mutation v those of overexpression of a normal protein.
Gene targeting (knock-out, knock-in)
Using homologous recombination, a cloned gene or gene segment can be exchanged for the endogenous gene.25,26 The cloned gene can contain inactivating mutations (knock-out) or missense mutations (knock-in). In this fashion, heterozygous animals can be generated that when mated can generate homozygous offspring. Knock-out animals are models for human recessive diseases. Knock-in mice are potentially closer models of human disease than mice expressing a transgene, because the proper endogenous promoter is used, and gene dosage is not disturbed.27,28
A problem with homozygous knock-outs for many genes is that embryonic lethality results, because the respective genes have a function not only in adult tissues but also during embryogenesis. This can be circumvented by generating chimeric animals that carry a mixture of wild type and homozygous deficient cells. For instance, by using the Cre-loxP recombination system, mice containing alleles with inserted loxP sites can be mated with mice expressing a Cre recombinase transgene under the control of a tissue specific promoter so that function of a gene is only abolished in a specific tissue.29
WEB BASED INFORMATION FOR GENETIC DIAGNOSIS AND TESTING
A large amount of information is contained in publicly available databases. Box 2 lists examples of different kinds of databases that in turn provide links to other relevant sites. For the clinician, a classic resource has existed in the form of Mendelian Inheritance in Man.30 It is now available as a web based version called Online Mendelian Inheritance in Man (OMIM). OMIM is easily searchable by disease name, symptom or sign, gene symbol, or name of a protein. Its advantages are that it is comprehensive and very well updated. Several editors regularly perform literature searches and add information to disease categories. This is also one of its disadvantages, in that information for some entries is lengthy and not revised in its entirety. Entries identified by searches of the PubMed database are often linked to corresponding entries in OMIM.
Box 2 Sites on the internet with molecular genetic databases
http://www.Geneclinics.org Online textbook of selected diseases and information on commercial and research laboratories
http://www3.ncbi.nlm.nih.gov/OMIM/Online Mendelian Inheritance in Man
http://www.gene.ucl.ac.uk/hugo Human Genome Organization
http://www4.ncbi.nlm.nih.gov/PubMed/medline.html Online search of the medical literature
http://www.neuro.wustl.edu/neuromuscular Clinical and research data on neuromuscular syndromes including ataxias
http://www.uwcm.ac.uk/uwcm/mg/hgmd0.html Human Gene Mutation Database
http://hgbase.interactiva.de Human gene polymorphisms
http://www.botany.uwc.ac.za/mirrors/MIT-bio/bio/7001main MIT biology hypertext book
http://www.biology.arizona.edu The Biology Project (site for basics in biology and genetics)
http://www.orpha.net European site in several languages for rare diseases and orphan drugs
http://avery.rutgers.edu/index.php?topic=welcome A well illustrated genetics tutorial
A site primarily dedicated to neuromuscular diseases and inherited ataxias is maintained by the neurology department at Washington University. This site has useful tabular information for classifications and diagnosis based on typical phenotypic features. The site is not primarily focused on genetic causes of neuromuscular disorders and ataxia syndromes, and this feature makes the site useful for the evaluation of patients without a clear family history.
The GeneTests web site (formerly called GeneClinics) is a site maintained by the University of Washington through funding from the National Institutes of Health (USA). It provides information on genetic diseases and associated tests including academic and commercial laboratories. It has four main sections: GeneReviews, a laboratory directory, a clinic directory, and a section with educational materials that includes a Microsoft PowerPoint presentation. GeneReviews is an on-line review of currently 195 genetic syndromes and diseases, many of which are neurological. Disease reviews in the site are well organised, and updated on a frequent basis. It now provides several new features that facilitate the choice of a particular genetic test for the diagnosis of disease and links to genetic testing laboratories and specialty clinics for genetic diseases. Using the site search engine, a particular disease entry can be identified with links to "testing", "research", and "reviews". The diagnosis section has a clinical and a testing subsection. If applicable, the testing section provides a list of different molecular tests that can be used for the diagnosis of a particular disorder. In my opinion, GeneTests is rapidly becoming the most relevant online resource for clinicians to choose a genetic test and to aid in the interpretation.
Neuroscience has undergone an inexorable transition toward a molecular discipline.31 The availability of the human genome sequence will further accelerate this process. The conservation of many genes involved in specific cellular pathways throughout evolution will unite neuroscientists working in different model systems.32 Similarly, many human disease genes are conserved in lower species.33 This provides the opportunity to model the normal or mutant functions of these genes, making use of the unique advantages of each species. Genome comparisons will help to identify those DNA sequences that represent gene control regions as well as DNA domains important in alternative splicing, and will lead to novel ways to analyse development and differentiation of the brain. Comparisons of human and chimpanzee genomes, which differ by only 1%, will provide novel directions for cognitive neurosciences.
How will this impact be felt in the practice of neurology? Neuroscience and neurology have been transformed by the ability to chromosomally map and subsequently isolate disease genes. Both of these steps will be drastically accelerated. Once a gene has been mapped to a chromosomal region, the previously time consuming tasks of identifying and isolating genes in the candidate region are now greatly shortened. Novel mapping techniques and large scale sequencing can now be applied to the study of complex diseases.
The collection of DNA from participants in clinical trials will become commonplace, with the goal of genetic stratification, identification of risk alleles, and pharmacogenomic analysis. Once drugs tested in trials are approved for clinical use, the genomic data obtained during the clinical trial will probably follow the drug into neurological practice and will aid in patient selection and drug choice. Preventive medication choices will be tailored to a disease risk profile specific for the individual patient. It is likely that the decision matrix will be computer aided, but if counselling for Mendelian disorders is any guide, the physician-patient interaction will remain central to this decision making process in the end.
Advances on the technological front will have to be paralleled by major educational efforts for physicians and patients with regard to the genome project and genetic testing. These tasks have traditionally been assumed by medical geneticists and genetic counsellors. Unless major changes in the number of these professionals are made in the immediate future, it is probable that neurologists will have to deliver the majority of neurogenetic care. Curricula for medical students and neurology residents will need to reflect these changes.
In addition to dealing with single causative alleles that greatly increase susceptibility to a disorder, the neurologist will need to interpret the effect of alleles that each may have only a relatively small effect on a phenotype and often act in concert with environmental stressors. Both patients and physicians will have to adjust to dealing with this changed scenario. To keep the physician-patient interaction free and unencumbered, privacy of genetic data has to be guaranteed at the local and national levels. The use of information obtained through DNA testing by other family members, employers, insurance companies, and governmental agencies requires societal discussion. The realisation that we all carry deleterious alleles should foster compassion, and knowledge of these alleles could ideally lead to enlightened choices for lifestyles and medical interventions.
This review is based in part on previously published material.5,34 The author thanks Ms C Moran for assistance with references. Work for this review was supported by the Rose Moss Foundation, the Carmen and Louis Warschaw Endowment for Neurology, and F.R.I.E.N.D.s of Neurology.