The sequencing of the complete genome for many organisms, including man, has opened the door to the systematic understanding of how complex structures such as the brain integrate and function, not only in health but also in disease. This blueprint, however, means that the piecemeal analysis regimes of the past are being rapidly superseded by new methods that analyse not just tens of genes or proteins at any one time, but thousands, if not the entire repertoire of a cell population or tissue under investigation. Using the most appropriate method of analysis to maximise the available data therefore becomes vital if a complete picture is to be obtained of how a system or individual cell is affected by a treatment or disease. This review examines what methods are currently available for the large scale analysis of gene and protein expression, and what are their limitations.
- DD-PCR, differential display polymerase chain reaction
- DIGE, difference gel electrophoresis
- ESI, electrospray ionisation
- ICAT, isotope coded affinity tagging
- MALDI, matrix assisted laser desorption/ionisation
- MGB, minor groove binder
- PMF, peptide mass fingerprint
- SAGE, serial analysis of gene expression
- SELDI, surface enhanced laser desorption/ionisation
- 2D-PAGE, two dimensional polyacrylamide gel electrophoresis
Statistics from Altmetric.com
- DD-PCR, differential display polymerase chain reaction
- DIGE, difference gel electrophoresis
- ESI, electrospray ionisation
- ICAT, isotope coded affinity tagging
- MALDI, matrix assisted laser desorption/ionisation
- MGB, minor groove binder
- PMF, peptide mass fingerprint
- SAGE, serial analysis of gene expression
- SELDI, surface enhanced laser desorption/ionisation
- 2D-PAGE, two dimensional polyacrylamide gel electrophoresis
“The Blind Man and the Elephant”
And so these men of Indostan
Disputed loud and long, Each in his own opinion ~ Exceeding stiff and strong,
Though each was partly in the right ~ And all were in the wrong!
John Godfrey Saxe
Many of the currently used methods for the analysis of function in a cell or tissue rely on measuring one single analyte at a time, with the analysis being based on a prior hypothesis of mechanisms of action or biology. By measuring only one analyte, only a limited picture is gained of the complexities of an often very dynamic situation. Rather like the tale of the six blind men and the elephant, by observing only one facet at a time, a distorted view of the situation can be achieved, and the real answer to the problem remains hidden. With the advent of whole genome maps for several key organisms including man, and the use of methods for profiling all the changes that occur in cells or tissues, it is now possible to stand back and take an unbiased view of biological processes. This is particularly the case for neuropsychiatric disease, where the complexities of the brain, its multitude of connections, and the dynamic interplay between neurones and glia make an unbiased view of disease processes particularly important.1 Two emerging fields are now coming into general use where the global approach to gaining insight into brain function can be applied. Functional genomics is the generic term used to describe methods that analyse the various genes expressed by a cell or tissue, while proteomics aims to define the protein complement. These technologies now allow an almost complete examination of gene and protein expression using single techniques. Our aim in this article is to provide an overview of the specialised methodologies and to point out the potential pitfalls and limitations of these techniques, with particular reference to investigations on human postmortem brain samples.
Cellular function is mediated through gene expression involving the production of messenger RNA. Several methods can be employed to profile gene expression in neurological and psychiatric disorders, including differential display and microarrays, ideally coupled with real time quantitative polymerase chain reaction (Q-PCR) cross validation. These technologies can be used independently or in parallel, where they investigate mRNA transcripts quantitatively by amplification of RNA from disease and control samples, and then detection of specific complementary DNA (cDNA) or antisense RNA (aRNA) species. It is hoped that pathologically relevant pathways and disease mechanisms involved in neurological and psychiatric disorders can be revealed with these techniques.
Essential to any gene expression studies is RNA of the highest quality.2 While this is easily possible for RNA isolated from cells in tissue culture, or from animal tissues, it could be a problem for human studies where—certainly for brain derived RNA—there will be potential delays in obtaining the tissue sample, and for postmortem brain RNA this can be lengthy and also be affected by the agonal state.3 It is, however, clear that high quality RNA can be obtained from postmortem brain and, moreover, useful results can be obtained in gene expression studies.1
Standard methods of RNA extraction from tissues or cells are based on rapid extraction into guanidine-phenol containing solutions providing total RNA.4 Further purification into mRNA is not usually necessary for gene expression studies, as mRNA is easily worked on by preparing cDNA in a standard reverse transcription reaction. Column purification can also assist in removing potential contaminants such as extraction buffers, DNA and so on, or in concentrating small RNA samples.5,6 The standard assessment of RNA quality is the use of the A260/A280 absorbance ratio, which be greater than 1.8; additionally, RNA should be visually examined by agarose gel electrophoresis or ideally in microfluidic systems,6 which employ much smaller samples and provide a better visual representation of RNA quality. On visual inspection good quality RNA normally shows twice the amount of 28S compared with 18S ribosomal RNA bands.
Production of cDNA is a prerequisite for any functional genomics study. High temperature reverse transcriptase (RT) can improve the yields of cDNA produced, by melting secondary RNA structure, and for low levels of RNA, cDNA yields can be improved by use of nucleotide binding proteins.7 The final assessment of RNA quality, however, can be more readily obtained by polymerase chain reaction (PCR) based assessment. Amplification of various housekeeping transcripts such as β actin, GAPDH, and so on can determine transcript length and amount. Using standard PCR, the first of these measures will give an indication of the amount of degradation of the transcripts in the mRNA pool, partly degraded RNA samples having shorter transcripts owing to the absence of the 5′ portions of the RNA, and subsequently reduced quantities of long transcripts are identified compared with optimally prepared RNA. Similar results are also obtained from using real time PCR with reduced transcript levels indicating either RNA degradation or the presence of RT or PCR inhibitors.
Initial approaches to differential gene expression include subtractive cloning,8 serial analysis of gene expression (SAGE),9 and subtractive hybridisation techniques.10 Differential display is a less laborious and more widely used approach,11 and has yielded promising results, particularly with an adapted indexing based differential display PCR (DD-PCR) technique.12 Standard differential display techniques involve conversion of RNA to cDNA, which is then split into a series of reactions involving a specific reverse primer and a random forward primer. Amplification of only a limited subset of the RNA present produces a specific RNA ladder for a given primer combination and tissue. By comparing the RNA ladder between test and control, it is possible to identify transcripts that are differentially expressed. For indexing based DD-PCR, cDNAs from disease and control tissue are digested with class II restriction enzymes (for example, BbvI), followed by amplification of the internal fragments by adaptor primer PCR and visualisation by non-denaturing polyacrylamide gel electrophoresis, thus producing a representation of the RNA transcripts present in the samples. The fragments are cloned and identified by comparison with database entries. The main advantage of the DD-PCR technique is its sensitivity in detecting low expression fragments and the identification of unknown differentially expressed genes, particularly intragenic fragments. Its main disadvantage is a low throughput of samples. This technique and its modified forms have been successfully applied to investigating neurological and psychiatric disorders13–15—for example, in Alzheimer’s disease, downregulation of genes involved in synaptic formation and organisation,13 tau phosphorylation,13 or protein targeting.15
Serial analysis of gene expression (SAGE) uses the principle that a 10–14 base pair (bp) stretch of RNA can identify a transcript, and these short sequences can be ligated before PCR. SAGE combines cDNA library production with high throughput sequencing,9 where double stranded cDNA is prepared using poly-dT priming and digested with a restriction enzyme which cuts at relatively high frequency (for example, NlaIII). Ligation of an adapter to the cDNA is then done, the adapter having a type IIS restriction enzyme site (such as BsmFI) which allows the cDNA to be cut again, but at a site up to 20 bp away from the recognition sequence. Following digestion, the short fragments are then blunt ended and ligated, and the subsequent products amplified by adaptor specific PCR. These PCR products can then be cleaved with the original restriction enzyme and ligated to give concatamers of these short fragments which are then cloned. These clones are composed of the various mRNA species initially present, and the frequency of each individual sequence will be proportional to the frequency of the mRNA in the starting material.9 Sequencing and analysis of these clones provides a measure of gene expression in the tissue, and comparison of results between test and control allows comparison of gene expression. Several major projects have been based on the use of SAGE, most notably involving analysis of cancer gene expression (see cgap.nci.nih.gov/SAGE/). Drawbacks of SAGE are, however, the large scale sequencing that is required to build up sufficient expression information, and the bioinformatics support required to interpret the sequencing results, making it only suitable for larger laboratories. The use of poly-dT to prime synthesis may also reduce representation of certain mRNA species.
It is probably fair to say that differential display and other methods have been superseded by microarrays as the technique of choice to investigate differential gene expression in large sample sets. Depending on the array used, microarrays can now give an almost global picture of gene expression status in one experiment. The key value of current microarray technology lies in its use as a high throughput, initial screen to identify potential disease related genes that can then be cross validated using more accurate semiquantitative methods, such as real time PCR.16
Microarrays encompass various different technologies that have developed over the past few years. Two principal methods exist: cDNA arrays and oligonucleotide arrays. cDNA arrays were first developed where the genes of interest as cDNA clones or PCR products are printed onto membranes (macroarrays) or microscope slides (microarrays) using a robotic arraying device. To compare the expression of the gene in the different samples, mRNA isolated from each of these samples is labelled with radioactive isotopes (33P) or with different fluorescent dyes such as Cy3 (green) and Cy5 (red), by reverse transcription. Radioactively labelled samples are scanned using a phosphoimager, and results are compared after standard normalisation procedures—for example, housekeeping controls. For fluorescent labelling, the two pools of labelled cDNA probes (test and control labelled with a different CyDye) are mixed and hybridised to a microarray. After hybridisation, measurements are made with a high resolution laser scanner that illuminates each DNA spot (at two wavelengths) and measures the fluorescent intensity of each dye separately. A ratio measurement of the absolute and relative abundance of each specific gene in both samples is obtained. Numerous studies of neuropsychiatric diseases such as Alzheimer’s disease, Parkinson’s disease, and schizophrenia have produced promising results using these techniques.17–21
Oligonucleotide probe arrays are designed by printing the gene of interest directly onto a glass slide as a single stranded polynucleotide, or by direct synthesis of oligonucleotides on the substrate. These arrays are almost exclusively probed using fluorescently labelled cDNA or aRNA, and have become a popular choice as they have superior specificity over cDNA arrays.22 Additionally, the cost of commercially available as well as custom designed oligonucleotide arrays has decreased recently. Several oligonucleotide arrays representing part or most of the genes in the human genome have been designed and used to investigate gene expression in neurological and psychiatric disorders. Companies such as Affymetrix, MWG Biotech and others have developed high throughput screening procedures, which include optimised protocols that minimise the variation generated within experiments. The current Affymetrix array for instance represents approximately 39 000 transcripts including 33 000 fully annotated genes (fig 1) on two GeneChips (HG-U133A and B chips). Each gene is represented by a Probe set which consists of 11 probe pairs typically designed with a bias towards the 3′ end of each gene. These probe pairs are distributed across the array and consist of a perfect match and a mismatch oligonucleotide (each averaging 25 base pairs in length). The mismatch oligonucleotide contains a single base pair mismatch in the centre of the probe and is used to quantify and subtract non-specific hybridisation and background signals. Each chip also includes E coli internal spike control probes (bioB, bioC, bioD, cre) to determine hybridisation efficiency. To investigate gene expression, total RNA is used to prepare complementary aRNA from cDNA, which is then fragmented and hybridised to the array. Following washing, the array is then scanned and data are analysed by specific software. The data can be further mined using a number of software tools, such as GeneSpring or Bioconductor.
A major consideration for microarray based gene expression profiling is data analysis. Given the ability to analyse thousands of genes at once, the potential to identify false positives is large. In order to overcome this, several approaches have been applied to maximise the likelihood of identifying true changes without losing significant effects owing to overstringent statistical analysis. In general, once the array background and background hybridisation to non-specific genes (often plant genes for mammalian arrays) has been subtracted from data, data are normalised to the mean expression of all genes on an array, or are normalised to housekeeping genes, for example GAPDH. As a general rule, genes showing a fold difference of ±1.5 in mean expression levels between the test and control, and where p = 0.005, are considered to be differentially expressed, and to be further analysed by a more stringent method such as Q-PCR. All software packages for array analysis provide these analysis routines, with each package being tailored, often to individual arrays.
Various microarray experiments and other methodologies using RNA from postmortem human brain tissue have been published, including studies exploring expression profiles of complex neuropsychiatric disorders such as schizophrenia, bipolar affective disorder, and Alzheimer’s disease.19,20,23–25 For example, in order to understand some of the early biochemical changes in Parkinson’s disease, microarray based gene expression profiling has been applied to animal models of the disease, rather than studying end stage disease material where there may be changes in gene expression because of common pathological responses or neurone loss.26 Here, using a membrane array containing over 1000 expressed genes, the investigators identified changes in striatal gene expression associated with transcription factors (for example, downregulated Nur-77) indicating a coordinated change in gene expression, changes in cell–cell communication (such as upregulated synapsin 1A), and altered kinase dependent cell signalling (downregulation of c-src).26
There are several potential limitations to array based methods of expression profiling. A limitation of array based methods in general is that low abundance transcripts are often not detected, simply because of limiting amounts of target RNA. However, linear amplification methods may allow improved sensitivity in detecting low abundance transcripts.27 As with any determination of transcript levels there is the problem that RNA levels do not always correlate with protein levels owing to the post-transcriptional regulation that affects many RNA species.28 Macroarrays, as an array method, almost exclusively use radiolabelled cDNA for interrogation, and only one membrane can be interrogated at a time, making experimental variation more likely. For macroarrays, another limitation is the number of probe sequences that are available on any one array which is usually no more than a few thousand; this means that for any RNA sample several macroarrays need to be hybridised, and on a sample with limited RNA this may prove difficult without RNA amplification. Macroarrays can, however, prove to be an advantage where a relatively focused approach to a problem is being investigated, for instance by using an array which only has genes involved in apoptosis printed onto it. Here a limited “hypothesis driven” array experiment can prove to be extremely powerful, providing a comprehensive overview of only the system under investigation (see, for example, Weinreb et al29). Macroarrays are, however, highly suited to small laboratories as the equipment requirements are slight.
REAL TIME Q-PCR
High throughput gene profiling techniques can only be regarded, at best, as being semiquantitative as they only determine relative expression levels, and have only a limited dynamic range. Methods are therefore required that can provide validation of expression results derived from gene expression profiling, and which can provide semiquantitative data for levels of gene expression such as northern blotting or Q-PCR. Q-PCR is now considered the method of choice for validating gene expression.30 Q-PCR uses fluorescence detection of the PCR product by combining a thermal cycler with a fluorescent spectrophotometer.31 Two basic assay systems are available. As the PCR reaction proceeds and double stranded DNA is generated, this is detected by either binding a fluorescent dye to the double stranded DNA or release of a fluorescent reporter molecule. A relatively inexpensive and rapid method, it also provides a high degree of sensitivity, allowing the determination of low abundance genes that may not be detected by microarray.32 With most Q-PCR methods, levels of specific transcripts are related to levels of specific housekeeping transcripts to provide a means of normalisation, thereby providing accurate relative quantitation (fig 2).
The most inexpensive Q-PCR method in relies on DNA intercalating dyes, perhaps the most popular of these being SYBR green I (SYBR green I is a double stranded DNA intercalating dye that fluoresces upon laser light excitation), where an internal passive reference dye (ROX) is automatically detected by the machine to normalise for inconsistent pipetting. PCR product amplification is measured in real time by SYBR green I fluorescence emission upon binding to amplified PCR products after each PCR cycle. After 40 cycles, an end point or plateau is reached whereby no further amplification can take place owing to competitive PCR effects.33–35 SYBR green assays provide a cheaper alternative to the more expensive 5′ endonuclease TAMRA probes (Taqman™), minor groove binder (MGB) probes, and Molecular Beacon detection chemistries.17 These methods use primers where the fluorescent detection molecule is incorporated to the PCR primer and is quenched by the presence of either secondary structure or an additional quencher molecule.16 MGB probe assays represent a new type of 5′ nuclease assay incorporating a different probe design and fluorescent detection to that of the more traditional TAMRA probe assays. A non-specific MGB is incorporated at the probe 3′ end leading to an increase in melting temperature (Tm)36 allowing for shorter probes, typically 12–14 bp in length. A non-fluorescent quencher molecule (NFQ) is also situated at the MGB probe 3′ end in close proximity to the 6-FAM™ fluorophore. Q-PCR clearly is a robust method of verifying initial gene expression results from microarrays and of investigating novel genes that may be involved in neurological disease.30,37,38
While q-PCR is a common approach to validation of microarray data and is suited to high throughput, alternative methods of validation of data are also available. As many arrays are based on cloned cDNA sequences, the availability of these clones makes in situ hybridisation possible for determining not only levels of transcripts, but also cellular localisation, which is vital in neuroscience given the numerous cell types present in the CNS. Similarly, if antibodies are available to the particular gene, western or slot blotting for quantitation, and immunocytochemistry for cellular localisation are methods of analysis, the use of protein methods providing the ultimate validation that a gene change is accompanied by a protein change.
Proteomics is the analysis of the protein complement of a given cell or tissue at a given point in time, and as such represents the natural extension of functional genomic analysis. While there are about 30 000 genes in the human genome (see http://www.ncbi.nlm.nih.gov/genome/guide/human/ for a list of human gene resources), the protein complement of a cell or tissue, the proteome, is much larger and also much more dynamic in nature. This is because most mammalian genes show alternative splicing of transcripts, leading to different isoforms of a given protein. This, coupled with post-translational modification such as glycosylation, myristoylation, and phosphorylation, leads to two or more effectively different proteins per gene. Proteomics can therefore generate much larger datasets requiring more resources to handle and analyse the data effectively.
It is doubtful whether one single method in a single pass will be able to identify accurately all the proteins that are expressed by a cell or tissue, owing to the huge numbers of protein isoforms that are present and their highly variable physical and chemical properties. The biggest problem is undoubtedly the large number of proteins that are expressed as this gives enormous analytical problems, and tissues or cells therefore need to be divided into manageable sized chunks. Proteomic approaches normally use fractionation of the sample into, for example, plasma membrane, nuclear, cytoplasmic, mitochondrial, lysosomal, and endoplasmic reticulum/Golgi fractions, to generate the maximum amount of protein information for a given cell or tissue. This can be achieved by various centrifugation methods, such as using increasing centrifugal forces or by centrifuging over media of different buoyant densities. Cocktails of protease and phosphatase/kinase inhibitors allow the various constituent proteins to be maintained in an intact state ready for analysis. Additionally, the use of various detergents and chaotropes can be applied to the samples to extract insoluble proteins (such as those proteins integrated into the cell membrane) for analysis. By this route, it is possible to generate a series of fractions that are sufficiently refined to allow analysis.
TWO DIMENSIONAL GEL ELECTROPHORESIS
Two dimensional polyacrylamide gel electrophoresis (2D-PAGE) is the most popular technique employed in proteomic studies as it can simultaneously resolve thousands of proteins with one format. Despite first being described in the mid-1970s,39,40 the basic principles of separating proteins first according to charge (isoelectric point), and then according to mass remain unchanged. As isoelectric focusing can resolve 70 proteins while SDS-PAGE on a gradient gel is capable of separating out around 100 protein spots, 2D-PAGE should be able to resolve a “spot map” of around 7000 proteins, though the first use of 2D-PAGE only resolved 1100 E coli proteins on a single gel.39 Recent advances have greatly increased this original number and it is now possible to distinguish up to 10 000 individual protein spots on a single large format gel.41
The technologies employed in 2D-PAGE have evolved rapidly. The first stage involves isoelectric focusing, where proteins are separated according to pI, using immobilised pH gradients (IPG) that result in ampholytes (charged carrier molecules) being incorporated into a thin gel strip. Proteins applied to these IPG strips migrate in an electric field and, owing to the pH gradient effect created by the ampholytes, stop migrating through the IPG strip when their charge is net neutral. These IPG strips are widely available in various formats, from wide range covering many pH units, to narrow pH ranges covering just one or two pH units. These narrow range strips provide much greater resolution and can be overlapped. This stretching of the protein pattern allows visualisation of proteins within the same pH range as a standard wide range gel yet at a much greater resolution.40 This was demonstrated by the use of eight overlapping pH gradients to define the proteome of the Mycoplasma genitalium.41 Studies of Sacchromyces cerevisiae proteins have compared overlapping narrow IPG strips between pH 4 and pH 9 with one pH 3–10 strip, producing patterns of 2286 and 755 distinct spots, respectively.42 Despite these advances in first dimension technology, the method is unsuitable for highly acidic or basic proteins as separation below pH 3 and above pH 11 is poor because of lack of good ampholytes. This has often led to poor separation for membrane bound proteins (as they are often very acidic) owing to poor solubility in the absence of detergents. Additional preseparation methods are therefore needed to solubilise membrane proteins before separation.
After completion of first dimension separation by pI, proteins are separated by molecular weight in polyacrylamide gels containing sodium dodecyl sulphate (SDS-PAGE). SDS is an anionic detergent which denatures the proteins, converting them to a linear molecule by relaxing secondary structure, and because of its anionic nature it gives proteins a net negative charge. When combined with a reducing agent such as dithiothreitol (DTT), proteins are therefore separated exclusively by mass. Once the protein spots are separated they can be visualised using a variety of stains. Silver and Coomassie stains are relatively simple, require little specialist equipment, and are therefore the most frequently used methods. Silver staining is the most common method and has a higher sensitivity than traditional radiolabelling or Coomassie brilliant blue staining. Coomassie staining typically detects 8–10 ng of protein, while silver staining can be 100 times more sensitive.43 Both methods, however, work by the stains reacting with functional groups on the proteins and therefore some polypeptides are not effectively stained. Recently developments in fluorescence technologies have led to the production of fluorescent protein dyes such as SYPRO and CyDyes with sensitivity similar to silver stains, though specialist equipment is required for their detection. These fluorescent dyes are also more compatible with peptide identification by mass spectrometry, silver and Coomassie stained gels often requiring destaining before mass spectrometry.43–45
Silver and Coomassie stains are routinely used in many laboratories, providing a simple method for building up proteome databases of various organisms or tissues (see http://ca.expasy.org/ch2d/). For example, two dimensional gels of protein samples from Alzheimer’s disease and control brains have been described recently using silver staining to visualise the spot map of over 1500 proteins, and quadrupole time of flight tandem mass spectrometry (Q-TOF MS/MS) for protein identification (see the section on mass spectrometry below).43 The proteome of human cerebrospinal fluid (CSF) has also been analysed, producing a list of over 480 proteins.44,45 Such studies have identified variations in apolipoprotein E in the CSF between patients with sporadic or variant Creutzfeldt-Jakob disease,46 and between patients with Alzheimer’s disease and schizophrenia.47 Nuclear proteins from human blood lymphocytes have been identified, showing that the method can be applied to any tissue,48 including protein expression in the mouse cerebellum.49
A new development in two dimensional technology is fluorescence 2D difference gel electrophoresis (DIGE), where up to three different samples are separated on a single gel because they have each been labelled before 2D electrophoresis with a different fluorescent cyanine dye (Cy2, Cy3, Cy5). Each dye can then be visualised under a different wavelength and the images overlayed, giving a combined image that can be analysed using various software packages (fig 3).50,51 This technique allows direct comparison between samples to show the presence of a particular protein—for instance, in a test sample compared with control—reducing the effects of inter-gel variation and improving reproducibility, which has traditionally been one of the major problems associated with 2D-PAGE.52 For example, a control sample is labelled with Cy3 and disease sample with Cy5; a pool of all samples is created and labelled with a third label, Cy2, and this is included in all gels as an internal standard; the three protein samples are then combined and run on a single gel. The three different spot maps can be directly overlayed to allow comparison of the two protein samples, and reference to the internal standard will confirm actual protein changes or experimental artefacts.52,53
2D-PAGE is also a flexible technique in terms of the format of the gels, as in some cases mini 2D gels may be preferable to large format gels—for example, in experiments involving large numbers of samples. Thus 7 cm gels have been used to investigate protein oxidation in Alzheimer’s disease brain tissue,54 where proteins in the 2D gels were transferred to a PVDF membrane for western blotting and the spot map visualised by SYPRO ruby fluorescent stain, followed by probing with an antibody against oxidised protein groups. This method demonstrated over 100 proteins on these “oxyblots,” with many showing significant changes in Alzheimer’s disease compared with controls.54,55
ICAT—ISOTOPE CODED AFFINITY TAGS
Isotope coded affinity tagging (ICAT) is a relatively recent advance in proteomic analysis first reported in 1999 by Gygi and colleagues.56 The technique, coupled with mass spectrometry, allows both identification and quantitation of proteins within complex mixtures and, as with DIGE, permits simultaneous analysis of two protein samples. In ICAT, one sample is labelled with a reagent which contains normal hydrogen while the other sample is labelled with a reagent containing deuterium, and both samples are then mixed together. Following separation, application of the samples to a mass spectrometer allows the differences of a few Daltons to be resolved, with direct quantitation of both mass peaks.
The two isotopic forms of the ICAT labelling reagent contain an isotopically light or heavy linker region, a protein reactive group, and a biotin affinity tag. The reactive group of the commercially available ICAT reagents and those first described by Gygi57 are specific for cysteine residues through the third group. The linker region contains eight hydrogen atoms (d0) for the light (H) chain reagent, or eight deuterium (D) atoms (d8) for the heavy chain. In a standard ICAT experiment two protein samples, such as control and disease, are labelled with light or heavy reagents, respectively. The two are then combined and subjected to proteolytic digestion, typically by trypsin. The resulting peptide mixture is then fractionated by avidin affinity chromatography, which isolates only the cysteine containing peptides by binding to the biotin moiety on the ICAT reagent. This step results in 10-fold fewer peptides than in the original mixture, simplifying subsequent analysis. Identification and quantitation is then determined by liquid chromatography and tandem mass spectrometry.56,58,59 However, the original ICAT reagents are relatively large and therefore identifying the peptide fragments is complicated by their presence, which causes substantial shifts in the peptide mass. Current reagents for ICAT have been improved by incorporating an acid cleavable linker, allowing removal of the biotin affinity tag before mass spectrometry, but leaving the peptide isotopically labelled (http://appliedbiosystems.com). This simplifies the analysis so that greater numbers of peptides can be identified and quantified.60
ICAT labelling has been employed to study protein changes induced in cultures of cortical neurones by the chemotherapeutic agent camptothecin. ICAT labelled peptides were purified on an avidin affinity column and analysed by liquid chromatography and mass spectrometry, with 129 proteins identified and their relative abundance quantified. This has demonstrated ICAT’s usefulness in detecting low abundance proteins from many different subcellular compartments, including those involved in protein synthesis, transcription regulation, and signal transduction.61 This technique is also suited to the study of relatively insoluble proteins such as membrane proteins which are often not compatible with 2D-PAGE, as these proteins can be extracted with strong ionic detergents and then labelled, the digestion step also creating peptides that are also more soluble than whole proteins.59
A problem with ICAT is that not only does it require proteins to contain cysteine residues, but these residues must be in the region of a peptide that is produced during proteolytic cleavage. This was highlighted recently in a study of a multi-subunit membrane protein of E coli,62 which revealed that a high proportion (10–15%) of proteins lack cysteine residues. This problem may be overcome by the development of ICAT reagents reactive for different amino acid residues or by incorporating isotopic tags during the proteolytic step. For example, Glu-C proteolysis has been described using regular water (H216O) and heavy water (H218O). This results in two 16O or two 18O atoms being incorporated into the peptide fragments, giving a 4 Da difference. Likewise, different affinity matrices can be used to isolate different peptides, for example nickel immobilised metal affinity chromatography for histidine containing peptides, or a lectin affinity column for selectively isolating glycoproteins.58 ICAT also fails to identify post-translational modifications such as phosphorylation and glycosylation unless these changes occur on the peptide containing the cysteine residue.
Recently a hybrid method for using ICAT in conjunction with two dimensional electrophoresis has been described.63 Two protein samples were labelled with light and heavy ICAT reagents, pooled, and separated on the same 2D gel. The gel was then stained to visualise the protein spots, which were then excised from the gel and enzymatically digested, providing the peptide mixture for identification by mass spectrometry. The 8 Da difference, while not detectable on the 2D gel, was still identifiable on the peptide mass fingerprint obtained by mass spectrometry, providing quantitative data on the differences in protein expression. The potential for ICAT to become comparable with 2D-PAGE for identification of protein differences without the need for time consuming gel analysis is likely to be an emerging technique in the next few years which will allow the high throughput analysis of complex protein patterns provided by CNS samples.
DNA and oligonucleotide based microarrays have been routinely used for many years now, and more recently their protein counterparts have been developed. Protein microarrays or “chips” are gaining in popularity as miniaturised ligand binding assays which can be used for complex protein samples, because they allow simultaneous detection and quantitation of biomolecules. In this instance, capture molecules such as antibodies are immobilised at a high density in a small area on a solid support such as a treated glass microscope slide. When exposed, each individual antibody captures its target protein from, for instance, a cell lysate or a serum sample. This technique allows large scale and high throughput analysis, using small sample volumes and relatively low protein concentrations. For these reasons microarrays are likely to find routine applications in basic research, disease diagnosis, and the identification of therapeutic targets.64,65 Protein microarrays effectively allow quantitation of several hundred to several thousand analytes with one system (fig 4).
Antibody microarrays are the most accessible medium used in proteomics. In one of the first papers reporting the use of protein microarrays, 115 antibodies or antigens were immobilised using a robotic arrayer and probed with the corresponding ligands in mixtures of varying but known concentrations. Interactions were visualised by labelling the protein mixtures with Cy3 or Cy5 fluorescent dyes, and the relative intensities provide data on relative abundance.64 For the production of antibody based microarrays there is a vast library of antibodies that are relatively stable and well characterised and are already routinely used in various techniques. The disadvantages with the use of antibodies include a large molecular size, and in the case of polyclonal antibodies, a possible lack of specificity and limited resource. Commercial companies are, however, developing methods for overcoming these problems using antibody fragments or phage technology, the latter using phage (bacterial viruses) which have been genetically modified to express immunoglobulin fragments on their surface.66 Antibody function may also vary between assay types; therefore in a clinical setting it is important that antibody performance can be validated. A recent assessment of protein microarrays reports that, of over 100 commercially available antibodies tested, as few as 5% are suitable for use in microarray based analyses.67 To overcome the problem it may be necessary to select antibodies specifically for a particular protocol, which in turn requires some knowledge of the system being studied.68 This will, however, allow specific systems to be studied, such as apoptosis or inflammation, providing a better understating of certain pathways—for example, signal transduction. Alternatively it is possible to simplify complex protein samples such as cell lysates using chromatographic methods like ion exchange or affinity chromatography, liquid phase isoelectric focusing, or 1D-PAGE.69 Synthetic alternatives to antibodies have been described which show high specificity and affinity and are stable; this may be a future route for protein microarrays.70 Antibody based microarrays are, however, likely to have potential for clinical applications such as detection of diagnostic proteins in serum or CSF, as well as being of use in a research setting. The microarray format has been described for high throughput detection of clinical analytes, albeit in a low density, 6×6 (36 analytes), format.71
Variations on the antibody array format that have been described include tissue arrays, peptide arrays, and carbohydrate arrays where these molecules (or tissue) are arrayed and probed with single or multiple analytes to determine their binding partner on the array. For instance, tissue arrays have small sections of normal or pathological tissue from various organs gridded onto their surface, which are probed with antibodies to novel proteins to determine expression patterns. Protein–protein interactions in yeast have been described whereby 80% of the proteome was cloned and purified and then spotted onto a glass slide, forming a proteome array. Probing with protein extracts then provided information on protein–protein and protein–phospholipid interactions.72 These protein arrays may be more likely to become the method of choice for determining protein binding partners to identify cellular pathways. In the future it should be possible to make cDNA expression libraries using the human genome database as a resource, and to spot all known proteins producing a proteome microarray. For example, if a particular neuronal protein is being studied, application of the recombinant or isolated protein to the array labelled with a suitable reporter would then allow potential interacting partners. Sequential analysis of these binding partners will permit such protein–protein pathways to be generated.
The use of microarrays in proteomic studies is still very much in its infancy but it is clear from the few publications there are on this subject that it will one day be an invaluable tool in the clinical setting, particularly where protein concentrations or sample quantity may be limited. A recent report identified five serum proteins which differed significantly between prostate cancer and control samples,73 while a second study used small amounts of oral cancer material obtained using laser capture microdissection.74 Here differences in protein expression could be quantified for a number of proteins during tumour progression.
Mass spectrometry is the preferred method for the identification of proteins, forming an essential part of proteomic analysis. Mass spectrometry measures the mass to charge ratio (m/z) of gaseous ions produced by accelerating an ionised particle, in this case the protein or peptide, through a rarefied atmosphere to a detector. By providing structural information such as peptide mass and amino acid sequence, as well as information on protein modifications, the data obtained can then be used to identify a protein by searching various databases available.
Because with a large protein—for instance, albumin at (~65 kDa)—the mass charge ratio could be derived from multiple combinations of amino acids, and as the accuracy of mass spectrometers is reduced at higher molecular mass, methods are required that improve the accuracy of detection. The protein to be analysed is purified, often by 2D-PAGE although chromatographic methods can be used, and digested enzymatically (for example by trypsin or Lys-C) to cleave the protein at specific bonds, giving a reproducible pattern of digestion. Mass spectrometry is then used on the complex peptide mixture, giving peptide masses with high accuracy—a peptide mass fingerprint (PMF) (fig 5). With this information the likely amino acid composition of each peptide from the protein digest can be derived, which is then compared with databases containing theoretical protein cleavage data using sophisticated computer search engines, producing a list of the closest matching proteins. For this reason PMF is ideally suited to those species for which the genome has been completely sequenced, such as man and mouse, but is less useful in those cases where the genome sequence has not been fully determined. In such cases additional peptide sequence information can be obtained using, for instance, traditional Edman sequencing, where amino acids are sequentially removed from the peptide and analysed by high performance liquid chromatography (HPLC) with electrochemical detection or by mass spectrometry, or more directly by using tandem mass spectrometry (MS/MS), which combines two mass analysers—for example, a quadrupole with a TOF analyser.75
All mass spectrometers have three main components: the ionisation source, the mass detector, and the ion detector. Ionisation sources include matrix assisted laser desorption/ionisation (MALDI) and electrospray ionisation (ESI). Both are ideal for detecting low protein concentrations and can be used with complex protein/peptide mixtures or with prefractionated samples. Both are “soft” ionisation techniques allowing ion formation without altering the native protein or peptide, thereby providing more accurate mass information.
MALDI requires picomoles or less of sample, is relatively insensitive to contaminants such as salts and non-ionic detergents, and samples can be in the solid, liquid, or gaseous phase. The analyte is co-crystalised with an ultraviolet absorbing matrix solution on a target plate. A laser beam is fired at the target which is absorbed by the matrix, transferring energy to the analyte and causing it to transfer into the gas phase.76 Traditionally MALDI instruments are coupled to TOF mass analysers, which measure the time lag between the point at which ions are accelerated to the point at which they reach the ion detector, ions with a smaller mass reaching the detector before those with a greater mass. Other analysers include the commonly used quadrupole which consists of four parallel metal rods which can act as a filter to allow only the passage of ions with a certain m/z. By placing multiple quadrupoles in series the amino acid sequence of a peptide can be determined.76
ESI requires the analyte to be in solution and therefore is ideally coupled to liquid chromatographic separation methods. As the sample is injected into the mass spectrometer it is sprayed across a high potential difference, resulting in the formation of a fine mist of charged droplets. More recent advances in ESI include nanospray ionisation in which the microcapillary tube used for injection of the sample has a diameter as small as 1–2 μm, allowing flow rates as low as 5 nl/min,77 which greatly reduces the amount of sample needed for analysis.
As previously mentioned, MALDI-TOF is a commonly used combination for PMF, but the MALDI-Q-TOF hybrid allows the amino acid sequence to be determined for any peptide that is not identified by PMF.77 These newer mass spectrometry instruments allow greater sensitivity and high mass accuracy, and offer high throughput analysis by being coupled to automated systems for either robotic sampling of 2D gels or direct capillary based separation of protein/peptide mixtures. They are also able to detect and characterise post-translational modifications and identify different isomers.78
There are numerous examples of the use of mass spectrometry in neuroscience—for example, increased platelet activating factor in the plasma and CSF in multiple sclerosis was identified using HPLC with tandem mass spectrometry,79 and the processing of neuropeptide Y in the CSF of patients with depression has been monitored using MALDI-TOF.80 The structural variants of β amyloid in brain tissue from patients with Alzheimer’s disease compared with healthy controls was determined by ESI-MS, and other neuropeptides such as substance P and dynorphin A have been analysed in plasma, CSF, and brain in various conditions, using mass spectrometry techniques (see Nilsson et al for a review81).
Recently, SELDI (surface enhanced laser desorption/ionisation) has increased in popularity. First detailed in 1993,82 the principles behind SELDI are similar to MALDI in that it uses a laser beam to desorb analyte ions from a solid for analysis by mass spectrometry.83 Sample preparation is simplified compared with MALDI as proteins are captured onto a solid phase chromatographic surface.84 For example, the sample is applied to strong cationic support and washed with an appropriate buffer so that only proteins and peptides with affinity for the support are retained. Analysis with SELDI therefore produces a series of mass peaks for each affinity matrix, effectively a protein peptide signature for each tissue or cell type. By using different types of support, it is thus possible to analyse different subsets of proteins to build up a picture of the proteins present. The technique has been used in various studies and is particularly suited to analysis of diagnostic biomarkers in plasma or CSF. For example, cystatin C—a secreted cysteine protease inhibitor—has been identified in CSF as a marker of chronic pain,85 and β amyloid has been identified in the lens of Alzheimer’s disease patients, suggesting that the pathological features of the disease overlap between brain and lens.86 Recently SELDI has been used to analyse plasma samples from individuals with ovarian tumours compared with normal individuals and individuals with benign ovarian cysts.87 By comparing the peptide profile, a series of peptide peaks was identified which together provided 99% sensitivity and 99% specificity in the diagnosis of ovarian tumour. The identification of prostate cancer associated biomarkers has also shown the value of SELDI for rapid discovery of potential clinical markers,88 suggesting that its application to samples such as CSF may have considerable utility.
One drawback of SELDI is that it is most suited to the analysis of peptides and proteins of less than 20 kDa, as these are more likely to be retained by the affinity support and are more likely to fly when hit by the laser. This can be overcome by simple prefractionation by size before affinity separation, so that only proteins greater than 20 kDa are applied to the affinity support. Also, as SELDI uses a mild ionisation procedure, it has a limited capacity to identify the peptide unless it is less than about 10 kDa. A peptide identified in this way requires further purification regimens and analytical methods for precise identification. SELDI is therefore more suited to high throughput prescreening of large sample numbers, but nonetheless provides an effective tool for identifying the presence of differentially expressed proteins. Advances in MALDI-TOF to provide high throughput, along with different isolation and enrichment strategies, will possibly make the principles of SELDI-TOF much more amenable to protein identification.
One of the major hurdles to the application of genomics in neuroscience is the complexity of the brain itself and the numerous different cell types, even within a relatively small area. As genomics and proteomics aim to define the complement of a given cell, this poses major problems in attempting to decipher which genes and proteins are associated with a particular cell type. To overcome this, techniques such as laser capture microdissection are available, where an individual cell type such as a pyramidal neurone in the hippocampus can be isolated from its neighbours by very focused laser light.89 The use of cells isolated in this way will allow these technologies to be applied to individual cell types, and enable comparison of, for instance, the proteomes of pools of neurones affected by degenerative pathology compared with their unaffected neighbours. With advances in sensitivity in fluorescent detection and mass spectrometry, it may even be possible to analyse the expression of a single cell.90 No one single technique will be capable of analysing the entire genome or proteome of a given cell or tissue, but with selective use of various methods it should be possible to determine the gene and protein expression patterns of key brain regions in health and disease. Ultimately, these global profiling technologies will help to unravel both the genetic and environmental factors that predispose to and precipitate complex neuropsychiatric disorders.
MMR and SB gratefully acknowledge the support of the National Alliance for Research on Schizophrenia and Depression, the BBSRC, the Stanley Medical Research Institute. CM and KEW are supported by the MRC, the Alzheimer’s Research Trust, Amersham Biosciences plc, and the Newcastle Hospitals Special Trustees.
Competing interests: Mr Pashby and Drs Prime, Orange, O’Beirne, and Whateley are employees of Amersham Biosciences plc who manufacture and distribute reagents and equipment relating to some of the systems referenced in this paper, and as such may be regarded as having a potential interest. Dr Morris is in receipt of grant funding from Amersham Biosciences plc.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.