Article Text

Mapping the landscape of cerebral amyloid angiopathy research: an informetric analysis perspective
  1. Andreas Charidimou1,2,
  2. Zoe Fox3,
  3. David J Werring1,
  4. Min Song4
  1. 1Stroke Research Group, Department of Brain Repair and Rehabilitation, UCL Institute of Neurology and The National Hospital for Neurology and Neurosurgery, London, UK
  2. 2Hemorrhagic Stroke Research Program, Department of Neurology, Massachusetts General Hospital Stroke Research Center, Harvard Medical School, Boston, Massachusetts, USA
  3. 3UCL and the Education Unit, Biomedical Research Centre, UCL Institute of Neurology, London, UK
  4. 4Department of Library and Information Science, Yonsei University, Seoul, Korea
  1. Correspondence to Dr Andreas Charidimou, Harvard Medical School, J Kistler Stroke Research Center, Massachusetts General Hospital, 175 Cambridge St, Suite 300, Boston, MA 02114, USA; andreas.charidimou.09{at}


To quantitatively analyse the research output and major trends in the field of cerebral amyloid angiopathy (CAA) over six decades, from 1954 to 2014, using advanced informetrics methods, we systematically identified CAA-related articles from PubMed, collected metadata and performed productivity analysis, copublication analysis, and network and content analysis over defined time periods. Linear regression was used to investigate these relationships. Changes in CAA research themes (2000–2014) were defined using a topic modelling technique. A total of 2340 CAA papers were published between 1954 and 2014. The mean number (3.03; 95% CI 2.62 to3.45; p<0.0001) and mean rate (0.13%; 95% CI 0.11% to 0.15%; p<0.0001) of CAA publications increased yearly. Analysis of copublication networks over 5-year periods from 1990 to 2014, revealed a great increase in the total number of connected investigators publishing on CAA (coefficient 16.74; 95% CI 14 to 19.49; p<0.0001) as well as the interactions between them (coefficient 73.53; 95% CI 52.03 to 89.03; p<0.0001). Further analysis of the network characteristics showed that in the past 15 years, copublication networks became not only larger, but also more connected and coherent. Content analysis identified 16 major CAA research themes and their differential evolution in the past 15 years, with the following main trends: (A) limited focus on vascular cognitive impairment; (B) a shift in emphasis towards neuroimaging, cerebral microbleeds and diagnostic aspects and away from pathological aspects; and (3) a reduced emphasis on basic biology apart from an increased focus on mouse models and perivascular drainage. Our study reveals the rapidly developing nature of the CAA research landscape, providing a novel quantitative and objective basis for identifying unmet needs and new directions. Our findings support the idea of a collaborative culture in the field, encouraging international research initiatives.


Statistics from


Cerebral amyloid angiopathy (CAA) is a common cerebral small-vessel disease resulting from progressive amyloid-β deposition in the media and adventitia of small arteries and capillaries of the leptomeninges and cerebral cortex.1 ,2 CAA is a major cause of lobar intracerebral haemorrhage in elderly people and an important contributor to vascular cognitive impairment.3 It is also almost invariably found in Alzheimer’s disease, albeit in a relatively mild form.4 In the era of modern neuroimaging, CAA is associated with a high prevalence of characteristic MRI markers of small vessel disease, including white matter hyperintensities, lobar cerebral microbleeds, cortical superficial siderosis and enlarged perivascular spaces in the cerebral white matter.1 ,2 Unlike most neurological disorders, the understanding of the underlying pathology of CAA preceded the associations with clinical syndromes by many years. Since the late 1960s, when CAA was first linked to clinical disease, researchers have strived to understand its underlying pathophysiological mechanisms and clinical consequences. Despite a growing interest in the field and remarkable recent advances, CAA remains largely untreatable. Nonetheless, there are now grounds for optimism as disease-modifying strategies (including immune-based amyloid-modulating therapies) may soon be possible in CAA; early-phase trials of candidate treatments are underway.5

As highlighted during the recent Fourth International CAA Conference hosted at the UCL Institute of Neurology in Queen Square, London (, to improve diagnosis and management, to better understand the disease and develop new treatments, requires the combined expertise and resources of large-scale international projects and networks. Mapping the landscape of CAA research over time should be valuable to understand how this field is evolving and to elucidate various aspects of its development—including trends, bottlenecks, areas in need of re-examination and new avenues. Advances in informatics now make it possible to gather large amount of ‘knowledge about knowledge’, a growing discipline termed ‘metaknowledge’.6 A specific form of metaknowledge—informetrics (including bibliometrics)—uses quantitative and statistical methods to harvest and analyse publication records from online databases at a large scale, in an effort to objectively measure patterns of scientific productivity within a given field or body of literature.7

Despite its wide application in other research domains, including stroke8 and Alzheimer's disease,9 ,10 to the best of our knowledge, there have been no previous studies of informetrics or bibliometrics in the field of CAA. The aim of our study is to analyse the research output in the field of CAA over six decades from 1954 to 2014 using advanced informetrics methods, including productivity analysis, network and content analysis, and copublication networks.


Data collection

We used PubMed to identify all relevant publications on CAA (without language restriction) with the combination of search terms: ‘amyloid angiopathy’ OR ‘congophilic angiopathy’ OR ‘dyshoric angiopathy’ OR ‘dysphoric angiopathy’. The search covered the period from 1950 to 2014. The methodological approach for the present study is similar to that applied recently in an overview of Alzheimer's disease research,11 and originates from the fields of informetrics, text mining and information visualisation. In summary, to broadly examine the landscape of CAA research, we extracted data to be used for analyses at the macro level (including year, journal, authors, MeSH terms, etc), and meta-data for analyses at the micro level (eg, ‘core biological entities’ or ‘bio-entities’, including genes, pathways, disease, etc), using text-mining techniques. Text mining refers to the extraction of hidden and useful information or knowledge by processing unstructured text data from individual electronic publications.12 ,13

Productivity analysis, coauthors, network analysis and content analysis

Time-based productivity over prespecified periods was calculated using the total number of unique publications identified on a yearly basis. Productivity analysis was also conducted from the perspective of journals and MeSH terms used. For all three aspects, the ratios of published papers for each year to the total number of published papers throughout the study period were also calculated.

Publications per author were collected and copublication networks were generated for defined 5-year time-periods from 1990 to 2015, using Gephi, an open source for network analysis and visualisation.14 Copublication of a research article can be visually represented as a link between two or more investigators. Expanding this abstraction to an entire scientific field thereby facilitates a global analysis using the power of network statistics.15 Gephi also enabled us to calculate network features, including diameter (ie, longest geodesic distance: the distance between two nodes in a graph based on the number of edges in a shortest path connecting them), density, centrality and the average clustering coefficient.

To provide a more comprehensive understanding of the published literature related to CAA at the micro level, we examined the network of ‘core biological entities’ (ie, initial key-concept list representing a paper) within the field of CAA, as well as topic trends over time. To this end, first, we applied a concept graph-based network analysis approach to identify frequently mentioned entities and their interrelationships, as recently proposed.11 Based on this document representation technique used in text mining, we constructed graphic representations of a text document, starting from a small set of concepts and expanding it into a rich graph, as previously described.11 For each CAA article identified, we extracted biological entities by a Named Entity Recognition technique. Biological entities (ie, initial key-concept list representing a paper) from the CAA literature were then mapped into concepts defined in the Unified Medical Language System (UMLS) provided by the National Library of Medicine—referred to as key concepts.16 UMLS is a comprehensive biomedical ontology and thesaurus that contains definitions and semantic information on a wide range of concepts in medicine, biology and health sciences.16 During the mapping process, we aimed to find either a first-best match or n-gram matches (ie, a contiguous sequence of n items from a given sequence of text) of a key phrase into a UMLS concept. After extracted entities were mapped into unique UMLS concepts, the obtained list was used to generate the global concept graph, visualised using Gephi.14 The global concept graph includes a selection of concept nodes in which edges connecting them represent semantic relationships between different concepts.11 The final global concept graph was examined with the following three methods: co-occurrence frequency analysis, centrality analysis and community detection. Specifically, to explore the core biological entities in CAA research, we analysed the top 10 centrality nodes using five validated centrality measures as previously described:11 ,17 degree centrality (the number of edges connected to a given node), weighted degree centrality, closeness centrality, betweenness centrality and PageRank. Here, we only present weighted degree centrality, calculated by summing the frequency of every node pair for a specific node.11

In order to examine trends of topics and themes in CAA research over time, we applied a topic modelling technique (content analysis). Topic models are based on the general idea that documents are mixtures of topics, where a topic is a probability distribution over words. A topic model is a generative model for documents: it specifies a simple probabilistic procedure by which documents can be generated. In this study we have used an adapted version of Dirichlet Multinomial Regression (DMR),18 an expansion of Latent Dirichlet Allocation.11 ,19 DMR is a topic model able to simultaneously generate both the words and metadata in a document, highlighting hidden topic variables. To track topical trends over time, DMR-based topic models were conditioned on year.11 Exploring trends of evolution of research themes over time allowed us to identify potential unmet needs in the field.

Statistical analysis

Linear regression was used to investigate the relationship between: (1) the numbers and the percentage of publications on a year-by-year basis; (2) the total number of connected investigators publishing on CAA (ie, nodes in copublication networks) and the interactions between them (ie, edges connecting different nodes) for each year; and (3) the relative distribution ratio of different topics and themes in CAA research between 2000 and 2014, as defined using the DMR-based topic modelling technique described above. We specifically focused on this period because it provided sufficient data size and a fairly even distribution of themes. In all analyses we assessed whether linear regression was appropriate by examining the fitted values and the residuals. Linear regression analyses were carried out using STATA (V.12.1, StataCorp).


A total of 2340 unique papers directly or indirectly related to CAA were published from 1954 through 2014 (no relevant papers appeared in PubMed between 1950 and 1953). Figure 1A shows that the number of CAA publications per year has increased approximately linearly during this period (R2=0.83; p<0.0001). There was a mean increase of 3.03 (95% CI 2.62 to 3.45) papers published per year over the whole period and the results remained consistent when the number of papers was logarithmically transformed. A similar trend was also evident for the ratio of publications per year (ie, number of publications per year/total number of publications 1954–2014) (R2=0.84; p<0.0001) (figure 1B): the ratio of publications increased by a mean 0.13% (95% CI 0.11% to 0.15%) per year over the whole study period, with the most recent decade, 2004–2013, accounting for 52.6% (n=1171 papers) of the whole world literature in CAA. The most productive years have been 2012 and 2013, contributing about 7.09% and 7.05% of all articles in the analysis, respectively. Online supplementary table e-1 and figure 2 summarises the top 10 most productive journals publishing CAA-related research.

Figure 1

(A) Total number of publications and rates of published papers (B) calculated on a year-by-year basis, with data fitted to a linear regression line from 1980. The inset bar charts and regression lines show the same data fitted for exponential functions, that is, linear regression performed between the logarithm of the data and the year. The two models were consistent.

Figure 2

The contribution of the top 10 most productive journals in cerebral amyloid angiopathy (CAA)-related research, in terms of the total published papers (A) and relatively to all the published literature in CAA (B).

We examined the total number of investigators who published an article related to CAA per year, the number of links between CAA authors over time (figure 3) and hence coauthorship interaction. Linear regression analysis specifically demonstrated the growth of copublication networks in terms of the number of authors as well as the collaborative interactions among them (figure 3). We visualised and further explored these networks over predefined time periods in the past 15 years—revealing their trends in evolution (figure 4), especially their overall growth and complexity. Not only did the total number of connected investigators publishing on CAA (ie, average number of nodes) and the interactions between them (ie, average number of edges connecting different nodes and number of edges per node) continue to increase (figure 4G), but so did the cohesiveness of copublication networks, as measured by the average degree centrality and average clustering coefficients over time (figure 4H–I). Based on degree centrality, in the past 5 years (from 2010 to 2014), one researcher tends to be a coauthor with an average of five different researchers, an increasing trend compared to the 1990's (figure 3H). Overall, the coauthorship networks exhibited ‘small-world’ network properties (ie, most nodes even if not neighbours of one another, can be reached from every other node by a small number of steps),20 since their clustering coefficients are fairly high (higher than 0.9) and the mean shortest path lengths are small enough (figure 3I, including inset). The network of the late 1990s demonstrates such properties to a lesser degree as compared with the other time-periods networks, and especially compared to the network in the early 2000s.

Figure 3

(A) Total number of linked investigators who published an article related to cerebral amyloid angiopathy (CAA) per year. These data were fitted to a linear regression line from 1980. (B) Number of links between CAA coauthors per year. Data were fitted to a linear regression line from 1980. The inset bar charts and regression lines show the same data fitted for exponential functions, that is, linear regression performed between the logarithm of the data and the year.

Figure 4

(A–F).Copublication networks of authors publishing on cerebral amyloid angiopathy, 1990–2014. Each line is linking two nodes (ie, authors) representing shared coauthorship of an article between two investigators. Different colours denote the community of nodes connected to each other. Representative networks are shown for 5-year periods: (A) 1990–1994; (B) 1995–1999; (C) 2000–2004; (D) 2005–2009; (E) 2010–2014; and (F) overall network 1990–2014. As is becoming evident, during the past 10–15 years, the number of investigators and coauthorship networks grew rapidly. For visualisation of these networks, we have used the community detection function provided by Gephi, which specifically applies the ‘modularity algorithm’; hence the colours are used to distinguish the communities automatically detected by the algorithm. (G, H, I) Bar charts of number of nodes and edges, connected components and average clustering coefficients, respectively, for the copublication network 5-year periods.

Figure 5 shows the final global concept graph, an integration of 2340 individual concept graphs from all identified articles, which provides biological entities involved in CAA research and their inter-relationships. Centrality measures are summarised in online supplementary table e-2.

Figure 5

Global concept graph illustrating biological entities involved in cerebral amyloid angiopathy research and the relationships among them, integrated from all identified papers. The node size is adjusted based on the weighted degree centrality (see online e-supplement and inset diagram for the Top 10 biological entities of the shown final global concept graph).

DMR-based topic modelling identified 16 relevant major topics in the field of CAA research across all articles (table 1): Alzheimer's disease autopsy studies, CAA related to prion disease, hereditary cystatin C CAA, haemorrhagic manifestations of CAA on pathology, vascular cognitive impairment, MRI and cerebral microbleeds, lobar intracerebral hemorrhage (ICH), perivascular drainage, Dutch-type CAA, mouse models, role of CAA in Alzheimer's disease, CAA diagnosis, amyloid-related imaging abnormalities (ARIA), transient focal neurological episodes, apolipoprotein E (APOE) and basic biology. To identify research areas with decreasing trends (and hence potential ‘unmet needs’) or rapid advances within the themes defined above, we examined the topic distribution over time using linear regression analysis (figure 6). We specifically focused on the period 2000–2014, which provided sufficient data size and a fairly even distribution of these themes. In the literature on CAA-associated clinical syndromes, there was a growing focus on ICH and transient focal neurological episodes, but a lack of information on cognitive decline in CAA, that is, vascular cognitive impairment (figure 6A). During this time, there was an increase in the emphasis on neuroimaging and cerebral microbleeds, coinciding with an increased interest in diagnostic aspects of CAA and ARIA (figure 6B). The relative distribution of publications dealing with pathological aspects of CAA, including the role of CAA in Alzheimer's disease pathology, remained stable but with decreasing trends (figure 6C). Over the same time period, there was an evolution in CAA pathophysiology-related themes, including a notable reduction in the basic biology topic, but an increased focus on mouse models and no change for the perivascular drainage theme (figure 6D).

Table 1

Dirichlet Multinomial Regression (DMR)-based topic modelling results

Figure 6

Evolution of different topics and themes in cerebral amyloid angiopathy (CAA) research between 2000 and 2014, fitted into linear regression models. These themes were emerged and defined from the topic modelling analysis of the most clinically relevant CAA-related topics, as presented in table 1.

In a post hoc analysis, we investigated whether certain relatively new methods used for biomarker research in CAA—which did not appear frequently enough to define separate themes in our topic modelling analysis—were present as keywords within other related themes. These included cerebrospinal fluid (CSF) biomarkers and new imaging methods that might be used as markers of vascular pathophysiology in CAA,5 including C11-labelled Pittsburgh Compound B Positron Emission Tomography (PiB-PET) imaging, measures of abnormal vascular reactivity (ie, fMRI), as well as postmortem imaging and cortical microinfarcts. After reviewing the extended sets of keywords identified in DMR-based topic modelling, only PiB-PET and CSF were captured as keywords in the ‘MRI and cerebral microbleeds’ (ie, diagnostic) and the ‘Perivascular drainage’ topics, respectively, but with a low probability distribution.


To the best of our knowledge, this is the first study to systematically assess the rapidly developing field of CAA research using informetric analytic tools. This approach has potential value to predict trends, identify unmet needs, and suggest new research directions or priorities. Our main findings suggest that the number of publications related to CAA increased approximately linearly over the past 60 years. Over the same period, the number of coauthorship networks has also risen significantly. Advanced biological entities network and content analysis allowed us to define major themes emerging from CAA research, and explore their differential evolution and shift in emphasis in the past 15 years.

The history of CAA is remarkable in that, unlike most neurological disorders, the understanding of the underlying pathology preceded by many years the association with clinical syndromes and disease.21 In a landmark paper published in 1954, Stefanos Pantelakis from the Geneva Brain collection (University Psychiatric Clinic in Bel-Air, Switzerland)22 set out the modern concepts of CAA pathophysiology and described what are still considered the pathological hallmarks of the disease.23 Since these early descriptions, understanding of the pathophysiological mechanisms, neuroimaging features and clinical consequences of CAA has increased substantially, consistent with the quantitative data we provide illustrating the increase in CAA-related research output. Although direct causality is difficult to demonstrate, the growth in coauthorship networks, indicating a shift from single or small investigator groups towards larger and more complex collaborative teams during this time, might have contributed to an increased volume of CAA research. Furthermore, additional journals have become available, encouraging increased numbers of CAA publications and interactions.

Topic modelling analyses enable us to grasp the most prevalent topics in the CAA literature in an unbiased way. Most notable over the past 15 years has been a shift in emphasis towards neuroimaging rather than neuropathology. Advanced MRI has increased our ability to detect and define the consequences of CAA in routine clinical practice, including dynamic features of the disease by contrast with the ‘snapshot’ obtained in neuropathological examination.1 ,5 Neuroimaging has also improved our ability to diagnose the disease accurately in vivo. However, an unmet need is the lack of strong support for MRI biomarkers from pathological-imaging correlations. For example, the world literature on cerebral microbleeds and their pathological correlates is based on 18 patients, with only a handful related to CAA.24 There are even less data for cortical superficial siderosis,25 acute cortical microinfarcts26 and MRI-visible white matter perivascular spaces,27 the most recent putative CAA biomarkers. Our data suggest another key area of need related to neuropathology: the identification of the exact role of CAA in the context of Alzheimer's disease. Is vascular amyloid a silent partner or a key player in the natural history of neurodegeneration?

A notable widening gap within the clinical syndromes theme is the lack of focus on the role CAA plays in vascular cognitive impairment, a crucial topic requiring future research efforts.28 Promising new early disease biomarkers of CAA, including as PiB-PET imaging, as well as measures of vascular reactivity and high field MRI, did not emerge from the topic modelling analysis, most likely because of their novelty, and hence the very few published reports.

As one would expect, no treatment theme emerged in our analysis, since CAA is as yet an untreatable cause of spontaneous ICH and vascular cognitive impairment. However, there are now grounds for optimism as amyloid-modulating therapies, including immune-based therapies (amyloid-β immunisation), may soon be possible in CAA ( Identifier: NCT01821118). Given our ageing population, CAA represents an impending public health problem. The development and evaluation of novel disease-modifying therapeutics increases the urgency with which advances in neurobiology, biomarkers and diagnosis of the disease are needed.5 These, in turn, require the combined expertise and resources of many different large-scale research projects as part of multicentre collaborations and international networks, which are often beyond the reach of any single investigator or research team. Treatment of hypertension to prevent CAA progression also deserves further research.29

Our study provides a unique overview of the CAA research landscape focusing on productivity analysis, copublication networks, the relationships between bioentities and the distribution of main topics, to help researchers grasp the field in its entirety. Combined with new CAA-research technologies, this ‘metaknowledge’ can contribute to shaping the direction of research to identify areas in need of re-examination or point out new paths of interest. We suggest that future informetrics work in CAA should include citation network analyses (eg, MeSH-MeSH and MeSH-Citation-MeSH networks) and the change of these networks over time, in order to unravel further levels of complexity in CAA-related research.30

Our findings demonstrate the increasingly collaborative culture in an expanding CAA field, providing optimism for the successful implementation of large international research initiatives, endorsed by the recent Fourth International CAA Conference. International research networks could have a profound effect on driving CAA biomarker research forward, facilitating both early-phase and late-phase trials.5 A similar effort within the Alzheimer's Disease Centres (Alzheimer's Disease Neuroimaging Initiative) had a disproportionately large impact in the field through collaborative publications.9


Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.


  • Contributors MS, AC and ZF conducted statistical analysis. AC took part in study concept and design, data interpretation, write up and critical revisions. ZF conducted data analysis and undertook critical revisions. DJW was involved in critical revisions. MS took part in study concept and design, data collection and analysis, write up and critical revisions.

  • Competing interests AC receives research support from the Greek State Scholarship Foundation, the Stroke Association and the British Heart Foundation. ZF reports no disclosure. DJW receives funding from the Stroke Association, the British Heart Foundation, and the Rosetrees Trust and its partners. MS receives funding from the Bio-Synergy Research Project (2013M3A9C4078138) of the Ministry of Science, ICT and Future Planning through the National Research Foundation.

  • Provenance and peer review Not commissioned; externally peer reviewed.

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.