Background Extracting data from healthcare records is essential for clinical and research purposes but can be labour and time intensive. We developed a natural language processing (NLP) application to automatically extract meaningful data from routinely-generated multiple sclerosis (MS) clinic letters.
Method We developed the system using the open source platform GATE (General Architecture for Text Engineering) and a training set of 100 manually annotated MS clinic letters. The system extracts information from each clinic letter including: MS diagnosis and type, Extended Disability Status Scale (EDSS) score, current and previous Disease Modifying Therapies (DMT), walking distance, and MRI information.
For initial validation, we used 250 MS clinic letters. We compared the systems performance in extracting MS diagnosis, EDSS score and current and previous DMTs with human annotation. We recorded precision (proportion of extracted items that are accurate), recall (proportion of items that are extracted) and F1-score (harmonic mean of precision and recall).
Results (see poster)
Conclusion NLP can be used to automatically extract specific information from MS clinic letters and has the potential to transform clinical practice and research at a large scale.
Statistics from Altmetric.com
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.