NfL reliability across laboratories, stage-dependent diagnostic performance and matrix comparability in genetic FTD: a large GENFI study

Background Blood neurofilament light chain (NfL) is increasingly considered as a key trial biomarker in genetic frontotemporal dementia (gFTD). We aimed to facilitate the use of NfL in gFTD multicentre trials by testing its (1) reliability across labs; (2) reliability to stratify gFTD disease stages; (3) comparability between blood matrices and (4) stability across recruiting sites. Methods Comparative analysis of blood NfL levels in a large gFTD cohort (GENFI) for (1)–(4), with n=344 samples (n=148 presymptomatic, n=11 converter, n=46 symptomatic subjects, with mutations in C9orf72, GRN or MAPT; and n=139 within-family controls), each measured in three different international labs by Simoa HD-1 analyzer. Results NfL revealed an excellent consistency (intraclass correlation coefficient (ICC) 0.964) and high reliability across the three labs (maximal bias (pg/mL) in Bland-Altman analysis: 1.12±1.20). High concordance of NfL across laboratories was moreover reflected by high areas under the curve for discriminating conversion stage against the (non-converting) presymptomatic stage across all three labs. Serum and plasma NfL were largely comparable (ICC 0.967). The robustness of NfL across 13 recruiting sites was demonstrated by a linear mixed effect model. Conclusions Our results underline the suitability of blood NfL in gFTD multicentre trials, including cross-lab reliable stratification of the highly trial-relevant conversion stage, matrix comparability and cross-site robustness.


INTRODUCTION
Genetic frontotemporal dementias (gFTDs) represent a group of progressive neurodegenerative diseases characterised by a progressive decline of executive, behavioural and language functions, frequently resulting from mutations in the genes chromosome open reading frame 72 (C9orf72), progranulin (GRN) or microtubule-associated protein tau (MAPT). 1 Neurofilament light chain (NfL)-an intermediate filament that constitutes part of the neuronal cytoskeleton-is released after neuronal damage into the interstitial fluid, cerebrospinal fluid and blood.Blood-based NfL has an increasing impact as a trial biomarker in gFTD for multiple contexts of use, for example, patient stratification, [2][3][4][5] trial inclusion, 6 toxicity monitoring and treatment-response capture, 7 and has now been approved by the U.S. Food and Drug Administration as a surrogate endpoint contributing to approval of novel drugs (tofersen). 8owever, its wider use in multicentre trials-as well as in real-world clinical settings-has been questioned due to potential cross-laboratory heterogeneity in analytical approaches and blood sample matrices that might lead to different, noncomparable concentrations of blood NfL. 9 10 Leveraging a large gFTD cohort, we here aimed to facilitate the use of blood NfL in gFTD multicentre trials and real-world clinical settings by testing: (1) its reliability across laboratories, measured at different time points, by different end-user devices and kits; (2) cut-off values maximising stratification accuracy of the trial relevant gFTD disease stages (conversion stage, symptomatic stage), with cut-off values validated across labs; (3) comparability between blood matrices and (4) robustness across recruiting sites.  The consistency of NfL measurements across the three different labs was quantified by intraclass correlation coefficients (ICC; two-way mixed effect model, single measures, absolute agreement 12 ).Bland-Altman analyses 13 were used to quantify between-lab bias, defined as the mean of the differences; limits of agreement, that is, the mean of the bias±1.96times the SD of the differences; and 95% CIs for the bias with lower and upper limits of agreement.The diagnostic performance of NfL was assessed by receiver operating characteristic (ROC) analysis 14 and calculating areas under the curve (AUCs), as well as optimal operating points, that is, cutoff values (assuming a cost ratio of 1 and a pretest probability of 0.5), maximising stratification accuracy for different gFTD disease stages.The predictive value for an NfL-based disease stage stratification was addressed by calculating positive and negative likelihood ratios (LR+ and LR−). 15Linear mixed effect models were used to characterise the stability of log-transformed NfL levels across recruiting sites (with categorial factors of disease stage and genetic status, and metric covariate of age as fixed effects).

RESULTS
NfL levels showed an excellent consistency across the three labs (ICC 0.964, 95% CI lower to upper limit 0.946 to 0.974), as demonstrated by a two-way mixed effect model.Reliability of NfL levels was high and bias was low across all three labs, as shown by linear regressions and Bland-Altman analyses with a maximal bias±SD of 1.12 pg/mL±1.20 (for summary, see figure 1A).1C), with a maximal bias of 0.01±0.01(AUC±SD).For a genotype-specific analysis (C9orf72, MAPT, GRN) of NfL cross-lab reliability and disease-stage AUC, see online supplemental figures 1,2 and online supplemental tables 3,4.The disease stage-specific stratification value of NfL levelsbeyond dichotomising cut-offs-was demonstrated by LR (see figure 1D).For exemplary illustration of the individual risk prediction of being presymptomatic versus symptomatic carrier at different NfL levels by LR+ and LR− see figure 1C (NfL values from lab 1).An NfL z-value of 3 corresponded to an LR+ of 83 and an LR− of 0.5.
NfL values in serum and plasma (n=344 samples of each matrix) were largely comparable (ICC 0.967, 95% CI lower to upper 0.894 to 0.977), as calculated by a two-way mixed effect model.The median ratio serum/plasma was 0.95.

Neurodegeneration
The high robustness of NfL across 13 recruiting sites was shown by a linear mixed effect model, as the categorial variable 'recruiting site' did not explain any variance (estimate 0.001, SE 0.001, Wald-Z 1.403, significance 0.161).

DISCUSSION
Blood NfL has an increasing impact as a trial biomarker in gFTD for multiple contexts of use 5 7 and is now being increasingly acknowledged by the FDA as a surrogate endpoints in drug approval processes. 8However, its wider use in multicentre trials and real-world clinical settings is limited by lack of larger data demonstrating cross-lab reliability, cross-lab validated cutoff values and cross-lab validated comparability between blood matrices in gFTD.Leveraging a large genetic FTD, our findings show that blood NfL is a biomarker in gFTD with high reliability across labs-even if assessed at different time points, and by partly different kits (NF-Light Advantage Kit vs Neurology 4-Plex A Kit).This finding confirms and extends earlier findings showing a good cross-lab reliability of blood NfL, which so far, however, has been limited to smaller sample sets and non-gFTD cohorts. 16Given, however, that all three labs in our study still used the same type of platform (Simoa HD-1), future studies need to investigate a potential decrease in cross-lab reliability if different measurement platforms are being used for blood NfL (eg, Ella, 17 Uman, 18 Atellica 19 ).A pilot study on this showed promising results. 20eliable cut-off values of blood NfL for accurately stratifying different gFTD disease stages are key for its use as a molecular stratification marker of gFTD subjects into treatment trials. 3 5 7n particular, reliable blood-based stratification of subjects close to conversion to the symptomatic phase of the disease will be of extremely high value to identify and recruit subjects into upcoming mechanistic treatment trials tailored to prevent neurodegeneration by early intervention. 5 21Extending earlier findings on blood NfL cut-offs in gFTD, 3 our findings now indicate that these cut-off values can be provided by blood NfL for gFTD even with a high reliability across labs.In addition, they also show that NfL levels in converting carriers are already more similar to symptomatic carriers than (non-converting) presymptomatic carriers.Nevertheless, in the absence of a certified reference material, value assigned by a certified reference method, the reported cut-offs remain preliminary and prospective laboratory-specific validation remains required.
Multicentre use of blood NfL-whether in trials or real-world clinical settings-is inherently characterised by cross-centre variability in preanalytical sample handling.Our data from a large set of different sites (n=13) suggest that this variability might not exert a substantial effect on multicentre blood NfL values-even despite the fact that no strictly enforced cross-centre harmonised standard operating procedure or centralised biosampling monitoring had been employed across centres.These data corroborate blood NfL as a very stable biomarker that is resistant to most types of clinically relevant variation in preanalytical sample handling. 22Future studies with larger sample batches per centre and testing more extreme variabilities in preanalytical sample handling are warranted to further investigate and specify the limits of this cross-centre comparability.
Real-world clinical multicentre use of blood NfL moreover often faces the challenge that samples come from different blood matrices (eg, serum vs plasma). 9While our findings confirm differences in the absolute blood NfL concentrations between serum and plasma, they at the same time show a high consistency between both blood matrices, allowing comparability of both matrices.The calculated median ratio serum/plasma might be a first coarse help when comparing results derived from these different matrices.However, its use might be limited to Simoabased blood NfL measurements, and further larger in-depth studies in independent cohorts are required to confirm this factor.
Our study has several limitations.First, although leveraging the largest gFTD cohort existing so far, the sample size is partly limited by the requirement to measure each sample in three labs, leading to limited sample sizes in particular for some gFTD subcohorts (eg, converters).Second, the construct and wording of 'cut-offs' suggest a separating dichotomy where in fact a biological continuum of NfL levels and disease progression exists.
Despite these limitations, our results underline the suitability of blood NfL as a fit-for-purpose biomarker in gFTD multicentre trials.

Supplementary Figures log-transformed
Limited (BMJ) disclaims all liability and responsibility arising from any reliance Supplemental material placed on this supplemental material which has been supplied by the author(s)

Cross-lab reliability Cross-lab reliability Cross-lab disease-stage AUC Figure 1
Cross-lab reliability, cross-lab disease-stage cut-offs and likelihood ratios (LR) and blood matrix comparability in genetic FTD.(A) Reliability of blood NfL measurements in genetic FTD (gFTD) across three labs (lab 1 and 2 serum, lab 3 plasma)-linear regressions and Bland-Altman analyses of log-transformed NfL values.For detailed statistics, see online supplemental table 2. (B) Comparative across-lab analysis of ROC curves and AUC values for the condition 'presymptomatic versus symptomatic carriers'.Detailed values of AUC±SE and 95% CI are given in the Results section.(C) Reliability of AUC values across three labs-Bland-Altman analyses for all stage comparisons.For detailed statistics, see online supplemental table 2.
(D) Prediction of individual risk factors at different cut-offs for the condition 'presymptomatic versus symptomatic carriers' (age-corrected z-values, first lab) by positive (LR+) and negative (LR−).AUC, area under the curve; FTD, frontotemporal dementia; NfL, neurofilament light chain; ROC receiver operating characteristic on April 11, 2024 by guest.Protected by copyright.http://jnnp.bmj.com/J Neurol Neurosurg Psychiatry: first published as 10.1136/jnnp-2023-332464 on 19 January 2024.Downloaded from mutation carriers), and their respective first-degree relatives (ie, either presymptomatic mutation carriers or noncarriers serving as within-family controls), recruited by the international Genetic FTD Initiative (GENFI; www.genfi.org.uk) 11at 13 sites.The comparative analysis included n=344 blood samples (n=148 from presymptomatic carriers; n=11 from carriers that converted during the observation period; n=46 from symptomatic carriers; n=139 from within-family-controls; for characteristics of these subcohorts, see online supplemental table 1 that were independently measured for NfL levels by Single molecule array (Simoa; HD-1 analyzer, Quanterix, Billerica, Massachusetts, USA) in three different laboratories (lab 1: Basel, Switzerland 5 ; lab 2: Rotterdam, the Netherlands 4 ; lab 3: London, UK 2 ), using different NfL kits (Basel and Rotterdam: NF-Light Advantage Kit 103186 (V.1); London: Neurology 4-Plex A Kit 102153), according to the manufacturer's instructions.The blood matrices for NfL analysis were serum (Basel and Rotterdam) and plasma (London).Further methodological details of NfL measurements, details of the GENFI protocol, participant demographics, clinical classification of the disease stages (ie, presymptomatic carriers, converters, symptomatic carriers) as well as NfL quantification were described elsewhere. 2 4 5 11Statistical analyses SPSS for Windows V.29.0 (IBM), Sigmaplot for Windows V.15 (Inpixion, Germany) and RStudio 2022.07.2 were used for statistical analyses.NfL values were not normally distributed and therefore log-transformed.For age-corrected z-scorestaking into consideration the age-related NfL increase observed in controls-log-transformed NfL values were normalised relative to their distribution in controls.

Table 1
Receiver operating characteristic (ROC) curve analysis with areas under the curve (AUC) and optimal cut-offs for separating different gFTD stages and conditions