The accuracy of tuberculous meningitis diagnostic tests using Bayesian latent class analysis

Introduction: Tuberculous meningitis (TBM) is the most dangerous form of tuberculosis with high mortality and disability rates. However, the delayed diagnostic process is often due to the absence of the gold standard tests leading to a lack of information about the sensitivity and specificity of diagnostic tests. This study aims to estimate the prevalence of TBM and determine the performance of four diagnostic procedures: the mycobacteria growth culture test, Gene Xpert assay, and analysis of protein levels and leukocyte count taken from cerebrospinal fluid. Methodology: We used a Bayesian latent class analysis to estimate the prevalence of TBM with 95% credible interval (CI), and the specificity and sensitivity of the four diagnostic procedures. The area under the receiver operating characteristic curve (AUC) of the cerebrospinal protein levels and leukocyte count were also compared and estimated using different thresholds. Results: A total of 1,213 patients suspected of having TBM were included. The estimated TBM prevalence was 34.8 % (95% CI: 28.8 – 41.3). The sensitivity of culture test and Gene Xpert assay was 62.7% (95% CI: 52.5 – 74.0), and 57.5% (95% CI: 51.0 – 64.0), and the specificity of Gene-Xpert was 95. 9% (95% CI: 92.0 – 99.8). The AUC for leukocyte count was 76.0%, and for protein level was 73.4%. Conclusions: This study provided better information about the performance of four routine diagnostic tests and the prevalence of TBM which can enhance disease control and improve treatment outcomes.


Introduction
Tuberculosis (TB) is a leading global health concern. According to the Tuberculosis Global Report 2017 published by the World Health Organization (WHO), it is one of the ten leading causes of death worldwide. In 2016, 10.4 million TB incident cases were reported with more than half occurring in South East Asia and Western Pacific Regions [1].
Tuberculous meningitis (TBM) is the most lethal manifestation of TB due to its impact on the nervous system [2]. The mortality rate of TBM was reported to be nearly 20% in the first month of treatment and also left 50% of patients living with severe neurological sequelae [3]. As it is a heterogeneous neurological disease with complicated and non-specific symptoms, there is no single test or symptom to confirm TBM. Hence, the diagnostic process is often delayed due to a lack of information about the accuracy of diagnostic tests and clinical symptoms.
Commonly, the diagnostic procedures are based on composite reference standards, which depended on clinical symptoms and laboratory features. Among them, the most common diagnostic compositions are the Thwaites diagnostic score [4] and the Lancet score [5]. Studies have found that clinical scoring systems have high sensitivity but low specificity when used in non-Human Immunodeficiency virus (HIV) groups [6,7].
Cerebrospinal fluid culture test has been considered as the gold standard for diagnosis and also has essential roles in antimicrobial susceptibility testing. However, the laboratory analysis is time-consuming (4-6 weeks for negative results), and the sensitivity is moderate. A study from Indonesia reported that the sensitivity of the culture test was less than 50% when compared with clinical diagnosis [8].
In 2012, the Gene-Xpert MTB/RIF assay (Cepheid, USA), a nucleic-based test, was recommended by WHO as an initial diagnostic test for diagnosing extrapulmonary tuberculosis, especially TBM. The test has a sensitivity and specificity of 59.3% and 99.5%, respectively [9]. However, most of these studies considered the culture test as the gold standard, leading to imperfect reference bias towards low sensitivity, a commonly encountered problem in the field of diagnostic studies. These problems have been discussed in a meta-analysis [10].
Hence, more information and knowledge about the estimation of the sensitivity and specificity of TBM diagnostic tests are crucial to have prompt diagnostic algorithms and suitable treatments, which can decrease mortality and improve patient outcomes [11]. As there has been no report on the sensitivity and specificity of laboratory tests in TBM in which the gold standard using biopsy and culture test is not ethically in all cases, especially in cases with negative test results. The main objectives of this study were to describe the characteristics and the laboratory features of TBM patients and evaluating the sensitivity and specificity of the TBM diagnostic tests, and the prevalence of TBM disease in Vietnam using the Bayesian latent class analysis approach. The main objectives of this study were to describe the characteristics and the laboratory features of TBM patients and evaluate the sensitivity and specificity of the TBM diagnostic tests. Thus, the prevalence of TBM in Northern Vietnam could be estimated with confidence. Furthermore, the results of this study would potentially give better and more precise understanding and information for the development of an appropriate diagnostic algorithm and would improve the treatment outcomes.

Study design
We conducted a cross-sectional observational study conducted from November 2016 to April 2019.

Study setting
We collected data from patients treated at Vietnam National Lung Hospital, the largest hospital for tuberculosis and respiratory diseases in Northern Vietnam. All patients suspected of having tuberculosis meningitis (TBM) who had cerebrospinal fluid collected by lumbar puncture immediately after hospital admission and before anti-biotic usage were included. The quality of the specimens was controlled carefully and followed the standard guidelines. We tried to do the lumbar puncture as soon as possible, frequently right after the patient's hospital admission and before drug usage. All the specimens were transferred to the microbiology department immediately after the procedure and examined by experienced technicians. A total of 1ml to 3 ml of cerebrospinal fluid (CSF) was collected. In several cases with ample CSF, we centrifuged CSF to increase positive result. The specimens were analyzed by experienced technicians to ensure that the quality of the specimens and the test result were reliable. The samples were analyzed for chemical composition, leukocyte count, Xpert test, and Mycobacteria Growth Indicator Tube (MGIT) culture test. The culture and Xpert tests were done at the Tuberculosis National Reference Laboratory. Both of the diagnostic tests were performed based on the manufacturer's instructions. As has been pointed out in previous study, the experiences of laboratory technicians were essential factors for the result of the culture test [12]. We understand the importance of this issue in diagnosing infectious diseases, especially in severe disease such as tuberculous meningitis. In our hospital routine procedure, we try to minimize the biases by following standard procedures with experienced technicians, and also providing regular training for the laboratory workers. All the specimen tests were performed at the National Reference laboratory of Vietnam, where thousands of tests are done annually. The laboratory is the reference laboratory for the whole country and is responsible for quality assurance monitoring and providing technical support to all tuberculosis (TB) laboratories in Vietnam. In 2017, this laboratory analyzed nearly 7,000 samples using Xpert tests and achieved the ISO 17043 qualification.

The tests under evaluation
According to WHO guidelines for extrapulmonary tuberculosis diagnosis and treatment from 2012, the MGIT culture and the Xpert tests were added in the Vietnamese standard protocol for diagnosis of TBM. The Xpert test was proved to be a reliable test based on its rapid and automatic characteristics, and a positive result of the MGIT culture test is considered crucial for TBM confirmation. We considered 2 more diagnostic parameters: total cell counts and the chemical analysis of the diagnostic tests. Based on the meta-analysis of Marais [5], CSF cytology and chemical parameters play significant roles in detecting the existence of TBM with approximate cut-off values. The cut-off values of these parameters were 1g/L for protein level and 500 cells/mm 3 for leukocyte count. We also examined a range of thresholds to see if it resulted in better interpretations of the clinical outcomes.
The chemical parameters included glucose, protein, and chloride levels. According to a previous study [5], protein and glucose levels were also used as diagnostic parameters for TBM diagnosis.
In this study, we used the protein level obtained from CSF as a chemical diagnostic marker of TBM. According to previous research, the average protein concentration level in cerebrospinal fluid is less than 0.45 g/Lt [5,13]. However, a cut-off value for protein level in diagnosing TBM in the previous study ranged from 0.4g/Lt to 2g/Lt. We, therefore, fit the models with 4 thresholds (0.4g/Lt, 1g/Lt, 2g/Lt, and 3g/Lt). Similarly, based on a previous study, the cut-off values for total cell counts in cerebrospinal fluid ranged from 10 cells/mm 3 to 500 cells/mm 3 [5]. In this study, we chose the cut-off value for the leukocyte counts as 10 cells/mm 3 . Moreover, we used a range of cut-off values for the cell counts: 10, 20, 500, and 1000 cells/mm 3 in 4 models.
For the Xpert and culture test, all the results which reported the presence of Mycobacterium tuberculosis in CSF specimens were defined as positive regardless of the bacteria loads.

Statistical analysis
Statistical software Data were analyzed using R software version 3.5.2 (Foundation for Statistical Computing, Vienna, Austria). Descriptive statistics were used to summarize the characteristics of patients using the mean and standard deviation or median and interquartile range for continuous variables, and frequencies and percentages for categorical variables.
Bayesian latent class analysis (BLCA) was used to estimate the performance of diagnostic tests and the prevalence of TBM. WinBUGS software version 1.4.3 (Medical Research Council Biostatistics Unit, United Kingdom) was used [14].

Bayesian latent class analysis
The principles of applying latent class analysis in evaluating the performance of diagnostic tests without a gold standard were firstly introduced by Hui-Walter [15] and have been developed to use with the Bayesian framework in other studies [16][17][18]. In the past decade, Bayesian latent class analysis (BLCA), used to assess the performance of diagnostic tests in the fields of infectious disease and cancer, has become increasingly popular. The rationale to use this method has been wellexplained in studies by Limmathurotsakul et al. [19] and Dendukuri [17].
With this statistical method, disease prevalence can be determined without consideration of any diagnostic test as the reference standard. BLCA has been successfully used to investigate the performance of diagnostic tests for patients with typhoid fever [20] and childhood tuberculosis [21]. In 2013, a web-based application for Bayesian using R and WinBUGS programs was developed to help researchers deal with complex data sets [18]. In 2015, the Standards for Reporting of Diagnostic accuracy studies that use Bayesian Latent Class Models (STARD-BLCM) checklist has been introduced for checking the quality of diagnostic test studies with BLCM.
Briefly, instead of considering a perfect standard test truly existed, we hypothesized that all the tests were imperfect, then we assumed that our observed data, which were the combination of test results, arose from a mixture of true TBM (+) and TBM (-) groups. This assumption helped us to calculate the sensitivity and specificity of every single test.

Model definition
In BLCA, there are 3 components: the prior distribution, observer data, and posterior distribution. While the prior distribution is often based on the experience of experts or previous knowledge, the data set we analyzed is combinations of dichotomous test results. With k diagnostic tests, the number of combinations is 2 k . The combinations follow a multinomial distribution, with the likelihood being the multinomial probability of each observed combination [16]. From prior distributions and observed data, the estimation for distributions of the prevalence of a disease, specificity, and sensitivity of each test can be calculated from the simulating process using the Markov chain Monte Carlo (MCMC) method.
The method of constructing prior distributions using experts' knowledge was mentioned in a previous publication [22]. In this study, we used the informative prior distribution only for the specificity of the culture test, due to its specificity being widely accepted as perfect [8,21], and we assumed that the specificity of the culture test was near perfect (99%). For other parameters, non-informative prior distributions were used. Moreover, we also analyzed with prior distributions from 2 experts at Vietnam National Lung Hospital who had more than 20 years of experience working in TB fields, including TBM. The experts' panel included the head of the General Medical Department and the head of the National Tuberculosis Reference Laboratory. We used their personal beliefs about the prevalence of disease, the sensitivity, and specificity of MGIT culture test, Xpert test, protein levels, leukocyte count at the cut-off point 1g/L for protein level and 500 cells/mm 3 for leukocyte count based on the well-known diagnostic criteria, the Lancet scoring system [5]. They were asked to elicit the most probable value or "best guess" (mode) and their certainty about the maximum values (95 th percentile). The expert elicitations then were transformed into beta distribution, Beta (a, b). For cut-off values of protein level and leukocyte counts, which are not mentioned in the Lancet scoring system, un-informative prior distribution was used due to the lack of previous information.
Since all 4 diagnostic tests were based on different principles, we assumed that all the diagnostic tests were independent (each test result was not influenced by other tests). The models were built with method described in previous studies [17,18].

Model fitting
Data were fitted based on Gibbs sampling by MCMC methods. For the model fitting, each model had passed a fitting of 3 simultaneous chains. A burn-in process of 2000 iterations was used for every chain. The simulation process was run with 20,000 iterations. The appropriate convergence was examined using a Gellman-Rubin-Brooks plot. Model fitting was measured using the deviation information criteria. The procedures were repeated for the various thresholds of protein concentration and leukocyte count. We calculated the area under the receiver operating characteristic curve (AUC) for each threshold of protein level based on the leukocyte count and repeated the progress for the leukocyte counts.
Identifiability and sensitivity analysis A lack of identifiability happens when the number of degrees of freedom is less than the number of estimated parameters from the model. In this study, based on previous findings [23], at least 3 diagnostic tests are needed to ensure the model is identifiable. In these studies, we used the models for 4 diagnostic tests in a single population to avoid the non-identifiable problem.
The sensitivity analysis was performed to ensure the robustness of BLCA by running parallel models with different prior distributions from our chosen prior distribution and 2 expert elicitations distribution.
To enhance the quality of the study in the Bayesian approach method, STARD-BLCM was used [24].

Missing data
We used a multiple imputation method to impute missing data based on predictive mean matching [25]. Separate datasets were run parallel and compared together to check for the reliability of the imputed data sets.

Ethical approval
Ethical approval was obtained from the Human Research Ethics Committee, Faculty of Medicine, Prince of Songkla University, Thailand (project number 61236181). The study was performed based on the principles stated in the Declaration of Helsinki.

Results
A total of 1,213 suspected TBM patients were included, 140 patients from the prospective part, and 993 patients from the retrospective part. There were 785 males (64.7%) and 428 females (35.3%). Table 1 shows a summary of the study patients and their laboratory parameters. The median age was 51.

Bayesian latent class analysis
The complete set of prior distributions based on two-expert elicitations are shown in Table 2. Since the experts worked in different fields, their beliefs were inconsistent in several parameters. However, both experts believed that the culture test had a specificity of at least 99%.   Table 2. Although we ran the models with multiple prior distributions in parallel, the estimation of the posterior distributions differed slightly. The TBM prevalence was estimated at 34.8% (95% CI: 28.8 -41.3). The sensitivity, specificity, negative predictive value, and positive predictive value of the four diagnostic tests are presented in Table 3.
The Latent class analysis (LCA)-derived sensitivity and specificity of protein concentration level and CSF leukocyte count through models with different thresholds are shown in Table 4. Briefly, the AUC for the leukocyte count was higher compared with the protein level (76% versus 73%), which indicated that the leukocyte count has better diagnostic precision than protein concentration. The receiver operating characteristic curves for both diagnostic parameters are illustrated in Figure 1.

Discussion
In this study, we evaluated the sensitivity and specificity of the MGIT culture test, Gene Xpert Mycobacterium tuberculosis (MTB)/resistance to rifampin (RIF) test, and the protein level and leukocyte count obtained from CSF in diagnosing TBM, and also the TBM prevalence. The study was performed using BLCA to avoid the assumption of any perfect references. This Bayesian approach was based on the application of latent class analysis for diagnostic tests introduced by previous studies [15,26]. This statistical method can assess the performance of a diagnostic test when a gold standard is unavailable or unfeasible [16,27]. By focusing on the combinations of test results rather than considering a single test as a reference, the Bayesian implementations help reduce biases in evaluating the performance of tests in the absence of a gold standard [10]. Moreover, the Bayesian framework can strengthen knowledge and information through the combination of prior knowledge of experts or from a systematic review [16,17]. Throughout the simulation process, the method can provide the most likely distribution for the performance of diagnostic tests and for disease prevalence, which is a crucial requirement for disease control and treatment. The diagnostic process for TBM is still a big challenge since the disease is severe, with a high mortality rate. Most previous studies used the culture test or clinical examination scoring to evaluate performance of the diagnostic process. Nonetheless, both methods have limitations. The results of the culture test for Mycobacterium tuberculosis depends on various settings such as the use of antibiotics, bacterial loads, the quality of specimens, and technician experience level. On the other hand, the composite reference standards are often based on personal perspectives of local clinicians and can be biased because of non-specific symptoms [12].
Furthermore, some studies were designed with the confirmed TBM population or particular settings that could not afford complete interpretations for the general clinical situations. Likewise, many studies lacked autopsy to confirm TBM. Hence, the performance of diagnostic tests and other clinical features was still not clearly understood [28].
While MGIT culture test was set as the gold standard in many previous studies, its accuracy is not 100%. Typically, when comparing with clinical diagnosis, the sensitivity of the test ranges from 31.8% to 48.8% [8,27,28]. Although our results confirm that the test is not 100% accurate, the estimates of the sensitivity of the test were higher than those reported in previous studies using clinical symptoms as a reference [8]. These differences can be explained by the inconsistency of examination results among clinicians based on the non-specific symptoms. Another advantage of the culture test is its ability to isolate Mycobacterium tuberculosis and distinguish nontuberculous mycobacteria. In our study, we found that  [29,30]. Further studies should be aware of these bacteria among the classical causal agent. From a systematic review, the pooled sensitivity of Gene Xpert MTB/RIF assay when compared with culture test was 79.5% (95% CI: 62.0% -90.2%), and the specificity ranged from 96% -100% based on a 55% prevalence of clinical symptoms [31]. There have been several disagreements about the performance of the Gene Xpert assay when compared with the culture [32][33][34]. Interestingly, the estimate of the sensitivity of the Xpert test in our findings was higher than in earlier studies. Such disagreement could be explained by the difference in study settings, sample sizes, and statistical methods. For instance, when using the culture test as the gold standard, a positive result of the Xpert assay when the culture test was negative could be considered as a false positive [28]. The main advantages of the Xpert test are speed (results are available within 2 hours), and its ability to detect remaining DNA from unviable microbial cells. In this study, the specificity of Xpert test and other parameters were found to be lower than culture test. Nonetheless, such differences between these values could be explained by study design and statistical methodology. Bayesian statistics combine current state of knowledge with observed data to obtain posterior probabilities describing an event. The posterior inferences were based on observing data and prior knowledge. For the culture test, prior knowledge from the literature review and expert elicitation suggested a specificity around 99% to 100%. On the other hand, the prior distributions for the Xpert test were varied. However, after running the models with different prior values, this diagnostic test had nearly perfect specificity value (range from 92 -95%), which indicated reliability of its use when diagnosing TBM compared with culture test. With its high specificity (nearly 100%) and sensitivity similar to the culture test, we recommend using the Xpert assay to aid the diagnostic process while waiting for the results of the culture test. However, a combination of the Xpert assay and clinical information with other CSF analysis parameters is preferred in place of using the test alone to rule out TBM due to the moderate negative predictive ability [25]. Further studies with the new Xpert MTB/RIF Ultra assay should be considered [35].
We also estimated the performances of 2 standard CSF parameters, the protein concentration levels, and leukocyte count. Both had an excellent ability to distinguish TBM patients. Our findings were consistent with other previous studies in the moderate performance of these parameters [36,37]. As they are routine parameters for CSF analysis, leukocyte counts, and protein level are still useful indicators for TBM diagnosis. Similar to the previous study [4], we suggest using CSF cell counts as an indicator to diagnose TBM. This is the first study to estimate the performance of the culture test, Xpert, CSF protein levels, and leukocyte counts in the diagnosis of the TBM without any reference standards. By using the statistical approach, we expected to broaden disease knowledge and information, which could be helpful information for clinicians dealing with TBM in high burden TB countries. The disease prevalence and the performance of routine diagnostic tests have essential implications for supporting clinician decisions and improving treatment outcomes. Through the assessment of chemicals and cytology parameters with different thresholds, our study provides a unique insight into how these parameters contribute to TBM diagnosis. Our study also aims to encourage the use of Bayesian methods in future research on TBM diseases. Other studies using this method in different populations and study settings would provide a better understanding of the disease worldwide. We hope our findings incorporated with those of other future studies on the use of Xpert-Ultra or other new diagnostic tests [35,38] will fulfill the diagnostic and treatment algorithm for TBM.
Our study had some limitations as it was conducted in a retrospective observational fashion. Due to study design, we had limited data for the CSF glucose levels and for the CSF/blood glucose ratio, which would be an important predictor for TBM diagnosis. In our study, the CSF glucose levels had a median of 3.28 (2.42 -4.07) mmol/l or 59 (43.6 -73.3) mg/dl, which was higher than in previous studies. However, problems about the quality of the retrospective data arose. As a national tertiary hospital, there were several cases referred from other secondary hospitals, where patients would have been treated with IV fluid or corticosteroids therapy, which could affect glucose levels. The use of CSF/Glucose ratio, instead of CSF glucose alone, should be more concise in diagnosing TBM. This problem should be addressed and considered with care in future studies.
Another limitation is the quality of the specimens. Although we followed careful guidelines and standard procedure for collecting and examining specimens, there were still limitations in quality control of the procedure and the follow-up process. Prospective studies with a plan to control the quality of specimen at the beginning and cohort studies that follow clinical course and patient outcome should be considered. A post-hoc analysis could have better validity and would enhance a more in-depth understanding of disease course and outcome. Nonetheless, this study still can provide useful information due to its large sample size and specific statistical modeling.
Future studies should be aware of several issues for the optimal use of BLCA. Firstly, it is crucial to have an appropriate prior hypothesis to increase the reliability of a Bayesian analysis. Better knowledge and experiences from experts can help improve the elicitation of prior distribution, which can help refine and solidify the final results. Although we used local experts' elicitations, the knowledge of experts can be affected by the national prevalence and epidemiology setting. A pooled experience from experts all over the world should be preferred. Secondly, in this current study, we assumed that all the diagnostic tests were independent due to their laboratory principles. In the future, other points of view, such as examining correlation assumptions and additional random effects, could be optimized to make a better fit model.

Conclusion
In summary, our findings indicated the Xpert test had sensitivity similar to the MGIT culture test and near-perfect specificity. Xpert test can be used as the primary test to combine with clinical information to achieve fast diagnosis and adequate treatment. Culture test can be used to confirm the result of Xpert test. The leukocyte count had higher AUC than that of the protein level. Among suspected patients, the prevalence of TBM patients was moderated.