QuantiFERON-TB Gold In-Tube in Saudi Arabia benchmarked with other sites of the Middle East: A meta-analysis review

Introduction: Screening for Latent Tuberculosis Infections (LTBI) constitutes a key step in health surveillance programs especially among adults of high-risk groups. To our knowledge, this is the first systematic and meta-analysis review that aims to critically assess and compare the agreement of QuantiFERON-TB Gold In-Tube (QFT-GIT) and Tuberculin Skin Testing (TST) among adults of high-risk groups in Saudi Arabia and compare results with other sites of the Middle East. Methodology: Kappa estimates were meta-analyzed using random effect model and several subgroup analyzes were performed to explain overall heterogeneity. Funnel plot, Begg’s and Egger’s tests were employed to assess overall publication bias. Results: 18 studies were meta-analyzed, comprising 5070 adults of high-risk groups. Pooled kappa estimates from Saudi Arabia (κ = 0.29, 95% CI: 0.16, 0.41) showed lower rate of agreement compared to other sites of the Middle East (κ = 0.33, 95% CI: 0.25, 0.41). However, a significant level of heterogeneity (I = 96.7%, p > 0.001) were identified across collected evidence. Begg’s and Egger’s tests confirmed absence of significant publication bias in this review (p = 0.49 and p = 0.16, respectively). Conclusion: This work revealed fair to poor agreement between TST and QFT-GIT, indicating that these two tests are not interchangeable in such settings. Substantial evidence is still needed before considering the sole use of QFT-GIT as an alternative to TST in these populations. Moreover, there is an urgent need for longitudinal studies in Saudi Arabia and the Middle East to accurately assess precision of LTBI diagnosis.


Introduction
In 2016 the virulent bacillus of Mycobacterium tuberculosis (MTB) claimed the life of approximately 1.8 million people and caused a total of 10.4 million MTB infections worldwide [1].These reported numbers only show one side of Tuberculosis (TB) control challenge.The other side of the challenge is more worrying, with at least one quarter of the world population estimated to be infected with dormant MTB, a subclinical condition known as latent TB infection (LTBI) [2,3].Early detection of LTBI cases is a very critical step, especially among high-risk groups.However, early LTBI detection is remarkably hindered by the absence of gold standard diagnostic tools and limited efficiency of available diagnostic tools, developed and utilized for clinical use alone.These tools are Tuberculin Skin Test (TST) and Interferon-Gamma Release Assays (IGRAs).IGRAs comprise both QuantiFERON-TB Gold In-Tube (QFT-GIT; Cellestis Ltd, Carnegie, Australia) and T-SPOT.TB test (T-SPOT; Oxford Immunotec Inc, Abingdon, UK) [4].
Growing bodies of scientific reports and national guidelines supported QFT-GIT utilization over the sole use of TST [5][6][7].In theory, this is due to three main reasons: (I) QFT-GIT is not influenced by Bacillus Calmette-Guérin (BCG) vaccination nor by most Nontuberculous Mycobacteria (NTM) infections [8], (II) it has higher reproducibility rates, with no concerns regarding booster phenomena and the impact of over sensitization upon test repetition [9], and (III) it requires a single patient visit, with no need for a two-step protocol to increase diagnostic precision [10,11].Therefore, several countries have recommended either the complete replacement of TST with QFT-GIT [5], or the simultaneous use of both tests during screening phase where at least one type of IGRAs is used as a confirmatory test to TST [6,7].
To date, much of the published evidence supporting the utilization of IGRAs and QFT-GIT in particular, stems from original articles or meta-analysis reviews conducted in high-income and low-incidence regions [12][13][14][15].In contrast, little is known regarding the performance of QFT-GIT in intermediate-and high-TB 688 prevalence settings, such as in Saudi Arabia and other sites of the Middle East.To the best of our knowledge, this is the first systematic and meta-analysis review aiming to critically assess and compare the agreement between QFT-GIT and TST among adults of high-risk groups in Saudi Arabia, and compare it with other sites of the Middles East.The objectives of this review are as follows: (I) systematically collect data from Saudi Arabia and apply similar criteria to identify data from the Middle East for comparison purposes, (II) evaluate the overall agreement between TST and QFT-GIT (using kappa coefficient values) among adults of highrisk groups, (III) calculate the level of heterogeneity across studies, and (IV) explain the cause of heterogeneity across evidence by performing multiple subgroup analysis.

Methodology
This systematic and meta-analysis review was conducted in concordance with the PRISMA statement (Preferred Reporting Items for Systematic Reviews and Meta-Analysis) [16].Before committing into this review, we executed a pilot and preliminary literature search using "PubMed" to identify relevant citations in Saudi Arabia and other sites of the Middle East.Lists of retrieved citations were carefully analyzed, and we found that the main era of publications meeting our set of criteria and objectives in Saudi Arabia and subsequently in other sites of the Middle East is between January 2013 to March 2017 (last search was performed on March 2017).

Data sources and searches
The searching, identification and screening processes were performed independently by two different reviewers, in case of disagreement, the whole process was repeated.Articles were primarily identified via "PubMed" and "Web of Science" searching engines using the following headings "QuantiFERON", "Tuberculin Skin Test" OR "TST", "Latent Tuberculosis infection" OR "LTBI" and "Tuberculosis" OR "TB" with the insertion of a time filter starting from January 2013 to March 2017 (Figure 1).A total of 265 citations were collected, of which we identified and excluded 103 duplicates.The remaining 162 citations were then screened by titles and abstracts to assess their inclusion eligibility.The 21 studies that remained underwent full text screening, wherein studies that failed to meet the inclusion criteria were omitted.Ultimately, 18 articles from the Middle East were included in this systematic and meta-analysis review (Figure 1).In addition, the reference lists of retrieved citations were manually examined for additional relevant studies.

Study selection
All included articles in this review met the following criteria: (I) published in English from January 2013 to March 2017 in Saudi Arabia or other sites of the Middle East, (II) classified as an original research article (no case studies, editorials, letters, news or commentaries), (III) fully published (not in press or underwork) and (IV) was conducted among Hemodialysis (HD)/ Renal Failure (RF) patients or Health Care Workers (HCWs).In addition to the latter criteria, studies that utilized both QFT-GIT and TST in their methodology with clear reporting of kappa values, or at least enough information to independently calculate kappa values, were included in this metaanalysis review.

Publication bias and quality assessment
The quality of methodology was evaluated for each study using a Newcastle Ottawa scale for case control studies and a modified Newcastle Ottawa scale for cross-sectional studies.Publication bias of all collected evidence was visually inspected using a funnel plot with a trim and fill method (Figure 2).Additionally, Egger's and Begg's tests were both performed to assess the overall publication bias of meta-analyzed kappa estimates [17].A flow-chart depicting the searching strategy employed in this review during the identification, screening, eligibility and inclusion phases in concordance with the PRISMA statement (2009) [16].

Data extraction, synthesis and analysis
Data extraction was executed independently in duplicates from each study for the following variable variants: sample size, subject group, gender, mean age, purified proteins derivatives (PPD) type and dose, TST induration and study location.Furthermore, to assess the overall agreement between TST and QFT-GIT we extracted all Kappa values from each study.Kappa values were missing in several studies and were subsequently compensated for by independently calculating them.Kappa values range from 0 to 1, and were interpreted as following: κ > 0.70: strong agreement, 0.70 > κ > 0.40: good agreement, 0.40 > κ > 0.30: fair agreement and κ < 0.30: poor agreement [18].Inconsistency between kappa values was assessed using Cochran's (Q) and I 2 tests with random or fixed effect model for the inverse variance, as appropriate [19].The p value of Q test was considered significant if it goes below 0.10, as reported previously [20].Pooled Kappa values were presented in tables and forest plots.Subgroup analysis was conducted based on one or two of the following parameters: (I) origin of publication (Saudi Arabia versus other sites of the Middle East), (II) subject group (HCWs versus HD patients), (III) rate of BCG vaccination (> 90% versus < 90%) and (IV) type of PPD.All statistical tests and data analysis were conducted using R program [21].

Summary of all selected studies
A total of 265 publications were identified from different electronic databases, of which 18 of them met the above mentioned criteria and were included in this review (Figure 1): 6 from Saudi Arabia [22][23][24][25][26][27], 5 from Turkey [28][29][30][31][32], 3 from Iran [33][34][35], 2 from Egypt [36,37], 1 from Oman [38] and 1 from Qatar [39].A summary of these studies, including details of objectives, sample size, subjects' characteristics, gender, mean age, PPD type and dose, TST induration and study's location is illustrated in Table 1.Based on subjects' characteristics, we identified 11 studies among HCWs [23,25,26,28-30, 35,37-39], and 7 studies among RF patients [22,24,27,[31][32][33]36]. A total of 5070 subjects (4188 HCWs and 882 HD patients) were included in this meta-analysis review.As can be seen in Table 1, all studies enrolled subjects of both genders, with a mean age ranging from 26-60 years.Studies that were conducted among HCWs, showed a working experience ranging from 4 to 17 years.The cut-off value for TST induration was mostly > 10 mm.Type and dose of PPD varied across studies: though 2 Tuberculin Units (TU) PPD RT 23 and 2 TU PPD Razi are both accepted as bio-equivalent to 5 TU PPD-Sanofi Pasteur (PPD-S) [35,40].Studies that specified the timing for each test were limited in number but they explicitly reported that QFT-GIT was performed before dialysis sessions and prior to PPD injection.Funnel plot showed symmetrical representation of kappa estimates (Figure 2), similarly, Begg's and Egger's tests confirmed the absence of significant publication bias in included studies (Begg's test, p = 0.49; Egger's test, p = 0.16).

Assessing the overall agreements between TST and QFT-GIT and the impact of other variables
A total of 21 studies were retrieved from the Middle East, and 18 of them were eligible for meta-analysis process (Figure 1).Kappa values as well as other information were accordingly extracted, subcategorized and presented in two different tables (Tables 2 and 3).Overall, pooled kappa coefficient showed fair agreement between QFT-GIT and TST among adults of high-risk groups in the Middle East (κ = 0.33, 95% CI: 0.25, 0.41) (Table 4; Figure 3).To evaluate inconsistency of evidence, significance of Q test (p value) and I 2 test were both calculated and presented in a forest plot (Figure 3).The subtotal heterogeneity of all reported kappa values was very high (p < 0.001, I 2 = 96.7%)(Table 4).In an attempt to explain such high variability across studies, we performed several subgroup analyses based on subject groups (HCWs verses HD) or origin of publication (published from Saudi Arabia versus other sites of the Middle East).The latter analysis failed to independently explain the high variability in reported evidence (Table 4).However, analyzing studies on the basis of both parameters (origin of publication as well as subject groups) resulted into significant reduction in heterogeneity among HD patients in the Middle East (excluding Saudi Arabia) (p = 0.21, I 2 = 34.9%)(Table 4).Other subgroup analysis has been conducted with the aim to investigate impact of BCG vaccination rate and PPD type on the variability of kappa estimates.Interestingly, some reduction in heterogeneity was identified, though not significant, among studies that used PPD-S (Table 4).

Comparing the overall positivity and diagnostic precision of QFT-GIT and TST among RF patients and HCWs.
Three studies from the Middle East reported sensitivity and specificity values for TST and QFT-GIT among RF patients, two of which were published from Saudi Arabia (Table 2).To minimize number of covariants, we considered studies from Saudi Arabia for a summary on accuracy.Notably, TST showed higher pooled specificity but lower pooled sensitivity among RF patients, and the diagnostic odd ratio was slightly higher in TST compared to QFT-GIT among the same tested group (Table 2).On the other hand, the overall positivity obtained from each test among HCWs of Saudi Arabia and other sites of the Middle East are in favor of fair agreements on both groups (Table 3).

Discussion
Screening for LTBI constitutes a key step in health surveillance programs, though the strategy can differ from one country to another [6,7].In Saudi Arabia, the primary screening process for LTBI is attained using TST [41].Only in cases where TST is limited, QFT-GIT is recommended.Clearly, such heavy reliance on TST is quite concerning, since a large proportion (98%) of the population in Saudi Arabia is BCG-vaccinated [42,43].In the literature, BCG was reported with a high confounding impact on TST [44].Usually this impact diminishes within a year of infancy [44].Other studies reported increased yield of positive TST outcome in all recipients of BCG regardless of their age, yet 20% of the cases were estimated to be due to a booster phenomenon and serial-skin testing [45,46].All of these studies emphasize the misleading role of BCG vaccination during TST interpretation.Thus, in highly BCG-vaccinated populations, such as in Saudi Arabia and other sites of the Middle East, an alternative to TST might be preferred.To the best of our knowledge this is the first meta-analysis and systematic review aiming to assess and evaluate the agreement between QFT-GIT and TST among adults of high-risk groups in Saudi Arabia and compare it with other sites of the Middle East for LTBI detection.
As expected, evidence analyzed in this review comes in favor of fair to poor agreement between TST and QFT-GIT whether in Saudi Arabia or in other sites of the Middle East (Figure 3) and with no significant difference between kappa values in these two regions (Table 4).This might be attributed to comparable socioeconomic, cultural and health-related factors in these areas.Most of the considered regions have highrates of migration and movement dynamics, high prevalence of consanguinity as well as high neonatal BCG vaccination rates.In addtion, these countries might also be suffering from potentially high rates of undiagnosed NTM infections (colonizers or diseases causing agents), which could lead to low disagreement rates between the two diagnostic tests [47].Similarly, poor agreement was noted among RF patients and HCWs in several geographical settings located away from the Middle East such as India (κ = 0.04) [48], Korea (κ = 0.08) [49] and Greece (κ = 0.02) [50], but not in Twain (κ = 0.53) [51], South Africa (κ = 0.63-0.70)[52] and Switzerland (κ = 0.60) [53].Bearing in mind the differential agreement of the two tests in different geographical and economical TB-burden settings, substantial evidence is still needed before considering the sole use of QFT-GIT as an alternative to TST among high-risk groups for the detection of LTBI in the Middle East.
The overall agreement between QFT-GIT and TST among HCWs (Table 3) is higher than that of HD groups (Table 2) in both Saudi Arabia and other sites of the Middle East (Table 4).This is not a surprising finding, as certain immunological dysfunctions may inflect negatively on the agreement level between TST and QFT-GIT.To that end, a stringent and unified screening protocol is needed for LTBI screening in these regions.This is especially important among immunocompromised adults to minimize the incidence of false negative testing due to lower immune response.For instance, performing IGRAs after dialysis session could significantly compromise their level of accuracy due to lower productions of IFN-gamma [54].Likewise, some studies have proposed that diagnostic precision of IGRAs can be hampered by long durations of HD treatments among RF patients.In this review, only one study from Saudi Arabia affirmed the utility of QFT-GIT prior to dialysis session and only few specified the duration of HD among tested RF subjects.In addition, using IGRAs after PPD injection can undermine the accuracy of IGRAs; though only few studies stated the timing of performing each test in their methodology, with QFT-GIT being performed before TST.Such uncertainties may seem trivial at first, but they indeed make conclusive findings on this regard hard to obtain.
On the basis of the present review, TST revealed higher pooled specificity (90.2%) but lower sensitivity (37.8%) among RF patients when compared to QFT-GIT in Saudi Arabia, with pooled specificity of 70.5% and sensitivity of 62.4% for the latter diagnostic tool (Table 2).Superior specificity of QFT-GIT has been previously reported elsewhere which comes in agreement with its narrow immunological target compared to TST [31,55].Again, it is worth remembering that despite the variability in sensitivity and specificity values of both tests in these different settings, these values have no particular meaning in assessing the overall accuracy due to the absence of gold standard diagnostic test to compare them with.In addition, the prevalence of TB among dialysis and RF patients is already high in the Kingdom, which is largely attributed to their weak immune status.Hence, early detection of LTBI cases is highly preferred to prevent any further clinical complication.As such, patients at high risk can be offered a TB protective therapy and can be monitored closely and periodically.This is especially important in countries like Saudi Arabia, as it is burdened by high immigration and mobility rates by being an Islamic hub for Muslims from all over the World.Along with these conditions, the majority of RF and dialysis patients in genral are in particular need of an organ transplant.In such cases and to eliminate chances of organ rejection, patients with an organ transplant are offered immune-suppressive therapy.Thereby, it is highly probable that MTB dormant infections, if exist in a patient, can flare at any time and turn into fully active disease.Early detection of LTBI (and offering TB preventative therapy) to patients of this group can optimize health care level and treatment outcome.This may ultimately increase patients' chances of survival and reduce rates of TB infections and transmission.
With the shortcomings of TST and QFT-GIT, nowadays much hope has been placed in the fourth version of QuantiFERON, known as QuantiFERON-TB Gold Plus, as a diagnostic tool for LTBI.While few studies have started testing QuantiFERON-TB Gold Plus in several geographical locations, no such studies have been performed in the Middle East.This along with our analyzed data urges for a deeper investigation to test the efficiency of such tools among high-risk groups that could drastically benefit from early screening of LTBI such as kidney failure and organ transplant patients.
In this review, we identified a very high heterogeneity level across studies from the Middle East and from Saudi Arabia (I 2 = 96.7%,p > 0.001) (Figure 3).In an attempt to explain such heterogeneity, studies were subcategorized into two main groups based on (I) subject characteristics (HCWs and RF patients) and (II) origin of publication.Interestingly, no significant heterogeneity was found when both of the latter parameters were considered simultaneously (Table 4).On the other hand, analyzing studies on the basis of both parameters (origin of publication as well as subject characterisitcs) resulted in significant reduction in heterogeneity among HD patients in the Middle East during the detection of LTBI cases (excluding Saudi Arabia) (p = 0.21, I 2 = 34.9%)(Table 2).
We also tested the significance of statistical differences existing between heterogeneity values as stratified on the basis of several other factors in this review.For instance, it has been suggested before, that different sources of PPD (such as RT 23 and Sanofi Pasteur) can induce variable responses on skin testing.Wherein, the probability of having a positive result with PPD RT 23 might be higher than that of PPD-S [56], others have suggested that PPD-S causes a larger TST induration size [57].Nonetheless, we found no significant differences between kappa estimates across studies that utilized PPD RT 23 compared to those with PPD-S [40] (Table 4).Moreover, according to a group of researchers in the Middle East, the agreement between TST and QFT-GIT can be remarkably influenced by BCG vaccination rates, wherein high agreement was specified among RF patients with low vaccination rates compared to RF patients with high vaccination rates [54].On the contrary to the latter finding, we found no significant difference between kappa estimates obtained from subjects with high vaccination rates compared to subjects with low vaccination rates (Table 4).Similarly, subgroup analysis based on BCG vaccination rate could not justify the observed heterogeneity across evidence.
Comparing one study to another, whether published from the Middle East or not, remains a very complicated process.This is largely due to the absence of a gold standard diagnostic tool when it comes to LTBI detection and the huge number of variations that exist between published studies.These variable aspects include, but not limited to, population heterogeneity, interpretation criteria, study design and variable sets of objectives.Our systematic and meta-analysis review applies a very stringent inclusion criterion in an attempt to minimize the impact of several co-variants across studies, such as publication date, location, study type and statues.We also tried to address the impact of several other factors in an attempt to explain the heterogeneity level such as the impact of subject characteristics, BCG rate, PPD type, risk of bias and location of retrieved citations.On the other hand, several variables of interest such as work seniority, TBburden, economical statues, duration of HD treatment and subjects' age might have contributed to the overall heterogeneity but unfortunately they could not be tested due to different study design, sampling power, study outcomes and data presentation.In addition, the chosen time period might be short, however, it represents the main era of publication for LTBI screening in Saudi Arabia and other sites of the Middle East (as concluded after the initial literature search).For comparison reasons, we also applied the same criteria to identify comparable citations in other sites of the Middle East.

Conclusion
This meta-analysis review had the aim of assessing and comparing the agreement of both tests among adults of high risk groups in Saudi Arabia and compare it with other sites of the Middle East.Collected evidence in this review comes in favor of fair to poor agreement between TST and QFT-GIT, indicating that these two tests are not interchangeable in such settings.Also, significant heterogeneity was detected across all collected evidence.This, to some extents, might be due to variations in type of PPD and neonatal vaccination rates.There is an urgent need for longitudinal studies in Saudi Arabia and the Middle East to accurately assess the precision of LTBI diagnosis.

Figure 1 .
Figure 1.Flow-chart of study selection strategy as employed in this review.

Figure 3 .
Figure 3. Meta-analysis and forest plot of kappa values in the Middle East.

Table 1 .
Details of Included Studies.
A summary of the following variables was listed for each retrieved citation: Study's ID and objectives, subjects' gender and age, PPD dose and type, BCG rate and study's location.Cut-off value for QFT-GIT is > 0.35 International Units/mL.Abbreviations used in this table are as the followings: BCG: Bacillus Calmette-Guerin, CRF: Chronic Renal Failure patients, HCWs: Health Care Workers, HD: Hemodialysis, IP-10: Inducible Protein 10, LTBI: Latent Tuberculosis Infection, NA: Not Available, PD: Peritoneal Dialysis, PPD: Purified Protein Derivatives, QFT-GIT: QuantiFERON-TB Gold In-Tube, RF: Renal Failure, TB: Tuberculosis, TST: Tuberculin Skin Testing and TU: Tuberculin Units.

Table 2 .
Performance of QFT-GIT versus TST among HD patients in the Middle East from 2013-2017.

Table 3 .
QFT-GIT versus TST among HCWs in the Middle East from 2013-2017.

Table 4 .
Summary of all subgroup analysis and their statistical significance.

%) p value HCWs in Saudi Arabia I 2 (%) p value HCWs in other sites of the Middle East I 2 (%) p value t. test p value 95% CI
values > 0.05 corresponds with significant differences between means of Kappa estimates in group A and B; I 2 values were calculated using random effect model apart from those labeled with a star sign (*); Abbreviations used in this table are as follows: BCG: Bacillus Calmette-Guérin, HCWs: Health Care Workers, RF: Renal Failure patients, TU: Tuberculin Unites. p