Microarray analysis of virulence gene profiles in Salmonella serovars from food / food animal environment

Introduction: Rapid, accurate and inexpensive analysis of the disease-causing potential of foodborne pathogens is an important consideration in food safety and biodefense, particularly in developing countries. The objective of this study is to demonstrate the use of a robust and inexpensive microarray platform to assay the virulence gene profiles in Salmonella from food and/or the food animal environment, and then use ArrayTrack for data analysis. Methodology: The spotted array consisted of 69 selected Salmonella-specific virulence gene probes (65bp each). These probes were printed on poly-L-lysine-coated slides. Genomic DNA was digested with Sau3AI, labeled with Cy3 dye, hybridized to the gene probes, and the images were captured and analyzed by GenePix 4000B and ArrayTrack, a free software developed by Food and Drug Administration (FDA) researchers. Results: Nearly 58% of the virulence-associated genes tested were present in all Salmonella strains tested. In general, genes belonging to inv, pip, prg, sic, sip, spa or ttr families were detected in more than 90% of the isolates, while the iacP, avrA, invH, rhuM, sirA, sopB, sopE or sugR genes were detected in 40 to 80% of the isolates. The gene variability was independent of the Salmonella serotype. Conclusions: This hybridization array presents an accurate and cost-effective method for evaluating the disease-causing potential of Salmonella in outbreak investigations by targeting a selective set of Salmonella-associated virulence genes.


Introduction
It has been estimated that approximately 1.3 billion cases of salmonellosis occur worldwide each year [1].Salmonella infections can largely depend on the immune status of the host and virulence factors such as Salmonella pathogenicity islands (SPIs), plasmids, toxins, fimbriae and flagella [2,3].The nature of infections caused by Salmonella in humans and animals could depend on a variety of bacterial and host factors, and their complex interactive environment [4].Multiple SPIs encode type III secretion systems that transport bacterial proteins (SptP and SopE) into the cytosol of the target host cells facilitating the uptake of the bacterium by the host cells [5][6][7].The SPI genes are also essential for Salmonella to proliferate within host cells and cause systemic infections [8,9].They also encode the mgtBC operon and are required for intra-macrophage survival.Salmonella serovars also harbor plasmids that contain virulence-associated genes, such as spvRABCD [10].In addition, the toxin-encoding genes, fimbria-encoding genes, and the flagella-encoding gene system play diverse roles in Salmonella pathogenesis [11].The genomic reservoir of Salmonella species contains horizontally transferred genetic elements, including some virulence genes that may play roles in pathogenicity and disease development [12].The characterization of virulence-associated genes is important in identifying Salmonella pathogenicity, understanding the potential transfer mechanisms, and developing an efficient detection method in epidemic disease control.
DNA microarrays have demonstrated great potential for analysis of gene expression, genotyping, pathway analysis, monitoring changes in genomic DNA, and host-pathogen interaction [7,12,13].Microarray techniques have been useful in highthroughput genetic profiling of pathogenic microorganisms [14,15].This technology can also detect the presence or absence of thousands of genes simultaneously by a single genomic hybridization step [16,17].Spotted DNA microarray platforms can be cost-effective and easy to reproduce in a laboratory setting with basic infrastructure [18].Furthermore, interpretation of microarray data is easier to automate and standardize than that of gelbased technologies [19].The objective of this study was to demonstrate the usefulness of a robust microarray platform to detect the virulenceassociated gene profiles in different Salmonella serovars from food and/or the food animal environment, and the use of ArrayTrack TM , a free software developed by researchers at the Food and Drug Administration (FDA) for data analysis.

Methodology
Bacterial strains and DNA preparation A total of 24 Salmonella enterica isolates were analyzed in this study (Table 1).S. enterica subspecies enterica serovar Enteritidis (S.Enteritidis) strain1 and S. Heidelberg strain 12 were obtained from the FDA's Office of Regulatory Affairs culture collection and the outbreak isolates were obtained from the FDA's Center for Food Safety and Applied Nutrition's investigation of the Roma tomato outbreak.These isolates were obtained as pure cultures for this study.Salmonella isolates from turkey farms were part of our previous study [20].Microarray validation was conducted using S. Typhimurium ATCC 14028 as the positive control strain for the probes used in this study and S. Typhimurium VV302 (∆hilA-523) and S. Typhimurium SVM725 (∆invF) as the negative controls.The bacterial genomic DNA was isolated from a freshly grown (18 to 24 hours) bacterial culture using a Wizard Genomic DNA Purification kit (Promega Corporation, Madison, WI, USA) and quantified by measuring the absorbance at 260 nm (NanoDrop ND-1000 V3.3 spectrophotometer, NanoDrop Technologies, Wilmington, DE, USA) for microarray and PCR experiments.

Selection of target genes and probe design
Sixty-nine gene-specific oligonucleotide probes (Operon Technologies Alameda, CA, USA) were designed based on the open reading frame sequences of Salmonella pathogenicity islands (SPI-1 to SPI-5) virulence-associated genes [9] from the National Center for Biotechnology Information (NCBI) GenBank (Bethesda, MD, USA) and the Array Designer software (Premier Biosoft International, Palo Alto, CA, USA), using Salmonella enterica Typhimurium LT2 genome NC_003197 as the reference genome.To increase the efficacy of our microarray chip, two additional probes, celB (5'-GGGGATCCAGCTGAAT GGACAGGTGGTGATCAAGAAGGTTGGAATTT CGTCAATGAAATG-3') and celF (5'-TGGTGCCGGTAGTCTTAACACTTACAAGGGT TATGTTGACAACATTTCTAGAACTATTCG-3') (MWG Biotech AG, Ebersberg, Germany) were added as the negative gene controls.The celB and celF were designed from the cellulase encoding gene (Accession number: U57818) and the 1,4-beta-Dglucan-cellobiohydrolase encoding gene (Accession number: U97154) from the fungi Orpinomyces sp.PC-2, respectively [21].These probes exhibited no homology to the Salmonella genome sequences in the GenBank database.

Microarray printing and processing
Salmonella oligonucleotide probes (50 µM each) were printed on poly-L-lysine-coated slides (Erie Scientific, Portsmouth, NH, USA) using SMP3 printing pins (TeleChem International, Sunnyvale, CA, USA) on an OmniGrid TM 100 Microarrayer (GeneMachines, San Carlos, CA, USA).Printed slides were baked at 80°C for 1 hour and UV crosslinked (UV Stratalinker 2400, Stratagene, La Jolla, CA).The slides were then treated with a blocking solution of 3X SSC (1X solution is 150 mM sodium chloride and 15 mM sodium citrate, pH 7.0), 0.1% SDS (sodium dodecyl sulfate) and 1% BSA (bovine serum albumin) using gentle agitation for 5 minutes at 50°C and washed with Milli-Q water four times for 1 minute each at room temperature.The slides were placed in boiling Milli-Q water for 2 minutes, followed by a 1 minute wash in ethanol at room temperature.The slides were dried in a microarray high speed centrifuge (TeleChem International, Sunnyvale, CA, USA) and stored at room temperature with low humidity until used.
Eight identical arrays were printed on each microarray chip for simultaneous duplicate analysis of four different bacterial isolates.Each virulenceassociated gene was represented by two identical probes in different locations within the array to minimize systematic errors.

Fluorescence labeling of genomic DNA, microarray hybridization and slide scanning
Genomic DNA from each test isolate was digested with Sau3AI (Promega, Madison, WI, USA) at 37 o C for 2 hours.The digested DNA was precipitated and chemically labeled using the Micromax Cy3 labeling dye (Perkin-Elmer Life Science, Inc., Boston, MA, USA) according to the manufacture's protocol.The labeled DNA was purified using PCR purification kit (Qiagen Sciences, MA, USA) and resuspended in 1X hybridization buffer (5X Denhardt's solution, 6X SSC, and 0.1%Tween 20) at 90 º C for 2 minutes, followed by incubation with the same buffer for 60 to 90 minutes at 60 º C for hybridization to occur.After hybridization, the arrays were washed and scanned by GenePix 4000B (Molecular Devices, Sunnyvale, CA).Fluorescent images were captured and analyzed using the GenePix Pro 6.0 software.Each template hybridization experiment was conducted at least three times with the genomic DNA isolated at different times and each of the hybridizations was evaluated on at least three microarray slides.The genomic DNA of S. Typhimurium ATCC 14028 was used as a positive control on every chip, hybridized to a randomly selected array along with the test strains.Genomic DNA from S. Typhimurium VV302 (∆hilA-523) and S. Typhimurium SVM725 (∆invF) were added to assess the performance of the microarray chip.

Data normalization and analysis
The fluorescence intensity for each probe was calculated by subtracting the median value of the local background intensities.For normalization of the Cy3 probe signals, the local backgroundsubtracted intensity of each probe was divided by the median value of the total signals from all probes in the same array.Any spot which showed a ratio greater than 1.0 was counted as positive.Ratios less than 1.0 were then standardized by comparison with those of the corresponding probes of the positive control strain, S. Typhimurium ATCC 14028, in the same chip.The standardized ratios of the two duplicate probes on each of the two chips (4 spots total) for each gene were averaged as the measurement of the signal strength for the gene.A ratio of < 0.8 indicated absence and a ratio > 0.9 indicated presence of a probe sequence.Values between 0.8 and 0.9 were classified as uncertain.The two threshold cutoffs, 0.8 and 0.9, were determined by evaluating the data from all 69 virulenceassociated gene probes for the Salmonella isolates.
Hierarchical clustering analysis (HCA) was performed with ArrayTrack TM , an open sharing software developed by researchers at the US Food and Drug Administration.
ArrayTrack TM is a bioinformatics tool that provides an integrated environment for genomic data management, analysis, and interpretation with a focus on microarray data.This software is freely available to the scientific community through the FDA website (http://www.fda.gov/ArrayTrack).The user manual and tutorials are available from the website.

PCR verification of microarray results
PCR was used to screen the all the bacterial isolates for the presence of following genes: purR, rmbA, rhuM, sugR, spi4H, ttrB, iacP, avrA, prgK, invH and sopE (Table 2).Most genes selected for PCR screening were either absent or elicited weak hybridization signals.The 16S rRNA and purR genes were used as positive controls and celB as a negative control for PCR analysis.A typical PCR (25 µl) contained 0.6 pmol/l of each primer, 12.5 µl 2X PCR MasterMix (Core System Kit, Promega) and 40 ng of purified genomic DNA.PCR amplification was conducted by incubating the samples at 94 o C for 5 minutes, followed by 30 cycles of 94 o C for 30 seconds, variable melting temperatures (Table 2) for 30 seconds and 72 o C for 30 seconds.A 10 µl aliquot of PCR products was loaded on a 2% agarose E-gel (Invitrogen, Carlsbad, CA, USA) and separated according to the manufacture's instructions.Whenever there was a discrepancy between the PCR and microarray results for a particular gene target, representative PCR-positive products were sequenced to verify the PCR results.The PCR amplicons were labeled using the BigDye Terminator Cycle Sequencing Kit (version 3.1, Applied Biosystems, Foster City, CA, USA) and separated using an ABI Prism 310 Automatic Sequencer (Applied Biosystems) to confirm the identity of the PCR product.

Distribution of virulence-associated genes in Salmonella serovars
The distribution of 69 virulence-associated genes in Salmonella is summarized in Table 3.The positive control genes, 16S rRNA and purine nucleotide synthesis repressor (purR), were detected in all Salmonella strains, while only background hybridization signals were observed on spots of the negative controls, celB, celF, 2X SSC, printing buffer and blanks (data not shown).Overall, 58% of the virulence-associated genes (40/69) were present in all 24 Salmonella isolates tested while the remaining genes were variable in their distribution.In this study, regardless of the serotype, Salmonella isolates exhibited most variability among the iacP, avrA, sopE, sirA and invH genes belonging to the SPI-1 class (Table 3).The Salmonella acyl carrier proteinencoding gene, iacP, was present in 71% (17/24) of the strains.The avrA gene, which encodes the secreted effector protein in Salmonella, was detected in 50% of the strains.The sirA gene was detected in 63% of Salmonella isolates.This gene encodes a two-component response regulator of the FixJ family that has a positive regulatory influence on the expression of Type III secretion genes involved with epithelial cell invasion and the elicitation of bovine gastroenteritis.Thirteen Salmonella isolates (54%) were positive for invH, which encodes the protein required for entry of the bacteria into cultured epithelial cells.The SPI-2 associated virulence genes were well conserved in most Salmonella serotypes; ORF242 was found to be variable in S. Heidelberg and absent from all untypeable Salmonella rough isolates (Table 3).
The SPI-3 genes were conserved among all serotypes tested, with the exception of rhuM and sugR.These two genes were found to be absent from most serotypes, with the exception of S. Heidelberg, S. Enteritidis and untypeable Salmonella rough isolates (Table 3).The sugR and rhuM genes located in S. enterica SPI-3 and encoding the putative ATP binding protein and a cytoplasmic protein, respectively, were detected in 42% of the isolates.None of the Salmonella outbreak strains showed positive signals with the sugR and rhuM probes on the microarray chip (Table 3).The SPI-3-associated mgtB and mgtC genes, which are important for intracellular Salmonella replication, were present in all serotypes.The SPI-4 associated genes were present in all Salmonella serotypes; these genes have sequence similarity to genes required for survival in macrophages.Interestingly, the spi4C, spi4H to spi4N and spi4Q genes were absent from S. Senftenberg strain 606 isolated from the poultry farm.Lastly, with the exception of sopB and pipA genes, the SPI-5 associated genes were conserved in all Salmonella serotypes.The virulence-associated effector protein-encoding gene, sopB, was detected in 79% of the isolates.
Overall, there appeared to be no major differences in the virulence-associated gene profiles of the poultry (egg houses and farm) and outbreak strains, nor any differences among the multiple isolates of serovars Heidelberg, Muenster, Anatum or untypeable isolates (Table 3).However, the two isolates of S. Senftenberg (601 and 606) displayed differences for 13 virulence genes (iacP, ORF242, ORF408, sopB, spi4H/I/J/K/L/M/N/Q and ttrC); these genes were absent from isolate 606.
Figure 1 illustrates a flag-based two-way HCA based on 29 virulence genes that showed variability among the 24 Salmonella isolates.In general, the outbreak and farm isolates appear to group in two distinct clusters.The HCA showed that the five untypeable Salmonella isolates were sub grouped with one S. Heidelberg (strain 585); all six strains were farm isolates, while S. Enteritidis strain 1 and S. Heidelberg strain 12 isolates from egg houses were sub grouped with the two S. Heidelberg farm isolates (580 and 589).The outbreak strains were grouped in a distinct cluster and shared profiles with six isolates from the farm (strains 606, 524, 807, 528, 601 and 614).

Validation of the microarray signals by PCR analysis and sequencing
Of the 275 PCR reactions (11 gene amplifications x 25 isolates), there was 88% (242/275) agreement between the PCR and hybridization results (Table 4).Genes in the remaining 12% of discordant results tested positive with PCR, but were found to be negative (signal) with the hybridization experiments; all PCR reactions amplified a product of the predicted size for the corresponding genes (Table 2).To further verify the discordant results, PCR amplicons of rhuM, spi4H and prgK genes for S. Miami strain 180, S. Senftenberg strain 606, and Salmonella strain 801, respectively, and representative amplicons of sugR, iacP, avr, invH and sopE genes were sequenced and found to match the predicted genes based on NCBI's Blast search results.

Discussion
Although 97% of the genome sequence is identical among different Salmonella serovars [22]; comparative genomics using microarray have revealed conserved and variable gene components associated with fimbriae, pathogenicity and phage elements [23,24].Virulence genes and plasmids can be used as biomarkers for detection of Salmonella serotypes, such as Typhimurium and Newport [25].This study examined the profiles of 69 virulenceassociated genes within Salmonella serovars from different environmental sources.Most genes in the spotted array were located in SPI-1 to SPI-3, and encode proteins responsible for secretion and translocation of Salmonella proteins in the host cells and intracellular survival and replication [7,9], while the other SPI (SPI-2, SPI-4 and SPI-5)-associated genes encode effector proteins that facilitate intracellular survival of Salmonella in the host cells, T1SS toxins and survival of these bacteria in macrophages [8,26,27].This platform adds to list of other studies wherein microarray technology has been used for diagnostic characterization of diseasecausing potential of bacterial pathogens [7,13,16,18].
The virulence genes associated with SPI-1 encode effector proteins that disrupt cytoskeletal and bacterial cell barriers resulting in Salmonella invasion of the host gastrointestinal epithelium [27].The sirA and invH genes were absent from S. Heidelberg isolates (with the exception of S. Heidelberg strain 580 for invH gene), while avrA and invH were absent from the untypeable Salmonella isolates.However, a recent study has shown that SPI-1 deficient S. Senftenberg can cause human enteropathogenic infections [26], indicating that SPI-1 associated genes are not essential to cause human gastroenteritis in this Salmonella serotype.In our study, all SPI-1 related genes in S. Senftenberg strain 601 were present; however, the avrA, iacP, invE and invH were absent from S. Senftenberg strain 606 (Table 3), suggesting that other virulence factors may contribute to the pathogenicity of this isolate.Several other virulence genes from SPI-2 (ORF242, ORF408 and ttrC), SPI-3 (rhuM and sugR), SPI-4 (spi-class) and SPI-5 (sopB) were concurrently found to be absent from S. Senftenberg strain 606 as opposed to S. Senftenberg strain 601, suggesting the uniqueness of this strain.
Of the genes analyzed, 58% were found in all the isolates from the various serovars and sources, which potentially indicate that they may serve as a core set of virulence-associated genes in Salmonella enterica.Outside this core group, there was diversity in the virulence factor profiles among the isolates: only two isolates (S.Senftenberg strain 601 and S. Worthington strain 614) shared a common profile and both strains were isolated from the same sampled flock.On the other hand, the two isolates of S. Senftenberg (601 and 606) displayed variability for 13 virulence genes (Table 3), indicating that when multiple isolates of Salmonella serotypes were analyzed, differences in gene content could be detected in all of them.Additional strains of S. Senftenberg should be tested to evaluate if these gene differences were unique to these isolates.There were no unique pattern differences in the virulence gene profiles between farm/egg house and the outbreak strains.S. Enteritidis strain 1 and S. Heidelberg strain 12, isolated from the same egg house, exhibited nearly identical virulence-associated gene profiles, despite belonging to different serovars (Table 3).
The avrA, iacP, invH, ORF242, rhuM, sirA or sugR genes were absent from several Salmonella isolates.S. Heidelberg strains 585, 589, 606, S. Manhattan strain 193, and unidentifiable Salmonella strains 801, 802, 804, 805 and 806 were found not to possess the ORF242 and invH genes.ORF242, located in SPI-2, encodes a protein similar to the AraC-like family of transcriptional regulators [28].There was no evidence of a relationship between ORF242 and invH in SPI-1, a lipoprotein required for invG localization to the outer membrane [29].
A subset of the target genes were selected for PCR analysis to specifically validate the microarray data demonstrating weaker signals.Nearly 88% accordance was observed between the microarray and PCR data.The discrepancies resulted from a target being detected by PCR and not meeting the positive detection threshold using the microarray probes.Over half (19/33) of the discrepancies were observed for iacP or invH genes and were probably caused by the stringency of hybridization conditions or sequence variability in the probe binding site.Future refinement of the array will need to address the problems with these two probes.
Although whole genome arrays can generate a lot of information, they are not always cost effective and require expensive software for data analysis.To support such analysis, ArrayTrack TM has been recently been upgraded to manage and analyze the genetic profiling data related to bacterial foodborne pathogens [30].ArrayTrack TM libraries have been populated with bioinformatics data from the public domains related to bacterial pathogen species.Data processing and visualization tools have been enhanced with customized options to facilitate analysis of genetic profiling microarray data.Specifically, three new functions have been developed and are particularly effective for analysis of the microarray data: flag-based HCA, a flag concordance (FC) heat map, and flag indicators in the mixed scatter plot.These functions can be relevant and effective for the identification and characterization of bacterial pathogens through microarray genetic profiling data [30].
One of the benefits of this spotted array technology is the ability to rapidly and simultaneously analyze multiple genes on a single platform.Since much of the Salmonella enterica genome is invariant, we have chosen a unique set of virulence genes that would represent Salmonella enterica pathogenicity.The chip utilized in this study has the added benefit that it contains eight identical arrays, enabling simultaneous analysis of four samples in duplicate.This design results in a per sample cost reduction and an increase in the sample throughput over single arrays.Such a platform, along with the free analysis software, could be particularly useful for laboratories with limited resources.When this improved microarray chip design was compared to a microarray containing a similar but single set of probes [9], the multi-sample array functioned identically.
Other microarray platforms previously reported have evaluated only a select list of virulence genes [16,25].
In summary, our study highlights a simple and effective method for a single-step screening of multiple virulence-associated genes for multiple Salmonella enterica isolates that may render these bacteria pathogenic in a food and/or food animal environment.The ArrayTrack TM free software for microarray data analysis could be used by researchers in developing countries.Such a detection system can be easily modified and adapted to include additional probes for newly described virulenceassociated genes such as those encoding for regulation-effector (phoP/phoQ), fimbriae (safC, sefA, stbD and stcC), or phage-associated genes (sseI, gtgA and STM4210) [12,31].High density screening of Salmonella isolates for genetic elements, such as pathogenicity islands, plasmids, and phages can provide better insights into the mechanisms of acquisition of these virulence factors by a particular strain.

Figure 1 .
Figure 1.A flag-based two-way hierarchical clustering analysis of the 24 Salmonella isolates based on all 29 virulence genes that varied between isolates.

Table 1 .
Salmonella isolates used in this study

Table 2 .
Primers used for PCR amplification of virulence-associated genes in Salmonella

Table 3 .
Prevalence of virulence-associated genes in 24 Salmonella isolates from food/food animal environment Gray box indicates gene presence; white box indicates gene absence; and black indicates uncertain existence of the gene based on threshold cutoff used for data normalization.

Table 4 .
Comparison of microarray hybridization and PCR amplification data Highlighted cells represent the discordant results between the microarray analysis and PCR assay.These were positive by PCR and negative by microarray.