The regulatory network constructed for S.mutans UA159 is a “static” connection network based on assigning putative, predicted relationships between transcription factors (TFs) and target genes (TGs). A TF is said to putative regulate a TG if its binding site (patches of DNA sequences) are said to be present in the upstream regulatory regions of the target gene or the operon corresponding to the TG. Binding sites, hence, form the principle component for identifying putative regulatory relationships.

In S.mutans with almost 120-140 verified and predicted TFs, only a handful of them are studied in terms of knowledge of their binding sites. Due to this reason, static regulatory network constructions cannot be performed just on the basis of a few set of TFs. For this reason, a comparative genomic based approach was carried out to identify novel putative TF-TG relationships by extrapolation of experimentally identified connections from model organisms which are evolutionarily close to S.mutans. In our case, we focused on extrapolation from well-studied Gram-positive organisms such as B.subtilis, S.pneumoniae etc which are in this case considered as model organisms. This approach primarily involves the application of the best-hit trilogy concept which prerequires the presence of orthologs of the model organism TF and TG (which form the experimentally validated interaction in the model organism) in S.mutans. In addition, the preservation of the binding site (subject to marginal substitutions) as well (after taking into consideration the TG operon structure in the model organism and S.mutans) is also mandatory. Thus, the best-hit trilogy principle makes sure that the regulatory relationship at work in model organisms is conserved in S.mutans as well. It also confirms that a particular TF could be functioning in a similar process or function or biological mechanism in S.mutans as in the model organism. Yet another method to substantiate the existence of a particular TF-TG relationship is its conservation within the members of its own genus. In this case, we checked for the conservation of predicted TF-TG relationships in organisms within the Streptococcus genus.


TFs satisfying the best-hit trilogy principle were then assumed that they would work at a systematic whole genome wide level. Whole genome scans were performed to identify further targets of the TF using the collection of binding sites corresponding to the extrapolations. In other words, the binding sites of the TF in S.mutans (after extrapolation) were used to construct Position Specific Scoring Matrices (PSSMs) which were subsequently used to identify further targets in the whole genome. Even in model organisms, which are the primary source of binding site information and regulatory relationships which can be used for extrapolation, the number of TGs per TF could be as low as 1. This imposes a statistical bottleneck as to the information content of PSSMs as a result of the low number of sites used to construct the PSSMs. To remove this limitation, TF specific PSSMs were constructed using sets of extrapolated binding sites obtained not only from S.mutans but also those from the TG orthologs present in members within the Streptococcus genus. For every PSSM constructed, an optimal P-value was determined to screen out false positive hits.


The resulting reconstructed primary static regulatory network of S.mutans consisted of 1785 TF-TG relationships corresponding to 32 regulons. More regulons will be reconstructed in the future as a result of extrapolations from other known TF-TG relationships.


STRING is a compendium database consisting of predicted bioinformatic and experimental interactions between proteins. The STRING specific search function enables the user to view the subset of interactions of some of the query genes from the STRING database. Interactions include those derived from comparitive genomics, gene fusion, textmining, experimental studies, phylogentic profiles, gene neighbourhood, and homology. For more information on the STRING database, users can refer to Szklarczyk 2011