SynoR :: Instructions

https://synor.dcode.org/

Instructions

BRIEFLY

SynoR performs genome scans for clusters of conserved transcription factor binding sites (cTFBS) in user-specified spatial configurations. The current version of this program scans human and mouse genomes for TFBS conserved in comparisons with either other mammals, chicken, frog, or fish. The identified cTFBS modules and corresponding genes go through several steps of functional annotation. (1) cTFBS modules are classified as promoters, UTRs, introns, intergenic, or coding exons depending on their relationship to "UCSC known genes". (2) Interspecies conservation is performed for all the identified modules to describe the evolutionary history of different modules. (3) Gene Ontology (GO) characterization is performed for genes bracketing the identified noncoding modules. (4) GNF Expression Atlas 2 analysis is performed for these genes, thus allowing the prediction of tissue specificity of the identified modules.

INPUT

Designing SynoR searches. It is necessary to define content, spatial and order parameters of a TFBS module in order to perform a SynoR search. SynoR's first input form collects the data on TFBS content and order. The number of different TFBS, their directionality, and order may be specified at this point. While the TFBS content is required, the order and the directionality are optional. Here are some examples that describe possible input parameters for the cluster specification:

GATA1 2

- search for a pair of GATA1 sites

GATA1 2 +

- search for a pair of GATA1 sites on the same strand

GATA1 +
HNF4 2 -

- search for GATA1 site on the positive strand followed by 2 HNF4 sites on the negative strand HNF4 2

Sometimes there is more than 1 TFBS matrix per TF in the TRANSFAC database utilized by SynoR. For example, there are 5 TRANSFAC matrices for the SP1 TFBS (SP1_01, SP1_Q6, SP1_Q6_01, SP1_Q4_01, SP1_Q2_01). In such cases, the SP1 2 input specification will be searching for a pair of ANY SP1 TFBS in the module, while the SP1_01 specification will be selecting for the SP1_01 matrix specifically. All the matrices were optimized independently, thus there is no straightforward rule of thumb on what is the best module definition in the case of multiple matrices for a TFBS. An experimentation with all the different options might be an optimal solution.
Select the Fix the order of TFBS option to require a specific order of TFBS in the cluster. For example, the following cluster definition will search for the GATA4 .. HNF4 .. SP1 configuration (or a reverse SP1 .. HNF4 .. GATA4 configuration), but not the HNF4 .. GATA4 .. SP1:

GATA4
HGF4
SP1

Also, if you select directionality of one of the TFBS, it will be reversed if the cluster is detected in the reverse strand. So, the selection of GATA4 + .. HNF4 - .. SP1 - is equivalent to SP1 + .. HNF4 + .. GATA4 -, but is not the same as SP1 - .. HNF4 - .. GATA4 +, if the Fix the order of TFBS option is selected.

OUTPUT

      SynoR genome scans may vary from some seconds to several minutes, depending on the selection of the comparison genomes, queue size, and the number of clusters matching the input module specification. The processing report page will be constantly updated and automatically forwarded to the results page upon the competition of the scan. Several data analysis options are available from the results web page:
- full list of the identified modules including:
    a) genome position, which is linked to the ECR Browser. Following the ECR Browser link it is possible to study interspecies conservation of the corresponding genomic locus, extract DNA sequences, list all the neighboring evolutionary conserved regions (ECRs), visualize conserved TFBS in alignments of different genomes, identify cross-species synteny, detect conserved SNPs in ECRs, obtained detailed information on genes, etc.
    b) annotation based on overlapping or bracketing gene features (promoter, UTR, etc.)
    c) corresponding gene name(s) (annotated using "UCSC known gene annotation") [The corresponding gene name is the name of an overlapping gene (in case of coding, intronic, and UTR elements). It is the name of the nearest gene in case of promoters. The names of two bracketing genes are reported in case of an intergenic element.]
    d) multi-species conservation profile
    e) FASTA module sequence
    f) position and strand of each TFBS in the module (available in the "text" output only)
- summary statistics on different types of modules
- functional annotation of genes corresponding (bracketing) to noncoding modules:
    a) enrichment in GO categories
    b) tissue specificity of the genes as calculated using the GNF Expression Atlas 2 (Su AI et al., PNAS (2002) 99, 4465-4470)

      Follow this link for an example output corresponding to a SRF/SP1 SynoR scan through human/mouse cTFBS.

      1. Details on the Gene Ontology analysis:
  -   Enrichment in GO categories is calculated for genes bracketing noncoding elements using binomial distribution approximation to the hypergeometric distribution.
  -   Holm's sequential Bonferroni correction is applied to account for multiple testing.
  -   GO analysis is performed for all the GO categories that include at least 10 genes from over 18,000 total "UCSC known genes".
  -   Significantly enriched GO categories (as indicated by the p-value of less than 0.05) are reported.
  -   Category name provides a dynamic link to the list of identified genes that fall into that particular category.
  -   The enrichment column gives a direct ratio of observed vs expected genes.

      2. Details on tissue specificity analysis: Synor collects expression data from the GNF Atlas 2 for the identified genes corresponding to noncoding modules and presents it in a microarray-style table consisting of colored rectangles. The density of red and green colors correlates with the level of relative tissue expression of a particular gene. Brighter colors correspond to higher expression levels. SynoR tissue expression analysis normalizes expression across different tissues for each gene separately. This way, the maximum (positive or negative) gene expression is equivalent for all the genes and the difference between different genes corresponds to only the differences in gene expression across different tissues.

      At the first data analysis step, the clustering of gene expression is performed by genes and tissues using the Cluster 3.0 software with the default settings. It allows a direct visual identification of clusters of co-expressed genes. Subsequently, the list of tissues with an unexpectedly large number of overexpressed or suppressed genes is extracted, providing tissue specificity estimates for the identified genes. This list is further broken into four categories: (1) significantly overexpressed, (2) some overexpressed, (3) some suppressed, and (4) significantly suppressed - that are described on top of the clustering figure using solid red, light red, light green, and solid green colors, respectively. The same colors are used to highlight these tissues in the clustering figure.

RETURNING TO PREVIOUSLY COMPLETED REQUEST

Every SynoR request gets an ID number assigned to it (s02170925227888, for example). Request ID numbers are available on almost every SynoR web page. It can be used to return to previously submitted requests by pasting this number into the corresponding form at the bottom of the SynoR front page. Please note that SynoR data are stored in a temporary directory and might be deleted in a month after a request was processed. Therefore, please be aware that SynoR request ID numbers may expire in a month or so.

QUESTIONS?

Please email dcode@ncbi.nlm.nih.gov if you have any concerns or questions on how to use SynoR.