Quantifying the relationship between co-expression, co-regulation and gene function
Activation/activity of TF will be reflected in the altered gene expression of the relationship between putative enhancer activity and it's associated gene activity. In order for two genes to have a greater than 50% chance of sharing a common transcription factor binder, the correlation between their expression profiles. Gene co-expression networks can be used to associate genes of unknown In the first step, individual relationships between genes are defined based on A platform for quantifying gene expression that assays mRNA.
This question has been addressed in Saccharomyces cerevisiae for which both extensive microarray expression data and experimentally verified TFBS exist Yu et al. In this single-celled organism, only gene pairs with a very high degree of co-expression share significantly larger number of TFBS Allocco et al.
While these results point to limitations of the use of the reverse engineering approach, they do encourage the use of shared sequence motifs as putative TFBS for some pairs of genes Allocco et al. The regulation of gene expression in animals is far more complex than that in unicellular organisms such as yeast or prokaryotes.QuickGO - Gene ontology annotation 2017
At the whole-organism level, the existence of compartmentalization and multiple cell types leads to enhanced complexity in animal regulatory networks. Thus, the experimental characterization of regulatory sequences in multicellular organisms is more difficult, and the use of computational tools to predict putative TFBS more valuable. However, the prediction of regulatory sequences is expected to be generally less successful in animals than in unicellular organisms see also Tompa et al. The extent of such limitations in the context of high-throughput microarray expression data remains unknown for animals.
Here, we use a compilation of experimentally-verified TFBS and extensive microarray experiment results available for D. We retrieved the expression patterns for the target Drosophila genes in the dataset from microarrays compiled by Spellman and Rubin Expression levels in these experiments were measured for the whole organisms.
For these data, we calculated the average expression value for the three experimental replicates in each time window.
Alternative analyses using values individually did not change our results. In further analyses, we also used GO annotation similarity to determine expression correlation instead of microarrays.
In order to quantify the GO annotation similarity, we calculated the maximum Resnik semantic similarity between genes Lord et al.
Analyses using other different measures yielded similar results. In all cases we calculated confidence intervals from randomly permuted datasets or from standard errors of the mean, as applicable. The average expression correlation is higher in magnitude for genes sharing at least two TFBS as compared to only one TFBS, although the difference was not statistically significant. Height of the boxes represents the average Pearson correlation values for gene pairs sharing different numbers of transcription factor binding sites.
Gene co-expression network - Wikipedia
Significance of the differences was assessed by using Mantel tests Sokal and Rohlf, Results are shown for absolute, only positive and only negative expression correlation values. We also observed this trend for negative expression correlation, although the differences between categories were much smaller than in the case of positive correlation Fig.
In these negative correlated gene pairs, the observation that genes sharing TFBS have higher negative co-expression may be taken as an indication of the presence of dual regulators i.
A dual regulator would activate one gene of the pair while repressing the other at the same time. This would produce a negative complementary expression correlation between the two genes.
Indeed, dual regulators seem to play an important role during Drosophila development Papatsenko and Levine, The list of potentially dual regulators contained segmentation genes bcd, kni, hb, Kr and homeotic genes abd-A, Antp, Ubxwhich are known to act simultaneously as repressors and activators during fly development Lawrence, Trends for absolute, positive and negative correlations between D.
However, the fraction of negatively co-expressed genes in yeast is much lower than the fraction in Drosophila Fig. Confidence intervals in yeast are negligible. Dashed line indicates the 1: A absolute; B only positive and C only negative co-expression correlation values.
Next, we examined the reverse relationship between gene expression correlation and the number of shared TFBS. This is important for evaluating the predictive use of co-expression in detecting TFBS.
In particular, we tested whether gene pairs with high either positive or negative correlation show an increased probability of sharing TFBS. For positively correlated genes, the number of shared TFBS is 0. The plot of the proportion of gene pairs sharing TFBS as a function of the expression correlation revealed a mild positive trend Fig.
Gene co-expression network
Gene pairs with higher expression correlation shared more TFBS, but the relationship is not statistically significant.
Therefore, the use of expression correlation to find genes via comparative genomics of shared motifs may not be very productive. View large Download slide Relationship between degree of co-expression or semantic similarity, and fraction of common TFBS. The fraction of gene pairs sharing at least one common TFBS y-axis is plotted for intervals of co-expression levels in A and Band semantic similarity in C x-axis.
Positive and negative correlation values are separated by a dashed vertical line. In contrast to D.
Can we predict/quantify transcription factor activation using gene expression data?
In order to test this possibility, we analyzed randomly selected subsets of the yeast TFBS data, such that the random yeast subsets had the same number of transcription factors as those present in our Drosophila dataset. Euclidean distance measures the geometric distance between two vectors, and so considers both the direction and the magnitude of the vectors of gene expression values. Mutual information measures how much knowing the expression levels of one gene reduces the uncertainty about the expression levels of another.
Each of these measures have their own advantages and disadvantages. The Euclidean distance is not appropriate when the absolute levels of functionally related genes are highly different. Furthermore, if two genes have consistently low expression levels but are otherwise randomly correlated, they might still appear close in Euclidean space.
In addition, for calculating mutual information one should estimate the distribution of the data which needs a large number of samples for a good estimate. The Pearson's correlation coefficient takes a value between -1 and 1 where absolute values close to 1 show strong correlation. The positive values correspond to an activation mechanism where the expression of one gene increases with the increase in the expression of its co-expressed gene and vice versa.
When the expression value of one gene decreases with the increase in the expression of its co-expressed gene, it corresponds to an underlying suppression mechanism and would have a negative correlation. There are two disadvantages for Pearson correlation measure: Moreover, Pearson correlation assumes that the gene expression data follow a normal distribution.
Furthermore, it has been shown that "most gene pairs satisfy linear or monotonic relationships" which indicates that "mutual information networks can safely be replaced by correlation networks when it comes to measuring co-expression relationships in stationary data  ".
Threshold selection[ edit ] Several methods have been used for selecting a threshold in constructing gene co-expression networks. A simple thresholding method is to choose a co-expression cutoff and select relationships which their co-expression exceeds this cutoff. This z-score is then converted into a p-value for each correlation and a cutoff is set on the p-value.
Some methods permute the data and calculate a z-score using the distribution of correlations found between genes in permuted dataset. WGCNA is a framework for constructing and analyzing weighted gene co-expression networks.