Résumé :
|
The computational identification of transcription factor binding sites is difficult due to their small size, resulting in large numbers of false positives and negatives in current approaches. Two computational methods to reduce false positives are to look for over-representation of transcription factor binding sites in a set of similarly regulated promoters or look for conservation in orthologous promoter alignments. We have developed a novel tool titled CORE_TF (Conserved and Over-REpresented Transcription Factors) that identifies common transcription factor binding sites in promoters of co-regulated genes. To improve upon existing binding site predictions, the tool searches for TransFacR matrices that are over-represented compared to a random set of promoters and identifies cross-species conservation in the predicted transcription factor binding sites. The algorithm has been evaluated using expression array data from several literature and in house studies on myogenic differentiation. CORE_TF is accessible as a web interface at www.LGTC.nl/CORE_TF. It provides a table of over-represented transcription factor binding sites in a user-defined set of promoters and a graphical view on evolutionary conserved transcription factor binding sites. In our myogenic test data sets it successfully predicts target transcription factors and their binding sites. Binding sites for the transcription factors MAF, NF-1, and Runx2 were significantly over-represented in the upregulated genes from all microarray studies analyzed. In addition to other known muscle-related transcription factors, we have predicted the involvement of transcription factors not previously known to function in myogenesis. We are in the process of verifying results with high throughput sequencing of chromatin-immunoprecipitated samples. The combination of in silico and empirical approaches will assist in the identification of transcription factors with a role in the regulation of myogenic differentiation and associated with myogenic defects seen in many neuromuscular disorders.
|