Our understanding of vertebrate development is limited by our ability to identify the precise mechanisms that lead a progenitor cell to commit to specific cell fates. The players are complex networks of trans-acting factors that act on cis-regulatory sequences to activate or repress given sets of genes. Mechanistically, cis-regulatory sequences integrate the proper combination of trans-acting signals via the arrangement of sets of motifs in a precise orientation and spacing for the affinity binding of these factors. Consequently, the information for such integration is hard-coded in the DNA sequence and defines the ‘grammar’ of cis-regulation in a given cell type. Thus the elucidation of such grammar is key for understanding regulatory networks during development.
Thus the lab addresses the underlying logic of cis-regulatory grammars governing precise spacio-temporal transcription during the establishment of neuronal diversity in vertebrates. The strategy consists of systematically testing sequences that are likely to be involved in gene regulation using the in vivo enhancer assay in fish that we developed and successfully tested. The result of that analysis is the precise description of the outcomes (exemplified by the reporter gene expression in the medaka embryo) of many combinatorial regulatory elements hardcoded in these sequences. Focusing on a limited number of cell types, sequences with enhancer activity in these cells are further analysed using new and established algorithms. Commonalities between these sequences are indicative of the specific regulatory grammar of these cells.
We are developing a fast and powerful pipeline to rapidly screen sequences for in- vivo enhancer activity during development in a spacio and temporal manner. The pipeline automatically searches genome-wide for the most likely regions with enhancer activity based on predicted transcription factor binding sites composition and conservation. Next, the regions are tested for enhancer activity in Medaka using a newly developed enhancer-assay that is convenient, highly reproducible and fast. The output of the pipeline is integrated in the 4DXpress database in order to conveniently compare the enhancer activity with the spacio-temporal expression of the gene surrounding the enhancer in its original genomic context. We applied the pipeline and experimentally analysed a subset of the regions with 90% success rate in terms of enhancer activity. All the experiments resulted in stable integration of the enhancer construct in the genome of transgenic fish. This pipeline represent a significant improvement for fast and efficient enhancer analysis in fish and nicely complement the established mouse enhancer pipeline.
We are also developping Trawler, an algorithm to efficiently discover over-represented motifs in chromatin immunoprecipitation (ChIP) experiments and to predict their functional instances. When we applied Trawler to data from yeast and mammals, 83% of the known binding sites were accurately called, often with other additional binding sites, providing hints of combinatorial input. Newly discovered motifs and their features (identity, conservation, position in sequence) are displayed on a web interface. We are also developing the stand alone version with additional features (see software)