Tag Archives: CTCF

On Genome Topology

The study of higher order genomic structure using novel chromosome conformation capture techniques is an important growth area of biological research. These methods are being used to study long-range interactions between or within chromosomes, and promise to elucidate the spatial organisation of the genome, and it’s functional significance. One such technique, Hi-C, which allows the identification of chromatin interactions across the entire genome, is used in a recent paper to discover that mammalian chromosomes are divided into highly self-interacting ‘topological domains’.

Hi-C works by purifying chromosomal interactions and then sequencing the products. Briefly, this is achieved by chromosomes first being cross-linked by treatment with formaldehyde; the DNA is then chopped up and the ends of the fragments are chemically marked; the fragments are then ligated together under conditions that favour ligation of cross-linked fragments. Thus the ligation products were originally in close proximity to each other. After shearing, the marked fragments are purified, and the resulting library of interacting fragments is ‘massively parallel sequenced’. Upon alignment with a reference genome sequence, one can construct a genome-wide contact matrix.

Dixon et al. applied Hi-C to mouse ES cells, human ES cells, human fibroblasts, as well as using data from mouse cortex. They found that when they analysed their data at a resolution of less than 100kb, highly self-interacting regions emerged. For example, in mouse ES cells, 2,200 of these ‘topological domains’, with a median size of 880kb, occupied ~91% of the genome. The topological domains were separated by short segments in which chromatin interactions ended abruptly, termed ‘topological boundary regions’. Interestingly, in general, the boundary regions remained the same between embryonic stem cells and differentiated cells, in both mouse and human. Hence, the overall domain architecture is generally unchanged between cell types.  Surprisingly, there was also quite a high degree of conservation of boundary zones between human and mouse.

These boundary zones seem to correspond to insulator or barrier elements that are known to divide different chromatin domains, and prevent heterochromatin from spreading. For instance the HoxA locus is divided into two compartments by a known insulator element, which was found to be a topological boundary region in both human and mouse. Dixon et al. also found that the distribution of the heterochromatin associated histone  modification H3K9me3 was segregated at boundary regions in differentiated cells. As the topological domains generally remain constant between stem and differentiated cell types, the boundaries seem to pre-mark the end points for heterochromatic spreading during cellular differentiation. Likewise, this shows that the topological domains are not a consequence of heterochromation formation.

In agreement with the linkage of boundary zones to insulator elements, Dixon et al found that they were enriched for binding-sites for the insulator protein CTCF. However, only 15% of global CTCF binding sites were in boundary zones, suggesting a more complex composition and function for the boundary zones. Looking at the distributions of other cellular factors, the researchers showed that boundary zones are associated with high levels of transcription; being enriched for transcription start sites, housekeeping genes, and promoter associated histone marks. Interestingly they also observed an enrichment for SINE retrotransposons. This is in agreement with a recent paper (that I wrote about) linking SINEs to the genomic spread of CTCF binding sites during evolution.

The discovery that the genome is partitioned into these topological domains is part of a growing literature dissecting genomic macro-structure. Dixon et al. compared topological domains with various other recently defined higher order levels of genomic organisation; ‘A+B’ compartments (Lieberman-Aiden et al.), lamina-associated domains, replication time zones, and large organised chromatin K9 modification domains. They concluded that topological domains are related to, but independent from each of these previously characterised architectures. This list gives one some idea of the complexity, and our shallow understanding of, higher order genomic structure. However, this tranche of new chromosome capture techniques, combined with methods for high throughput analysis of chromatin composition, are yielding a wealth of data. In the next few years we should have a far more nuanced and complete appreciation of the interplay between chromosomal architecture, chromatin state and genetic regulation. A mouth-watering prospect.

Dixon, J., Selvaraj, S., Yue, F., Kim, A., Li, Y., Shen, Y., Hu, M., Liu, J., & Ren, B. (2012). Topological domains in mammalian genomes identified by analysis of chromatin interactions Nature, 485 (7398), 376-380 DOI: 10.1038/nature11082

Lieberman-Aiden, E., van Berkum, N., Williams, L., Imakaev, M., Ragoczy, T., Telling, A., Amit, I., Lajoie, B., Sabo, P., Dorschner, M., Sandstrom, R., Bernstein, B., Bender, M., Groudine, M., Gnirke, A., Stamatoyannopoulos, J., Mirny, L., Lander, E., & Dekker, J. (2009). Comprehensive Mapping of Long-Range Interactions Reveals Folding Principles of the Human Genome Science, 326 (5950), 289-293 DOI: 10.1126/science.1181369

On Transposable Elements and Regulatory Evolution

Transposable elements (TEs), generally considered molecular parasites on the genome, are increasingly being linked to the evolution of new biological functions. TEs have been shown to be a source of novel genes and exons, the ‘arms race’ between them and their hosts has been a driving force in the evolution of epigenetic silencing mechanisms, and they have been shown to serve as cis-acting regulatory elements for host genes. This last role, as regulatory elements has potentially wide ramifications: TE mobilisation could cause changes to the expression of co-regulated suites of genes. Recently, the emergence of novel TEs and their mobilisation has been argued to be a causative factor underlying such ‘punctuated equilibria’ evolutionary phenomena as the Cambrian explosion and the rapid speciation of cichlid fishes. Two new papers analysing mammalian genomic evolution further link transposable elements with the spread of regulatory elements through the genome, and the evolution of novel characters.

CTCF binding sites.

CTCF (CCCTC-binding factor) is a DNA-binding protein with such a diverse and exciting array of potential roles attributed to it that it has been called a ‘master weaver of the genome’. It acts as an insulator, dividing different chromatin domains, and is therefore important for transcriptional activation and repression. This role appears to be linked to the formation of long distance chromosomal loops, and hence to the global organisation of the chromosomes within the nucleus. Schmidt et al. used ChIP-seq to define all the CTCF binding events in liver cells from five eutherian mammals (human, macaque, mouse, rat, and dog) and a marsupial (opossum). Using this data they defined a core DNA sequence motif that CTCF commonly binds, as well as sets of CTCF binding events that are conserved between the various species. In some lineages certain CTCF bound DNA sequence motifs were overrepresented. These overrepresented ‘motif-words’ were often embedded within lineage specific SINE repeats (short interspersed nuclear elements, non-autonomous non-LTR retrotransposons). For instance, mice and rats share about 2000 CTCF binding events that are associated with B2 SINES, mice have a further 5,300 B2 associated binding events and rats a further 1,200. Enrichments of CTCF binding events associated with lineage specific SINEs also occurred in the canine and opossum genomes (on a lesser scale). Surprisingly however, no similar TE associated enrichment occurred in the primate lineage. Looking at CTCF binding events that were conserved between multiple mammals, Schmidt et al. were also able to find over 100 binding events that were associated with fossilised ancestral transposable sequences.

Overall, this data shows that CTCF binding has expanded via retrotransposition in multiple mammalian lineages and that this is an ancient mechanism of regulatory evolution. CTCF binds a long DNA sequence motif (33/34bp) that is less likely to be generated by random point mutations than the smaller motifs more commonly bound by transcription factors. This is one reason why CTCF binding site expansion should be more associated with TEs than other regulatory sequence motifs. Another suggestion that the authors make to explain this association is that CTCF binding may protect TEs from repressive DNA or chromatin modifications.

Transposons and the evolution of pregnancy

During mammalian pregnancy, endometrial stromal cells (ESCs) differentiate in response to progesterone and signalling via the cAMP second messenger pathway, to produce a vascularised placenta that can accommodate implantation (a process termed decidualisation). The enhancer that drives expression of Prolactin in response to progesterone/cAMP signalling in ESCs is derived from a MER20 transposon (a hAT-Charlie family DNA transposon). Lynch et al. have found a strong association between MER20 elements and genes that are differentially expressed in mammalian ESCs and genes that are responsive to progesterone/cAMP signalling.

Analysing MER20s that are located close to stromally regulated genes, they found that, based on their association with CpG islands and various histone modifications, they often had regulatory potential. They then tested whether 21 randomly chosen MER20s bound various transcription factors and insulator proteins. 14 MER20s bound a suite of 5 different insulator proteins (including CTCF), whilst 5 different transcription factors important for ESC development bound together in 4 cases. This suggested that MER20s could be classified into ‘insulator’ and ‘enhancer-repressor’ types. Using a reporter gene assay in various cell types, they then showed that the majority of these MER20s acted as regulatory elements in response to progesterone/cAMP signalling specifically in ESCs.

This data indicates that the rewiring of the gene regulatory network of ESCs during the evolution of pregnancy was partly mediated by MER20 transposition events. In this case, MER20s contain sequences for regulatory assemblies of transcription factors responsive to specific signalling pathways, and hence have acted as cell type specific regulatory elements.

These two papers, as well as an increasing number of other studies, show that TEs are important agents of gene regulatory network evolution. The findings of Lynch et al. especially confirm the perspicacity of the discoverer of transposable elements, Barbara McClintock in terming them ‘controlling elements’.

See also: Retrotransposons as regulatory elements

Lynch, V., Leclerc, R., May, G., & Wagner, G. (2011). Transposon-mediated rewiring of gene regulatory networks contributed to the evolution of pregnancy in mammals Nature Genetics, 43 (11), 1154-1159 DOI: 10.1038/ng.917

Schmidt, D., Schwalie, P., Wilson, M., Ballester, B., Gonçalves, A., Kutter, C., Brown, G., Marshall, A., Flicek, P., & Odom, D. (2012). Waves of Retrotransposon Expansion Remodel Genome Organization and CTCF Binding in Multiple Mammalian Lineages Cell, 148 (1-2), 335-348 DOI: 10.1016/j.cell.2011.11.058

Zeh, D., Zeh, J., & Ishida, Y. (2009). Transposable elements and an epigenetic basis for punctuated equilibria BioEssays, 31 (7), 715-726 DOI: 10.1002/bies.200900026

Phillips, J., & Corces, V. (2009). CTCF: Master Weaver of the Genome Cell, 137 (7), 1194-1211 DOI: 10.1016/j.cell.2009.06.001