Monthly Archives: June 2012

Patterns of RNA methylation 2

In a recent post I discussed the extent of adenosine methylation in RNAs. Meyer et al. found that m6A was found in many mRNAs and showed a bias in its distribution towards the end of coding sequence, stop codons, and the proximal section of 3’UTRs. The main chemically modified base of DNA is 5-methylcytosine. Squires et al. have surveyed the presence of m5C in human RNAs, and find that this modification is also common in tRNAs, rRNAs, mRNAs and ncRNAs.

The principal method for detecting methylated cytosines in nucleic acids is bisulphite sequencing. Bisulphite converts cytosine residues to uracil, but modified cytosines are left unchanged. Hence, when sequenced, C reads as T, and m5C reads as C. When compared to a reference sequence the status of cytosine methylation can be deduced. Squires et al. used bisulphite conversion of RNAs, followed by reverse transcription and high throughput sequencing. A number of other modified forms of cytosine known to be present in some rRNAs, such as N4-methylcytidine (m4C) and N4,2’-O-dimethylcytidine (m4Cm), may also be resistant to bisulphite treatment. With this in mind, Squires et al. termed their detected modified cytosines m5C candidate sites.

Surveying RNAs from HeLa cells, Squires et al discovered 255 modified Cs in tRNAs. This confirmed a number of known sites and identified many new candidate sites, which however generally fitted into a known pattern of modification of residues in specific secondary structural regions – the variable region and the anticodon loop. Modifications in these areas are important in stabilising secondary structure and affect aminoacylation and codon recognition.

Most interestingly, the researchers discovered 10, 275 m5C candidate sites in mRNAs and ncRNAs. Their data covered 10.6% of the total cytosine residues in the transcriptome. m5C seems to be enriched in some classes of ncRNA, but relatively depleted in mRNAs. The majority (83%) however, of their candidate sites were found in mRNAs. Within these transcripts m5C appears to be depleted within protein coding sequences but enriched in 5’ and 3’ UTRs. Further computational analysis showed an association between mRNA m5C sites and binding regions for Argonaute proteins (the proteins that small regulatory RNA molecules complex with to effect post-transcriptional regulation).

Two different methyltransferases are known to catalyse the m5C modification in eukaryotic RNAs, NSUN2 and TRDMT1. Previously these two enzymes had only been shown to methylate a few specific positions in various tRNAs. Squires et al. used RNAi to knockdown NSUN2 and TRDMT1 in HeLa cells and assayed the methylation status of a selected subset of cytosine residues. This showed that a number of m5C sites in mRNAs and ncRNAs are dependent on NSUN2, suggesting that this could be the primary enzyme responsible for cytosine methylation in these classes of RNAs. NSUN2 has been shown to be cell-cycle regulated and a target for the oncogene MYC. Mouse knockouts are small, and have revealed a role in balancing stem cell renewal and differentiation. A recent paper (Khan et al. 2012) has linked mutations in NSUN2 to autosomal-recessive intellectual disability syndrome in humans. It will be interesting to investigate the extent of this enzyme’s role in RNA methylation, and dissect what component of it’s function is responsible for the mouse and human phenotypes.

As with the investigation into m6A, m5C is commonly found in RNAs of many categories, and as with the previous study it is not yet obvious just how important RNA methylation truly is. The phenotypes associated with loss of methyltransferases or demethylases are not that extensive, but neither are they negligible. Some observations are shared between Meyer et al and Squires et al; the enrichments in 3’ UTRs and the correlation between RNA methylation and microRNA/argonaute binding sites (although there were differences in the details of these associations. This investigation by Squires et al into m5C is not on the same level as Meyer et al’s study, in that it lacked the developmental component and wasn’t on the same global scale. On the other hand bisulphite sequencing does pinpoint the exact modified residues, whereas m6A cannot as yet be detected to the same level of accuracy. The methodology used by Squires et al. can be scaled up, and so more global studies of m5C will no doubt appear in the near future. I also look forward to more detailed understanding of the enzymatic pathways involved, and a dissection of their roles in development.

Squires JE, Patel HR, Nousch M, Sibbritt T, Humphreys DT, Parker BJ, Suter CM, & Preiss T (2012). Widespread occurrence of 5-methylcytosine in human coding and non-coding RNA. Nucleic acids research, 40 (11), 5023-33 PMID: 22344696

Khan MA, Rafiq MA, Noor A, Hussain S, Flores JV, Rupp V, Vincent AK, Malli R, Ali G, Khan FS, Ishak GE, Doherty D, Weksberg R, Ayub M, Windpassinger C, Ibrahim S, Frye M, Ansar M, & Vincent JB (2012). Mutation in NSUN2, which encodes an RNA methyltransferase, causes autosomal-recessive intellectual disability. American journal of human genetics, 90 (5), 856-63 PMID: 22541562

Meyer KD, Saletore Y, Zumbo P, Elemento O, Mason CE, & Jaffrey SR (2012). Comprehensive Analysis of mRNA Methylation Reveals Enrichment in 3′ UTRs and near Stop Codons. Cell, 149 (7), 1635-46 PMID: 22608085

Genomic Rearrangement in Lampreys

Generally, all cells in an organism are considered to have the same genome; the differences between cells being determined by differential expression of genes. However, a growing list of species, including sciarid flies, and various copepods and nematodes, are known to undergo genomic rearrangements during the differentiation of cell lineages. Recent findings have shown that two species of jawless vertebrates (cyclostomes), the hagfish and the sea lamprey, undergo extensive genomic remodelling during development.

Hagfish and lampreys are the closest extant relatives of the jawed vertebrates (gnathostomes). Controversy has reigned over whether lampreys are more closely related to gnathostomes than they are to hagfish. This debate appears to have been resolved by molecular phylogenetics; the two branches of jawless vertebrates are united as a monophyletic clade, the cyclostomes. However the division of hagfish and lampreys occurred shortly after the division of gnathostomes and cyclostomes in the early stages of vertebrate evolution (~500mya). Two whole genome duplications occurred during the early evolution of the vertebrates (referred to as 1R and 2R). It is not yet resolved whether these duplications occurred before the divergence of cyclostomes and gnathostomes. Opinion appears to be split as to whether this divergence occurred after 1R or after 2R. The easiest way to resolve this question is whole genome sequencing. It has been known for some time that hagfish undergo genomic rearrangements during early development. Smith et al (2009) have now shown a similar phenomenon occurring in the Sea Lamprey (Petromyzon marinus), explaing some of the difficulty in constructing a finished version of the sea lamprey genome.

Smith et al. found that the total DNA content of nuclei from germline (sperm) and various somatic cell types (blood, liver, kidney) differed by >20%, equating to ≈500Mb. They then performed southern blots in which the restricted genomes of blood or sperm cells were probed with a repetitive sequence element. A number of bands differed in size or intensity between the germline and somatic cells, showing that genome rearrangement had occurred. One specific band was present in the germline samples and virtually absent from a range of somatic tissues. This band, termed Germ1 consisted of sequences for the 18s ribosomal DNA, a retrotransposon, and a section of the 28s rDNA. When aligned to Lamprey genomic sequence (somatic cell derived), these sequences were all commonly found, but one section of Germ1, from one end of the fragment to the 28s rDNA section, was dramatically underrepresented; a germline specific sequence. Smith et al. then performed FISH (fluorescent in situ hybridisation) using the Germ1 clone against metaphase germline and somatic cells. In the germline they found many Germ1-like sequences distributed across several chromosomes, often arrayed in tandem repeats. In contrast, in somatic cell nuclei, Germ1 hybridised to only one chromosome pair, most likely equating to the functional rDNAs. Using real-time PCR over the period of early embryogenesis, the authors found Germ1 abundance was drastically reduced 2 or 3 days after fertilisation. Smith et al. estimate that Germ1-like sequences make up ~7% of the germline genome, therefore suggesting that ~13% more of it is also lost. By comparing germline BAC clones with somatic genome sequence, they managed to identify more lost sequences including a gene known to be expressed during germline development, SPOPL.

Smith et al. have therefore shown that the sea lamprey genome undergoes a dramatic rearrangement during the early development of the somatic tissues. A large proportion of the excised content is accounted for by the elimination of Germ1-like sequences. However, it appears likely that a number of specific genes are also lost. In most cases of large scale somatic genomic rearrangement the main excised component is made up of transposable elements and other repetitive sequences. It seems likely that this could be the main basis for that seen in lampreys, as Germ1-like sequences appear to be a strange fusion of transposon and duplicated ribosomal DNA genes. However, the finding that an individual gene, SPOPL, is also selectively deleted in somatic lineages suggests that genomic deletions could be linked to the genetic regulation of development.

Interestingly, when the common apoptosis assay TUNEL – which detects DNA breaks – is used during the first few weeks of lamprey embryonic development, nearly every nucleus is labelled. It seems likely that this effect (that was considered an artifact) is explained by developmentally regulated deletions. The deletion of SPOPL appeared to occur more gradually than that of Germ1- like sequences. Together with the observation that the total nuclear DNA content differed slightly between various somatic tissues, these findings suggest that this program of somatic genomic deletions could be occurring in an intricate, and tissue specific, progression during early development.

All jawed-vertebrates undergo programmed genomic rearrangements during the diversification of the immune system. VDJ recombination, mediated by the transposase derived RAG recombinase, generates antigenic diversity, allowing adaptive immune responses. Cyclostomes have an alternate RAG-independent adaptive immune system, termed VLR. The recombinational system used in VLR is not yet clear. Is this system linked to that employed during the early developmentally regulated deletion process?

Smith et al (2010) link the presence of chromatin diminution (ie the somatic genomic rearrangements) in this basal vertebrate taxon to the whole genome duplications that occurred in the vertebrate stem group. If the genomic rearrangement mechanisms seen in cyclostomes were present in the last common ancestor of jawed and jawless vertebrates perhaps this system predisposed the stem group vertebrates to whole genome duplications by creating a permissive environment for polyploidisation and rediploidisation? Although this is a fascinating idea, currently it is perhaps a slightly idle speculation. It is of pressing importance to understand the mechanisms underlying cyclostome genomic rearrangements. Are they the same between hagfish and lampreys? Are similar systems present in gnathostomes? Finished genome sequences for both germline and various somatic cell lineages will answer many questions regarding the effects and purposes of chromatin diminution, however one can understand that this is easier said than done considering the potential complexity of the rearrangements. Lampreys and hagfish are also so far unculturable in the laboratory, adding to the difficulty in expanding their experimental use. Perhaps the biggest question left hanging, is whether other jawed vertebrates employ programmed genomic rearrangements for purposes other than antigenic diversification? This remains a possibility as the consistency of the genome has been generally assumed rather than tested.

See also a follow up on a new paper from the same group: Genomic Rearrangement in Lampreys 2

Smith JJ, Antonacci F, Eichler EE, & Amemiya CT (2009). Programmed loss of millions of base pairs from a vertebrate genome. Proceedings of the National Academy of Sciences of the United States of America, 106 (27), 11212-7 PMID: 19561299

Smith JJ, Saha NR, & Amemiya CT (2010). Genome biology of the cyclostomes and insights into the evolutionary biology of vertebrate genomes. Integrative and comparative biology, 50 (1), 130-7 PMID: 21558194

Shimeld SM, & Donoghue PC (2012). Evolutionary crossroads in developmental biology: cyclostomes (lamprey and hagfish). Development (Cambridge, England), 139 (12), 2091-9 PMID: 22619386
This article is good for vertebrate phylogeny and cyclostome development. A free version here

On Genome Topology 2: The Fractal Globule

As a follow-up to my last post on the use of Hi-C to discover highly self-interacting genomic ‘topological domains’, I wanted to discuss a very interesting aspect of the original paper describing Hi-C. As well as finding a division of the genome into two chromatin compartments, Lieberman-Aiden et al. used their Hi-C data to compare and contrast two models of the topology of chromatin folding within the nucleus.

In this first description of Hi-C, Leberman-Aiden divided their genome-wide contact matrix into 1Mb regions (ie.10 times less definition than the Dixon et al study). They found that, at this level of resolution, the genome can be partitioned into two varieties of spatial compartment, termed A and B. Greater interaction occurs within each compartment than across compartments. Compartment A displays a more open form of chromatin, with a high gene density and high levels of gene expression. Compartment B shows a more densely packed, closed chromatin state. Although the authors do not equate these compartments to euchromatin and heterochromatin, they sound distinctly similar to this old cytogenetic division.

In the later section of the paper, Lieberman-Aiden et al. discuss how their Hi-C data can be used to test models of the three dimensional folding of chromatin. The ‘Equilibrium globule’ model has been used to describe polymers in a poor solvent at equilibrium. In it chromatin is pictured as being in a densely knotted configuration. The ‘Fractal Globule’ model describes polymers self-organising into long-lived, non-equilibrium conformations:

“This highly compact state is formed by an unentangled polymer when it crumples into a series of small globules in a “beads-on-a-string” configuration. These beads serve as monomers in subsequent rounds of spontaneous crumpling until only a single globule-of-globules-of-globules remains. The resulting structure resembles a Peano curve, a continuous fractal trajectory that densely fills 3D space without crossing itself”

(C) Top: An unfolded polymer chain, 4000 monomers (4.8 Mb) long. Coloration corresponds to distance from one endpoint, ranging from blue to cyan, green, yellow, orange, and red. Middle: An equilibrium globule. The structure is highly entangled; loci that are nearby along the contour (similar color) need not be nearby in 3D. Bottom: A fractal globule. Nearby loci along the contour tend to be nearby in 3D, leading to monochromatic blocks both on the surface and in cross-section. The structure lacks knots. (D) Genome architecture at three scales. Top: Two compartments, corresponding to open and closed chromatin, spatially partition the genome. Chromosomes (blue, cyan, green) occupy distinct territories. Middle: Individual chromosomes weave back-and-forth between the open and closed chromatin compartments. Bottom: At the scale of single megabases, the chromosome consists of a series of fractal globules.

When the intrachromasomal contact probability is plotted against genomic distance a power law scaling is observed between ~500kb and ~7Mb. This scaling figure (s1.08) is much closer to that predicted for the fractal globule model (s-1) than that for the equilibrium globule (s-3/2). Likewise, data on the 3D distance between pairs of loci from 3D-FISH is in agreement with a fractal globule topology.

It therefore seems that, at the scale of several megabases, chromatin is organised in these knot-free conformations of globules within globules, allowing unfolding and refolding, whilst also enabling maximally dense packing. I must admit that I don’t have too much insight into the meaning of this; but frankly fractals are cool, and I love the idea of crumpling into globules of globules!

Lieberman-Aiden, E., van Berkum, N., Williams, L., Imakaev, M., Ragoczy, T., Telling, A., Amit, I., Lajoie, B., Sabo, P., Dorschner, M., Sandstrom, R., Bernstein, B., Bender, M., Groudine, M., Gnirke, A., Stamatoyannopoulos, J., Mirny, L., Lander, E., & Dekker, J. (2009). Comprehensive Mapping of Long-Range Interactions Reveals Folding Principles of the Human Genome Science, 326 (5950), 289-293 DOI: 10.1126/science.1181369

On Genome Topology

The study of higher order genomic structure using novel chromosome conformation capture techniques is an important growth area of biological research. These methods are being used to study long-range interactions between or within chromosomes, and promise to elucidate the spatial organisation of the genome, and it’s functional significance. One such technique, Hi-C, which allows the identification of chromatin interactions across the entire genome, is used in a recent paper to discover that mammalian chromosomes are divided into highly self-interacting ‘topological domains’.

Hi-C works by purifying chromosomal interactions and then sequencing the products. Briefly, this is achieved by chromosomes first being cross-linked by treatment with formaldehyde; the DNA is then chopped up and the ends of the fragments are chemically marked; the fragments are then ligated together under conditions that favour ligation of cross-linked fragments. Thus the ligation products were originally in close proximity to each other. After shearing, the marked fragments are purified, and the resulting library of interacting fragments is ‘massively parallel sequenced’. Upon alignment with a reference genome sequence, one can construct a genome-wide contact matrix.

Dixon et al. applied Hi-C to mouse ES cells, human ES cells, human fibroblasts, as well as using data from mouse cortex. They found that when they analysed their data at a resolution of less than 100kb, highly self-interacting regions emerged. For example, in mouse ES cells, 2,200 of these ‘topological domains’, with a median size of 880kb, occupied ~91% of the genome. The topological domains were separated by short segments in which chromatin interactions ended abruptly, termed ‘topological boundary regions’. Interestingly, in general, the boundary regions remained the same between embryonic stem cells and differentiated cells, in both mouse and human. Hence, the overall domain architecture is generally unchanged between cell types.  Surprisingly, there was also quite a high degree of conservation of boundary zones between human and mouse.

These boundary zones seem to correspond to insulator or barrier elements that are known to divide different chromatin domains, and prevent heterochromatin from spreading. For instance the HoxA locus is divided into two compartments by a known insulator element, which was found to be a topological boundary region in both human and mouse. Dixon et al. also found that the distribution of the heterochromatin associated histone  modification H3K9me3 was segregated at boundary regions in differentiated cells. As the topological domains generally remain constant between stem and differentiated cell types, the boundaries seem to pre-mark the end points for heterochromatic spreading during cellular differentiation. Likewise, this shows that the topological domains are not a consequence of heterochromation formation.

In agreement with the linkage of boundary zones to insulator elements, Dixon et al found that they were enriched for binding-sites for the insulator protein CTCF. However, only 15% of global CTCF binding sites were in boundary zones, suggesting a more complex composition and function for the boundary zones. Looking at the distributions of other cellular factors, the researchers showed that boundary zones are associated with high levels of transcription; being enriched for transcription start sites, housekeeping genes, and promoter associated histone marks. Interestingly they also observed an enrichment for SINE retrotransposons. This is in agreement with a recent paper (that I wrote about) linking SINEs to the genomic spread of CTCF binding sites during evolution.

The discovery that the genome is partitioned into these topological domains is part of a growing literature dissecting genomic macro-structure. Dixon et al. compared topological domains with various other recently defined higher order levels of genomic organisation; ‘A+B’ compartments (Lieberman-Aiden et al.), lamina-associated domains, replication time zones, and large organised chromatin K9 modification domains. They concluded that topological domains are related to, but independent from each of these previously characterised architectures. This list gives one some idea of the complexity, and our shallow understanding of, higher order genomic structure. However, this tranche of new chromosome capture techniques, combined with methods for high throughput analysis of chromatin composition, are yielding a wealth of data. In the next few years we should have a far more nuanced and complete appreciation of the interplay between chromosomal architecture, chromatin state and genetic regulation. A mouth-watering prospect.

Dixon, J., Selvaraj, S., Yue, F., Kim, A., Li, Y., Shen, Y., Hu, M., Liu, J., & Ren, B. (2012). Topological domains in mammalian genomes identified by analysis of chromatin interactions Nature, 485 (7398), 376-380 DOI: 10.1038/nature11082

Lieberman-Aiden, E., van Berkum, N., Williams, L., Imakaev, M., Ragoczy, T., Telling, A., Amit, I., Lajoie, B., Sabo, P., Dorschner, M., Sandstrom, R., Bernstein, B., Bender, M., Groudine, M., Gnirke, A., Stamatoyannopoulos, J., Mirny, L., Lander, E., & Dekker, J. (2009). Comprehensive Mapping of Long-Range Interactions Reveals Folding Principles of the Human Genome Science, 326 (5950), 289-293 DOI: 10.1126/science.1181369

The Birth of Introns

Eukaryotic genes are composed of exons and introns. Introns are non-coding sequences that separate the coding exons, and are spliced out of the pre-messenger RNA after transcription. This modular structure of eukaryotic genes allows alternative splicing, by which single genes can encode multiple isoforms of proteins, hence widening the diversity of the proteome. Introns also have important roles in genetic regulation; for instance as sites of enhancers, and by encoding microRNAs.

Intron position is often conserved between orthologous eukaryotic genes showing that spliceosomal introns originated early in eukaryotic evolution. However, it has been difficult to explain the mechanisms of intron loss, and especially, gain that have maintained a high number of introns in present day eukaryotic genomes. Current models suggest that introns should be being lost faster than they are gained. However, studies in organisms such as the urochordate, Oikopleura dioca, and the green alga, Micromonas pusilla, have shown extensive recent intron gains. Interestingly, the study of the Micromonas genome discovered a form of intronic repeat sequence that ‘extended nearly to donor and acceptor sites, and lacked known TE (transposable element) characteristics’. These sequences were termed ‘Introner elements’. A new study, forthcoming in Current Biology, has discovered and characterised something similar in various fungal clades.

Burgt et al. found numerous introns with near-identical sequences in the Dothidiomycete fungus Cladosporium fulvum. They then widened their analysis to search for similar introns in the ‘intronomes’ of 23 other species of fungi, and found large sets of near-identical introns in 6 different species. Phylogenetic analyses of these ‘introner-like elements’ (ILEs) showed that they could be grouped into related clusters, and that in turn the clusters were related to each other, indicating that all the ILE clusters were derived from a single ancestral element.

Analysis of the molecular structure of the Introner-like elements showed that they contained all the distinguishing features of normal spliceosomal introns, such as splice acceptor and donor sites, and branch point sequences. ILEs were longer than normal introns, and were found to fold into more stable secondary structures. Burgt et al. suggest that these predicted stable secondary structures are likely to have important functions, as they observed compensatory mutations that conserve secondary structure between related ILEs.

Analysing intron gain in the 6 species of fungi in which they found ILEs, Burgt et al find that ILEs account for the majority of recent gains. In closely related sister species that diverged within the last 22,000 years ILEs account for 90% of intron gains, but this figure rapidly drops off for older divergences. This leads Burgt et al. to consider that most intron gains are due to ILE multiplication, with rapid degeneration meaning that ILE identification becomes progressively more difficult.

Introner-like elements therefore appear to be mobile elements that can in some way transpose to new sites leading to intron gain. Just what mechanism is employed in this process is far from clear. Many different mechanisms for intron gain have been proposed but as yet there is little experimental evidence demonstrating that they occur in vivo. These include Intron transposition, in which an intron transposes to a new position in a transcript, which is then reverse transcribed and recombined into the original gene; Transposon insertion in which a transposon becomes a spliceable intron; Intronisation in which exons are converted into intron by accumulated mutation; and other ideas based on genetic duplications and errors during repair processes. Burgt et al think that the most likely mechanism for ILEs is a process by which introns are reverse spliced directly into the genome and then reverse transcribed. It will be interesting to see whether ILE transposition can be observed in vivo and figure out just what mechanism of intron generation is employed.

Interestingly, introner-like elements differ from the introner elements found in Micromonas in important ways. Introner elements were found within introns rather than being the whole intron, and lacked the interesting secondary structures observed in ILEs. Along with the author’s inability to find ILEs in other clades, this suggests that ILEs may not be a very widespread mechanism of intron multiplication. However Burgt et al. disagree, and reckon that ILEs could potentially be an ancestral mechanism for intron gain.

van der Burgt, A., Severing, E., de Wit, P., & Collemare, J. (2012). Birth of New Spliceosomal Introns in Fungi by Multiplication of Introner-like Elements Current Biology DOI: 10.1016/j.cub.2012.05.011