Category Archives: New Papers

Genomic Rearrangement in Lampreys 2

As discussed in a recent post, during lamprey embryogenesis programmed genomic rearrangements lead to deletion of ~20% of the germline genome in the soma. Smith, Amemiya and co-workers have now published a follow-up study in which they further characterise the complement of deleted genes. Their findings have led them to hypothesise that the programmed genomic rearrangements (PGRs) serve to segregate pluripotency functions required in germline that could be deleterious in the soma.

Smith et al used a couple of different genomic techniques to identify somatically deleted sequences. Using microarrays constructed from available germline sequence, they found that ~13 % of the sampled sequence was deleted in the soma (in relative agreement with the ~20% derived from flow cytometry). Within this dataset, they identified 8 new single-copy/low-copy number sequences found only in the germline. RT-PCR showed that 5 of the novel sequences were expressed in germline cells. In situ hybridisation of one of these sequences showed that it was expressed in differentiating primordial germ cells in lamprey embryos.

The main limitation for identifying more genes subject to somatic deletion has been a lack of germline genomic sequence. Smith et al. performed high-throughput shotgun sequencing on lamprey sperm cells, generating short sequence reads covering ~10% of the germline genome. They then compared this dataset with the whole-genome sequence derived from somatic (liver) cells, yielding tens of thousands of putative deletion and recombination sites. A substantial part of the somatically deleted DNA corresponds to single-copy, protein-coding genes; the authors identified 246 instances of homology to individual human genes.

The problem with this comparison however, is that, by necessity, it was generated from 2 different individuals (of different sexes). This meant that apparent cases of deletion or recombination may be due to polymorphisms for insertion or deletion mutations present in lamprey populations. The researchers undertook validation experiments on a subset of the candidate deletion/recombination dataset (using PCR to amplify candidate sequences from testes and blood from 4 different males, blood from 4 different females, as well within an array of somatic tissue types within individuals). Of 48 tested candidate gene deletions or recombination events, they validated 7 sites of programmed deletion, and 3 recombination sites. They also identified 3 insertion/deletion polymorphisms, and 5 gaps in the somatic whole genome sequence. Due to PCR failures, or because of repetitive target sequences, 30 of the candidates were not informative. The validated gene deletions included APOBEC-1 complementation factor, encoding a protein involved in RNA editing, and the secreted developmental signalling molecule encoding WNT7A/B.

In the process of these validation experiments, Smith et al discovered short palindromic sequences at the deletion breakpoints. There was no specific consensus sequence at these positions, but the palindromes may indicate that the mechanism of chromatin diminution utilises site-specific recombination.

Another interesting finding of these experiments is that it appears that the programmed deletions are inherited uniformly throughout all the various somatic lineages. The earlier paper (discussed in the previous post) had suggested that different somatic tissues might have subtly different deleted portions. Microarray experiments, and comparison of the validated gene deletions between different tissues found no evidence of this, although this question may as yet not be answered definitively.

The crux of the paper rests on a computational comparison of ontology terms (in which homology is used to make predictions of cellular function, which are further sorted into broad categories). In the dataset of predicted gene deletions, certain ontologies were overrepresented with respect to the rest of the germline sequence; these included ‘regulation of gene expression’, ‘chromatin organisation’, and ‘development of germ/stem cells’.

Simply put, the paper has shown that a substantial number of protein-coding genes as well as repetitive sequences are deleted from the genomes of lamprey somatic cells. Many of the deleted genes are expressed in the germline, and often appear to have important regulatory functions. The crucial characteristics of the germline are the ability to undergo meiotic recombination, and totipotency. The missexpression of factors involved in these processes in the soma would be seriously detrimental; potentially resulting in aberrant cell fate specification, genome disruption, and hence cancers. The authors postulate that this conflict of interests between the germline and the soma underlies their genomic differentiation.

I find this an attractive and interesting hypothesis. As yet though, I don’t think the data is strong enough to have proved it. Gene ontology terms are relatively crude categorisations, and compounded with the question of what proportion of candidate deletions are bona fide, I’ll withhold judgement on the evolutionary rationale behind the deletions for the moment. Jeramiah Smith and colleagues are currently assembling the entire lamprey germline genome. Complete annotation of the deleted portion of the genome will certainly reveal the function of these fascinating genome rearrangements more clearly. I look forward to new studies investigating the mechanisms underlying the rearrangements, and their developmental progression. The extensive genome remodelling that occurs in ciliates utilises a combination of a small RNA/Argonaute system and domesticated transposase enzymes. I guess that analysis of any transposases encoded in the lamprey genome may be the place to start to unravel the mechanisms of chromatin diminution.

Smith JJ, Baker C, Eichler EE, & Amemiya CT (2012). Genetic consequences of programmed genome rearrangement. Current biology : CB, 22 (16), 1524-9 PMID: 22818913

Gene Birth, de novo

What are the genomic mechanisms responsible for the creation of evolutionary novelty? The view that has emerged from the findings of molecular and developmental biology in recent decades is that the primary substrate for evolutionary change is the regulatory network between genes, rather than emergence of new genes. For instance, animal development repeatedly utilises a relatively limited ‘toolkit’ of highly conserved families of transcription factors, signalling molecules and signal transduction components. Most morphological novelty seems to emerge via new deployments and combinations of this repertoire of pre-existing genes. No one doubts that new genes arise during evolution, but in line with the ‘predominantly regulatory’ model most gene birth has been considered to involve duplication of genes, followed by their reorganisation and diversification. However, whole genome sequencing has found that every evolutionary lineage contains protein-coding genes that lack homologues in other lineages – orphan genes. In fact, as much as one third of all annotated genes are orphans. Although orphan genes can arise by processes of duplication and (rapid) diversification, it appears likely that the primary mechanism for their emergence is de novo evolution from previously non-coding sequence.

de novo gene birth requires non-genic sequences to be first transcribed, acquire open reading frames (ORFs), and these ORFs to be translated. There are some obvious conceptual hurdles to be overcome in making a model to explain how this sequence of events may occur to an important extent; how does non-genic sequence become translated? and wouldn’t any polypeptide translation products be insignificant? Corvunis et al. have tried to surmount these problems by postulating a model of de novo gene birth that proceeds through intermediate, reversible, ‘proto-gene’ stages between the emergence of an ORF and bona fide functional protein-coding genes.

A major problem in understanding the genome is that ORFs in genomic sequence are classified using a minimal length threshold. Although ORFs encoding functional polypeptides as small as 9 amino acids long have been discovered, the standard length threshold used to delineate genic ORFs is 300nt (equating to an 100aa protein). In the budding yeast Saccharomyces cerevisiae, ~6000 ORFs are annotated as genes, whilst ~261, 000 unannotated ORFs longer than 3 codons are considered non-genic ORFs. The majority of the S. cerevisiae genome is transcribed, and a number of putatively non-coding transcripts have been shown to associate with ribosomes. Corvunis et al postulated that a certain amount of translation of non-genic ORFs may go on, and although the polypeptides produced may not be functional, providing they were not toxic and selected against, these proto-genes could be maintained in the genome. Proto-genes would provide adaptive potential, and a subset could be retained over time if they provided selective advantage. New genes originating de novo would be expected to be initially shorter, less expressed and more rapidly evolving than established genes.

Carvunis et als model for de novo gene birth leads to a number of predictions. Firstly, there should be an evolutionary continuum between non-genic ORFs and bona fide genes with respect to such characteristics as length, expression level and sequence composition. Secondly, many non-genic ORFs should be translated, and thirdly, some recently emerged ORFs should be adaptively advantageous and hence retained by natural selection.

To test these predictions, Carvunis et al. estimated the order of emergence of S. cerevisiae ORFs, based on their level of conservation amongst ascomycete fungi. Annotated ORFs were classified into 10 groups. For instance, those ORFs found only in S. cerevisiae constituted ORFs1, which accounted for ~2% of the total. ~12% were only conserved within the four closely related Saccharomyces species (ORFs1-4). The weak conservation, and poor characterisation of ORFs1-4 means that their annotation as genes is debatable, whereas the ~88% of annotated S. cerevisiae ORFs that had homologues in more distant species (ORFs5-10), can be more confidently considered genes. The authors also classified ~108,000 unannotated ORFs longer than 30nt as having a conservation level of 0 (ORFs0). Hence, ORFs0 and ORFs1-4 were considered as candidate proto-genes, whilst ORFs5-10 were classed as bona fide genes.

In agreement with the postulated continuum between non-genic ORFs and genes, Corvunis et al. found a positive correlation between the level of conservation and both gene length and expression level. A spectrum of codon usage was also observed; the relative abundances of amino acids encoded by ORFs1-4 being intermediate between those of the (hypothetical) translation products of ORFs5-10 and ORFs0.

To test the second prediction, Corvunis et al. used data on ribosomal occupancy to search for signatures of translation amongst ORFs0. Of these ~108,000 short, unannotated ORFs, 1,139 showed evidence of translation (termed ORFs0+).

The authors went on to measure the extent of selection operating on their classes of ORFs by comparing the genome sequences of 8 different S. cerevisiae strains. ~3% of ORFs0+ and ~9-25% of ORFs1-4 were found to be under purifying selection.

Corvunis et al. therefore classify the set of ORFs0+  (that showed translational activity), and those ORFs1-4 that don’t necessarily show evidence of being under purifying selection, as proto-genes. This set amounted to 1,891 ORFs displaying characteristics intermediate between non-genic ORFs and genes.

Although Corvunis et al found evidence to support all three of their predictions, I found the most persuasive evidence in favour of the importance of de novo gene birth to be the fact that since the division of S. cerevisiae and S. paradoxus between 1 and 5 novel genes have arisen by gene duplication mechanisms, whilst 19 of the 143 ORFs1 (arising de novo in the same period) were found to be under purifying selection.

Perhaps the main take-home message from these analyses is that the imposition of arbitrary annotation boundaries (eg. the 100 codon cut-off) can lead to artifactual understandings. The findings of widespread non-coding transcription, and the potential for marginal, non-functional translation mean that genes exist on a continuum, and their RNA and protein products exist on spectra of functionality. These ‘shades of grey’ may actually be an important source of evolutionary potential.

Carvunis AR, Rolland T, Wapinski I, Calderwood MA, Yildirim MA, Simonis N, Charloteaux B, Hidalgo CA, Barbette J, Santhanam B, Brar GA, Weissman JS, Regev A, Thierry-Mieg N, Cusick ME, & Vidal M (2012). Proto-genes and de novo gene birth. Nature PMID: 22722833

Tautz D, & Domazet-Lošo T (2011). The evolutionary origin of orphan genes. Nature reviews. Genetics, 12 (10), 692-702 PMID: 21878963

Patterns of RNA methylation 2

In a recent post I discussed the extent of adenosine methylation in RNAs. Meyer et al. found that m6A was found in many mRNAs and showed a bias in its distribution towards the end of coding sequence, stop codons, and the proximal section of 3’UTRs. The main chemically modified base of DNA is 5-methylcytosine. Squires et al. have surveyed the presence of m5C in human RNAs, and find that this modification is also common in tRNAs, rRNAs, mRNAs and ncRNAs.

The principal method for detecting methylated cytosines in nucleic acids is bisulphite sequencing. Bisulphite converts cytosine residues to uracil, but modified cytosines are left unchanged. Hence, when sequenced, C reads as T, and m5C reads as C. When compared to a reference sequence the status of cytosine methylation can be deduced. Squires et al. used bisulphite conversion of RNAs, followed by reverse transcription and high throughput sequencing. A number of other modified forms of cytosine known to be present in some rRNAs, such as N4-methylcytidine (m4C) and N4,2’-O-dimethylcytidine (m4Cm), may also be resistant to bisulphite treatment. With this in mind, Squires et al. termed their detected modified cytosines m5C candidate sites.

Surveying RNAs from HeLa cells, Squires et al discovered 255 modified Cs in tRNAs. This confirmed a number of known sites and identified many new candidate sites, which however generally fitted into a known pattern of modification of residues in specific secondary structural regions – the variable region and the anticodon loop. Modifications in these areas are important in stabilising secondary structure and affect aminoacylation and codon recognition.

Most interestingly, the researchers discovered 10, 275 m5C candidate sites in mRNAs and ncRNAs. Their data covered 10.6% of the total cytosine residues in the transcriptome. m5C seems to be enriched in some classes of ncRNA, but relatively depleted in mRNAs. The majority (83%) however, of their candidate sites were found in mRNAs. Within these transcripts m5C appears to be depleted within protein coding sequences but enriched in 5’ and 3’ UTRs. Further computational analysis showed an association between mRNA m5C sites and binding regions for Argonaute proteins (the proteins that small regulatory RNA molecules complex with to effect post-transcriptional regulation).

Two different methyltransferases are known to catalyse the m5C modification in eukaryotic RNAs, NSUN2 and TRDMT1. Previously these two enzymes had only been shown to methylate a few specific positions in various tRNAs. Squires et al. used RNAi to knockdown NSUN2 and TRDMT1 in HeLa cells and assayed the methylation status of a selected subset of cytosine residues. This showed that a number of m5C sites in mRNAs and ncRNAs are dependent on NSUN2, suggesting that this could be the primary enzyme responsible for cytosine methylation in these classes of RNAs. NSUN2 has been shown to be cell-cycle regulated and a target for the oncogene MYC. Mouse knockouts are small, and have revealed a role in balancing stem cell renewal and differentiation. A recent paper (Khan et al. 2012) has linked mutations in NSUN2 to autosomal-recessive intellectual disability syndrome in humans. It will be interesting to investigate the extent of this enzyme’s role in RNA methylation, and dissect what component of it’s function is responsible for the mouse and human phenotypes.

As with the investigation into m6A, m5C is commonly found in RNAs of many categories, and as with the previous study it is not yet obvious just how important RNA methylation truly is. The phenotypes associated with loss of methyltransferases or demethylases are not that extensive, but neither are they negligible. Some observations are shared between Meyer et al and Squires et al; the enrichments in 3’ UTRs and the correlation between RNA methylation and microRNA/argonaute binding sites (although there were differences in the details of these associations. This investigation by Squires et al into m5C is not on the same level as Meyer et al’s study, in that it lacked the developmental component and wasn’t on the same global scale. On the other hand bisulphite sequencing does pinpoint the exact modified residues, whereas m6A cannot as yet be detected to the same level of accuracy. The methodology used by Squires et al. can be scaled up, and so more global studies of m5C will no doubt appear in the near future. I also look forward to more detailed understanding of the enzymatic pathways involved, and a dissection of their roles in development.

Squires JE, Patel HR, Nousch M, Sibbritt T, Humphreys DT, Parker BJ, Suter CM, & Preiss T (2012). Widespread occurrence of 5-methylcytosine in human coding and non-coding RNA. Nucleic acids research, 40 (11), 5023-33 PMID: 22344696

Khan MA, Rafiq MA, Noor A, Hussain S, Flores JV, Rupp V, Vincent AK, Malli R, Ali G, Khan FS, Ishak GE, Doherty D, Weksberg R, Ayub M, Windpassinger C, Ibrahim S, Frye M, Ansar M, & Vincent JB (2012). Mutation in NSUN2, which encodes an RNA methyltransferase, causes autosomal-recessive intellectual disability. American journal of human genetics, 90 (5), 856-63 PMID: 22541562

Meyer KD, Saletore Y, Zumbo P, Elemento O, Mason CE, & Jaffrey SR (2012). Comprehensive Analysis of mRNA Methylation Reveals Enrichment in 3′ UTRs and near Stop Codons. Cell, 149 (7), 1635-46 PMID: 22608085

Genomic Rearrangement in Lampreys

Generally, all cells in an organism are considered to have the same genome; the differences between cells being determined by differential expression of genes. However, a growing list of species, including sciarid flies, and various copepods and nematodes, are known to undergo genomic rearrangements during the differentiation of cell lineages. Recent findings have shown that two species of jawless vertebrates (cyclostomes), the hagfish and the sea lamprey, undergo extensive genomic remodelling during development.

Hagfish and lampreys are the closest extant relatives of the jawed vertebrates (gnathostomes). Controversy has reigned over whether lampreys are more closely related to gnathostomes than they are to hagfish. This debate appears to have been resolved by molecular phylogenetics; the two branches of jawless vertebrates are united as a monophyletic clade, the cyclostomes. However the division of hagfish and lampreys occurred shortly after the division of gnathostomes and cyclostomes in the early stages of vertebrate evolution (~500mya). Two whole genome duplications occurred during the early evolution of the vertebrates (referred to as 1R and 2R). It is not yet resolved whether these duplications occurred before the divergence of cyclostomes and gnathostomes. Opinion appears to be split as to whether this divergence occurred after 1R or after 2R. The easiest way to resolve this question is whole genome sequencing. It has been known for some time that hagfish undergo genomic rearrangements during early development. Smith et al (2009) have now shown a similar phenomenon occurring in the Sea Lamprey (Petromyzon marinus), explaing some of the difficulty in constructing a finished version of the sea lamprey genome.

Smith et al. found that the total DNA content of nuclei from germline (sperm) and various somatic cell types (blood, liver, kidney) differed by >20%, equating to ≈500Mb. They then performed southern blots in which the restricted genomes of blood or sperm cells were probed with a repetitive sequence element. A number of bands differed in size or intensity between the germline and somatic cells, showing that genome rearrangement had occurred. One specific band was present in the germline samples and virtually absent from a range of somatic tissues. This band, termed Germ1 consisted of sequences for the 18s ribosomal DNA, a retrotransposon, and a section of the 28s rDNA. When aligned to Lamprey genomic sequence (somatic cell derived), these sequences were all commonly found, but one section of Germ1, from one end of the fragment to the 28s rDNA section, was dramatically underrepresented; a germline specific sequence. Smith et al. then performed FISH (fluorescent in situ hybridisation) using the Germ1 clone against metaphase germline and somatic cells. In the germline they found many Germ1-like sequences distributed across several chromosomes, often arrayed in tandem repeats. In contrast, in somatic cell nuclei, Germ1 hybridised to only one chromosome pair, most likely equating to the functional rDNAs. Using real-time PCR over the period of early embryogenesis, the authors found Germ1 abundance was drastically reduced 2 or 3 days after fertilisation. Smith et al. estimate that Germ1-like sequences make up ~7% of the germline genome, therefore suggesting that ~13% more of it is also lost. By comparing germline BAC clones with somatic genome sequence, they managed to identify more lost sequences including a gene known to be expressed during germline development, SPOPL.

Smith et al. have therefore shown that the sea lamprey genome undergoes a dramatic rearrangement during the early development of the somatic tissues. A large proportion of the excised content is accounted for by the elimination of Germ1-like sequences. However, it appears likely that a number of specific genes are also lost. In most cases of large scale somatic genomic rearrangement the main excised component is made up of transposable elements and other repetitive sequences. It seems likely that this could be the main basis for that seen in lampreys, as Germ1-like sequences appear to be a strange fusion of transposon and duplicated ribosomal DNA genes. However, the finding that an individual gene, SPOPL, is also selectively deleted in somatic lineages suggests that genomic deletions could be linked to the genetic regulation of development.

Interestingly, when the common apoptosis assay TUNEL – which detects DNA breaks – is used during the first few weeks of lamprey embryonic development, nearly every nucleus is labelled. It seems likely that this effect (that was considered an artifact) is explained by developmentally regulated deletions. The deletion of SPOPL appeared to occur more gradually than that of Germ1- like sequences. Together with the observation that the total nuclear DNA content differed slightly between various somatic tissues, these findings suggest that this program of somatic genomic deletions could be occurring in an intricate, and tissue specific, progression during early development.

All jawed-vertebrates undergo programmed genomic rearrangements during the diversification of the immune system. VDJ recombination, mediated by the transposase derived RAG recombinase, generates antigenic diversity, allowing adaptive immune responses. Cyclostomes have an alternate RAG-independent adaptive immune system, termed VLR. The recombinational system used in VLR is not yet clear. Is this system linked to that employed during the early developmentally regulated deletion process?

Smith et al (2010) link the presence of chromatin diminution (ie the somatic genomic rearrangements) in this basal vertebrate taxon to the whole genome duplications that occurred in the vertebrate stem group. If the genomic rearrangement mechanisms seen in cyclostomes were present in the last common ancestor of jawed and jawless vertebrates perhaps this system predisposed the stem group vertebrates to whole genome duplications by creating a permissive environment for polyploidisation and rediploidisation? Although this is a fascinating idea, currently it is perhaps a slightly idle speculation. It is of pressing importance to understand the mechanisms underlying cyclostome genomic rearrangements. Are they the same between hagfish and lampreys? Are similar systems present in gnathostomes? Finished genome sequences for both germline and various somatic cell lineages will answer many questions regarding the effects and purposes of chromatin diminution, however one can understand that this is easier said than done considering the potential complexity of the rearrangements. Lampreys and hagfish are also so far unculturable in the laboratory, adding to the difficulty in expanding their experimental use. Perhaps the biggest question left hanging, is whether other jawed vertebrates employ programmed genomic rearrangements for purposes other than antigenic diversification? This remains a possibility as the consistency of the genome has been generally assumed rather than tested.

See also a follow up on a new paper from the same group: Genomic Rearrangement in Lampreys 2

Smith JJ, Antonacci F, Eichler EE, & Amemiya CT (2009). Programmed loss of millions of base pairs from a vertebrate genome. Proceedings of the National Academy of Sciences of the United States of America, 106 (27), 11212-7 PMID: 19561299

Smith JJ, Saha NR, & Amemiya CT (2010). Genome biology of the cyclostomes and insights into the evolutionary biology of vertebrate genomes. Integrative and comparative biology, 50 (1), 130-7 PMID: 21558194

Shimeld SM, & Donoghue PC (2012). Evolutionary crossroads in developmental biology: cyclostomes (lamprey and hagfish). Development (Cambridge, England), 139 (12), 2091-9 PMID: 22619386
This article is good for vertebrate phylogeny and cyclostome development. A free version here

On Genome Topology 2: The Fractal Globule

As a follow-up to my last post on the use of Hi-C to discover highly self-interacting genomic ‘topological domains’, I wanted to discuss a very interesting aspect of the original paper describing Hi-C. As well as finding a division of the genome into two chromatin compartments, Lieberman-Aiden et al. used their Hi-C data to compare and contrast two models of the topology of chromatin folding within the nucleus.

In this first description of Hi-C, Leberman-Aiden divided their genome-wide contact matrix into 1Mb regions (ie.10 times less definition than the Dixon et al study). They found that, at this level of resolution, the genome can be partitioned into two varieties of spatial compartment, termed A and B. Greater interaction occurs within each compartment than across compartments. Compartment A displays a more open form of chromatin, with a high gene density and high levels of gene expression. Compartment B shows a more densely packed, closed chromatin state. Although the authors do not equate these compartments to euchromatin and heterochromatin, they sound distinctly similar to this old cytogenetic division.

In the later section of the paper, Lieberman-Aiden et al. discuss how their Hi-C data can be used to test models of the three dimensional folding of chromatin. The ‘Equilibrium globule’ model has been used to describe polymers in a poor solvent at equilibrium. In it chromatin is pictured as being in a densely knotted configuration. The ‘Fractal Globule’ model describes polymers self-organising into long-lived, non-equilibrium conformations:

“This highly compact state is formed by an unentangled polymer when it crumples into a series of small globules in a “beads-on-a-string” configuration. These beads serve as monomers in subsequent rounds of spontaneous crumpling until only a single globule-of-globules-of-globules remains. The resulting structure resembles a Peano curve, a continuous fractal trajectory that densely fills 3D space without crossing itself”

(C) Top: An unfolded polymer chain, 4000 monomers (4.8 Mb) long. Coloration corresponds to distance from one endpoint, ranging from blue to cyan, green, yellow, orange, and red. Middle: An equilibrium globule. The structure is highly entangled; loci that are nearby along the contour (similar color) need not be nearby in 3D. Bottom: A fractal globule. Nearby loci along the contour tend to be nearby in 3D, leading to monochromatic blocks both on the surface and in cross-section. The structure lacks knots. (D) Genome architecture at three scales. Top: Two compartments, corresponding to open and closed chromatin, spatially partition the genome. Chromosomes (blue, cyan, green) occupy distinct territories. Middle: Individual chromosomes weave back-and-forth between the open and closed chromatin compartments. Bottom: At the scale of single megabases, the chromosome consists of a series of fractal globules.

When the intrachromasomal contact probability is plotted against genomic distance a power law scaling is observed between ~500kb and ~7Mb. This scaling figure (s-1.08) is much closer to that predicted for the fractal globule model (s-1) than that for the equilibrium globule (s-3/2). Likewise, data on the 3D distance between pairs of loci from 3D-FISH is in agreement with a fractal globule topology.

It therefore seems that, at the scale of several megabases, chromatin is organised in these knot-free conformations of globules within globules, allowing unfolding and refolding, whilst also enabling maximally dense packing. I must admit that I don’t have too much insight into the meaning of this; but frankly fractals are cool, and I love the idea of crumpling into globules of globules!

Lieberman-Aiden, E., van Berkum, N., Williams, L., Imakaev, M., Ragoczy, T., Telling, A., Amit, I., Lajoie, B., Sabo, P., Dorschner, M., Sandstrom, R., Bernstein, B., Bender, M., Groudine, M., Gnirke, A., Stamatoyannopoulos, J., Mirny, L., Lander, E., & Dekker, J. (2009). Comprehensive Mapping of Long-Range Interactions Reveals Folding Principles of the Human Genome Science, 326 (5950), 289-293 DOI: 10.1126/science.1181369

On Genome Topology

The study of higher order genomic structure using novel chromosome conformation capture techniques is an important growth area of biological research. These methods are being used to study long-range interactions between or within chromosomes, and promise to elucidate the spatial organisation of the genome, and it’s functional significance. One such technique, Hi-C, which allows the identification of chromatin interactions across the entire genome, is used in a recent paper to discover that mammalian chromosomes are divided into highly self-interacting ‘topological domains’.

Hi-C works by purifying chromosomal interactions and then sequencing the products. Briefly, this is achieved by chromosomes first being cross-linked by treatment with formaldehyde; the DNA is then chopped up and the ends of the fragments are chemically marked; the fragments are then ligated together under conditions that favour ligation of cross-linked fragments. Thus the ligation products were originally in close proximity to each other. After shearing, the marked fragments are purified, and the resulting library of interacting fragments is ‘massively parallel sequenced’. Upon alignment with a reference genome sequence, one can construct a genome-wide contact matrix.

Dixon et al. applied Hi-C to mouse ES cells, human ES cells, human fibroblasts, as well as using data from mouse cortex. They found that when they analysed their data at a resolution of less than 100kb, highly self-interacting regions emerged. For example, in mouse ES cells, 2,200 of these ‘topological domains’, with a median size of 880kb, occupied ~91% of the genome. The topological domains were separated by short segments in which chromatin interactions ended abruptly, termed ‘topological boundary regions’. Interestingly, in general, the boundary regions remained the same between embryonic stem cells and differentiated cells, in both mouse and human. Hence, the overall domain architecture is generally unchanged between cell types.  Surprisingly, there was also quite a high degree of conservation of boundary zones between human and mouse.

These boundary zones seem to correspond to insulator or barrier elements that are known to divide different chromatin domains, and prevent heterochromatin from spreading. For instance the HoxA locus is divided into two compartments by a known insulator element, which was found to be a topological boundary region in both human and mouse. Dixon et al. also found that the distribution of the heterochromatin associated histone  modification H3K9me3 was segregated at boundary regions in differentiated cells. As the topological domains generally remain constant between stem and differentiated cell types, the boundaries seem to pre-mark the end points for heterochromatic spreading during cellular differentiation. Likewise, this shows that the topological domains are not a consequence of heterochromation formation.

In agreement with the linkage of boundary zones to insulator elements, Dixon et al found that they were enriched for binding-sites for the insulator protein CTCF. However, only 15% of global CTCF binding sites were in boundary zones, suggesting a more complex composition and function for the boundary zones. Looking at the distributions of other cellular factors, the researchers showed that boundary zones are associated with high levels of transcription; being enriched for transcription start sites, housekeeping genes, and promoter associated histone marks. Interestingly they also observed an enrichment for SINE retrotransposons. This is in agreement with a recent paper (that I wrote about) linking SINEs to the genomic spread of CTCF binding sites during evolution.

The discovery that the genome is partitioned into these topological domains is part of a growing literature dissecting genomic macro-structure. Dixon et al. compared topological domains with various other recently defined higher order levels of genomic organisation; ‘A+B’ compartments (Lieberman-Aiden et al.), lamina-associated domains, replication time zones, and large organised chromatin K9 modification domains. They concluded that topological domains are related to, but independent from each of these previously characterised architectures. This list gives one some idea of the complexity, and our shallow understanding of, higher order genomic structure. However, this tranche of new chromosome capture techniques, combined with methods for high throughput analysis of chromatin composition, are yielding a wealth of data. In the next few years we should have a far more nuanced and complete appreciation of the interplay between chromosomal architecture, chromatin state and genetic regulation. A mouth-watering prospect.

Dixon, J., Selvaraj, S., Yue, F., Kim, A., Li, Y., Shen, Y., Hu, M., Liu, J., & Ren, B. (2012). Topological domains in mammalian genomes identified by analysis of chromatin interactions Nature, 485 (7398), 376-380 DOI: 10.1038/nature11082

Lieberman-Aiden, E., van Berkum, N., Williams, L., Imakaev, M., Ragoczy, T., Telling, A., Amit, I., Lajoie, B., Sabo, P., Dorschner, M., Sandstrom, R., Bernstein, B., Bender, M., Groudine, M., Gnirke, A., Stamatoyannopoulos, J., Mirny, L., Lander, E., & Dekker, J. (2009). Comprehensive Mapping of Long-Range Interactions Reveals Folding Principles of the Human Genome Science, 326 (5950), 289-293 DOI: 10.1126/science.1181369

The Birth of Introns

Eukaryotic genes are composed of exons and introns. Introns are non-coding sequences that separate the coding exons, and are spliced out of the pre-messenger RNA after transcription. This modular structure of eukaryotic genes allows alternative splicing, by which single genes can encode multiple isoforms of proteins, hence widening the diversity of the proteome. Introns also have important roles in genetic regulation; for instance as sites of enhancers, and by encoding microRNAs.

Intron position is often conserved between orthologous eukaryotic genes showing that spliceosomal introns originated early in eukaryotic evolution. However, it has been difficult to explain the mechanisms of intron loss, and especially, gain that have maintained a high number of introns in present day eukaryotic genomes. Current models suggest that introns should be being lost faster than they are gained. However, studies in organisms such as the urochordate, Oikopleura dioca, and the green alga, Micromonas pusilla, have shown extensive recent intron gains. Interestingly, the study of the Micromonas genome discovered a form of intronic repeat sequence that ‘extended nearly to donor and acceptor sites, and lacked known TE (transposable element) characteristics’. These sequences were termed ‘Introner elements’. A new study, forthcoming in Current Biology, has discovered and characterised something similar in various fungal clades.

Burgt et al. found numerous introns with near-identical sequences in the Dothidiomycete fungus Cladosporium fulvum. They then widened their analysis to search for similar introns in the ‘intronomes’ of 23 other species of fungi, and found large sets of near-identical introns in 6 different species. Phylogenetic analyses of these ‘introner-like elements’ (ILEs) showed that they could be grouped into related clusters, and that in turn the clusters were related to each other, indicating that all the ILE clusters were derived from a single ancestral element.

Analysis of the molecular structure of the Introner-like elements showed that they contained all the distinguishing features of normal spliceosomal introns, such as splice acceptor and donor sites, and branch point sequences. ILEs were longer than normal introns, and were found to fold into more stable secondary structures. Burgt et al. suggest that these predicted stable secondary structures are likely to have important functions, as they observed compensatory mutations that conserve secondary structure between related ILEs.

Analysing intron gain in the 6 species of fungi in which they found ILEs, Burgt et al find that ILEs account for the majority of recent gains. In closely related sister species that diverged within the last 22,000 years ILEs account for 90% of intron gains, but this figure rapidly drops off for older divergences. This leads Burgt et al. to consider that most intron gains are due to ILE multiplication, with rapid degeneration meaning that ILE identification becomes progressively more difficult.

Introner-like elements therefore appear to be mobile elements that can in some way transpose to new sites leading to intron gain. Just what mechanism is employed in this process is far from clear. Many different mechanisms for intron gain have been proposed but as yet there is little experimental evidence demonstrating that they occur in vivo. These include Intron transposition, in which an intron transposes to a new position in a transcript, which is then reverse transcribed and recombined into the original gene; Transposon insertion in which a transposon becomes a spliceable intron; Intronisation in which exons are converted into intron by accumulated mutation; and other ideas based on genetic duplications and errors during repair processes. Burgt et al think that the most likely mechanism for ILEs is a process by which introns are reverse spliced directly into the genome and then reverse transcribed. It will be interesting to see whether ILE transposition can be observed in vivo and figure out just what mechanism of intron generation is employed.

Interestingly, introner-like elements differ from the introner elements found in Micromonas in important ways. Introner elements were found within introns rather than being the whole intron, and lacked the interesting secondary structures observed in ILEs. Along with the author’s inability to find ILEs in other clades, this suggests that ILEs may not be a very widespread mechanism of intron multiplication. However Burgt et al. disagree, and reckon that ILEs could potentially be an ancestral mechanism for intron gain.

van der Burgt, A., Severing, E., de Wit, P., & Collemare, J. (2012). Birth of New Spliceosomal Introns in Fungi by Multiplication of Introner-like Elements Current Biology DOI: 10.1016/j.cub.2012.05.011

A dual purpose RNA and Hox regulation

A new paper in Plos Genetics shows that a long non-coding RNA regulates the expression of a Hox gene in Drosophila in cis. This finding suggests an explanation for the co-linearity displayed by Hox genes between genomic arrangement and expression pattern.

The Ultrabithorax mutant.

Hox genes are master-regulators of positional identity along the anterior-posterior axis throughout bilaterian animals. Hox genes are found in genomic clusters in which their 3′-5′ organisation mirrors their expression pattern along the A-P axis. This correspondence between body axis and genomic organisation is termed co-linearity. An important feature of Hox gene genetics is the phenomenon of ‘posterior prevalence’. In any given segment the gene that has it’s most anterior boundary of expression in that segment will define segmental identity. Hence, if that gene is not expressed the segment will take on a more anterior identity. Perhaps the clearest example of this phenomenon is the Ultrabithorax mutant in Drosophila, in which segments that would have generated abdominal structures instead take on a thoracic fate, leading to flies with two sets of wings.

The Hox gene cluster is actually divided into two partial clusters in Drosophila; the Antennapedia complex (ANT-C) and the Bithorax complex (BX-C). BX-C consists of three Hox genes responsible for posterior patterning in Drosophila, Ultrabithorax (Ubx), abdominal-A (abd-A), and Abdominal-B (Abd-B) spread over ~300kb, and has become a paradigm for the understanding on genetic regulation. Many transcriptional enhancers, maintenance elements (sites for the binding of Polycomb-group and Trithorax-group chromatin modulating complexes), and encoded microRNAs responsible for regulating the expression of the BX-C genes have been discovered. However, a complete picture of BX-C regulation is still far away. It’s been known since the 1980′s that much of BX-C is transcribed, but the significance of this finding is just emerging. Gummalla et al. have used classical genetics to characterise the role of one such non-coding RNA in relation to the expression of abd-A in the embryonic CNS.

Figure showing the expression of ABD-A (red), and ABD-B (green) in the embryonic CNS. Note the gap in PS13, that isn’t filled by derepressed ABD-A in this mutant.

abd-A is expressed in the embryonic epidermis and CNS in parasegments (PS) 7-12 but is excluded from PS13. In line with ‘posterior prevalence’, this was considered to be due to Abd-B repressing abd-A expression. A mutation that removes Abd-B, shows expression of abd-A expression extending into PS13. However, this mutation also removed some of the sequence downstream of the transcription unit of Abd-B. In flies homozygous for more subtle mutations affecting Abd-B, abd-A expression only spreads into PS13 epidermis and not the CNS.  Therefore, some function located in the genomic region downstream of Abd-B (termed iab-8), was necessary for abd-A repression in the PS13 CNS. Gumalla et al. knew that a long non-coding RNA (iab-8 ncRNA) was predicted to initiate in this area, and therefore set out to characterise it’s function.

A map of the abdominal half of the bithorax complex. the iab-8 ncRNA is shown in blue (note exon structure). Abd-B, and abd-A are in black and the position of the miR-iab-8 is shown.

iab-8 ncRNA is transcribed from virtually the entire region between Abd-B and abd-A, spanning 92kb. Mutations that truncate iab-8 ncRNA near the Abd-B end cause a derepression of abd-A expression in the PS13 CNS, but mutations affecting the end nearest abd-A display only subtle derepression. The difference between these two classes of mutants, appears to be the position of a microRNA encoded by iab-8 ncRNA, miR-iab-8. This suggested that miR-iab-8 was responsible for the repression of abd-A in PS13 CNS. However, mutants with this miRNA deleted did not display the complete derepression phenotype, rather a very weak derepression of abd-A. This showed that there must be a second, partially redundant function of iab-8 ncRNA, apart from producing miR-iab-8.

To test whether a second miRNA or a small polypeptide encoded by iab-8 ncRNA was responsible for this second function, Gummalla et al. missexpressed iab-8 ncRNA from another locus in PS 8-13. This had no effect on ABD-A expression, suggesting that no other trans-acting factor is encoded by the ncRNA. They then performed some complicated genetic experiments that showed that iab-8 ncRNA acts to repress abd-A is cis. They generated flies that contained a deletion of miR-iab-8 on one chromosome, and a truncated copy of the iab-8 ncRNA on the other. These flies do not produce any of the miRNA, but still produce the ncRNA on one chromosome, and yet abd-A is derepressed in PS13 CNS. When flies are generated with one copy of the BX-C deleted, and a deletion of miR-iab-8 on the other chromosome, abd-A is not derepressed.

The iab-8 ncRNA therefore acts to repress abd-A expression in CNS of PS13 through two different mechanisms: a trans-acting miRNA, and through a cis-acting process of transcriptional interference. Although it is possible that this process of cis-repression could act by iab-8 ncRNA recruiting gene silencing machinery that would act by heterochromatin formation or DNA methylation, the authors suggest that it is more likely that iab-8 ncRNA acts by somehow interfering with the abd-A promoter. This leads them to suggest that if this method of gene regulation was widely used within Hox clusters it could explain the link between posterior prevalence and co-linearity. In this case expression of a more anterior gene is blocked in posterior segments by a more ‘posterior’ transcript. Similarly an upstream ncRNA acts to repress Ubx (Petruk et al.2006). This method of transcriptional interference by readthrough of more posterior genes or by upstream ncRNAs would fix the arrangement of Hox genes in an ancestral cluster, and hence the co-linearity that is observed today.

Gummalla, M., Maeda, R., Castro Alvarez, J., Gyurkovics, H., Singari, S., Edwards, K., Karch, F., & Bender, W. (2012). abd-A Regulation by the iab-8 Noncoding RNA PLoS Genetics, 8 (5) DOI: 10.1371/journal.pgen.1002720

Patterns of RNA methylation

A new paper in Cell provides a transcriptome-wide survey of the methylation of adenosine residues in RNAs. Meyer et al find that this epitranscriptomic post-transcriptional modification is widespread and dynamically regulated, and likely to play important roles in cellular regulation.

Methylation of the N6 position of adenosine residues (m6A) has been known to be a post-transcriptional modification of RNAs for many years. Research in the 1960’s and 70’s demonstrated that m6A is present in tRNAs, rRNAs and viral RNAs, and made up between 0.1% and 0.4% or total adenosines in cellular RNA. However as m6A was not easily detectable by commonly available methods, research on this modified base foundered. A recent spur to experimentation on m6A has come from the analysis of a gene linked to obesity. FTO (fat mass and obesity associated) is a major regulator of metabolism and energy utilisation. It appears that the major catalytic function of FTO is the demethylation of N6-methyladensosine (m6A), suggesting that m6A has important physiological roles in humans and other mammals.

As m6A is not detectable by sequencing or hybridisation based techniques, nor susceptible to chemical modification, Meyer et al. based their experiments on the use of an anti-m6A antibody (ά-m6A). They first showed that m6A was present in RNA from a wide selection of different mouse tissues and cell lines. It was especially enriched in liver, kidney, and brain, and showed a dramatic increase in adult neural tissue as opposed to embryonic. m6A was found to be present in RNAs of all sizes, and was enriched in the polyadenylated fraction (ie. mRNAs), but not present in the poly(A) tails themselves.

To look in more detail at the distribution of m6A throughout the transcriptome, Meyer et al. developed a high throughput technique called MeRIP-Seq. Cellular RNA is fragmented into ~100nt fragments, and then m6A containing fragments are immunoprecipitated using ά-m6A. The RNA fragments are then deep sequenced. m6A residues should be detected on multiple RNA fragment sequence reads, allowing the detection of m6A peaks, that can be assigned to their approximate position on RNA molecules. Using adult mouse brain RNA in multiple MeRIP-Seq experiments, Meyer et al. identified 41, 072 distinct peaks in the RNAs of 8,843 genes. However they used a smaller, highly reproducible, subset of 13, 471 peaks in 4, 654 genes for their further analyses.

94.5% of the m6A peaks occurred in mRNAs, but more than 3% were found within long non-coding RNAs, showing that ncRNAs are also targets for adenosine methylation. mRNAs from a wide variety of genes were found to contain methylated adenosines, including many involved in cellular regulation, and genes linked to neurodevelopmental and neurological disorders.

The largest proportion of m6A containing mRNAs exhibited a single m6A peak (46%) (equating to either a single m6A residue or a cluster of adjacent m6As), whilst 48.5% contained two or three peaks. However, mRNAs can contain more than 15 peaks along their lengths. Although MeRIP-Seq doesn’t allow one to say exactly which adenosines are methylated, it does give one a good idea of their positions on RNAs. m6A levels are low in the 5’ ends of mRNAs. They increase steadily throughout the coding sequence, peak in the vicinity of the stop codon, remain high in the first portion of the 3’ UTR and then rapidly decline. This linkage between the region of the translational stop codon and m6A is the most important finding of the paper.

Meyer et al. went on to show that regions of m6A occurrence are more likely to be conserved in vertebrates. They also found a correlation between m6A in 3’UTRs and the presence of microRNA binding sites.

Adenosine methylation has therefore been shown to be a widespread and dynamically regulated post-transcriptional modification of mRNAs and lncRNAs in mammals. Its functional significance however, is still difficult to gauge. So far, the pathways responsible for adenosine methylation of RNAs are not characterised. It is also unclear as to whether FTO is the primary enzyme responsible for adenosine demethylation. FTO knockout mice survive, but display postnatal growth retardation and decreased locomotor activity. The linkages between m6A, stop codons and miRNA binding await mechanistic study, but are suggestive of important regulatory roles for RNA methylation. With MerIP-Seq, Meyer et al. have invented a useful technique for the analysis of this important modification.

Meyer, K., Saletore, Y., Zumbo, P., Elemento, O., Mason, C., & Jaffrey, S. (2012). Comprehensive Analysis of mRNA Methylation Reveals Enrichment in 3′ UTRs and near Stop Codons Cell DOI: 10.1016/j.cell.2012.05.003

A follow-up to this post on 5-methylcytosine in RNAs: Patterns of RNA methylation 2

A Ribosome Code?

The ribosome, a universally conserved molecular machine that catalyses protein synthesis, has generally been considered to act constitutively. That is to say, that ribosomes act to translate mRNAs in the same way across all cells and developmental stages. Regulatory control of translation is predominantly exerted by the action of translation initiation factors, which guide the association of the ribosome with target mRNAs. The eukaryotic ribosome is composed of 4 RNA molecules and 79 different ribosomal proteins (RPs). A paper published last year by Kondrashov et al. has shown one RP (RPL38) specifically regulates the expression of a subset of mRNAs during embryonic development in the mouse. Together with findings from human genetic diseases and from other organisms, this data is suggestive of a ‘ribosomal code’ regulating translation.

Kondrashov et al. set out to discover what gene was responsible for causing the morphological defects found in a spontaneous mouse mutant, tail-short (Ts). These mice display skeletal patterning defects, including homeotic transformations (ie. the conversion of a tissue’s identity to that of a different tissue; in this case changes between the segmental identities of vertebrae and ribs). They also display eye and craniofacial defects, short and kinky tails, and wavy neural tubes. These phenotypes are only found in heterozygous mice (Ts/+); homozygotes die at implantation stages. By positional cloning, Kondrashov et al. found that the gene responsible for Ts was Rpl38.

Ts/+ mice display skeletal defects and transformations along the entire length of the anterior-posterior body axis. The key regulators of morphological identity along the A-P axis are Hox genes. Hox genes encode homeodomain-containing transcription factors, and are found in four genomic clusters in vertebrates. Loss of function mutations in, or missexpression of, Hox genes generally leads to homeotic transformations (most shockingly seen in the Drosophila mutants antennapedia and ultrabithorax). Kondrashov et al. therefore examined the expression of Hox gene transcripts in Ts/+ mouse embryos. Surprisingly, they found no changes in the levels or expression domains of the Hox genes.

Schematic representation of the axial skeleton of WT and Ts/+ mice. Defects are explained by the effects of corresponding Hox gene mutants.

The researchers then asked whether changes in translational control of Hox genes were responsible for the Ts/+ phenotypes. Using various techniques they showed that there were no changes in global protein synthesis. However, by using quantitative PCR on mRNAs that were purified with active ribosomes, they identified a subset of Hox genes that were translationally deregulated in Ts/+ embryos (Hoxa4; a5; a9; a11; b3; b13; c8; d11).  These findings were confirmed by observing protein levels for HOXA5, A11, and B13 in the Ts/+ mouse embryos. The majority of the Ts/+ axial skeleton phenotypes could be accounting for by the known effects of loss of function mutations in the Hox genes that were translationally deregulated.

It therefore appears that RPL38 is exerting a specialised control on the translation of specific Hox genes. In further experiments Kondrashov et al. find that RPL38 is likely facilitating the formation of the 80S (complete) ribosomal complex on specific mRNAs (the ribosome is made up of two subunits, the 40S subunit associates with the 5′UTR of the target mRNA first and is then joined by the 60S subunit to make a translationally competent ribosome). An important question is whether RPL38 exerts it’s function as part of the ribosome, or whether it has extra-ribosomal roles as well? By separating ribosomal from ribosome-free cytosolic fractions, Kondrashov et al, found that RPL38 was only ever found in the ribosome.

Ribosomal proteins have generally been considered as ubiquitously expressed cellular ‘housekeeping’ proteins. However, when the researchers examined Rpl38 expression, they found that transcripts were enriched in specific tissues. For instance, embryonic tissues that give rise to facial structures, as well as the neural retina, showed high levels of Rpl38 expression, correlating with the craniofacial and eye defects in Ts/+ mice. Likewise, Rpl38 was strongly expressed in the somites and the neural tube, the embryonic tissues giving rise to the vertebrae and the spinal cord respectively. Kondrashov et al. went on to examine the expression of 72 different ribosomal proteins in 14 different tissue and cell types. They found a large amount of heterogeneity in RP expression, suggesting that many have specialised, tissue specific roles.

A few obvious outstanding questions for future studies should be noted; Does RPL38 bind cis-regulatory sequence or structure elements within target mRNAs? and what are they? Do trans-acting factors also play a role? Other developmental questions also stand out. Hox genes are not involved in eye development, and it also seems unlikely that the Hox genes implicated in the trunk segmental effects are also responsible for the craniofacial defects. What other RPL38 mRNA targets are responsible for these phenotypes?

These experiments have therefore shown that RPL38 has transcript-specific roles in the control of translation, and that many RPs display heterogeneous expression patterns rather than the previously assumed ubiquity. Together these findings suggest that RPs are imparting a new level of specificity in the control of gene expression. They fit into a broader array of observations that hint at the existence of a ‘ribosome code’ in which alterations in the composition of ribosomes leads to their translational specialisation towards subsets of mRNAs. Diamond-Blackfan Anaemia is a human genetic disease caused by mutations in a number of ribosomal proteins. Patients display limb defects, cleft palates, growth failures and cancer predisposition. Likewise knockdown of multiple distinct RPs in zebrafish leads to a wide range of developmental defects and a high incidence of cancer. A possible explanation for these types of finding, is that highly proliferating tissues may be more sensitive to differences in the rate of protein synthesis. Hence, indirect effects on cell proliferation and apoptosis may lead to the morphological abnormalities. However, Kondrashov et al. have shown in this study of Ts/+, overall protein synthesis is not affected, and the effects on a subset of developmental patterning genes are responsible for the bulk of the phenotypes.

Ribosomal RNAs and proteins are also targets for extensive chemical modifications such as phosphorylation and methylation, most of which are as yet uncharacterised. Interestingly, another human genetic disease, X-linked Dyskeratosis Congenita, is probably caused by failures of rRNA modifications. By analogy with the levels of complexity see with regard to modifications and combinations of chromatin-associated histones, a ‘ribosome code’ imparting translational specificity by heterogeneity of RPs and modifications has the potential to be a hugely important level of regulatory control.

Kondrashov, N., Pusic, A., Stumpf, C., Shimizu, K., Hsieh, A., Xue, S., Ishijima, J., Shiroishi, T., & Barna, M. (2011). Ribosome-Mediated Specificity in Hox mRNA Translation and Vertebrate Tissue Patterning Cell, 145 (3), 383-397 DOI: 10.1016/j.cell.2011.03.028

Topisirovic, I., & Sonenberg, N. (2011). Translational Control by the Eukaryotic Ribosome Cell, 145 (3), 333-334 DOI: 10.1016/j.cell.2011.04.006