Monthly Archives: July 2012

Interacting small RNA pathways in worms 3: CSR-1 associated 22G-RNAs

Anaphase during the first mitosis of a C. elegans embryo. Microtubules (green) are pulling the two sets of daughter chromosomes (blue) towards the centrosomes (yellow).

During mitosis, duplicated chromosomes are separated and segregated into two daughter cells. This is achieved by the action of spindle microtubules. In metaphase, microtubules radiating from centrosomes at two poles in the cell, attach to the condensed chromosomes at proteinaceous structures called kinetochores. The daughter chromosomes are then pulled slowly towards opposite poles during anaphase. Mostly, eukaryotes have monocentric chromosomes – meaning that each chromosome contains one centromere. Kinetochores are associated with domains of heterochromatin close to the centromeres. This pericentric heterochromatin is composed of repetitive sequences, and is generally transcriptionally silenced.

An important feature of the mechanisms stabilising the heterochromatic state in some organisms are small RNAs. In the fission yeast Schizosaccharomyces pombe an RdRP-dependent population of small RNAs, directed against repetitive sequence, has been shown to target pericentromeric heterochromatin, stabilising the centromeres. Perturbations to this pathway therefore lead to mitotic chromosome segregation defects.

Not all organisms’ chromosomes contain single centromeres. Nematodes have holocentric chromosomes in which multiple spindle attachments are made into continuous kinetochores spanning the length of the chromosomes. The assembly of centromeres and kinetochores in both monocentric and holocentric chromosomes have many conserved features. For instance, the histone variant HCP-3/CENP-A is found in centromeric nucleosomes. Unlike in monocentric chromosomes though, in nematodes it is incorporated into the whole poleward face of condensed chromosomes.

In C. elegans, a number of mutations affecting components of RNAi pathways were found to cause mitotic and meiotic chromosome segregation defects. As discussed in the previous post, these included drh-3 (encoding a helicase necessary for the biosynthesis of 22G-RNAs), and the Argonaute protein encoding csr-1. Claycomb et al. found that mutations in two additional genes, the RdRP encoding ego-1, and the tudor domain protein encoding ekl-1, caused similar defects.  These included oocytes with abnormal complements of chromosomes, underproliferated germlines, and high incidence of males (him) phenotypes. Partial mutant alleles of csr-1 or drh-3 had low proportions of viable progeny, with many worms dying at various points in embryogenesis.

In embryos depleted of any of these factors, chromosomes initially condense normally in prophase. However, during metaphase the chromosomes fail to align into ‘plates’ perpendicular to the long axis of the spindles, and at anaphase the researchers observed chromosomal bridging across the midzone of the spindle. This led to the accumulation of cells with abnormal chromosomal complements, and the death of embryos.

Claycomb et al. then looked at the distribution of the centromeric histone variant HCP-3/CENP-A in RNAi depleted embryos. They found that although it was loaded onto chromosomes, instead of the normal poleward localisation, it was distributed in a disorganised pattern over the metaphase chromosomes. Similarly disorganised patterns were seen with an array of centromere and kinetochore proteins, as well as proteins necessary for chromosomal condensation and cohesion.

Deep sequencing of the small RNAs associated with CSR-1, showed that this argonaute binds 22G-RNAs targeted against protein-coding genes. This population of 22G-RNAs are dependent upon drh-3, ego-1, and ekl-1 and were antisense to at least 4191 genes. Interestingly, in glp-4 mutants that lack a germline, 22G-RNAs corresponding to ~80% of the CSR-1 target mRNAs were strongly depleted. This suggests that the CSR-1 associated 22G-RNAs originate in the germline and target genes expressed in the germline. No differences were found in the transcriptional profiles of csr-1 mutants and wild type worms, showing that CSR-1 22G-RNA system does not work by silencing its’ targets.

During the development of the germline, DRH-3, EGO-1 and CSR-1 all localise to P-granules – perinuclear nuage structures important for germline specification and small RNA mediated activities. In later stages of oocyte maturation CSR-1 becomes enriched in nuclei. In mitotic cells all four factors were enriched in prophase nuclei, and could be observed to localise to metaphase chromosomes. Using chromatin immunoprecipitation (ChIP), Claycomb et al. demonstrated that CSR-1 directly bound genomic loci targeted by CSR-1 associated 22G-RNAs.

These slightly disparate lines of evidence can be condensed into a basic model by which the CSR-1 22G-RNA pathway contributes to the regulation of chromosome segregation. CSR-1 associated 22G-RNAs are generated in the germline. Their biogenesis occurs in P-granules and depends on the RdRP complex (comprised of the helicase DRH-3, the RdRP EGO-1, and the tudor domain protein EKL-1) acting upon transcripts of germline expressed genes. Guided by these 22G-RNAs, CSR-1 translocates to the nucleus, and via chromatin modification defines chromosomal domains. It appears that there is an inverse relationship between domains targeted by CSR-1 22G-RNAs and those enriched for the centromeric histone variant HCP-3/CENP-A. Hence, CSR-1 marks domains adjacent to centromeric domains. As CSR-1 22G-RNA targets are distributed relatively uniformly across the genome, the defined domains can serve to position kinetochores along the length of the chromosomes. CSR-1 22G-RNAs are maternally inherited, and the CSR-1 defined chromatin domains are epigenetically stable, enabling the correct assembly of kinetochores during mitoses throughout development.

It is interesting to note that two related 22G-RNA pathways, those associated with WAGO or CSR-1 argonautes, with a similar biogenesis pathway, can have such diverse cellular roles. As we have seen the WAGO 22G-RNA pathway acts to silence deleterious transcription. The CSR-1 22G-RNA pathway doesn’t silence its’ targets but instead intricately organises chromosome structure to ensure proper segregation. Does this pathway have roles beyond this function? In later posts in this series I’ll cover papers that suggest it may also participate in a global genome surveillance network.

See also a post on a paper that gives a radically different take on the CSR-1 pathway: The CSR-1 siRNA pathway gets more enigmatic

Important ideas regarding the role of the CSR-1 pathway being involved in epigenetic licensing are discussed in these two posts:
Interacting small RNA pathways in worms 5: Global Genome Surveillance
Epigenetic Licensing of a Sex Determination Gene

Claycomb JM, Batista PJ, Pang KM, Gu W, Vasale JJ, van Wolfswinkel JC, Chaves DA, Shirayama M, Mitani S, Ketting RF, Conte D Jr, & Mello CC (2009). The Argonaute CSR-1 and its 22G-RNA cofactors are required for holocentric chromosome segregation. Cell, 139 (1), 123-34 PMID: 19804758

Interacting small RNA pathways in worms 2: 22G-RNAs

In two papers published in 2009 (Gu et al. and Claycomb et al), Craig Mello’s group characterised 22G-RNAs. They found that they could be divided into two different functional classes based on the Argonaute proteins with which they are complexed – CSR-1 or WAGOs. The biogenesis of both groups utilises common factors, but each targets different classes of loci and fulfil different roles.

The starting point for the discovery of 22G-RNAs was the analysis of various mutant alleles of the gene drh-3 (which encodes a dicer-related helicase). Homozygous drh-3 alleles result in infertile worms, whilst RNAi targeting leads to worms dying in embryogenesis with defects in chromosome segregation and in the production of small RNA populations. Gu et al identified three partial loss of function drh-3 alleles in which the homozygous worms were viable at 20°C but infertile at 25°C. When analysing the small RNA populations present in these worms, they found that a prominent 22nt species of RNA was absent, whilst others were unaffected. Chemical analysis showed that these RNAs had 5’ guanosine residues, and were 5’ triphosphorylated (most small RNAs are 5’ monophosphorylated – as produced by the activity of the Dicer endonuclease). The 22nt RNAs were not sensitive to periodate – suggesting that, unlike piRNAs they are not 3’ modified.

Gu et al. then made libraries of all small RNAs from wild type and drh-3 mutant worms, and deep sequenced them. 21nt and 22nt RNAs accounted for 25% and 36% of the wild type reads, respectively. The 21nt RNAs were divided equally between those with 5’U and those with 5’G. ~ 60% of the 22nt reads had a 5’G. Both 21nt and 22nt 5’G containing RNAs were strongly depleted in the drh-3 mutants. Of all the endogenous siRNA reads in the wild type library (64% of the total), ~53% were antisense to protein-coding genes, whilst ~16% were derived from transposons and repetitive sequences and ~31% were from non-annotated loci. All of these endo-siRNAs were depleted in drh-3 mutants. To try and clarify matters, Gu et al. termed this population of 22nt drh-3-dependent endo-siRNAs with a bias towards 5’G, 22G-RNAs. (note: I’m not quite clear as to whether this included 21G and/or 22U populations as well).

By analysing various mutant lines, Gu et al. found that 22G-RNAs are present in the soma and the germline. However, they were especially enriched in the germline, and in oocytes (ie. maternally derived). In the soma 22G-RNAs appear to act downstream of the exogenous-RNAi pathway (which won’t be discussed further – I’ll concentrate on their roles in the germline).

Germline 22G-RNAs are independent of the exo-RNAi pathway. For instance, they were not depleted in dcr-1 (dicer) mutants, suggesting that their biogenesis is not triggered by dsRNA. The triphosphorylated 5′ end of 22G-RNAs, and the independence from dcr-1, suggested that their biosynthesis was dependent on RNA-dependent RNA polymerases (RdRPs).  Single mutants for the known RdRPs still expressed 22G-RNAs. However, in worms mutant for both rrf-1 and ego-1, the researchers found that they failed to accumulate. Immunoprecipitation experiments showed that DRH-3 interacted biochemically with both RRF-1 and EGO-1, as well as a tudor-domain containing protein, EKL-1. These four proteins make up the core RdRP complex responsible for the biosynthesis of 22G-RNAs in the germline.

Depletion of Argonaute proteins leads to a reduction of the small RNAs with which they complex. Gu et al. used worm lines mutant for multiple WAGO genes to get a picture of which AGOs mediated 22G-RNA function. They found that worms deficient in wago-1 showed a major reduction in germline 22G-RNAs, whilst a worm strain lacking all 12 wago genes (MAGO12) showed an even greater deficit.

Deep sequencing from worms mutant for drh-3, ekl-1, or from the rrf-1 ego-1 double mutants had near complete germline 22G-RNA deficits. However in the MAGO12 worms, only a subset of 22G-RNAs matching repeat elements, as well as some coding and non-annotated loci were absent. Gene-targeted 22G-RNAs were far less likely to be affected. Immunoprecipitation of WAGO-1 complexes revealed an enrichment for the repeat element biased subset, whilst the AGO CSR-1 was found to interact with a subset of  22G-RNAs that are antisense to germline-expressed protein coding genes (Claycomb et al. discussed next post). This bimodal distribution of 22G-RNA targets revealed that two distinct 22G-RNA pathways functioned in the germline. They both share a common biosynthesis pathway but differ in the AGOs with which they complex.

The WAGO-associated 22G-RNA pathway appears to act by silencing it’s targets. Those loci targeted by the most highly expressed 22G-RNAs were derepressed in drh-3 mutants. Transposons are a major target for the WAGO mediated system. 22G-RNAs matching repetitive elements were depleted in MAGO12 worms. By assaying the reversion rate of mutations caused by the insertion of the transposon Tc5, and by monitoring the transcription of the Tc1 and Tc3 transposons, Gu et al showed that transposons are derepressed in drh-3 mutants (I’d have preferred to see these effects in the MAGO12 mutants to definitively show that it’s only the WAGO associated subset required).

The WAGOs are a worm specific clade of AGOs which don’t seem to act by ‘Slicer’ endonuclease activity. This study showed a lot of redundancy amongst WAGOs with regard to their 22G-RNA associated roles. However, the authors expect there to be a number of distinct roles within this family of factors. WAGO-1 appears to be a crucial factor in these systems. Importantly, this paper showed the existence of a major Dicer-independent RNA based genome surveillance system. This system has the ability to silence transposons and other repetitive sequences. It also appears to act upon pseudogenes and ‘cryptic loci’, preventing detrimental transcription/translation. However, the details of this system’s targets were beyond of the scope of this first paper.

The next post will discuss CSR-1 associated 22G-RNAs, before we come to 21U-RNAs and the links between the three systems.

Gu W, Shirayama M, Conte D Jr, Vasale J, Batista PJ, Claycomb JM, Moresco JJ, Youngman EM, Keys J, Stoltz MJ, Chen CC, Chaves DA, Duan S, Kasschau KD, Fahlgren N, Yates JR 3rd, Mitani S, Carrington JC, & Mello CC (2009). Distinct argonaute-mediated 22G-RNA pathways direct genome surveillance in the C. elegans germline. Molecular cell, 36 (2), 231-44 PMID: 19800275

Claycomb JM, Batista PJ, Pang KM, Gu W, Vasale JJ, van Wolfswinkel JC, Chaves DA, Shirayama M, Mitani S, Ketting RF, Conte D Jr, & Mello CC (2009). The Argonaute CSR-1 and its 22G-RNA cofactors are required for holocentric chromosome segregation. Cell, 139 (1), 123-34 PMID: 19804758

Interacting small RNA pathways in worms 1: Introduction

A cluster of new papers, in the journals Cell and Science, discuss the links between piRNAs and various endogenous siRNA pathways in the nematode worm C. elegans. Emerging from these experiments is a picture of a genome-wide surveillance system capable of differentiating between self and non-self nucleic acids. The commonalities and differences between these papers require rather detailed analyses. I’m therefore intending to write a series of posts; first covering some of the background information on these small RNA systems and then getting onto the new findings.

A panoply of small RNA molecules, involved in diverse cellular functions have been discovered in the wake of the initial observation of RNA interference (RNAi). Originally RNAi described the mechanism by which genes could be specifically silenced by the exogenous application of cognate double-stranded RNAs. Nowadays, the term RNAi is more generally applied to gene silencing pathways involving the three major classes of small RNAs; microRNAs (miRNAs), small-interfering RNAs (siRNAs), and piwi-interacting RNAs (piRNAs). A common feature of all these small RNAs is that they complex with members of the Argonaute (AGO) family of proteins. Embedded within AGOs, the small RNAs act as guides; base-pairing with specific target RNAs that can then be cleaved by the RNase H endoribonuclease activity of the AGO protein. However, not all argonautes act by this ‘slicing’ activity; gene silencing can also be achieved by interactions with pathways involved in chromatin modification, or the inhibition of transcriptional elongation. Meanwhile the list of non-silencing roles of AGOs and their complexed small RNAs continues to grow; chromosome segregation, double-strand break repair, programmed genomic rearrangement etc.

The synthesis of both miRNAs and siRNAs generally involves the recognition of dsRNA and its cleavage by Dicer enzymes. miRNAs are derived from short stem-loop structures found in transcripts. siRNAs are the main effectors of the ‘classical’ RNAi. Exogenous dsRNA molecules are cleaved by Dicer into 20-30nt siRNAs that are loaded onto AGOs. However, this pathway also targets endogenously formed dsRNA, which can derive from hairpin structures in transcripts, or by base-pairing between transcripts produced either from separate loci, or by bidirectional transcription at individual loci. Hence, endogenous siRNAs generally target transposons or other repetitive sequences, but can target genes as well.

This basic siRNA pathway is present in most animals, but more complicated systems exist in plants, fungi and nematode worms. These ‘secondary siRNA’ systems wrest on the use of RNA dependent RNA polymerases (RdRPs) to amplify siRNAs against targets recognised by primary siRNAs. In the cases of plants and fungi the dsRNAs produced by RdRPs are again processed by Dicer enzymes and then loaded onto AGOs. However, various populations of RdRP-produced siRNAs in C. elegans do not require Dicer cleavage.

As noted above, the key effectors of these small RNA pathways are argonaute proteins. The numbers of AGOs present in organisms varies widely. The Drosophila genome encodes 5 AGOs; 3 of these are involved in the piRNA system, whilst the other two specialise in either miRNAs or siRNAs. In contrast, C. elegans has 27 AGOs. This reflects the presence of various additional networks of endogenous- siRNAs. Deep sequencing in C. elegans has revealed a large diversity of different varieties of small RNAs, with major peaks at 21, 22, and 26 nt. Different types of sRNAs have different biases in relation to their predominant 5′ residue, 3′ modifications, and extent of 5′ phosphorylation.

This series of posts will ignore many classes of C. elegans siRNAs and instead concentrate on two varieties of 22nt endo-siRNAs with 5′ guanosine residues (22G-RNAs). 22G-RNAs associated with the AGO CSR-1 have been shown to play critical roles in chromosome segregation during meiosis and mitosis. Another population of 22G-RNAs that associate with various worm-specific AGOs (WAGOs) have been implicated in the long-term silencing of transposons and other genomic loci. The piRNAs of worms – 21U-RNAs – display some critical differences with those found in Drosophila and mammals. However, understanding their role in C. elegans may help to explain some of the outstanding questions about their functions thoughout animals.

Ketting RF (2011). The many faces of RNAi. Developmental cell, 20 (2), 148-61 PMID: 21316584

See also these related posts:
small silencing RNAs. I: Piwi-interacting RNAs.
RNAi and Chromatin Modification
Lamarckian inheritance of antiviral response in Nematodes.
Double-strand break interacting RNAs (diRNAs)

Gene Birth, de novo

What are the genomic mechanisms responsible for the creation of evolutionary novelty? The view that has emerged from the findings of molecular and developmental biology in recent decades is that the primary substrate for evolutionary change is the regulatory network between genes, rather than emergence of new genes. For instance, animal development repeatedly utilises a relatively limited ‘toolkit’ of highly conserved families of transcription factors, signalling molecules and signal transduction components. Most morphological novelty seems to emerge via new deployments and combinations of this repertoire of pre-existing genes. No one doubts that new genes arise during evolution, but in line with the ‘predominantly regulatory’ model most gene birth has been considered to involve duplication of genes, followed by their reorganisation and diversification. However, whole genome sequencing has found that every evolutionary lineage contains protein-coding genes that lack homologues in other lineages – orphan genes. In fact, as much as one third of all annotated genes are orphans. Although orphan genes can arise by processes of duplication and (rapid) diversification, it appears likely that the primary mechanism for their emergence is de novo evolution from previously non-coding sequence.

de novo gene birth requires non-genic sequences to be first transcribed, acquire open reading frames (ORFs), and these ORFs to be translated. There are some obvious conceptual hurdles to be overcome in making a model to explain how this sequence of events may occur to an important extent; how does non-genic sequence become translated? and wouldn’t any polypeptide translation products be insignificant? Corvunis et al. have tried to surmount these problems by postulating a model of de novo gene birth that proceeds through intermediate, reversible, ‘proto-gene’ stages between the emergence of an ORF and bona fide functional protein-coding genes.

A major problem in understanding the genome is that ORFs in genomic sequence are classified using a minimal length threshold. Although ORFs encoding functional polypeptides as small as 9 amino acids long have been discovered, the standard length threshold used to delineate genic ORFs is 300nt (equating to an 100aa protein). In the budding yeast Saccharomyces cerevisiae, ~6000 ORFs are annotated as genes, whilst ~261, 000 unannotated ORFs longer than 3 codons are considered non-genic ORFs. The majority of the S. cerevisiae genome is transcribed, and a number of putatively non-coding transcripts have been shown to associate with ribosomes. Corvunis et al postulated that a certain amount of translation of non-genic ORFs may go on, and although the polypeptides produced may not be functional, providing they were not toxic and selected against, these proto-genes could be maintained in the genome. Proto-genes would provide adaptive potential, and a subset could be retained over time if they provided selective advantage. New genes originating de novo would be expected to be initially shorter, less expressed and more rapidly evolving than established genes.

Carvunis et als model for de novo gene birth leads to a number of predictions. Firstly, there should be an evolutionary continuum between non-genic ORFs and bona fide genes with respect to such characteristics as length, expression level and sequence composition. Secondly, many non-genic ORFs should be translated, and thirdly, some recently emerged ORFs should be adaptively advantageous and hence retained by natural selection.

To test these predictions, Carvunis et al. estimated the order of emergence of S. cerevisiae ORFs, based on their level of conservation amongst ascomycete fungi. Annotated ORFs were classified into 10 groups. For instance, those ORFs found only in S. cerevisiae constituted ORFs1, which accounted for ~2% of the total. ~12% were only conserved within the four closely related Saccharomyces species (ORFs1-4). The weak conservation, and poor characterisation of ORFs1-4 means that their annotation as genes is debatable, whereas the ~88% of annotated S. cerevisiae ORFs that had homologues in more distant species (ORFs5-10), can be more confidently considered genes. The authors also classified ~108,000 unannotated ORFs longer than 30nt as having a conservation level of 0 (ORFs0). Hence, ORFs0 and ORFs1-4 were considered as candidate proto-genes, whilst ORFs5-10 were classed as bona fide genes.

In agreement with the postulated continuum between non-genic ORFs and genes, Corvunis et al. found a positive correlation between the level of conservation and both gene length and expression level. A spectrum of codon usage was also observed; the relative abundances of amino acids encoded by ORFs1-4 being intermediate between those of the (hypothetical) translation products of ORFs5-10 and ORFs0.

To test the second prediction, Corvunis et al. used data on ribosomal occupancy to search for signatures of translation amongst ORFs0. Of these ~108,000 short, unannotated ORFs, 1,139 showed evidence of translation (termed ORFs0+).

The authors went on to measure the extent of selection operating on their classes of ORFs by comparing the genome sequences of 8 different S. cerevisiae strains. ~3% of ORFs0+ and ~9-25% of ORFs1-4 were found to be under purifying selection.

Corvunis et al. therefore classify the set of ORFs0+  (that showed translational activity), and those ORFs1-4 that don’t necessarily show evidence of being under purifying selection, as proto-genes. This set amounted to 1,891 ORFs displaying characteristics intermediate between non-genic ORFs and genes.

Although Corvunis et al found evidence to support all three of their predictions, I found the most persuasive evidence in favour of the importance of de novo gene birth to be the fact that since the division of S. cerevisiae and S. paradoxus between 1 and 5 novel genes have arisen by gene duplication mechanisms, whilst 19 of the 143 ORFs1 (arising de novo in the same period) were found to be under purifying selection.

Perhaps the main take-home message from these analyses is that the imposition of arbitrary annotation boundaries (eg. the 100 codon cut-off) can lead to artifactual understandings. The findings of widespread non-coding transcription, and the potential for marginal, non-functional translation mean that genes exist on a continuum, and their RNA and protein products exist on spectra of functionality. These ‘shades of grey’ may actually be an important source of evolutionary potential.

Carvunis AR, Rolland T, Wapinski I, Calderwood MA, Yildirim MA, Simonis N, Charloteaux B, Hidalgo CA, Barbette J, Santhanam B, Brar GA, Weissman JS, Regev A, Thierry-Mieg N, Cusick ME, & Vidal M (2012). Proto-genes and de novo gene birth. Nature PMID: 22722833

Tautz D, & Domazet-Lošo T (2011). The evolutionary origin of orphan genes. Nature reviews. Genetics, 12 (10), 692-702 PMID: 21878963