Category Archives: Evolution

Supergenes, Sociality, and Sex Chromosomes

A new paper in Nature shows that the social structure of fire ant colonies is determined by a ‘supergene’ – a single non-recombining cluster of hundreds of genes. The supergene makes up more than 50% of a pair of divergent chromosomes. The social chromosomes appear to have emerged in a similar manner by which sex chromosomes evolve.

Colonies of the fire ant Solenopsis invicta can be organised in two different modes, either containing a single queen (monogyne) or multiple queens (polygyne). This social polymorphism has been shown to be associated with a single mendelian genetic factor. Two different alleles, B and b, of the Gp-9 gene, encoding an odorant-binding protein, predict the two colony types. This effect is mediated by worker-ant behaviour. Colonies composed solely of Gp-9BB workers live under a single queen, whereas mixed colonies of Gp-9BB and Bb workers accept many queens. These queens are invariably Bb as the Bb workers will kill any BB queens they encounter. In this way the b allele acts as a ‘green beard’ gene – promoting its’ propagation by behavioural self-recognition. However, it does not spread unchecked through the population as it is recessive lethal – Gp-9bb individuals die early.

The two social forms differ in many aspects of their biology. Monogyne queens tend to found new colonies after long nuptial flights. They build their nests without the aid of workers or foraging and therefore require extensive fat reserves and a longer period of maturation. Polygyne queens often stay within their original nest or undertake limited nuptial flights, hence requiring smaller fat reserves and less maturation. Monogyne colonies are therefore simple families and highly dispersed whilst polygyne colonies contain a number of families, tend to bud from each other and are frequently clustered. These different population structures display different behaviours; the more related monogyne colonies being more aggressive and territorial than those of pologyne fire ants.

Although the Gp-9 encoded odorant-binding protein may well regulate the two forms of social organisation by chemical communication, it was scarcely believable that this panoply of different behavioural, morphological and life-history traits could be regulated by the one protein. Researchers therefore speculated that rather than being the one determining gene, Gp-9 may instead be a marker whose presence correlates with a number of other genetic factors determining the alternative forms of colony organisation.

Clusters of co-segregating genetic loci – ‘supergenes’ – have been found to underlie patterns of adaptive variation regulating different floral types and butterfly mimicry. These types of loci facilitate the co-transmission of many different traits in parallel. They effectively behave like one classically defined gene but are in fact composed of many molecularly defined genes.

Wang et al. have now found that the fire ant colony organisation polymorphism is determined by a supergene. Comparing the haploid sons of BB and Bb queens they discovered a 13Mb region of a 23Mb chromosome in which no recombination occurred between the B and b forms. This region included Gp-9 and at least 615 other genes (6.7% of known S. invicta genes).  The vast majority of these genes were present in both chromosome types, but recombination fails to occur due to chromosomal rearrangements. The authors identified one major inversion changing the orientation of 9.3Mb of the non-recombining section.

This heteromorphic pair of ‘social chromosomes’, seemingly regulating many aspects of the monogyny/polygyny division, bears resemblances to sex chromosomes. Recombination continues when B chromosomes are paired (as in XX chromosomes in mammals), whilst it fails to occur between B and b (as in XY). bb, like YY, is non viable. The most widely accepted theory about how recombination is suppressed in the evolution of sex chromosomes from autosomes posits that inversions are selected for when genes with male specific benefits are linked to a male sex-determining gene. It has been hard to prove this theory though, as it is difficult to identify alleles with sex-specific benefits. The demonstration that the same mechanism (ie. inversion preventing recombination and permanently linking alleles) is responsible for the social supergene and chromosomes in fire ants lends support for this model of sex chromosome evolution. One can envisage how a series of inversions gradually permanently links loci responsible for major complex adaptive traits into supergenes and potentially into specialised chromosomes.

Like Y chromosomes, the lack of recombination allows degenerative mutations to build up on the social b chromosome. Wang et al found increased numbers of repetitive elements and longer introns in the non-recombining region. It is because of this accumulation of deleterious mutation that, like Y chromosomes, the b chromosomes contain recessive lethal alleles. Importantly however, during the life-cycle, haploid males containing a single b chromosome have to be viable. This pressure limits the degeneration of the b chromosome in comparison to Y chromosomes.

As yet there is relatively little information on the roles of the 616 genes in the non-recombinant supergene. 70% of genes known to be differentially expressed between BB and Bb workers do however map to the non-recombining region. Further studies will no doubt tackle the molecular mechanisms by which the two social systems differentiate. It will be interesting to ask what proportion of genes in the supergene are directly involved in determination of social system. The authors estimate that suppression of recombination on the social chromosomes started relatively recently in fire ants, 390,000 years ago. Monogyny and polygyny exist in many other species of ant; do similar supergene systems underlie these other instances? Further, how widespread are supergenes? Potentially they could be a common mechanism underlying the evolution of complex adaptations such as social behaviours.

Wang, J., Wurm, Y., Nipitwattanaphon, M., Riba-Grognuz, O., Huang, Y., Shoemaker, D., & Keller, L. (2013). A Y-like social chromosome causes alternative colony organization in fire ants Nature DOI: 10.1038/nature11832

Bourke, A., & Mank, J. (2013). Genetics: A social rearrangement Nature DOI: 10.1038/nature11854

The Transposon/piRNA/Chromatin Nexus

Close observation of chromatin states at piRNA-silenced genomic loci demonstrates the power of transposons to change native gene expression.

As reviewed in an earlier post, the Drosophila Piwi/piRNA transposon silencing pathway can be divided into two facets; a complex pathway operating in the germline centred on the Piwi-family argonautes Aubergine and AGO3 localised in peri-nuclear nuage, and a linear pathway operational in the somatic follicle cells. In this linear pathway, piRNAs derived from uni-directional piRNA clusters such as flamenco target Piwi to mediate silencing of a limited subset of retrotransposons. Unlike Aub and AGO3, Piwi is localised to the nucleus, leading to speculation that rather than silencing transposons post-transcriptionally by ‘slicing’ their transcripts, it may act at the transcriptional level. There are many precedents in other organisms for argonautes mediating transcriptional silencing via interactions with chromatin modification and DNA methylation pathways. However, whether one of these silencing modes is employed by Drosophila Piwi was unresolved. A new paper from the lab of Julius Brennecke, generally analysing the linear piRNA pathway active in a cell line derived from the somatic follicle cells surrounding the oocyte (OSC cells) includes important findings for a number of aspects of Piwi-mediated transposon silencing leading to insights on the wider genomic ecology of transposon insertions.

In the first section of the paper, Sienski et al. demonstrate that Maelstrom (Mael), a protein containing putative RNA and DNA binding domains, expressed in both cytoplasm and nuclei and previously implicated in a number of Piwi-pathway effects, acts downstream of Piwi to effect TE silencing. Silencing requires the nuclear localisation of both Piwi and Mael. Further, mutation of the residues necessary for ‘slicer’ activity in Piwi did not de-repress TEs, suggesting a different mechanism for Piwi-mediated silencing.

Sienski et al. go on to marshal three different high-throughput techniques to show that Piwi mediates gene silencing at the transcriptional level. Knocking down (KD) the expression of Piwi pathway factors (piwi, mael) in OSC cells they determined the set of repressed transposable elements (TEs) by comparing RNA levels (RNA-seq). Changes in the steady-state RNA levels were highly correlated with transcription rate as monitored by RNA polymerase II occupancy (ChIP-seq) and levels of nascent RNAs (GRO-seq). Judging by how closely correlated derepression of TEs was to transcription rate, it seems unlikely that the linear piRNA pathway active in follicle cells acts post-transcriptionally at all.

Reasoning that Piwi-mediated transcriptional gene silencing may involve chromatin modification, Sienski et al. profiled the distribution of the repressive histone mark H3K9me3 in OSCs after piwi or mael knockdown. H3K9me3 levels at transposable elements known to be repressed by the piRNA pathway were significantly reduced in the absence of Piwi (and to a lesser extent Mael). This data was from across the genome irrespective of whether the TE was inserted into heterochromatic or euchromatic regions. To negate general effects associated with heterochromatin, the authors looked more closely at TE insertions within euchromatic regions.

Approximate sketch of the patterns of RNA pol II occupancy (ie Transcription), and H3K9me3 at the mdg1 locus after piwi or mael knockdown and normally in control.

Approximate sketch of the patterns of RNA pol II occupancy (ie Transcription), and H3K9me3 at the mdg1 locus after piwi or mael knockdown and normally in control.

At a specific euchromatic insertion of the retrotransposon mdg1, they observed that upon either piwi KD or mael KD, transcription downstream of the insertion strongly increased. However, although this transcriptional bleeding into the surrounding area was similar upon TE derepression due to either piwi KD or mael KD, the pattern of H3K9me3 was very different. Normally this mdg1 insertion displays H3K9me3 in the surrounding 12kb, peaking at the insertion site. This was strongly reduced in piwi KD cells, but in mael KD, H3K9me3 was moderately reduced at the insertion site but had actually spread further downstream (see figure). Similar patterns were observed at nearly all euchromatic mdg1 insertions, as well as other TEs known to be targeted by the linear piRNA pathway active in OSC cells.

Strikingly, most euchromatic H3K9me3 peaks were sensitive to piwi knockdown, whilst 88% of H3K9me3 peaks were found within 5Kb of TE insertions. Piwi-mediated transposon silencing therefore seems to be the main trigger for H3K9 trimethylation in euchromatin.

This transposon silencing mechanism appears to have a major impact on native genes upon TE insertion in their vicinity. An insertion of the retrotransposon gypsy into the first intron of the expanded (ex) gene serves as paradigm for these effects. In OSC cells, the gypsy insertion triggered H3K9me3 spreading into the surrounding 10-12Kb. In control cells RNA polymerase II occupancy was observable at the ex transcription start site (TSS) but weak. Upon piwi or mael knockdown, transcription from the ex TSS was massively increased. As in the earlier mdg1 example, H3K9me3 levels were greatly reduced upon piwi KD but not in mael KD cells. Sienski et al. observed similar effects on the transcription of 28 more genes with nearby TE insertions in OSC cells.

This data has a number of ramifications speaking of a complex interplay between transcription, the establishment and maintenance of repressive chromatin states and the Piwi pathway. Firstly, H3K9me3 considered a transcriptionally repressive histone mark is compatible with transcription. In fact, based on it’s pattern in mael KD cells, the authors propose that downstream transcriptional bleeding leads to the spread of H3K9me3. Further, although H3K9me3 has an integral role in Piwi-mediated silencing, it is not the final silencing mark. H3K9 trimethylation is downstream of Piwi action, but is either upstream or acts in parallel to Mael, which mediates an unknown silencing step crucial to Piwi transcriptional gene silencing.

Importantly, this paper has demonstrated the impact that TE insertion and subsequent piRNA pathway transcriptional repression can have on native gene expression. There are two different modes in which the inactivation of Piwi-mediated TE silencing can lead to the transcriptional activation of these loci. Firstly, the spreading of repressive chromatin marks at transposons can suppress RNA polymerase II access to the genes promoter. Alleviation of TE repression hence leads to (re-)activation of gene expression. Conversely, as TEs (especially the long terminal repeats of some retrotransposons) can serve as promoters, the loss of their repressed chromatin state upon piRNA pathway loss, can activate transcription of downstream regions. Although both these modes lead to transcriptional activation after Piwi pathway loss, they demonstrate that transposon insertion can either activate or repress transcription within relatively extensive genomic surroundings. This underscores the scope for transposons to act as regulatory elements, or to produce new chimerical transcripts and hence potential new genes.

These experiments were mainly performed in one cell type that only partially reflects the activity of what is already a subset of piwi/piRNA action during Drosophila oogenesis.  Piwi and Mael are also active in the nurse cells and oocyte, and this paper suggests that they have similar roles within the context of the expanded piRNA pathways active in the germline. It will be interesting to integrate this nuclear-localised transcriptional-silencing aspect of piRNA silencing into the context of ping-pong amplification and bi-directional piRNA cluster transcripts. Further, do these Piwi-mediated chromatin effects in the germline impact on the transcriptional status of TEs and genes later in somatic development? And if not, do other systems have equivalent activity?

This paper underlines again the importance of the arms race between mobile genetic elements and genomic immune systems such as the piRNA pathway on the wider genomic regulatory context. This contest is being observed to have shaped so many aspects of genome organisation throughout evolution that it sometimes becomes hard to differentiate parasitism from regulation. It is clear however, that to understand the evolutionary impact of mobile elements we must also understand the import of the various epigenetic mechanisms controlling their spread. The minutiae of these mechanisms with regard to their targets, plasticity, adaptability, heritability – often different from organism to organism – has major evolutionary significance. Evolution works differently depending on these mechanisms.

Sienski, G., Dönertas, D., & Brennecke, J. (2012). Transcriptional Silencing of Transposons by Piwi and Maelstrom and Its Impact on Chromatin State and Gene Expression Cell, 151 (5), 964-980 DOI: 10.1016/j.cell.2012.10.040

The Heterodox Dinokaryon

The nuclei of dinoflagellates display a highly derived organisation; chromosomes are permanently condensed and seem to lack histone proteins. A new study in Current Biology links the emergence of these characters to the importation of a novel family of nuclear proteins originating in giant viruses.

A Haeckel print of various Dinoflagellates

Dinoflagellates are a diverse and successful phylum of protists.  Many are photosynthetic with a major role in the oceans’ primary production, whilst others have symbiotic, parasitic or predatory lifestyles. Their nuclei are highly unusual. Whereas in all other eukaryotes chromosomes only condense during mitosis, dinoflagellate chromosomes display a permanently condensed, liquid crystalline form. This ‘cholesteric’ structure produces a banded appearance in electron micrographs. Another key dinoflagellate heterodoxy is the absence (or at least undetectability) of histone proteins and the nucleosomal organisation of chromatin. These differences are so radical that dinoflagellates were suggested to represent an intermediate ‘mesokaryotic’ stage between prokarya and eukarya. Molecular phylogenetics has since clarified that they are in fact a sister clade to apicomplexan protists, leaving no doubt that that the dinoflagellate nuclear organisation – the dinokaryon – is derived from standard eukaryotic ancestors. Other atypical features of the dinokaryon include very high DNA content and the replacement of as much as 70% of the base thymine with the rare base 5-hydoxymethyluracil.  However, there is some variability in the occurrence of these features. For instance the chromosome banding patterns are not always evident and some dinoflagellate species’ chromosomes can be decondensed at certain stages of their lifecycles.

A dinoflagellate nucleus. Note the condensed chromosomes with characteristic banding pattern (not Blastodinium sp.).

To investigate the emergence of these dinokaryotic characteristics during the early evolution of the dinoflagellates, Gornik et al. investigated the nuclei of two early-branching members of the lineage.  Perkinsus marinus represents the closest known lineage not included within the dinoflagellates proper, whilst Hematodinium sp. branches basally within the clade. In line with their expectations the genome of P. marinus is organised into nucleosomal units, whilst that of Hematodinium sp. is not and appears to be 80 times larger. The P. marinus genome contains sequences for the 4 core histones as well as the linker histone H1, all of which were prominently detectable as protein in extracts from nuclei. Genome sequence is not available for Hematodinium sp., however transcriptomic sequencing revealed the presence of the four core histones as well as a number of variants. Unlike the histone genes of P. marinus the sequences were quite divergent from the highly conserved eukaryotic norm, however the core ‘histone-fold’ regions were relatively well preserved, as were key residues that serve as sites for post-translational modification.  Histone genes have been found in other dinoflagellate genomes recently, but histone protein expression had not previously been detected. Gornik et al could identify histone H2A protein in nuclear extracts from Hematotinium sp. However, whereas in P. marinus and other eukaryotes, histone proteins are the dominant species in such extracts, in Hematodinium sp a single 30kDa species dominated.

When this band was extracted and the protein identified by mass spectrometry, it was found to correspond to a novel family of proteins, at least 4 of which were expressed in Hematodinium sp., whilst 13 were found in the transcriptome. This family of proteins only appears to be present in dinoflagellates; no homologues were found in other eukaryotic groups or in prokaryotes. However database searching did reveal homology with a protein of unknown function widely found encoded in the genomes of phycodnaviruses, a family of giant viruses infecting algae. Gornik et al. therefore named these proteins Dinoflagellate/Viral NucleoProteins (DVNPs).

Like histones and many other DNA-binding proteins, DVNPs are highly basic proteins. They are relatively variable in their N-terminal regions, with higher conservation in a core region, which may potentially include a DNA-binding helix-turn-helix motif. Biochemical experiments demonstrated that DVNPs have a high affinity for DNA and are post-translationally modified at various residues by phosphorylation.

The phycodnaviridae are members of the nucleocytoplasmic large DNA viruses (NCLDVs), a monophyletic clade of giant viruses that encode much more of their replication apparatus than is typical of viruses. They are predicted to have emerged more than 2 billion years ago, predating the first dinoflagellates by more than a billion years. As most phycodnaviruses include DVNP orthologues dinoflagellates must have acquired DVNPs from the phycodnaviruses early in their evolution. As yet there is no information on the roles of DVNPs in the phycodnaviridae, but the fact that both taxa have expanded genomes suggests a possible similar function. Do DVNPs allow such efficient DNA packing that the costs of genome expansion are somehow minimised?

The DVNPs are not the first family of putative histone-replacement proteins discovered in dinoflagellates. Later-branching taxa express ‘histone-like proteins’ (HLPs), probably related to the bacterial DNA-binding protein HU, and shown to be able to bend DNA in vitro. HLPs are not found in Hematodinium sp. or other early-branching dinoflagellates, whereas DVNPs are found in combination with HLPs in later-branching taxa. DVNPs therefore seem to be associated with the core dinokaryotic characteristics of permanently condensed chromosomes and expanded genome size, whilst the presence of HLPs correlates with other characters such as the chromosome banding patterns observed in later-branching taxa.

The observation that dinoflagellates do in fact encode and express divergent histones at low levels raises the question of what their roles could be if they are not primarily responsible for the bulk packing of DNA? Linked to this is the broad question of how DVNPs and HLPs act to condense dinoflagellate chromosomes. Considering the vast quantity of research attempting to understand the biology of eukaryotic chromosomes, it is rather daunting to find a whole new way of doing things; how do transcription and replication mechanisms work in the context of permanently condensed chromosomes? How does this link in with genome expansion? I don’t know how much dinoflagellate genomic data is available, but I imagine that a finished genome sequence would be of great use. Perhaps though, I’d prefer instead to prioritise biochemical and structural studies of these various proteins actions on DNA.

Gornik, S., Ford, K., Mulhern, T., Bacic, A., McFadden, G., & Waller, R. (2012). Loss of Nucleosomal DNA Condensation Coincides with Appearance of a Novel Nuclear Protein in Dinoflagellates Current Biology DOI: 10.1016/j.cub.2012.10.036

Genomic Rearrangement in Lampreys 2

As discussed in a recent post, during lamprey embryogenesis programmed genomic rearrangements lead to deletion of ~20% of the germline genome in the soma. Smith, Amemiya and co-workers have now published a follow-up study in which they further characterise the complement of deleted genes. Their findings have led them to hypothesise that the programmed genomic rearrangements (PGRs) serve to segregate pluripotency functions required in germline that could be deleterious in the soma.

Smith et al used a couple of different genomic techniques to identify somatically deleted sequences. Using microarrays constructed from available germline sequence, they found that ~13 % of the sampled sequence was deleted in the soma (in relative agreement with the ~20% derived from flow cytometry). Within this dataset, they identified 8 new single-copy/low-copy number sequences found only in the germline. RT-PCR showed that 5 of the novel sequences were expressed in germline cells. In situ hybridisation of one of these sequences showed that it was expressed in differentiating primordial germ cells in lamprey embryos.

The main limitation for identifying more genes subject to somatic deletion has been a lack of germline genomic sequence. Smith et al. performed high-throughput shotgun sequencing on lamprey sperm cells, generating short sequence reads covering ~10% of the germline genome. They then compared this dataset with the whole-genome sequence derived from somatic (liver) cells, yielding tens of thousands of putative deletion and recombination sites. A substantial part of the somatically deleted DNA corresponds to single-copy, protein-coding genes; the authors identified 246 instances of homology to individual human genes.

The problem with this comparison however, is that, by necessity, it was generated from 2 different individuals (of different sexes). This meant that apparent cases of deletion or recombination may be due to polymorphisms for insertion or deletion mutations present in lamprey populations. The researchers undertook validation experiments on a subset of the candidate deletion/recombination dataset (using PCR to amplify candidate sequences from testes and blood from 4 different males, blood from 4 different females, as well within an array of somatic tissue types within individuals). Of 48 tested candidate gene deletions or recombination events, they validated 7 sites of programmed deletion, and 3 recombination sites. They also identified 3 insertion/deletion polymorphisms, and 5 gaps in the somatic whole genome sequence. Due to PCR failures, or because of repetitive target sequences, 30 of the candidates were not informative. The validated gene deletions included APOBEC-1 complementation factor, encoding a protein involved in RNA editing, and the secreted developmental signalling molecule encoding WNT7A/B.

In the process of these validation experiments, Smith et al discovered short palindromic sequences at the deletion breakpoints. There was no specific consensus sequence at these positions, but the palindromes may indicate that the mechanism of chromatin diminution utilises site-specific recombination.

Another interesting finding of these experiments is that it appears that the programmed deletions are inherited uniformly throughout all the various somatic lineages. The earlier paper (discussed in the previous post) had suggested that different somatic tissues might have subtly different deleted portions. Microarray experiments, and comparison of the validated gene deletions between different tissues found no evidence of this, although this question may as yet not be answered definitively.

The crux of the paper rests on a computational comparison of ontology terms (in which homology is used to make predictions of cellular function, which are further sorted into broad categories). In the dataset of predicted gene deletions, certain ontologies were overrepresented with respect to the rest of the germline sequence; these included ‘regulation of gene expression’, ‘chromatin organisation’, and ‘development of germ/stem cells’.

Simply put, the paper has shown that a substantial number of protein-coding genes as well as repetitive sequences are deleted from the genomes of lamprey somatic cells. Many of the deleted genes are expressed in the germline, and often appear to have important regulatory functions. The crucial characteristics of the germline are the ability to undergo meiotic recombination, and totipotency. The missexpression of factors involved in these processes in the soma would be seriously detrimental; potentially resulting in aberrant cell fate specification, genome disruption, and hence cancers. The authors postulate that this conflict of interests between the germline and the soma underlies their genomic differentiation.

I find this an attractive and interesting hypothesis. As yet though, I don’t think the data is strong enough to have proved it. Gene ontology terms are relatively crude categorisations, and compounded with the question of what proportion of candidate deletions are bona fide, I’ll withhold judgement on the evolutionary rationale behind the deletions for the moment. Jeramiah Smith and colleagues are currently assembling the entire lamprey germline genome. Complete annotation of the deleted portion of the genome will certainly reveal the function of these fascinating genome rearrangements more clearly. I look forward to new studies investigating the mechanisms underlying the rearrangements, and their developmental progression. The extensive genome remodelling that occurs in ciliates utilises a combination of a small RNA/Argonaute system and domesticated transposase enzymes. I guess that analysis of any transposases encoded in the lamprey genome may be the place to start to unravel the mechanisms of chromatin diminution.

Smith JJ, Baker C, Eichler EE, & Amemiya CT (2012). Genetic consequences of programmed genome rearrangement. Current biology : CB, 22 (16), 1524-9 PMID: 22818913

Gene Birth, de novo

What are the genomic mechanisms responsible for the creation of evolutionary novelty? The view that has emerged from the findings of molecular and developmental biology in recent decades is that the primary substrate for evolutionary change is the regulatory network between genes, rather than emergence of new genes. For instance, animal development repeatedly utilises a relatively limited ‘toolkit’ of highly conserved families of transcription factors, signalling molecules and signal transduction components. Most morphological novelty seems to emerge via new deployments and combinations of this repertoire of pre-existing genes. No one doubts that new genes arise during evolution, but in line with the ‘predominantly regulatory’ model most gene birth has been considered to involve duplication of genes, followed by their reorganisation and diversification. However, whole genome sequencing has found that every evolutionary lineage contains protein-coding genes that lack homologues in other lineages – orphan genes. In fact, as much as one third of all annotated genes are orphans. Although orphan genes can arise by processes of duplication and (rapid) diversification, it appears likely that the primary mechanism for their emergence is de novo evolution from previously non-coding sequence.

de novo gene birth requires non-genic sequences to be first transcribed, acquire open reading frames (ORFs), and these ORFs to be translated. There are some obvious conceptual hurdles to be overcome in making a model to explain how this sequence of events may occur to an important extent; how does non-genic sequence become translated? and wouldn’t any polypeptide translation products be insignificant? Corvunis et al. have tried to surmount these problems by postulating a model of de novo gene birth that proceeds through intermediate, reversible, ‘proto-gene’ stages between the emergence of an ORF and bona fide functional protein-coding genes.

A major problem in understanding the genome is that ORFs in genomic sequence are classified using a minimal length threshold. Although ORFs encoding functional polypeptides as small as 9 amino acids long have been discovered, the standard length threshold used to delineate genic ORFs is 300nt (equating to an 100aa protein). In the budding yeast Saccharomyces cerevisiae, ~6000 ORFs are annotated as genes, whilst ~261, 000 unannotated ORFs longer than 3 codons are considered non-genic ORFs. The majority of the S. cerevisiae genome is transcribed, and a number of putatively non-coding transcripts have been shown to associate with ribosomes. Corvunis et al postulated that a certain amount of translation of non-genic ORFs may go on, and although the polypeptides produced may not be functional, providing they were not toxic and selected against, these proto-genes could be maintained in the genome. Proto-genes would provide adaptive potential, and a subset could be retained over time if they provided selective advantage. New genes originating de novo would be expected to be initially shorter, less expressed and more rapidly evolving than established genes.

Carvunis et als model for de novo gene birth leads to a number of predictions. Firstly, there should be an evolutionary continuum between non-genic ORFs and bona fide genes with respect to such characteristics as length, expression level and sequence composition. Secondly, many non-genic ORFs should be translated, and thirdly, some recently emerged ORFs should be adaptively advantageous and hence retained by natural selection.

To test these predictions, Carvunis et al. estimated the order of emergence of S. cerevisiae ORFs, based on their level of conservation amongst ascomycete fungi. Annotated ORFs were classified into 10 groups. For instance, those ORFs found only in S. cerevisiae constituted ORFs1, which accounted for ~2% of the total. ~12% were only conserved within the four closely related Saccharomyces species (ORFs1-4). The weak conservation, and poor characterisation of ORFs1-4 means that their annotation as genes is debatable, whereas the ~88% of annotated S. cerevisiae ORFs that had homologues in more distant species (ORFs5-10), can be more confidently considered genes. The authors also classified ~108,000 unannotated ORFs longer than 30nt as having a conservation level of 0 (ORFs0). Hence, ORFs0 and ORFs1-4 were considered as candidate proto-genes, whilst ORFs5-10 were classed as bona fide genes.

In agreement with the postulated continuum between non-genic ORFs and genes, Corvunis et al. found a positive correlation between the level of conservation and both gene length and expression level. A spectrum of codon usage was also observed; the relative abundances of amino acids encoded by ORFs1-4 being intermediate between those of the (hypothetical) translation products of ORFs5-10 and ORFs0.

To test the second prediction, Corvunis et al. used data on ribosomal occupancy to search for signatures of translation amongst ORFs0. Of these ~108,000 short, unannotated ORFs, 1,139 showed evidence of translation (termed ORFs0+).

The authors went on to measure the extent of selection operating on their classes of ORFs by comparing the genome sequences of 8 different S. cerevisiae strains. ~3% of ORFs0+ and ~9-25% of ORFs1-4 were found to be under purifying selection.

Corvunis et al. therefore classify the set of ORFs0+  (that showed translational activity), and those ORFs1-4 that don’t necessarily show evidence of being under purifying selection, as proto-genes. This set amounted to 1,891 ORFs displaying characteristics intermediate between non-genic ORFs and genes.

Although Corvunis et al found evidence to support all three of their predictions, I found the most persuasive evidence in favour of the importance of de novo gene birth to be the fact that since the division of S. cerevisiae and S. paradoxus between 1 and 5 novel genes have arisen by gene duplication mechanisms, whilst 19 of the 143 ORFs1 (arising de novo in the same period) were found to be under purifying selection.

Perhaps the main take-home message from these analyses is that the imposition of arbitrary annotation boundaries (eg. the 100 codon cut-off) can lead to artifactual understandings. The findings of widespread non-coding transcription, and the potential for marginal, non-functional translation mean that genes exist on a continuum, and their RNA and protein products exist on spectra of functionality. These ‘shades of grey’ may actually be an important source of evolutionary potential.

Carvunis AR, Rolland T, Wapinski I, Calderwood MA, Yildirim MA, Simonis N, Charloteaux B, Hidalgo CA, Barbette J, Santhanam B, Brar GA, Weissman JS, Regev A, Thierry-Mieg N, Cusick ME, & Vidal M (2012). Proto-genes and de novo gene birth. Nature PMID: 22722833

Tautz D, & Domazet-Lošo T (2011). The evolutionary origin of orphan genes. Nature reviews. Genetics, 12 (10), 692-702 PMID: 21878963

Genomic Rearrangement in Lampreys

Generally, all cells in an organism are considered to have the same genome; the differences between cells being determined by differential expression of genes. However, a growing list of species, including sciarid flies, and various copepods and nematodes, are known to undergo genomic rearrangements during the differentiation of cell lineages. Recent findings have shown that two species of jawless vertebrates (cyclostomes), the hagfish and the sea lamprey, undergo extensive genomic remodelling during development.

Hagfish and lampreys are the closest extant relatives of the jawed vertebrates (gnathostomes). Controversy has reigned over whether lampreys are more closely related to gnathostomes than they are to hagfish. This debate appears to have been resolved by molecular phylogenetics; the two branches of jawless vertebrates are united as a monophyletic clade, the cyclostomes. However the division of hagfish and lampreys occurred shortly after the division of gnathostomes and cyclostomes in the early stages of vertebrate evolution (~500mya). Two whole genome duplications occurred during the early evolution of the vertebrates (referred to as 1R and 2R). It is not yet resolved whether these duplications occurred before the divergence of cyclostomes and gnathostomes. Opinion appears to be split as to whether this divergence occurred after 1R or after 2R. The easiest way to resolve this question is whole genome sequencing. It has been known for some time that hagfish undergo genomic rearrangements during early development. Smith et al (2009) have now shown a similar phenomenon occurring in the Sea Lamprey (Petromyzon marinus), explaing some of the difficulty in constructing a finished version of the sea lamprey genome.

Smith et al. found that the total DNA content of nuclei from germline (sperm) and various somatic cell types (blood, liver, kidney) differed by >20%, equating to ≈500Mb. They then performed southern blots in which the restricted genomes of blood or sperm cells were probed with a repetitive sequence element. A number of bands differed in size or intensity between the germline and somatic cells, showing that genome rearrangement had occurred. One specific band was present in the germline samples and virtually absent from a range of somatic tissues. This band, termed Germ1 consisted of sequences for the 18s ribosomal DNA, a retrotransposon, and a section of the 28s rDNA. When aligned to Lamprey genomic sequence (somatic cell derived), these sequences were all commonly found, but one section of Germ1, from one end of the fragment to the 28s rDNA section, was dramatically underrepresented; a germline specific sequence. Smith et al. then performed FISH (fluorescent in situ hybridisation) using the Germ1 clone against metaphase germline and somatic cells. In the germline they found many Germ1-like sequences distributed across several chromosomes, often arrayed in tandem repeats. In contrast, in somatic cell nuclei, Germ1 hybridised to only one chromosome pair, most likely equating to the functional rDNAs. Using real-time PCR over the period of early embryogenesis, the authors found Germ1 abundance was drastically reduced 2 or 3 days after fertilisation. Smith et al. estimate that Germ1-like sequences make up ~7% of the germline genome, therefore suggesting that ~13% more of it is also lost. By comparing germline BAC clones with somatic genome sequence, they managed to identify more lost sequences including a gene known to be expressed during germline development, SPOPL.

Smith et al. have therefore shown that the sea lamprey genome undergoes a dramatic rearrangement during the early development of the somatic tissues. A large proportion of the excised content is accounted for by the elimination of Germ1-like sequences. However, it appears likely that a number of specific genes are also lost. In most cases of large scale somatic genomic rearrangement the main excised component is made up of transposable elements and other repetitive sequences. It seems likely that this could be the main basis for that seen in lampreys, as Germ1-like sequences appear to be a strange fusion of transposon and duplicated ribosomal DNA genes. However, the finding that an individual gene, SPOPL, is also selectively deleted in somatic lineages suggests that genomic deletions could be linked to the genetic regulation of development.

Interestingly, when the common apoptosis assay TUNEL – which detects DNA breaks – is used during the first few weeks of lamprey embryonic development, nearly every nucleus is labelled. It seems likely that this effect (that was considered an artifact) is explained by developmentally regulated deletions. The deletion of SPOPL appeared to occur more gradually than that of Germ1- like sequences. Together with the observation that the total nuclear DNA content differed slightly between various somatic tissues, these findings suggest that this program of somatic genomic deletions could be occurring in an intricate, and tissue specific, progression during early development.

All jawed-vertebrates undergo programmed genomic rearrangements during the diversification of the immune system. VDJ recombination, mediated by the transposase derived RAG recombinase, generates antigenic diversity, allowing adaptive immune responses. Cyclostomes have an alternate RAG-independent adaptive immune system, termed VLR. The recombinational system used in VLR is not yet clear. Is this system linked to that employed during the early developmentally regulated deletion process?

Smith et al (2010) link the presence of chromatin diminution (ie the somatic genomic rearrangements) in this basal vertebrate taxon to the whole genome duplications that occurred in the vertebrate stem group. If the genomic rearrangement mechanisms seen in cyclostomes were present in the last common ancestor of jawed and jawless vertebrates perhaps this system predisposed the stem group vertebrates to whole genome duplications by creating a permissive environment for polyploidisation and rediploidisation? Although this is a fascinating idea, currently it is perhaps a slightly idle speculation. It is of pressing importance to understand the mechanisms underlying cyclostome genomic rearrangements. Are they the same between hagfish and lampreys? Are similar systems present in gnathostomes? Finished genome sequences for both germline and various somatic cell lineages will answer many questions regarding the effects and purposes of chromatin diminution, however one can understand that this is easier said than done considering the potential complexity of the rearrangements. Lampreys and hagfish are also so far unculturable in the laboratory, adding to the difficulty in expanding their experimental use. Perhaps the biggest question left hanging, is whether other jawed vertebrates employ programmed genomic rearrangements for purposes other than antigenic diversification? This remains a possibility as the consistency of the genome has been generally assumed rather than tested.

See also a follow up on a new paper from the same group: Genomic Rearrangement in Lampreys 2

Smith JJ, Antonacci F, Eichler EE, & Amemiya CT (2009). Programmed loss of millions of base pairs from a vertebrate genome. Proceedings of the National Academy of Sciences of the United States of America, 106 (27), 11212-7 PMID: 19561299

Smith JJ, Saha NR, & Amemiya CT (2010). Genome biology of the cyclostomes and insights into the evolutionary biology of vertebrate genomes. Integrative and comparative biology, 50 (1), 130-7 PMID: 21558194

Shimeld SM, & Donoghue PC (2012). Evolutionary crossroads in developmental biology: cyclostomes (lamprey and hagfish). Development (Cambridge, England), 139 (12), 2091-9 PMID: 22619386
This article is good for vertebrate phylogeny and cyclostome development. A free version here