Eran Elhaik Research

Genomics

Research topics:

Inferring genome compositional organization

Analysing genome compositional organization

Inferring genome compositional organization is one of the most important quesitons in genomics. It has been claimed that animal genomes are composed of a mosaic of sequence stretches of variable lengths that differ widely in their GC compositions. In all animals studied so far, the distribution of GC-content domain lengths (also known as isochores) was found to follow a heavy-tail distribution with power-law decay exponents ranging from -1.12 to -1.15. One of the most common ways to describe a genome is by means of the nucleotide distribution, particularly the distribution of GC content. If complete genomic data is absent, the genomic composition can be deduced from the GC content distributions along short scaffolds of genes and their flanking regions.

Many of the theories concerning the evolution of isochores are also based on studies that used GC3 as a predictor for isochore composition or that simply assumed the existence of isochores. I showed that these theories cannot be discussed without further analysis of genomic data (Elhaik et al. 2009). Moreover, approaches to the issue of compositional and structural organization should not assume the existance of isochores in a genome but should rather be based on a model that has been tested on complete genomic sequences. These and other findings and the relationship between genomic composition and methylation appeard in Elhaik and Tatarinova (2012)

Because different segmentation algorithm yields different results (Elhaik et al. 2010a), due to the the user input, which drastically affect the results - it is necessary to use a paramtere-free algorithm which does not wield to the user and considers only the sequence. Such an algorithm was developed by (Elhaik et al. 2010b) (see IsoPlotter) and was used to develop a new model for compositional genomics that would explain the findings which Bernardi's isochore model failed to explain.

In Elhaik and Graur (2014) we carried an extensive analysis of genome composition. In this paper, we have: 1. Refuted the Isochore theory (hopefully for the last time). 2. Provided a detailed description of mammalian genome landscape. 3. We invalidated the clade Euarchontoglires (Murids are closer to Primates). Our findings depict the mammalian genome as a tapestry of mostly short homogeneous and nonhomogeneous domains and few long ones thus providing strong evidence in favor of the compositional domain model.

Analysing genome compositional organization is an essential step before analyzing any new genome.

We modeled the genome organization and developed phylogenetic applications that can help in deciding the position of a species in a tree.

The Compositional Domain Model we developed was found to be consistent in describing the genome organization of eutherian and other species but was also useful for genomic comparisons. Shortly after its introduction, the Compositional Domain Model became the choice model for genomic analyses and has been applied in numerous genome sequencing projects such as the honeybee (Sodergren et al.2006a) and its newest build (Sodergren et al.2006a), sea urchin (Sodergren et al. 2006b), red flour beetle (Richards et al. 2008), cow (Gibbs et al. 2009), nasonia (Warren et al. 2010), body louse (Kirkness et al. 2010), and many ant genomes, such as the red harvester ant (Smith et al. 2011), invasive Argentine ant (Smith et al. 2011), and the Leaf-Cutter Ant (Suen et al. 2011).
and others (Suen et al. 2011).

This work was widely covered by the media (see Press).

Analyzing base composition of DNA is important to the understanding of genome organization. The nucleotide composition of genomes varies dramatically between and among taxa. The GC content is the primary measure to characterize genomic regions in terms of homogeneity, compositional bias, and compositional constraints. Zhang and Zhang (1991) proposed the Z-curve, an extension to the GC content measure, based on a three coordinate system of x, y, and z and a derived measure, namely the "genome order index" defined as S = a^2 + c^2 + t^2 + g^2, where a, c, t, and g are the nucleotide frequencies of A, C, T, and G, respectively. The fact that the numerical value of S is smaller than 1/3 for almost all DNA sequences of 809 genomes have been erroneously interpreted as supporting evidence for the existence of genome-specific constraints on nucleotide composition of naturally occurring DNA, i.e., isochores.

We studied the Z-curve method and the "genome order index" purported by their developers to be useful measures.

In two consequent papers Elhaik et al. (2008) and Elhaik et al. (2010) showed that these calculations are in error and that the Z-curve suffers from over dimensionality, as the Z dimension, stands for GC content, suffices to represent any given genome. This work establised the importance of utilizing the GC content to study genome composition and organization.

References
Elhaik, E., and D. Graur. 2014. A Comparative Study and a Phylogenetic Exploration of the Compositional Architectures of Mammalian Nuclear Genomes. PloS Computational Biology. DOI: 10.1371/journal.pcbi.1003925.
Elhaik, E., and D. Graur. 2013. IsoPlotter+: A Tool for Studying the Compositional Architecture of Genomes. ISRN Bioinformatics. 2013:6.
Elhaik, E., D. Graur, and K. Josic. 2008. 'Genome order index' should not be used for defining compositional constraints in nucleotide sequences. Computational Biology and Chemistry. 32:147.
Elhaik, E., D. Graur, and K. Josic. 2010a. Comparative testing of DNA segmentation algorithms using benchmark simulations. Molecular Biology and Evolution. 27:1015-1024.
Elhaik, E., D. Graur, and K. Josic. 2010b. 'Genome order index' should not be used for defining compositional constraints in nucleotide sequences - a case study of the Z-curve. Biology Direct. 5:10.
Elhaik, E., D. Graur, K. Josic, and G. Landan. 2010. Identifying compositionally homogeneous and nonhomogeneous domains within the human genome using a novel segmentation algorithm. Nucleic Acids Res. 38:e158.
Elhaik, E., G. Landan, and D. Graur. 2009. Can GC Content at Third-Codon Positions Be Used as a Proxy for Isochore Composition? Molecular Biology and Evolution. 26:1829-1833.
Elhaik, E., and T. V. Tatarinova. 2012. GC3 Biology in Eukaryotes and Prokaryotes. Pp. 55-68 in T. Tatarinova, and O. Kerton, eds. DNA Methylation - From Genomics to Technology. InTech.
Elsik, C. G.R. L. TellamK. C. Worley et al. 2009. The genome sequence of taurine cattle: a window to ruminant biology and evolution. Science. 324:522-528.
Elsik, C. G.K. C. WorleyA. K. Bennett et al. 2014. Finding the missing honey bee genes: lessons learned from a genome upgrade. BMC Genomics. 15:86.
Kirkness, E. F.B. J. HaasW. Sun et al. 2010. Genome sequences of the human body louse and its primary endosymbiont provide insights into the permanent parasitic lifestyle. Proceedings of the National Academy of Sciences of the United States of America. 107:12168-12173.
Richards, S.R. A. GibbsG. M. Weinstock et al. 2008. The genome of the model beetle and pest Tribolium castaneum. Nature. 452:949-955.
Simola, D. F.L. WisslerG. Donahue et al. 2013. Social insect genomes exhibit dramatic evolution in gene composition and regulation while preserving regulatory features linked to sociality. Genome Research.
Smith, C. D.A. ZiminC. Holt et al. 2011a. Draft genome of the globally widespread and invasive Argentine ant (Linepithema humile). Proceedings of the National Academy of Sciences of the United States of America. 108:5673-5678.
Smith, C. R.C. D. SmithH. M. Robertson et al. 2011b. Draft genome of the red harvester ant Pogonomyrmex barbatus. Proceedings of the National Academy of Sciences of the United States of America. 108:5667-5672.
Sodergren, E.G. M. WeinstockE. H. Davidson et al. 2006a. Insights into social insects from the genome of the honeybee Apis mellifera. Nature. 443:931-949.
Sodergren, E.G. M. WeinstockE. H. Davidson et al. 2006b. The genome of the sea urchin Strongylocentrotus purpuratus. Science. 314:941-952.
Suen, G.C. TeilingL. Li et al. 2011. The genome sequence of the leaf-cutter ant Atta cephalotes reveals insights into its obligate symbiotic lifestyle. PLoS Genetics. 7:e1002007.
Werren, J. H.S. RichardsC. A. Desjardins et al. 2010. Functional and evolutionary insights from the genomes of three parasitoid Nasonia species. Science. 327:343-348.