Eran Elhaik Publications

Publications

Publication type:

Peer-reviewed publications

Book Chapters

YouTube Videos

Highlighted papers

Zhang et al. (2024). Microbiome Geographic Population Structure (mGPS) Detects Fine-Scale Geography.

Elhaik et al. (2022). Principal Component Analyses (PCA)-based findings in population genetic studies are highly biased and must be reevaluated.

Behnamian et al. (2022). Temporal population structure, a genetic dating method for ancient Eurasian genomes from the past 10,000 years.

Danko et al. (2021). Global Genetic Cartography of Urban Metagenomes and Anti-Microbial Resistance.

Baughn et al. (2020). Targeting TMPRSS2 in SARS-CoV-2 infection.

Elhaik (2019). Neonatal circumcision and prematurity are associated with sudden infant death syndrome (SIDS).

Elhaik (2014). Geographic population structure analysis of worldwide human populations infers their biogeographical origins.

Elhaik (2013). The Missing Link of Jewish European Ancestry: Contrasting the Rhineland and the Khazarian Hypotheses.

Smith et al. (2011). Draft genome of the red harvester ant Pogonomyrmex barbatus.

Smith et al. (2011). Draft genome of the globally widespread and invasive Argentine ant (Linepithema humile).

Werren et al. (2010). Functional and evolutionary insights from the genomes of three parasitoid Nasonia species .

Elsik et al. (2009). Sequence of Taurine Cattle: A Window to Ruminant Biology and Evolution.

Weinstock et al. (2006). Insights into social insects from the genome of the honeybee Apis mellifera.

Sodergren et al. (2006). The genome of the sea urchin Strongylocentrotus purpuratus.

Peer-reviewed publications

Preprints are marked with X

2026

75.

Mak, L., Tierney, B., Wei, W., Ronkowski, C., Brizola Toscan, R., Turhan, B., Toomey, M., Andrade-Martínez, J. S., Fu, C., Lucaci, A. G., Barrios Solano, A. H., Setubal, J. C., Henriksen, J. R., Zimmerman, S., Kopbayeva, M., Noyvert, A., Iwan, Z., Kar, S., Nakazawa, N., Meleshko, D., Horyslavets, D., Kantsypa, V., Frolova, A., Kahles, A., and Danko, D. Elhaik, E., Labaj, P., Mangul, S., The International MetaSUB Consortium, Mason, C. E., Hajirasouliha, I. 2026. CAMP: a modular metagenomics analysis system for integrated multistep data exploration. NAR Genomics and Bioinformatics.
Altmetric score: #2 most-read paper of all outputs by NAR Genomics and Bioinformatics of the same age.

More...

Abstract	Computational analysis of large-scale metagenomics sequencing datasets provides valuable isolate-level taxonomic and functional insights from complex microbial communities. However, the ever-expanding ecosystem of metagenomics-specific methods and file formats makes designing scalable workflows and seamlessly exploring output data increasingly challenging. Although one-click bioinformatics pipelines can help organize these tools into workflows, they face compatibility and maintainability challenges that can prevent replication. To address the gap in easily extensible yet robustly distributable metagenomics workflows, we have developed the Core Analysis Modular Pipeline (CAMP), a module-based metagenomics analysis system written in Snakemake, with a standardized module and directory architecture. Each module can run independently or in sequence to produce target data formats (e.g. short-read preprocessing alone or followed by de novo assembly), and provides output summary statistics reports and Jupyter notebook-based visualizations. We applied CAMP to a set of 10 metagenomics samples, demonstrating how a modular analysis system with built-in data visualization facilitates rich seamless communication between outputs from different analytical purposes. The CAMP ecosystem (module template and analysis modules) can be found at https://github.com/Meta-CAMP.
Authors	Mak, L., Tierney, B., Wei, W., Ronkowski, C., Brizola Toscan, R., Turhan, B., Toomey, M., Andrade-Martínez, J. S., Fu, C., Lucaci, A. G., Barrios Solano, A. H., Setubal, J. C., Henriksen, J. R., Zimmerman, S., Kopbayeva, M., Noyvert, A., Iwan, Z., Kar, S., Nakazawa, N., Meleshko, D., Horyslavets, D., Kantsypa, V., Frolova, A., Kahles, A., and Danko, D.
Keywords	Computational metagenomics, large-scale sequencing, microbial communities, taxonomic profiling, functional annotation, isolate-level analysis, scalable workflows, metagenomics,

The School of Athens by Raphael. Source: Wikimedia

2025

74.

Weinreich, M., McDonough, H., Heverin, M., Mac Domhnaill, É., Yacovzada, N., Magen, I., Cohen, Y., Harvey, C., Elazzab, A., Gornall, S., Boddy, S., Alix, J. J. P., Kurz, J. M., Kenna, K. P., Zhang, S., Iacoangeli, A., Al-Khleifat, A., Snyder, M. P., Hobson, E., Chio, A., Malaspina, A., Hermann, A., Ingre, C., Vazquez Costa, J., van den Berg, L., Povedano Panadés, M., van Damme, P., Corcia, P., de Carvalho, M., Al-Chalabi, A., Hornstein, E., Elhaik, E., Shaw, P. J., Hardiman, O., McDermott, C., and Cooper-Knock, J. 2025. Optimised machine learning for time-to-event prediction in healthcare applied to timing of gastrostomy in ALS: a multi-centre, retrospective model development and validation study. eBioMedicine.
Altmetric score: #11 most read paper of all outputs by eBioMedicine of the same age.

More...

Abstract	Background. Amyotrophic lateral sclerosis (ALS) is invariably fatal but there are large variations in the rate of progression. The lack of predictability can make it difficult to plan clinical interventions. This includes the requirement for gastrostomy where early or late placement can adversely impact quality of life and survival. Methods. We designed a model to predict the timing of gastrostomy requirement in ALS as indicated by 5% weight loss from diagnosis. We considered >5000 different prediction model configurations including spline models and a set of deep learning (DL) models designed for time-to-event prediction. The optimal prediction model was chosen via a Bayesian framework to avoid overfitting. Model covariates were measurements routinely collected at diagnosis; a separate longitudinal model also incorporated weight at six months. We employed a training dataset of 3000 patients from Europe, and two external validation cohorts spanning distinct populations and clinical contexts (United States, n = 299; and Sweden, n = 215). Missing data was imputed using a random forest model. Findings. The optimal model configuration was a logistic hazard DL model. The optimal model achieved a median absolute error (MAE) between predicted and measured time of 3.7 months, with AUROC 0.75 for gastrostomy requirement at 12 months. To increase accuracy we updated predictions for those who had not received gastrostomy at six months after diagnosis: here MAE was 2.6 months (AUROC 0.86). Combining both models achieved MAE of 1.2 months for the modal group of patients. Prediction performance is stable across both validation cohorts. Missing data was imputed without degrading model performance. Interpretation. To enter routine clinical practice a prospective study will be required, but we have demonstrated stable performance across multiple populations and clinical contexts suggesting that our prediction model can be used to guide individualised gastrostomy decision making for patients with ALS.
Authors	Weinreich, M., McDonough, H., Heverin, M., Mac Domhnaill, É., Yacovzada, N., Magen, I., Cohen, Y., Harvey, C., Elazzab, A., Gornall, S., Boddy, S., Alix, J. J. P., Kurz, J. M., Kenna, K. P., Zhang, S., Iacoangeli, A., Al-Khleifat, A., Snyder, M. P., Hobson, E., Chio, A., Malaspina, A., Hermann, A., Ingre, C., Vazquez Costa, J., van den Berg, L., Povedano Panadés, M., van Damme, P., Corcia, P., de Carvalho, M., Al-Chalabi, A., Hornstein, E., Elhaik, E., Shaw, P. J., Hardiman, O., McDermott, C., and Cooper-Knock, J.
Keywords	Time-to-event prediction, Machine learning, Personalised medicine, Amyotrophic lateral sclerosis (ALS), Gastrostomy

73.

Jaiswal, R. K., Garibo Domingo, T., Grunchec, H., Singh, K., Pirooznia, M., Elhaik, E., and Cohn, M. 2025. Subtelomeric elements provide stability to short telomeres in telomerase-negative cells of the budding yeast N. castellii. Current Genetics.

More...

Abstract	Motivation. Efforts to address health disparities are often limited by the lack of robust computational tools for inferring genetic ancestry by calculating an individual’s genetic similarity to continental groups. We have already shown that a preferred alternative to self-described race is using ancestry informative markers (AIMs) that can be classified into ancestral components and used to estimate their similarity to those of known populations to identify continental groups. However, real-world genomic data can present challenges, including limited availability of germline DNA, a small number of AIMs for each sample, and the use of different variant calling software, limiting the application of existing solutions. Results. Here, we describe a novel supervised machine-learning tool AncestryGeni, which infers genetic ancestry for samples with even a hundred markers and is applicable to any genomic data, including exome sequencing (WES) and RNA sequencing (RNA-Seq) data. Applying AncestryGeni to a real-world genomic dataset obtained from the Multiple Myeloma Research Foundation (MMRF) CoMMpass study, we show that it is more accurate than the commonly used FastNGSadmix when using non-standard genomic material. We also demonstrate that when using AncestryGeni, the tumor-derived sequence obtained from WES and RNA-Seq can be a robust data source to accurately estimate an individual’s genetic similarity to a continental group. Availability and implementation. AncestryGeni pipeline is available at https://github.com/eelhaik/AncestryGeni/tree/main.
Authors	Elhaik, E., Behnamian, S., Howe, M., Tang, H., Yan, H., Tian, S., Shivaram, S., Zepeda Mendoza, C., MacLachlan, K., Usmani, S., Pirooznia, M., Morgan, G., Blaney, P., Maura, F., and Baughn, L. B.
Keywords	health disparities, genetic ancestry inference, ancestry informative markers, continental groups, self-described race, population genetics, genomic diversity, supervised machine learning, genomic ancestry classification, Multiple Myeloma

72.

Elhaik, E., Behnamian, S., Howe, M., Tang, H., Yan, H., Tian, S., Shivaram, S., Zepeda Mendoza, C., MacLachlan, K., Usmani, S., Pirooznia, M., Morgan, G., Blaney, P., Maura, F., and Baughn, L. B. 2025. AncestryGeni: A novel genetic ancestry classification pipeline for small and noisy sequence data. Bioinformatics.

More...

Abstract	Motivation. Efforts to address health disparities are often limited by the lack of robust computational tools for inferring genetic ancestry by calculating an individual’s genetic similarity to continental groups. We have already shown that a preferred alternative to self-described race is using ancestry informative markers (AIMs) that can be classified into ancestral components and used to estimate their similarity to those of known populations to identify continental groups. However, real-world genomic data can present challenges, including limited availability of germline DNA, a small number of AIMs for each sample, and the use of different variant calling software, limiting the application of existing solutions. Results. Here, we describe a novel supervised machine-learning tool AncestryGeni, which infers genetic ancestry for samples with even a hundred markers and is applicable to any genomic data, including exome sequencing (WES) and RNA sequencing (RNA-Seq) data. Applying AncestryGeni to a real-world genomic dataset obtained from the Multiple Myeloma Research Foundation (MMRF) CoMMpass study, we show that it is more accurate than the commonly used FastNGSadmix when using non-standard genomic material. We also demonstrate that when using AncestryGeni, the tumor-derived sequence obtained from WES and RNA-Seq can be a robust data source to accurately estimate an individual’s genetic similarity to a continental group. Availability and implementation. AncestryGeni pipeline is available at https://github.com/eelhaik/AncestryGeni/tree/main.
Authors	Elhaik, E., Behnamian, S., Howe, M., Tang, H., Yan, H., Tian, S., Shivaram, S., Zepeda Mendoza, C., MacLachlan, K., Usmani, S., Pirooznia, M., Morgan, G., Blaney, P., Maura, F., and Baughn, L. B.
Keywords	health disparities, genetic ancestry inference, ancestry informative markers, continental groups, self-described race, population genetics, genomic diversity, supervised machine learning, genomic ancestry classification, Multiple Myeloma

70.

Tang, H., Yan H., Shivaram S., Lehman S., Sharma N., Smadbeck J., Zepeda-Mendoza C., Tian S., Asmann, Y., Vachon, C., Maia, A.G., Keats, J., Bergsagel, P.L., Fonseca, R., Stewart, A.K., Hsu, J.S., Kandasamy, R.K., Pandey, A., Kaddoura, M.A., Maura, F., Mitra, A., Rajkumar, S.V., Kumar, S.K., Elhaik, E., Braggio, E., and Baughn, L.B. 2024. Functional variant rs9344 at 11q13.3 regulates CCND1 expression in multiple myeloma with t(11;14). Leukemia.

More...

Abstract	Multiple myeloma (MM) is a plasma cell (PC) malignancy characterized by cytogenetic abnormalities, such as t(11;14)(q13;q32), resulting in CCND1 overexpression. The rs9344 G allele within CCND1 is the most significant susceptibility allele for t(11;14). Sequencing data from 2 independent cohorts, CoMMpass (n = 698) and Mayo Clinic (n = 661), confirm the positive association between the G allele and t(11;14). Among 80% of individuals heterozygous for rs9344 with t(11;14), the t(11;14) event occurs on the G allele, demonstrating a biological preference for the G allele in t(11;14). Within t(11;14), the G allele is associated with higher CCND1 expression and elevated H3K27ac and H3K4me3. CRISPR/Cas9 mediated A to G conversion resulted in increased H3K27ac over CCND1 and elevated CCND1 expression. ENCODE ChIP-seq data supported a PAX5 binding site within the enhancer region covering rs9344, showing preferential binding to the G allele. Overexpression of PAX5 resulted in increased CCND1 expression. These results support the importance of rs9344 G enhancer in increasing CCND1 expression in MM.
Authors	Tang, H., Yan H., Shivaram S., Lehman S., Sharma N., Smadbeck J., Zepeda-Mendoza C., Tian S., Asmann, Y., Vachon, C., Maia, A.G., Keats, J., Bergsagel, P.L., Fonseca, R., Stewart, A.K., Hsu, J.S., Kandasamy, R.K., Pandey, A., Kaddoura, M.A., Maura, F., Mitra, A., Rajkumar, S.V., Kumar, S.K., Elhaik, E., Braggio, E., and Baughn, L.B.
Keywords	Multiple Myeloma (MM), CCND1, rs9344, genomic ancestry, t(11;14), PAX5

Amos, W., and Elhaik, E. 2025. Unexpected D-tour Ahead: Why the D-Statistic, applied to Humans, Measures Mutation Rate Variation not Neanderthal Introgression. bioRxiv.

More...

Abstract	It is widely accepted that humans interbred with Neanderthals and other extinct hominins, leaving a lasting genetic legacy. However, much of the supporting evidence was developed using the statistic D, which assumes, without testing, both that mutation rate is constant and that recurrent mutations are vanishingly rare. These assumptions together preclude an alternative explanation based on variation in mutation rates across human populations. Here we critically evaluate the assumptions underlying D and confirm that neither is valid. Over 40% of SNPs in dbSNP carry recurrent mutations. Theory indicates that D does not vary with mutation rate as long as the mutation rate does not vary between populations. In practice, D calculated separately for different sequence motifs varies greatly, implying strongly that mutation rates do vary between populations. We show that most, if not all, D-informative sites result from two mutations rather than the one mutation expected under the introgression hypothesis. Moreover, individual non-Africans carry a signal in more than five times as many genomic windows as can be accounted for by the 2% legacy they are thought to carry, indicating a signal that is radically more diffuse than expected. Remarkably, partitioning the data by whether the chimpanzee or Neanderthal allele is the major allele in humans reveals that the overall reported D-value of ~5% actually comprises two opposing components: one with D ~ 30% and another with D ~ -25%. Tellingly, the positive component is produced by sites where the Neanderthal allele is the major allele, the exact opposite of what should be the case under introgression, where introgressed alleles should be rare. We show further that the entire D signal can be accounted for by sites where the Neanderthal allele is fixed outside Africa and the chimpanzee allele is rare inside Africa. Investigating potential mechanisms, we extend the published observation that the mutability of three-base combinations across human populations is influenced by flanking sequence heterozygosity to reveal how genomic regions that lost more heterozygosity out of Africa exhibit higher D-values. This correlation supports a model where loss of heterozygosity slowed the mutation rate, thereby reducing the divergence between Neanderthals and non-Africans. Across independent tests, our findings consistently indicate that the mutation rate variation hypothesis provides a more compelling explanation for the observed patterns in human-Neanderthal genetic relationships than the introgression hypothesis. We argue that the mutation rate variation hypothesis would help settle a number of conflicting patterns in the literature and, hence, that the concept of archaic introgression into humans and its implications for hominin-derived traits warrants reconsideration.
Authors	William, A., and Elhaik, E.
Keywords	Mutation Rate Variation Hypothesis (MVRH), Introgression hypothesis, D statistic, Neanderthal, Interbreeding, Human evolution

Painted on a jar found in Kuntilat Ajrud in the Sinai Peninsula

2024

71.

Zhang, Y., McCarthy, L., Ruff, S.E., and Elhaik, E.. 2024. Microbiome Geographic Population Structure (mGPS) Detects Fine-Scale Geography. Genome Biology and Evolution.
Altmetric score: #1 most-read paper in Genome Biology and Evolution compared with all outputs of the same age.

More...

Abstract	Over the past decade, sequencing data generated by large microbiome projects showed that taxa exhibit patchy geographical distribution, raising questions about the geospatial dynamics that shape natural microbiomes and the spread of antimicrobial resistance (AMR) genes. Answering these questions requires distinguishing between local and non-local microorganisms and identifying the source sites for the latter. Predicting the source sites and migration routes of microbiota has been envisioned for decades but was hampered by the lack of data, tools, and understanding of the processes governing biodiversity. State-of-the-art biogeographical tools suffer from low resolution and cannot predict biogeographical patterns at a scale relevant to ecological, medical, or epidemiological applications. Analyzing urban, soil, and marine microorganisms, we found that some taxa exhibit regional-specific composition and abundance, suggesting they can be used as biogeographical biomarkers. We developed the Microbiome Geographic Population Structure (mGPS), a machine-learning-based tool that utilizes microbial relative sequence abundances to yield a fine-scale source site for microorganisms. mGPS predicted the source city for 92% of the samples and the within-city source for 82% of the samples, though they were often only a few hundred meters apart. mGPS also predicted soil and marine sampling sites for 86% and 74% of the samples, respectively. We demonstrated that mGPS differentiated local from non-local microorganisms and used it to trace the global spread of AMR genes. mGPS’s ability to localize samples to their waterbody, country, city, and transit stations opens new possibilities in tracing microbiomes and has applications in forensics, medicine, and epidemiology.
Authors	Zhang, Y., McCarthy, L., Ruff, S.E., and Elhaik, E..
Keywords	Microbiome, Biogeogrphical predictions, Microbiome Geographic Population Structure (mGPS), antimicrobial resistance (AMR), Forensics, Machine Learning

Weinreich, M., McDonough, H., Yacovzada, N., Magen, I., Cohen, Y., Harvey, C., Gornall, S., Boddy, S., Alix, J., Kurz, J., Kenna, K., Zhang, S., Iacoangeli, A., Al-Khleifat, A., Snyder, M., Hobson, E., Al-Chalabi, A., Hornstein, E., Elhaik, E., Shaw, P., McDermott, C., and Cooper-Knock J. 2024. predicTTE: An accessible and optimal tool for time-to-event prediction in neurological diseases. bioRxiv.

More...

Abstract	Time-to-event prediction is a key task for biological discovery, experimental medicine, and clinical care. This is particularly true for neurological diseases where development of reliable biomarkers is often limited by difficulty visualising and sampling relevant cell and molecular pathobiology. To date, much work has relied on Cox regression because of ease-of-use, despite evidence that this model includes incorrect assumptions. We have implemented a set of deep learning and spline models for time-to-event modelling within a fully customizable app and accompanying online portal, both of which can be used for any time-to-event analysis in any disease by a non-expert user. Our online portal includes capacity for end-users including patients, Neurology clinicians, and researchers, to access and perform predictions using a trained model, and to contribute new data for model improvement, all within a data-secure environment. We demonstrate a pipeline for use of our app with three use-cases including imputation of missing data, hyperparameter tuning, model training and independent validation. We show that predictions are optimal for use in downstream applications such as genetic discovery, biomarker interpretation, and personalised choice of medication. We demonstrate the efficiency of an ensemble configuration, including focused training of a deep learning model. We have optimised a pipeline for imputation of missing data in combination with time-to-event prediction models. Overall, we provide a powerful and accessible tool to develop, access and share time-to-event prediction models; all software and tutorials are available at www …
Authors	Weinreich, M., McDonough, H., Yacovzada, N., Magen, I., Cohen, Y., Harvey, C., Gornall, S., Boddy, S., Alix, J., Mohseni, N., Kurz, J., Kenna, K., Zhang, S., Iacoangeli, A., Al-Khleifat, A., Snyder, M., Hobson, E., Al-Chalabi, A., Hornstein, E., Elhaik, E., Shaw, P., McDermott, C., and Cooper-Knock J.
Keywords	Time-to-event prediction, Biomarkers, Parkinson, Neurology, Cox regression

Sumerian sun god (Shamash) tablet (850 BC)

2023

69.

Zhang, Y., Ruff, S.E., Oskolkov, N., Tierney, B.T., Ryon, K., Danko, D., Mason, C.E., and Elhaik, E. 2023. The microbial biodiversity at the archeological site of Tel Megiddo (Israel). Frontiers in Microbiology.

More...

Abstract	Introduction: The ancient city of Tel Megiddo in the Jezreel Valley (Israel), which lasted from the Neolithic to the Iron Age, has been continuously excavated since 1903 and is now recognized as a World Heritage Site. The site features multiple ruins in various areas, including temples and stables, alongside modern constructions, and public access is allowed in designated areas. The site has been studied extensively since the last century; however, its microbiome has never been studied. We carried out the first survey of the microbiomes in Tel Megiddo. Our objectives were to study (i) the unique microbial community structure of the site, (ii) the variation in the microbial communities across areas, (iii) the similarity of the microbiomes to urban and archeological microbes, (iv) the presence and abundance of potential bio-corroding microbes, and (v) the presence and abundance of potentially pathogenic microbes. Methods: We collected 40 swab samples from ten major areas and identified microbial taxa using next-generation sequencing of microbial genomes. These genomes were annotated and classified taxonomically and pathogenetically. Results: We found that eight phyla, six of which exist in all ten areas, dominated the site (>99%). The relative sequence abundance of taxa varied between the ruins and the sampled materials and was assessed using all metagenomic reads mapping to a respective taxon. The site hosted unique taxa characteristic of the built environment and exhibited high similarity to the microbiome of other monuments. We identified acid-producing bacteria that may pose a risk to the site through biocorrosion and staining and thus pose a danger to the site’s preservation. Differences in the microbiomes of the publicly accessible or inaccessible areas were insignificant; however, pathogens were more abundant in the former. Discussion: We found that Tel Megiddo combines microbiomes of arid regions and monuments with human pathogens. The findings shed light on the microbial community structures and have relevance for bio-conservation efforts and visitor health.
Authors	Zhang, Y., Ruff, S.E., Oskolkov, N., Tierney, B.T., Ryon, K., Danko, D., Mason, C.E., and Elhaik, E.
Keywords	microbiome, pathogens, biocorrosion, monuments, acid-producing bacteria (APB), Urban microbiome, Monumentome, MetaSUB

68.

Eachus, H., Oberski, L., Paveley J., Bacila, I., Ashton, J.P., Esposito, U., Seifuddin, F., Pirooznia, M., Elhaik, E., Placzek, M., Krone, N., and Cunliffe, V.T. 2023. Glucocorticoid Receptor regulates protein chaperone, circadian clock and affective disorder genes in the zebrafish brain. Disease Models and Mechanisms.

More...

Abstract	Glucocorticoid resistance is commonly observed in depression and has been linked to reduced expression and/or function of the Glucocorticoid Receptor (GR). Previous studies have shown that GR mutant zebrafish exhibit behavioural abnormalities that are indicative of an affective disorder, suggesting that GR plays a role in brain function. We compared the brain methylomes and brain transcriptomes of wild-type and GR mutant adult zebrafish. 249 GR-regulated Differentially Methylated Regions (DMRs) were identified, including a cluster of CpGs within the first intron of the glucocorticoid-inducible, heat shock protein co-chaperone gene fkbp5. RNA-seq analysis revealed that genes associated with chaperone-mediated protein folding, regulation of circadian rhythm and regulation of metabolism were particularly sensitive to loss of GR function. In addition, subsets of genes exhibiting GR-regulated transcription were identified that are known to regulate behaviour and are linked to unipolar depression and anxiety. Taken together, our results identify key biological processes and novel molecular mechanisms through which the GR likely mediates responses to stress in the adult zebrafish brain, and they provide further support for the GR mutant as a model for the study of affective disorders.
Authors	Eachus, H., Oberski, L., Paveley J., Bacila, I., Ashton, J.P., Esposito, U., Seifuddin, F., Pirooznia, M., Elhaik, E., Placzek, M., Krone, N., and Cunliffe, V.T.
Keywords	DNA methylation; Glucocorticoid; Nervous System; Transcriptome.

67.

Movert, E., Bolarin, J.S., Valfridsson, C., Velarde, J., Skrede, S., Nekludov, M., Hyldegaard, O., Arnell, P., Svensson, M., Norrby-Teglund, A., Cho, K.H., Elhaik, E., Wessels, M.R., Råberg L., and Carlsson F. 2023. Interplay between human STING genotype and bacterial NADase activity regulates inter-individual disease variability. Nature Communications.
Altmetric score: Top 98% most-read paper compared to all outputs by Nature Communications of the same age.

More...

Abstract	Variability in disease severity caused by a microbial pathogen is impacted by each infection representing a unique combination of host and pathogen genomes. Here, we show that the outcome of invasive Streptococcus pyogenes infection is regulated by an interplay between human STING genotype and bacterial NADase activity. S. pyogenes-derived c-di-AMP diffuses via streptolysin O pores into macrophages where it activates STING and the ensuing type I IFN response. However, the enzymatic activity of the NADase variants expressed by invasive strains suppresses STING-mediated type I IFN production. Analysis of patients with necrotizing S. pyogenes soft tissue infection indicates that a STING genotype associated with reduced c-di-AMP-binding capacity combined with high bacterial NADase activity promotes a ‘perfect storm’ manifested in poor outcome, whereas proficient and uninhibited STING-mediated type I IFN production correlates with protection against host-detrimental inflammation. These results reveal an immune-regulating function for bacterial NADase and provide insight regarding the host-pathogen genotype interplay underlying invasive infection and interindividual disease variability.
Authors	Movert, E., Bolarin, J.S., Valfridsson, C., Velarde, J., Skrede, S., Nekludov, M., Hyldegaard, O., Arnell, P., Svensson, M., Norrby-Teglund, A., Cho, K.H., Elhaik, E., Wessels, M.R., Råberg L., and Carlsson F.
Keywords	STING, NADase, Streptococcus pyogenes, agricultural revolution, paleogenetics

66.

Koleilat, K., Tang, H., Sharma, N., Yan, H., Tian, S., Smadbeck, J., Shivaram, S., Meyer, R., Pearce, K., Baird, M., Mendoza, C.Z., Xu, X., Greipp, P.T., Peterson, J.F., Ketterling, R.P., Bergsagel, P.L., Vachon, C., Rajkumar, S.V., Kumar, S., Asmann, Y.W., Elhaik, E., and Baughn, L.B. 2023. Disparity in the detection of chromosome 15 centromere in patients of African ancestry with a plasma cell neoplasm. Genetics in Medicine Open.

More...

Abstract	Purpose. Fluorescence in situ hybridization (FISH) is the current gold-standard assay providing information related to risk stratification and therapeutic selection for individuals with plasma cell neoplasms (PCNs). The differential hybridization of FISH probe sets in association with individuals’ genetic ancestry has not been previously reported. Methods. This retrospective study included 1,224 bone marrow (BM) samples from individuals who had an abnormal plasma cell proliferative disorder FISH result and a concurrent conventional G-banded chromosome study. DNA from BM samples obtained from the G-banded chromosome study was genotyped and a biogeographical ancestry prediction was carried out. Results. Using a cohort of individuals with a PCN, we identified reduced hybridization of a chromosome 15 centromere FISH probe (D15Z4). Metaphase FISH studies of cells with two copies of chromosome 15 demonstrated a failure of the D15Z4 FISH probe to hybridize to one chromosome 15 centromere revealing a false-positive monosomy 15 FISH result in some individuals. Surprisingly, individuals with a monosomy 15 FISH result had a median African ancestry of 77.2% (95% CI: 74.1%-80.3%), compared with a median African ancestry of 2.2% (95% CI: 2.0%-2.5%) in the non-monosomy 15 cohort (p-value=9.4x10-10). Thus, individuals with African ancestry had an 8.02-fold (95% CI: 3.73- 17.25) increased probability of having a false-positive monosomy 15 result (p-value = 9.92 X10-8 ). Conclusion. This emphasizes a concern regarding the reliability of diagnostic genomic tools and their application in interpreting genetic testing results in diverse patient populations. We discuss alternative methodologies to better represent different ancestry groups in clinical diagnostic testing.
Authors	Koleilat, K., Tang, H., Sharma, N., Yan, H., Tian, S., Smadbeck, J., Shivaram, S., Meyer, R., Pearce, K., Baird, M., Mendoza, C.Z., Xu, X., Greipp, P.T., Peterson, J.F., Ketterling, R.P., Bergsagel, P.L., Vachon, C., Rajkumar, S.V., Kumar, S., Asmann, Y.W., Elhaik, E., and Baughn, L.B
Keywords	Fluorescence in situ hybridization (FISH), plasma cell neoplasms (PCNs), African Americans, chromosome 15 monosomy

The Lake House, AI art (2023)

2022

65.

Ryon, K.E., Tierney, B.T., Frolova, A., Kahles, A., Desnues, C., Ouzounis, C., Gibas, C., Bezdan, D., Deng, Y., He, D., Dias-Neto, E., Elhaik, E., Afshin, E., Grills, g., Iraola, G., Suzuki, H., Werner, J., Udekwu, K., and Mason, C.E. 2022. A history of the MetaSUB consortium: Tracking urban microbes around the globe. iScience.
Altmetric score: %5 of all outputs by iScience of the same age.

More...

Abstract	The MetaSUB Consortium, founded in 2015, is a global consortium with an interdisciplinary team of clinicians, scientists, bioinformaticians, engineers, and designers, with members from more than 100 countries across the globe. This network has continually collected samples from urban and rural sites including subways and transit systems, sewage systems, hospitals, and other environmental sampling. These collections have been ongoing since 2015 and have continued when possible, even throughout the COVID-19 pandemic. The consortium has optimized their workflow for the collection, isolation, and sequencing of DNA and RNA collected from these various sites and processing them for metagenomics analysis, including the identification of SARS-CoV-2 and its variants. Here, the Consortium describes its foundations, and its ongoing work to expand on this network and to focus its scope on the mapping, annotation, and prediction of emerging pathogens, mapping microbial evolution and antibiotic resistance, and the discovery of novel organisms and biosynthetic gene clusters.
Authors	Ryon, K.E., Tierney, B.T., Frolova, A., Kahles, A., Desnues, C., Ouzounis, C., Gibas, C., Bezdan, D., Deng, Y., He, D., Dias-Neto, E., Elhaik, E., Afshin, E., Grills, g., Iraola, G., Suzuki, H., Werner, J., Udekwu, K., and Mason, C.E.
Keywords	The MetaSUB Consortium, Micorbiome, Covid19, SARS-CoV-2.

64.

Elhaik, E.. 2022. Principal Component Analyses (PCA)-based findings in population genetic studies are highly biased and must be reevaluated. Scientific Reports.
Altmetric score: #3 of all outputs by Scientific Reports by the same age.
#8 Most-downloaded paper in Scientific Reports in 2022.

More...

Abstract	Principal Component Analysis (PCA) is a multivariate analysis that reduces the complexity of datasets while preserving data covariance. The outcome can be visualized on colorful scatterplots, ideally with only a minimal loss of information. PCA applications, implemented in well-cited packages like EIGENSOFT and PLINK, are extensively used as the foremost analyses in population genetics and related fields (e.g., animal and plant or medical genetics). PCA outcomes are used to shape study design, identify, and characterize individuals and populations, and draw historical and ethnobiological conclusions on origins, evolution, dispersion, and relatedness. The replicability crisis in science has prompted us to evaluate whether PCA results are reliable, robust, and replicable. We analyzed twelve common test cases using an intuitive color-based model alongside human population data. We demonstrate that PCA results can be artifacts of the data and can be easily manipulated to generate desired outcomes. PCA adjustment also yielded unfavorable outcomes in association studies. PCA results may not be reliable, robust, or replicable as the field assumes. Our findings raise concerns about the validity of results reported in the population genetics literature and related fields that place a disproportionate reliance upon PCA outcomes and the insights derived from them. We conclude that PCA may have a biasing role in genetic investigations and that 32,000-216,000 genetic studies should be reevaluated. An alternative mixed-admixture population genetic model is discussed.
Authors	Elhaik, E.
Keywords	Principal Component Analysis (PCA), ancient DNA (aDNA), Ashkenazic Jews, Genetic ancestry, Elizabeth Warren, Indians, problems, criticism, EIGENSOFT, David Reich

63.

Behnamian, S., Esposito, U., Holland, G., Alshehab, G., Dobre, A.M., Pirooznia, M., Brimacombe, C.S., and Elhaik, E.. 2022. Temporal population structure, a genetic dating method for ancient Eurasian genomes from the past 10,000 years. Cell Reports Methods.
Altmetric score: #2 of all outputs by Cell Reports Methods.

More...

Abstract	Radiocarbon dating is the gold standard in archeology to estimate the age of skeletons, a key to studying their origins. Many published ancient genomes lack reliable and direct dates, which results in obscure and contradictory reports. We developed the temporal population structure (TPS), a DNA-based dating method for genomes ranging from the Late Mesolithic to today, and applied it to 3,591 ancient and 1,307 modern Eurasians. TPS predictions aligned with the known dates and correctly accounted for kin relationships. TPS dating of poorly dated Eurasian samples resolved conflicting reports in the literature, as illustrated by one test case. We also demonstrated how TPS improved the ability to study phenotypic traits over time. TPS can be used when radiocarbon dating is unfeasible or uncertain or to develop alternative hypotheses for samples younger than 10,000 years ago, a limitation that may be resolved over time as ancient data accumulate.
Authors	Behnamian, S., Esposito, U., Holland, G., Alshehab, G., Dobre, A.M., Pirooznia, M., Brimacombe, C.S., and Elhaik, E.
Keywords	genomics dating, ancient DNA (aDNA), temporal population structure, TPS, genomic dating, paleogenomics, phenotypic traits, DNA-based dating method, radiocarbon dating, supervised learning, random forest regression

62.

Zhang, S., Cooper-Knock, J., Weimer, A.K., Shi, M., Moll, T., Marshall, J.N.G., Harve, C., Ghahremani Nezhad, H., Franklin, J., Souza, C.D.S., Ning, K., Wang, C., Li, J., Dilliott, A.A., Farhan, S., Elhaik, E., Pasniceanu, I., Livesey, M.R., Eitan, C., Hornstein, E., Kenna, K.P., Project MinE ALS Sequencing Consortium, Veldink, J.H., Ferraiuolo, L., Shaw, P.J., and Snyder, M.P. 2022. Genome-wide identification of the genetic basis of amyotrophic lateral sclerosis. Neuron.
Altmetric score: top 1% attention score of all outputs ever tracked.

More...

Abstract	Amyotrophic lateral sclerosis (ALS) is a complex disease that leads to motor neuron death. Despite heritability estimates of 52%, genome-wide association studies (GWASs) have discovered relatively few loci. We developed a machine learning approach called RefMap, which integrates functional genomics with GWAS summary statistics for gene discovery. With transcriptomic and epigenetic profiling of motor neurons derived from induced pluripotent stem cells (iPSCs), RefMap identified 690 ALS-associated genes that represent a 5-fold increase in recovered heritability. Extensive conservation, transcriptome, network, and rare variant analyses demonstrated the functional significance of candidate genes in healthy and diseased motor neurons and brain tissues. Genetic convergence between common and rare variation highlighted KANK1 as a new ALS gene. Reproducing KANK1 patient mutations in human neurons led to neurotoxicity and demonstrated that TDP-43 mislocalization, a hallmark pathology of ALS, is downstream of axonal dysfunction. RefMap can be readily applied to other complex diseases.
Authors	Zhang, S., Cooper-Knock, J., Weimer, A.K., Shi, M., Moll, T., Marshall, J.N.G., Harve, C., Ghahremani Nezhad, H., Franklin, J., Souza, C.D.S., Ning, K., Wang, C., Li, J., Dilliott, A.A., Farhan, S., Elhaik, E., Pasniceanu, I., Livesey, M.R., Eitan, C., Hornstein, E., Kenna, K.P., Project MinE ALS Sequencing Consortium, Veldink, J.H., Ferraiuolo, L., Shaw, P.J., and Snyder, M.P.
Keywords	ALS, motor neurons, machine learning, genetics, gene discovery, iPSC, multiomics, epigenetics, TDP-43 mislocalization, axonal dysfunction

Swans Reflecting Elephants, Salvador Dali (1937) Source: Wikimedia

2021

61.

Aguiar-Pulido, V., Wolujewicz, P., Martinez-Fundichely, A., Elhaik, E., Thareja, G., AbdelAleem, A., Chalhoub, N., Cuykendall, T., Al-Zamer, J., Lei, Y., El-Bashir, H., Musser, J.M., Al-Kaabi, A., Shaw, G.M., Khurana, E., Suhre, K., Mason, C.E., Elemento, O., Finnell, R.H., and Ross, M.E. 2021. Systems biology analysis of human genomes points to key pathways conferring spina bifida risk. PNAS.
Altmetric score: top 5% attention score of all outputs of similar age.

More...

Abstract	Spina bifida (SB) is a debilitating birth defect caused by multiple gene and environment interactions. Though SB shows non-Mendelian inheritance, genetic factors contribute to an estimated 70% of cases. Nevertheless, identifying human mutations conferring SB risk is challenging due to its relative rarity, genetic heterogeneity, incomplete penetrance, and environmental influences that hamper genome-wide association studies approaches to untargeted discovery. Thus, SB genetic studies may suffer from population substructure and/or selection bias introduced by typical candidate gene searches. We report a population based, ancestry-matched whole-genome sequence analysis of SB genetic predisposition using a systems biology strategy to interrogate 298 case-control subject genomes (149 pairs). Genes that were enriched in likely gene disrupting (LGD), rare protein-coding variants were subjected to machine learning analysis to identify genes in which LGD variants occur with a different frequency in cases versus controls and so discriminate between these groups. Those genes with high discriminatory potential for SB significantly enriched pathways pertaining to carbon metabolism, inflammation, innate immunity, cytoskeletal regulation, and essential transcriptional regulation consistent with their having impact on the pathogenesis of human SB. Additionally, an interrogation of conserved noncoding sequences identified robust variant enrichment in regulatory regions of several transcription factors critical to embryonic development. This genome-wide perspective offers an effective approach to the interrogation of coding and noncoding sequence variant contributions to rare complex genetic disorders.
Authors	Aguiar-Pulido, V., Wolujewicz, P., Martinez-Fundichely, A., Elhaik, E., Thareja, G., AbdelAleem, A., Chalhoub, N., Cuykendall, T., Al-Zamer, J., Lei, Y., El-Bashir, H., Musser, J.M., Al-Kaabi, A., Shaw, G.M., Khurana, E., Suhre, K., Mason, C.E., Elemento, O., Finnell, R.H., and Ross, M.E.
Keywords	Spina Bifida, complex disorders, pathway, genetic ancestry, GPS, PaM

60.

Moore, K.J.M., Cahill, J., Aidelberg, G., Aronoff, R., Bektaş, A., Bezdan, D., Butler, D.J., Chittur, S.V., Codyre, M., Federici, F., Tanner, N.A., Tighe, S.W., True, R., Ware, S.B., Wyllie, A.L., Afshin, E.E., Bendesky, A., Chang, C.B., Rosa, R.D., Elhaik, E., Erickson, D., Goldsborough, A.S., Grills, G., Hadasch, K., Hayden, A., Her, S-Y., Karl, J.A., Kim, C.H., Kriegel, A.J., Kunstman, T., Landau, Z., Land, K., Langhorst, B.W., Lindner, A.B., Mayer, B.E., McLaughlin, L.A., McLaughlin, M.T., Molloy, J., Mozsary, C., Nadler, J.L., D’Silva, M., Ng, F., O'Connor25, D.H., Ongerth, J.E., Osuolale, O., Pinharanda, A., Plenker, D., Ranjan, R., Rosbash, M., Rotem, A., Segarra, J., Schürer, S., Sherrill-Mix, S., Solo-Gabriele, H., To, S., Vogt, M.C., Yu, A.D., The gLAMP Consortium, and Mason C.E. 2021. Loop-Mediated Isothermal Amplification (LAMP) Detection of SARS-CoV-2 and Myriad Other Applications. Journal of Biomolecular Techniques (JBT).

More...

Abstract	As the 2nd year of the COVID-19 pandemic begins, it remains clear that a massive increase in the ability to test for SARS-CoV-2 infections in a myriad of settings is critical to control the pandemic and to prepare for future outbreaks. The current gold standard for molecular diagnostics is the polymerase chain reaction (PCR), but the extraordinary and unmet demand for testing in a variety of environments means that both complementary and supplementary testing solutions are still needed. This review highlights the role that loop-mediated isothermal amplification (LAMP) has had in filling this global testing need, providing a faster and easier means of testing, and what it can do for future applications, pathogens, and to prepare for future outbreaks. The review lays out the current state of the art for research of LAMP-based SARS-CoV-2 testing, as well as its implications for other pathogens and testing. The authors represent the global LAMP (gLAMP) Consortium - an international research collective that has regularly met to share their experiences on LAMP deployment and best practices; sections are devoted to all aspects of LAMP testing, including preanalytical sample processing, target amplification and amplicon detection, then the hardware and software required for deployment, and finally a summary of the current regulatory landscape. Included as well are a series of first-person accounts of LAMP method development and deployment. The final conclusions and recommendations section provides the reader with a distillation of the most validated testing methods and their paths to implementation. The review aims to provide practical information and insight for a range of audiences: for a research audience to help accelerate research through sharing of best practices, for an implementation audience to help get testing up and running quickly, and for public health, clinical, and policy audience to help convey the breadth of impact that LAMP methods have to offer.
Authors	Moore, K.J.M., Cahill, J., Aidelberg, G., Aronoff, R., Bektaş, A., Bezdan, D., Butler, D.J., Chittur, S.V., Codyre, M., Federici, F., Tanner, N.A., Tighe, S.W., True, R., Ware, S.B., Wyllie, A.L., Afshin, E.E., Bendesky, A., Chang, C.B., Rosa, R.D., Elhaik, E., Erickson, D., Goldsborough, A.S., Grills, G., Hadasch, K., Hayden, A., Her, S-Y., Karl, J.A., Kim, C.H., Kriegel, A.J., Kunstman, T., Landau, Z., Land, K., Langhorst, B.W., Lindner, A.B., Mayer, B.E., McLaughlin, L.A., McLaughlin, M.T., Molloy, J., Mozsary, C., Nadler, J.L., D’Silva, M., Ng, F., O'Connor25, D.H., Ongerth, J.E., Osuolale, O., Pinharanda, A., Plenker, D., Ranjan, R., Rosbash, M., Rotem, A., Segarra, J., Schürer, S., Sherrill-Mix, S., Solo-Gabriele, H., To, S., Vogt, M.C., Yu, A.D., The gLAMP Consortium, and Mason C.E.
Keywords	Covid19, corona, LAMP, gLAMP, molecular diagnostics

59.

Danko, D.C., Bezdan, D., Afshinnekoo, E., Ahsanuddin, S., Alicea, J., Bhattacharya, C., Bhattacharyya, M., Blekhman, R., Butler, D.J., Castro-Nallar, E., Canas, A.M., Chatziefthimiou, A.D., Chng, K.R., Coil, D.A., Court, D.S., Crawford, R.W., Desnues, D., Dias-Neto, E., Daisy, D., Dybwad, M., Eisen, J.E., Elhaik, E., Ercolini, D., Filippis, F.D., Frolova, A., Graf, A.B., Green, D.C., Lee, P.K.H., Hecht, J., Hernandez, M., Jang, S., Kahles, A., Karasikov, M., Knights, K., Kyrpides, N.C., Ljungdahl, P., Lyons, A., Mason-Buck, G., McGrath, K., Mongodin, E.F., Mustafa, H., Mutai, B., Nagarajan, N., Neches, R.Y., Ng, A., Nieto-Caballero, M., Nikolayeva, O., Nikolayeva, T., Noushmehr, H., Oliveira, M., Ossowski, S., Osuolale, O.O., Paez-Espino, D., Png, E., Rascovan, N., Richard, H., Ratsch, G., Sanchez, J.L., Schriml, L.M., Shaaban, H., Shi, L., Sierra, M.A., Song, L.H., Suzuki, H., Thomas, D., Udekwu, K.I., Ugalde, J.A., Valentine, B., Vassilev, D.I., Vayndorf, E., Leung, M.H.Y., Young, B., Zambrano, M.M., Zhu, J., Zhu, S., Labaj, P.P., and Mason, C.E 2021. A global metagenomic map of urban microbiomes and antimicrobial resistance. Cell.
Altmetric score: #1 of all Cell outputs of similar age.

Science highlight: Cities have their own distinct microbial fingerprints.
Nature Reviews Microbiology highlight: Surveying what's flushed away
Cell highlight: Best of 2021 papers

More...

Abstract	We present a global atlas of 4,728 metagenomic samples from mass-transit systems in 60 cities over 3 years, representing the first systematic, worldwide catalog of the urban microbial ecosystem. This atlas provides an annotated, geospatial profile of microbial strains, functional characteristics, antimicrobial resistance (AMR) markers, and genetic elements, including 10,928 viruses, 1,302 bacteria, 2 archaea, and 838,532 CRISPR arrays not found in reference databases. We identified 4,246 known species of urban microorganisms and a consistent set of 31 species found in 97% of samples that were distinct from human commensal organisms. Profiles of AMR genes varied widely in type and density across cities. Cities showed distinct microbial taxonomic signatures that were driven by climate and geographic differences. These results constitute a high-resolution global metagenomic atlas that enables discovery of organisms and genes, highlights potential public health and forensic applications, and provides a culture-independent view of AMR burden in cities.
Authors	Danko, D.C., Bezdan, D., Afshinnekoo, E., Ahsanuddin, S., Alicea, J., Bhattacharya, C., Bhattacharyya, M., Blekhman, R., Butler, D.J., Castro-Nallar, E., Canas, A.M., Chatziefthimiou, A.D., Chng, K.R., Coil, D.A., Court, D.S., Crawford, R.W., Desnues, D., Dias-Neto, E., Daisy, D., Dybwad, M., Eisen, J.E., Elhaik, E., Ercolini, D., Filippis, F.D., Frolova, A., Graf, A.B., Green, D.C., Lee, P.K.H., Hecht, J., Hernandez, M., Jang, S., Kahles, A., Karasikov, M., Knights, K., Kyrpides, N.C., Ljungdahl, P., Lyons, A., Mason-Buck, G., McGrath, K., Mongodin, E.F., Mustafa, H., Mutai, B., Nagarajan, N., Neches, R.Y., Ng, A., Nieto-Caballero, M., Nikolayeva, O., Nikolayeva, T., Noushmehr, H., Oliveira, M., Ossowski, S., Osuolale, O.O., Paez-Espino, D., Png, E., Rascovan, N., Richard, H., Ratsch, G., Sanchez, J.L., Schriml, L.M., Shaaban, H., Shi, L., Sierra, M.A., Song, L.H., Suzuki, H., Thomas, D., Udekwu, K.I., Ugalde, J.A., Valentine, B., Vassilev, D.I., Vayndorf, E., Leung, M.H.Y., Young, B., Zambrano, M.M., Zhu, J., Zhu, S., Labaj, P.P., and Mason, C.E
Keywords	MetaSub, Microbiome, forensics,biodiversity, build environment, New York, London, Hong Kong

58.

Elhaik, E., Ahsanuddin, S., Robinson, J.M., Foster, E.M., and Mason, C.E. 2021. The impact of cross-kingdom molecular forensics on genetic privacy. (2021). Microbiome.
YouTube Video
Altmetric scores: #1 of all Microbiome outputs of similar age.

More...

Abstract	Recent advances in metagenomic technology and computational prediction may inadvertently weaken an individual’s reasonable expectation of privacy. Through cross-kingdom genetic and metagenomic forensics, we can already predict at least a dozen human phenotypes with varying degrees of accuracy. There is also growing potential to detect a “molecular echo” of an individual’s microbiome from cells deposited on public surfaces. At present, host genetic data from somatic or germ cells provide more reliable information than microbiome samples. However, the emerging ability to infer personal details from different microscopic biological materials left behind on surfaces requires in-depth ethical and legal scrutiny. There is potential to identify and track individuals, along with new, surreptitious means of genetic discrimination. This commentary underscores the need to update legal and policy frameworks for genetic privacy with additional considerations for the information that could be acquired from microbiome-derived data. The article also aims to stimulate ubiquitous discourse to ensure the protection of genetic rights and liberties in the post-genomic era.
Authors	Elhaik, E., Ahsanuddin, S., Robinson, J.M., Foster, E.M., and Mason, C.E.
Keywords	Microbiome, GPS, Biogeography, forensics, phenotypes, metasub, privacy, fourth amendment

57.

Carress, H., Lawson, D. J., and Elhaik, E. 2021. Population genetic considerations for using biobanks as international resources in the pandemic era and beyond. (2021). BMC Genomics.

More...

Abstract	The past years have seen the rise of genomic biobanks and mega-scale meta-analysis of genomic data, which promises to reveal the genetic underpinnings of health and disease. However, the over-representation of Europeans in genomic studies not only limits the global understanding of disease risk but also inhibits viable research into the genomic differences between carriers and patients. Whilst the community has agreed that more diverse samples are required, it is not enough to blindly increase diversity; the diversity must be quantified, compared and annotated to lead to insight. Genetic annotations from separate biobanks need to be comparable and computable and to operate without access to raw data due to privacy concerns. Comparability is key both for regular research and to allow international comparison in response to pandemics. Here, we evaluate the appropriateness of the most common genomic tools used to depict population structure in a standardized and comparable manner. The end goal is to reduce the effects of confounding and learn from genuine variation in genetic effects on phenotypes across populations, which will improve the value of biobanks (locally and internationally), increase the accuracy of association analyses and inform developmental efforts.
Authors	Carress, H., Lawson, D.J., and Elhaik, E.
Keywords	Bioinfromatics, Principal Component Analyses (PCA), Genomics medicine, Biobanks, Population stratification bias

Elhaik, E. 2021. Why most Principal Component Analyses (PCA) in population genetic studies are wrong. (2021). BioRxiv.
Altmetric score: top 3% of all research outputs ever tracked by Altmetric.

More...

Abstract	Principal Component Analysis (PCA) is a multivariate analysis that allows reduction of the complexity of datasets while preserving data's covariance and visualizing the information on colorful scatterplots, ideally with only a minimal loss of information. PCA applications are extensively used as the foremost analyses in population genetics and related fields (e.g., animal and plant or medical genetics), implemented in well-cited packages like EIGENSOFT and PLINK. PCA outcomes are used to shape study design, identify and characterize individuals and populations, and draw historical and ethnobiological conclusions on origins, evolution, whereabouts, and relatedness. The replicability crisis in science has prompted us to evaluate whether PCA results are reliable, robust, and replicable. We employed an intuitive color-based model alongside human population data for eleven common test cases. We demonstrate that PCA results are artifacts of the data and that they can be easily manipulated to generate desired outcomes. PCA results may not be reliable, robust, or replicable as the field assumes. Our findings raise concerns on the validity of results reported in the literature of population genetics and related fields that place a disproportionate reliance upon PCA outcomes and the insights derived from them. We conclude that PCA may have a biasing role in genetic investigations. An alternative mixed-admixture population genetic model is discussed.
Authors	Elhaik, E.
Keywords	Principal Component Analyses (PCA), population genetics, ancient DNA, origins, biogeography, admixture

56.

Elhaik, E. and Graur, D. 2021. On the Unfounded Enthusiasm for Soft Selective Sweeps III: The Supervised Machine Learning Algorithm That Isn't. (2021). Genes.
Altmetric score: top #1 of all outputs of similar age from Genes.

More...

Abstract	In the last 15 years or so, soft selective sweep mechanisms have been catapulted from a curiosity of little evolutionary importance to a ubiquitous mechanism claimed to explain most adaptive evolution and, in some cases, most evolution. This transformation was aided by a series of articles by Daniel Schrider and Andrew Kern. Within this series, a paper entitled “Soft sweeps are the dominant mode of adaptation in the human genome” (Schrider and Kern, Mol. Biol. Evolut. 2017, 34(8), 1863–1877) attracted a great deal of attention, in particular in conjunction with another paper (Kern and Hahn, Mol. Biol. Evolut. 2018, 35(6), 1366–1371), for purporting to discredit the Neutral Theory of Molecular Evolution (Kimura 1968). Here, we address an alleged novelty in Schrider and Kern’s paper, i.e., the claim that their study involved an artificial intelligence technique called supervised machine learning (SML). SML is predicated upon the existence of a training dataset in which the correspondence between the input and output is known empirically to be true. Curiously, Schrider and Kern did not possess a training dataset of genomic segments known a priori to have evolved either neutrally or through soft or hard selective sweeps. Thus, their claim of using SML is thoroughly and utterly misleading. In the absence of legitimate training datasets, Schrider and Kern used: (1) simulations that employ many manipulatable variables and (2) a system of data cherry-picking rivaling the worst excesses in the literature. These two factors, in addition to the lack of negative controls and the irreproducibility of their results due to incomplete methodological detail, lead us to conclude that all evolutionary inferences derived from so-called SML algorithms (e.g., S/HIC) should be taken with a huge shovel of salt
Authors	Elhaik, E. and Graur, D.
Keywords	artificial intelligence (AI); supervised machine learning (SML); evolutionary biology; molecular and genome evolution; selective sweeps; population size

55.

Afshinnekoo, E., Bhattacharya, C., Burguete-García, A., Castro-Nallar, E., Deng, Y., Desnues, C., Dias-Neto, E., Elhaik, E., Iraola, G., Jang, S., Labaj, P.P., Mason, C.E., Nagarajan, N., Poulsen, M., Prithiviraj, B., Siam, R., Shi, T., Suzuki, H., Werner, J., Zambrano, M.M., and Bhattacharyya, M. 2021. COVID-19 drug practices risk antimicrobial resistance evolution. (2021). The Lancet Microbe.

More...

Abstract	Antimicrobial resistance is one of the biggest challenges facing modern medicine. Because the management of COVID-19 is increasingly becoming dependent on pharmacological interventions, there is greater risk for accelerating the evolution and spread of antimicrobial resistance. A study in a tertiary hospital environment revealed concerning colonisation patterns of microbes during extended periods.1 It also highlighted the diversity of antimicrobial resistance gene reservoirs in hospitals that could facilitate the emergence and transmission of new modes of antibiotic resistance.
Authors	Afshinnekoo, E., Bhattacharya, C., Burguete-García, A., Castro-Nallar, E., Deng, Y., Desnues, C., Dias-Neto, E., Elhaik, E., Iraola, G., Jang, S., Labaj, P.P., Mason, C.E., Nagarajan, N., Poulsen, M., Prithiviraj, B., Siam, R., Shi, T., Suzuki, H., Werner, J., Zambrano, M.M., and Bhattacharyya, M.
Keywords	Covid-19, HIV, Drugs

54.

Johannsen, B.E., Baughn, L.B., Sharma, N., Zjacic, N., Pirooznia, M., and Elhaik, E. 2021. The Genetics of Sudden Infant Death Syndrome—Towards a Gene Reference Resource. (2021). Genes.

More...

Abstract	Sudden infant death syndrome (SIDS) is the unexpected death of an infant under one year of age that remains unexplained after a thorough investigation. Despite SIDS remaining a diagnosis of exclusion with an unexplained etiology, it is widely accepted that SIDS can be caused by environmental and/or biological factors, with multiple underlying candidate genes. However, the lack of biomarkers raises questions as to why genetic studies on SIDS to date are unable to provide a clearer understanding of the disease etiology. We sought to improve the identification of SIDS-associated genes by reviewing the SIDS genetic literature and objectively categorizing and scoring the reported genes based on the strength of evidence (from C1 (high) to C5 (low)). This was followed by analyses of function, associations between genes, the enrichment of gene ontology (GO) terms, and pathways and gender difference in tissue gene expression. We constructed a curated database for SIDS gene candidates consisting of 109 genes, 14 of which received a category 4 (C4) and 95 genes received the lowest category of C5. That none of the genes was classified into the higher categories indicates the low level of supporting evidence. We found that genes of both scoring categories show distinct networks and are highly diverse in function and involved in many GO terms and pathways, in agreement with the perception of SIDS as a heterogeneous syndrome. Genes of both scoring categories are part of the cardiac system, muscle, and ion channels, whereas immune-related functions showed enrichment for C4 genes. A limited association was found with neural development. Overall, inconsistent reports and missing metadata contribute to the ambiguity of genetic studies. Considering those parameters could help improve the identification of at-risk SIDS genes. However, the field is still far from offering a full-pledged genetic test to identify at-risk infants and is still hampered with methodological challenges and misunderstandings of the vulnerabilities of vital biological mechanisms.
Authors	Johannsen, B.E., Baughn, L.B., Sharma, N., Zjacic, N., Pirooznia, M., and Elhaik, E.
Keywords	Sudden Infant Death Syndrome (SIDS); cot death; annotation; pathway enrichment; network analysis

53.

Robinson, J., Pasternak, Z., Mason, C.E., and Elhaik, E. 2021. Forensic applications of microbiomics: a review. (2021). frontiers in Microbiology.

More...

Abstract	The rise of microbiomics and metagenomics has been driven by advances in genomic sequencing technology, improved microbial sampling methods, and fast-evolving approaches in bioinformatics. Humans are a host to diverse microbial communities in and on their bodies, which continuously interact with and alter the surrounding environments. Since information relating to these interactions can be extracted by analysing human and environmental microbial profiles, they have potential to be relevant to forensics. In this review, we analysed over 100 papers describing forensic microbiome applications with emphasis on geolocation, personal identification, trace evidence, manner and cause of death, and inference of the postmortem interval (PMI). We found that although the field is in its infancy, utilising microbiome and metagenome signatures has potential to enhance the forensic toolkit. However, many of the studies suffer from limited sample sizes and model accuracies, and unrealistic environmental settings, leaving the full potential of microbiomics to forensics unexplored. It is unlikely that the information that can currently be elucidated from microbiomics can be used by law enforcement. Nonetheless, the research to overcome these challenges is ongoing, and it is foreseeable that microbiome-based evidence could contribute to forensic investigations in the future.
Authors	Robinson, J., Pasternak, Z., Mason, C.E., and Elhaik, E.
Keywords	microbiome, Forensic microbiology, Forensic Science, microbial forensics, Metagenomics

The Merneptah Stele. From Wikipedia

2020

52.

Chng, K.R., Li, C., Bertrand, D., Ng, A.H.Q., Kwah, J.S., Low, H.M., Tong, C., Natrajan, M., Zhang, M.H., Xu, L., Ko, K.K.K., Ho, E.X.P., Av-Shalom, T.V., Teo, J.W.P., Khor, C.C., MetaSUB Consortium, Chen, S.L., Mason, C.E., Ng, O.T., Marimuthu, K., Ang, B., and Nagarajan N. 2020. Cartography of opportunistic pathogens and antibiotic resistance genes in a tertiary hospital environment. (2020). Nature Medicine.

More...

Abstract	Although disinfection is key to infection control, the colonization patterns and resistomes of hospital-environment microbes remain underexplored. We report the first extensive genomic characterization of microbiomes, pathogens and antibiotic resistance cassettes in a tertiary-care hospital, from repeated sampling (up to 1.5 years apart) of 179 sites associated with 45 beds. Deep shotgun metagenomics unveiled distinct ecological niches of microbes and antibiotic resistance genes characterized by biofilm-forming and human-microbiome-influenced environments with corresponding patterns of spatiotemporal divergence. Quasi-metagenomics with nanopore sequencing provided thousands of high-contiguity genomes, phage and plasmid sequences (>60% novel), enabling characterization of resistome and mobilome diversity and dynamic architectures in hospital environments. Phylogenetics identified multidrug-resistant strains as being widely distributed and stably colonizing across sites. Comparisons with clinical isolates indicated that such microbes can persist in hospitals for extended periods (>8 years), to opportunistically infect patients. These findings highlight the importance of characterizing antibiotic resistance reservoirs in hospitals and establish the feasibility of systematic surveys to target resources for preventing infections.
Authors	Chng, K.R., Li, C., Bertrand, D., Ng, A.H.Q., Kwah, J.S., Low, H.M., Tong, C., Natrajan, M., Zhang, M.H., Xu, L., Ko, K.K.K., Ho, E.X.P., Av-Shalom, T.V., Teo, J.W.P., Khor, C.C., MetaSUB Consortium, Chen, S.L., Mason, C.E., Ng, O.T., Marimuthu, K., Ang, B., and Nagarajan N.
Keywords	Microbiome,hospital, resistome, mobilome, antibiotic resistance

51.

Cooper-Knock, K., Zhang, S., Kenna, KP, Moll, T., Franklin, JP., Allen, S., Nezhad, HG., Iacoangeli, A., Yacovzada, NY., Eitan, C., Hornstein, E., Elhaik, E., [25 co-authors]... and Shaw, PJ. 2020. Rare variant burden analysis within enhancers identifies CAV1 as an ALS risk gene. (2020). Cell Reports.

More...

Abstract	Amyotrophic lateral sclerosis (ALS) is an incurable neurodegenerative disease. CAV1 and CAV2 organize membrane lipid rafts (MLRs) important for cell signaling and neuronal survival, and overexpression of CAV1 ameliorates ALS phenotypes in vivo. Genome-wide association studies localize a large proportion of ALS risk variants within the non-coding genome, but further characterization has been limited by lack of appropriate tools. By designing and applying a pipeline to identify pathogenic genetic variation within enhancer elements responsible for regulating gene expression, we identify disease-associated variation within CAV1/CAV2 enhancers, which replicate in an independent cohort. Discovered enhancer mutations reduce CAV1/CAV2 expression and disrupt MLRs in patient-derived cells, and CRISPR-Cas9 perturbation proximate to a patient mutation is sufficient to reduce CAV1/CAV2 expression in neurons. Additional enrichment of ALS-associated mutations within CAV1 exons positions CAV1 as an ALS risk gene. We propose CAV1/CAV2 overexpression as a personalized medicine target for ALS.
Authors	Cooper-Knock, K., Zhang, S., Kenna, KP, Moll, T., Franklin, JP., Allen, S., Nezhad, HG., Iacoangeli, A., Yacovzada, NY., Eitan, C., Hornstein, E., Elhaik, E., [25 co-authors]... and Shaw, PJ.
Keywords	Paleogenomics, Y chromosome, Y haplogroups, Ancient DNA, aYChr-DB

50.

Freeman, L, Brimacombe, C.S., and Elhaik, E.. 2020. aYChr-DB: a database of ancient human Y haplogroups. (2020). NAR Genomics and Bioinformatics.

More...

Abstract	Ancient Y-Chromosomal DNA is an invaluable tool for dating and discerning the origins of migration routes and demographic processes that occurred thousands of years ago. Driven by the adoption of high-throughput sequencing and capture enrichment methods in paleogenomics, the number of published ancient genomes has nearly quadrupled within the last three years (2018–2020). Whereas ancient mtDNA haplogroup repositories are available, no similar resource exists for ancient Y-Chromosomal haplogroups. Here, we present aYChr-DB—a comprehensive collection of 1797 ancient Eurasian human Y-Chromosome haplogroups ranging from 44 930 BC to 1945 AD. We include descriptors of age, location, genomic coverage and associated archaeological cultures. We also produced a visualization of ancient Y haplogroup distribution over time. The aYChr-DB database is a valuable resource for population genomic and paleogenomic studies.
Authors	Freeman, L., Brimacombe, C.S., and Elhaik, E..
Keywords	Paleogenomics, Y chromosome, Y haplogroups, Ancient DNA, aYChr-DB

49.

Baughn, L.B., Sharma, N., Elhaik, E., Sekulic, A., Bryce, A.H., and Fonseca, R. 2020. Targeting TMPRSS2 in SARS-CoV-2 infection. (2020). Mayo Clinic Proceedings.

In the Limelight: September 2020 - A highlight by Karl A. Nath

More...

Abstract	SARS-coronavirus 2 (SARS-CoV-2) has rapidly caused a global pandemic associated with a novel respiratory infection now termed coronavirus disease-19 (COVID-19). ACE2 is necessary to facilitate SARS-CoV-2 infection, but due to its essential metabolic roles, it may be difficult to target it in therapies. TMPRSS2, which interacts with ACE2, may be a better candidate for targeted therapies. Using publicly-available expression data, we show that both ACE2 and TMPRSS2 are expressed in many host tissues, including lung. The highest expression of ACE2 is found in the testes, whereas the prostate display the highest expression of TMPRSS2. Given the increased severity of disease among older males with SARS-CoV-2 infection, we address the potential roles of ACE2 and TMPRSS2 in their contribution to the sex differences in disease severity. We show that expression levels of ACE2 and TMPRSS2 are overall comparable between males and females in multiple tissues suggesting that differences in the expression levels of TMPRSS2 and ACE2 in the lung and other non-sex-specific tissues may not explain the gender disparities in SARS CoV-2 severity. However, given their instrumental roles for SARS-CoV-2 infection and their pleiotropic expression, targeting the activity and expression levels of TMPRSS2 is a rational approach to treat COVID-19.
Authors	Baughn, L.B., Sharma, N., Elhaik, E., Sekulic, A., Bryce, A.H., and Fonseca, R.
Keywords	ACE2, Angiotensin I converting enzyme 2, COVID-19, Coronavirus disease-19, GTEx, Genotype-Tissue Expression, SARS-CoV-2, SARS-coronavirus 2, TMPRSS2, Transmembrane protease serine 2

48.

Baughn, L., Li, Z., Pearce, K., Vachon, C., Polley, M.Y., Keats, J., Elhaik, E., Baird, M., Therneau, T., Cerhan, J., Bergsagel, P., Dispenzieri, A., Rajkumar, S>, Asmann, Y., and Kumar, S. 2020. The CCND1 870G risk allele is enriched in individuals of African ancestry with plasma cell dyscrasias. (2020). Blood Cancer Journal.

More...

Abstract	Purpose: Multiple myeloma (MM) is a plasma cell (PC) malignancy with an increasing incidence in the US. Epidemiological studies demonstrate a 2-3 fold higher incidence of the pre-malignant monoclonal gammopathy of undetermined significance (MGUS) and MM with a ~4-year younger age of onset among African Americans (AA) compared to European Americans (EAs) (Fonseca, Leukemia, 2017). With equal access to care, AAs have better overall survival compared to EAs (Waxman, Blood, 2010). This disparity may be explained by ancestral-associated genetic predisposition of AAs to development of monoclonal gammopathies and to specific acquired, cytogenetically-defined subtypes. Using calculated genetic ancestry data, we have previously identified a higher prevalence of IgH translocations t(11;14), t(14;16) and t(14;20) in individuals with >80% African ancestry (Baughn, BCJ, 2018). Since SNP rs9344 encoding the CCND1 870G>A polymorphism has been reported in association with increased risk of t(11;14) (Weinhold, Nature Genetics, 2013), we investigated whether rs9344 correlates with African ancestry and with t(11;14) in our cohort of patients with plasma cell dyscrasias. Methods: We studied 898 patients with monoclonal gammopathies who had undergone uniform testing to identify MM-specific cytogenetic abnormalities. DNA from bone marrow samples was genotyped on the Precision Medicine Research Array and biogeographical ancestry was assessed using the Geographic Population Structure Origins tool (Elhaik, Nat Commun, 2014). Plasma cell proliferative disorder FISH of immunoglobulin (cIg)-stained positive plasma cells was performed as described (Baughn, BCJ, 2018). Individuals were divided into three ancestral groups: 1. EAs (<0.1% African ancestry and <30% Asian ancestry); 2. AAs (> 80% African ancestry); and 3. Other. Chi-squared test was used to determine the overall comparison between the 3 ancestral groups and also between ancestral groups 1 vs 2, and 2 vs 3 using pairwise comparison. All tests were two-sided with alpha level set at 0.05 for overall statistical significance. Pairwise comparison was considered statistically significant when p<0.025 based on Bonferroni method for multiple comparisons. Results: We identify increased risk of development of either a t(11;14), t(14;16) or t(14;20) in AAs (48.8%) compared to EAs (33.6%) (p-value=0.0051). To explore the genetic basis of increased t(11;14) specifically (37.4% AAs vs. 27.3% p -value=0.049), we evaluated the frequency of the G risk allele of rs9344 in relation to African ancestry. The frequency of the G risk allele was higher in AAs (0.81) compared to EAs (0.59) (p-value <0.0001) and also higher in t(11;14) cases (0.73) compared to non-t(11;14) controls (0.58) (p-value <0.0001). A multivariate model identified only rs9344 as significantly associated with t(11;14) after adjusting for age, gender and race group suggesting that it plays a role in the development of t(11;14) (p-value <0.001 for GG, p -value=0.005 for AG). To test if these results are replicable, we studied the MMRF CoMMpass cohort. This cohort includes individuals with newly diagnosed MM along with self-report race information and translocation data from long insert whole genome sequencing. Although t(11;14) was not enriched in self-reported Black individuals from this cohort (17.5% Black vs. 20.4% White, p-value=0.47), there was association between the rs9344 and self-reported Black race (p-value <0.0001) and with the presence of t(11;14) when the t(11;14) was analyzed by long insert whole genome sequencing (p-value=0.0001). Conclusions: To our knowledge, this study includes the largest group of African individuals with an abnormal plasma cell clone along with uniformly collected FISH, genotyping and ancestry data. We have identified in this diverse population the association of the CCND1 870G>A polymorphism (rs9344) with African ancestry and with t(11;14) suggesting that it plays a role in the development of t(11;14) plasma cell dyscrasias.
Authors	Baughn, L., Li, Z., Pearce, K., Vachon, C., Polley, M.Y., Keats, J., Elhaik, E., Baird, M., Therneau, T., Cerhan, J., Bergsagel, P., Dispenzieri, A., Rajkumar, S>, Asmann, Y., and Kumar, S.
Keywords	GPS Origins, GPS, Cancer, ancestry, Multiple myeloma, Africans

Mason-Buck, G., Graf, A., Elhaik, E., Robinson, J., Pospiech, E., Oliveira, M., Moser, J., Lee, P.K.H., Githae, D., Ballard, D., Bromberg, Y., Casimiro-Soriguer, C.S., Dhungel, E., Ahn, T., Kawulok, J., Loucera, C., Ryan, F., Walker, A.R., Zhu, C., Mason, C.E., Amorim, A., Syndercombe Court, D., Branicki, W, and Labaj, P. 2020. DNA Based Methods in Intelligence - Moving Towards Metagenomics. (2020). Preprints.org.

More...

Abstract	Advancements in DNA methods and biotechnology have enabled forensic scientists to explore the DNA evidence found as part of a criminal investigation on a much more comprehensive and predictive level. This has led to a rise in research into DNA intelligence tools such as phenotypic prediction (i.e., eye and hair colour) and inference of biogeographical ancestry. Both of which can be applied to gain further insights about a scene or sample in question. Although microorganisms have played a role in forensics for decades, investigations were focused on the pathogenicity aspect, mainly to determine the cause and time of death. Recent progress in studying the human microbiome has implicated the potential use of this data in forensics. Since each individual, place, or item has its own microbial pattern, a new suite of tools are now available to be exploited in criminal investigations. Although there is much interest and potential for these emerging metagenomic and microbial forensic tools, best practices and reference ranges need to be established before they are implemented. Here, we discuss existing DNA intelligence tools applied to forensic science, the application of microbial forensics and metagenomics along with the challenges and concerns that future developments entail.
Authors	Labaj, P.; Mason-Buck, G.; Graf, A.; Elhaik, E.; Robinson, J.; Pospiech, E.; Oliveira, M.; Moser, J.; Lee, P.K.H.; Githae, D.; Ballard, D.; Bromberg, Y.; Casimiro-Soriguer, C.S.; Dhungel, E.; Ahn, T.; Kawulok, J.; Loucera, C.; Ryan, F.; Walker, A.R.; Zhu, C.; Mason, C.E.; Amorim, A.; Syndercombe Court, D.; and Branicki, W
Keywords	Forensic; Metagenomics

Elhaik, E. 2020. Diverse genetic origins of medieval steppe nomad conquerors – a response to Mikheyev et al. (2019). bioRxiv.
Altmetric score: top 5% of all research outputs scored by Altmetric.

More...

Abstract	Recently, Mikheyev et al. (2019) have produced a preprint study describing the genomes of nine Khazars archeologically dated from the 7th to the 9th centuries found in the Rostov county in modern-day Russia. Skull morphology indicated a mix of "Caucasoid" and "Mongoloid" shapes. The authors compared the samples to ancient and contemporary samples to study the genetic makeup of the Khazars and their genetic legacy and addressed the question of the relationships between the Khazar and Ashkenazic Jews. A careful examination reveals grave concerns regarding all the aspects of the study from the identification of the "Khazar" samples, the choice of environment for ancient DNA sequencing, and the analyses. The authors did not disclose the data used in their study, and their methodology is incoherent. We demonstrate that their analyses yield nonsensical results and argue that none of the claims made in this study are supported by the data unequivocally. Provided the destruction of the bone samples and the irreproducibility of the analyses, even by the forgivable standards of the field, this study is irreplicable, wasteful, and misleading. Overall, this work should be considered a case study of how not to do paleogenomics research.
Authors	Elhaik, E.
Keywords	Ancient DNA, Paleogenomics, Khazars, Jews, Ashkenazic Jews

Nimrod, Danziger, I. 1939. From Wikipedia

2019

47.

GE4GAC group, Thais F. Bartelli, Lais L. Senda de Abrantes, Helano C. Freitas, Andrew M. Thomas, Jordana M. Silva, Gabriela E. Albuquerque, Luiza F. Araújo, Gabriela P. Branco, Maria G. de Amorim, Marianna S. Serpa, Isabella K. T. M. Takenaka, Deborah T. Souza, Lucas O. Monção, Bruno S. Moda, Renan Valieris, Alexandre Defelicibus, Rodrigo Borges, Rodrigo D. Drummond, Francisco I. A. Alves, Monize N. P. Santos, Irina G. Bobrovnitchaia, Eran Elhaik, Luiz G. V. Coelho, André Khayat, Samia Demachki, Paulo P. Assumpção, Karina M. Santiago, Giovana T. Torrezan, Dirce M. Carraro, Stela V. Peres, Vinícius F. Calsavara, Rommel Burbano, Calebe R. Nóbrega, Graziela P. P. Baladão, Ana C. C. Pereira, Camila M. Gatti, Marcela A. Fagundes, Marília S. Araújo, Tayana V. Miranda, Monica S. Barbosa, Daniela M. M. Cardoso, Lilian C. Carneiro, Alexandre M. Brito, Amanda F. P. L. Ramos, Lucas L. L. Silva, Jaqueline C. Pontes, Tatiane Tiengo, Paola E. Arantes, Vilma Santana, Milena Cordeiro, Rosane O. Sant’Ana, Hanna B. Andrade, Ana K. M. Anaissi, Sara V. Sampaio, Emne A. Abdallah, Ludmilla T. D. Chinen, Alexcia C. Braun, Bianca C. T. Flores, Celso A. L. Mello, Laura C. L. Claro, Claudia Z. Sztokfisz, Carlos C. Altamirano, David R. F. Carter, Victor H. F. Jesus, Rachel Riechelmann, Tiago Medina, Kenneth J. Gollob, Vilma R. Martins, João C. Setúbal, Adriane G. Pelosof, Felipe J. Coimbra, Wilson L. Costa-Jr, Israel T. Silva, Diana N. Nunes, Maria P. Curado, and Emmanuel Dias-Neto. 2019. Genomics and epidemiology for gastric adenocarcinomas (GE4GAC): a Brazilian initiative to study gastric cancer. Applied Cancer Research.

More...

Abstract	Gastric cancer (GC) is the fifth most common type of cancer worldwide with high incidences in Asia, Central, and South American countries. This patchy distribution means that GC studies are neglected by large research centers from developed countries. The need for further understanding of this complex disease, including the local importance of epidemiological factors and the rich ancestral admixture found in Brazil, stimulated the implementation of the GE4GAC project. GE4GAC aims to embrace epidemiological, clinical, molecular and microbiological data from Brazilian controls and patients with malignant and pre-malignant gastric disease. In this letter, we summarize the main goals of the project, including subject and sample accrual and current findings.
Authors	GE4GAC group, Thais F. Bartelli, Lais L. Senda de Abrantes, Helano C. Freitas, Andrew M. Thomas, Jordana M. Silva, Gabriela E. Albuquerque, Luiza F. Araújo, Gabriela P. Branco, Maria G. de Amorim, Marianna S. Serpa, Isabella K. T. M. Takenaka, Deborah T. Souza, Lucas O. Monção, Bruno S. Moda, Renan Valieris, Alexandre Defelicibus, Rodrigo Borges, Rodrigo D. Drummond, Francisco I. A. Alves, Monize N. P. Santos, Irina G. Bobrovnitchaia, Eran Elhaik, Luiz G. V. Coelho, André Khayat, Samia Demachki, Paulo P. Assumpção, Karina M. Santiago, Giovana T. Torrezan, Dirce M. Carraro, Stela V. Peres, Vinícius F. Calsavara, Rommel Burbano, Calebe R. Nóbrega, Graziela P. P. Baladão, Ana C. C. Pereira, Camila M. Gatti, Marcela A. Fagundes, Marília S. Araújo, Tayana V. Miranda, Monica S. Barbosa, Daniela M. M. Cardoso, Lilian C. Carneiro, Alexandre M. Brito, Amanda F. P. L. Ramos, Lucas L. L. Silva, Jaqueline C. Pontes, Tatiane Tiengo, Paola E. Arantes, Vilma Santana, Milena Cordeiro, Rosane O. Sant’Ana, Hanna B. Andrade, Ana K. M. Anaissi, Sara V. Sampaio, Emne A. Abdallah, Ludmilla T. D. Chinen, Alexcia C. Braun, Bianca C. T. Flores, Celso A. L. Mello, Laura C. L. Claro, Claudia Z. Sztokfisz, Carlos C. Altamirano, David R. F. Carter, Victor H. F. Jesus, Rachel Riechelmann, Tiago Medina, Kenneth J. Gollob, Vilma R. Martins, João C. Setúbal, Adriane G. Pelosof, Felipe J. Coimbra, Wilson L. Costa-Jr, Israel T. Silva, Diana N. Nunes, Maria P. Curado, and Emmanuel Dias-Neto.
Keywords	Gastric cancer, GE4GAC, Ancestry, GPS, Brazil

Eachus, H., Subramanya, D., Jackson, H.E., Wang, G., Berntsen, K., Ashton, J.P., Esposito, U., Seifuddin, F., Pirooznia, M., Elhaik, E., Krone, M., Baines, R.A., Placzek, M., and Cunliffe. V.T. 2019. Regulation of neuron-specific gene transcription by stress hormone signalling requires synaptic activity in zebrafish. bioRxiv.

More...

Abstract	The Glucocorticoid Receptor (GR) co-ordinates metabolic and behavioural responses to stressors. We hypothesised that GR influences behaviour by modulating specific epigenetic and transcriptional processes in the brain. Using the zebrafish as a model organism, the brain methylomes of wild-type and grs357 mutant adults were analysed and GR-sensitive, differentially methylated regions (GR-DMRs) were identified. Two genes with GR-DMRs exhibited distinct methylation and transcriptional sensitivities to GR: the widely expressed direct GR target fkbp5 and neuron-specific aplp1. In larvae, neural activity is required for GR-mediated transcription of aplp1, but not for that of fkbp5. GR regulates metabotropic glutamate receptor gene expression, the activities of which also modulated aplp1 expression, implicating synaptic neurotransmission as an effector of GR function upstream of aplp1. Our results identify two distinct routes of GR-regulated transcription in the brain, including a pathway through which GR couples endocrine signalling to synaptic activity-regulated transcription by modulating metabotropic glutamate receptor expression.
Authors	Eachus, H., Subramanya, D., Jackson, H.E., Wang, G., Berntsen, K., Ashton, J.P., Esposito, U., Seifuddin, F., Pirooznia, M., Elhaik, E., Krone, M., Baines, R.A., Placzek, M., and Cunliffe. V.T.
Keywords	Glucocorticoid Receptor, aplp1, fkbp5, zebrafish

46.

Elhaik, E. and Ryan, D.M. 2019. Pair Matcher (PaM): fast model-based optimisation of treatment/case-control matches. Bioinformatics.

More...

Abstract	Motivation. In clinical trials, individuals are matched using demographic criteria, paired, and then randomly assigned to treatment and control groups to determine a drug's efficacy. A chief cause for the irreproducibility of results across pilot to Phase III trials is population stratification bias caused by the uneven distribution of ancestries in the treatment and control groups. Results. Pair Matcher (PaM) addresses stratification bias by optimising pairing assignments a priori and/or a posteriori to the trial using both genetic and demographic criteria. Using simulated and real datasets, we show that PaM identifies ideal and near-ideal pairs that are more genetically homogeneous than those identified based on competing methods, including the commonly used principal component analysis (PCA). Homogenising the treatment (or case) and control groups can be expected to improve the accuracy and reproducibility of the trial or genetic study. PaM's ancestral inferences also allow characterizing responders and developing a precision medicine approach to treatment. Availability PaM is freely available via Rhttps://github.com/eelhaik/PAM and a web-interface at http://elhaik-matcher.sheffield.ac.uk/ElhaikLab/.
Authors	Elhaik, E. and Ryan, D.M., and
Keywords	population structure, population stratification, clinical trials, randomised controls, Principal Component Analysis (PCA), association studies

Threshold of forest, Magritte, R. 1926. From Wikiart

2018

45.

Elhaik, E. 2018. Neonatal circumcision and prematurity are associated with sudden infant death syndrome (SIDS). Journal of Clinical and Translational Research.
Altmetric score: top 1% of all research outputs ever tracked by Altmetric (BioaRxiv version).

More...

Abstract	Background: Sudden Infant Death Syndrome (SIDS) is the most common cause of postneonatal unexplained infant death. The allostatic load hypothesis posits that SIDS is the result of cumulative perinatal painful, stressful, or traumatic exposures that tax neonatal regulatory systems. Aims: To test the predictions of the allostatic load hypothesis, we explored the relationships between SIDS and two common phenotypes, male neonatal circumcision (MNC) and prematurity. Methods: We collated latitudinal data from 15 countries and 40 US states sampled during 2009 and 2013. We used linear regression analyses and likelihood ratio tests to calculate the association between SIDS and the phenotypes. Results: SIDS mortality rate was significantly and positively correlated with MNC. Globally (weighted): Increase of 0.06 (95% CI = 0.01–0.1, t = 2.86, p = 0.01) per 1000 SIDS mortality per 10% increase in circumcision rate. US (weighted): Increase of 0.1 (95% CI = 0.02–0.16, t = 2.81, p = 0.01) per 1000 unexplained mortalities per 10% increase in circumcision rate. US states in which Medicaid covers MNC had significantly higher MNC rates (X ̃=0.72 vs 0.49; p =0.007) and male/female ratio of SIDS deaths (X ̃=1.48 vs 1.125; p = 0.015) than other US states. Prematurity was also significantly and positively correlated with MNC. Globally: Increase of 0.5 (weighted: 95% CI = 0.2–0.86, t = 3.37, p = 0.004) per 1000 SIDS mortality per 10% increase in the prematurity rates. US: Increase of 1.9 (weighted: 95% CI = 0.6–3.2, t = 3.13, p = 0.004) per 1000 unexplained mortalities per 10% increase in the prematurity rates. Combined, the phenotypes increased the likelihood of SIDS. Conclusions: Epidemiological analyses are useful to generate hypotheses but cannot provide strong evidence of causality. Biological plausibility is provided by a growing body of experimental and clinical evidence linking aversive preterm and early-life SIDS events. Together with historical and anthropological evidence, our findings emphasize the necessity of cohort studies that consider these phenotypes with the aim of improving the identification of at-risk infants and reducing infant mortality. Relevance for patients: Preterm birth and neonatal circumcision are associated with a greater risk of SIDS, and efforts should be focused on reducing their rates.
Authors	Elhaik, E.
Keywords	Sudden Infant Death Syndrome (SIDS), Allostatic load, Neonatal circumcision, Prematurity, Trauma, Pain, Stress, Lilith, Jews

44.

Esposito, U., Das, R., Syed, S., Pirooznia, M., and Elhaik, E. 2018. Ancient Ancestry Informative Markers for Identifying Fine-Scale Ancient Population Structure in Eurasians. Genes.
Altmetric score: top 1% of all research outputs from Genes.

More...

Abstract	The rapid accumulation of ancient human genomes from various areas and time periods potentially enables the expansion of studies of biodiversity, biogeography, forensics, population history, and epidemiology into past populations. However, most ancient DNA (aDNA) data were generated through microarrays designed for modern-day populations, which are known to misrepresent the population structure. Past studies addressed these problems by using ancestry informative markers (AIMs). It is, thereby, unclear whether AIMs derived from contemporary human genomes can capture ancient population structures, and whether AIM-finding methods are applicable to aDNA, provided that the high missingness rates in ancient—and oftentimes haploid—DNA can also distort the population structure. Here, we define ancient AIMs (aAIMs) and develop a framework to evaluate established and novel AIM-finding methods in identifying the most informative markers. We show that aAIMs identified by a novel principal component analysis (PCA)-based method outperform all of the competing methods in classifying ancient individuals into populations and identifying admixed individuals. In some cases, predictions made using the aAIMs were more accurate than those made with a complete marker set. We discuss the features of the ancient Eurasian population structure and strategies to identify aAIMs. This work informs the design of single nucleotide polymorphism (SNP) microarrays and the interpretation of aDNA results, which enables a population-wide testing of primordialist theories.
Authors	Esposito, U., Das, R., Syed, S., Pirooznia, M., and Elhaik, E.
Keywords	ancient DNA, ancient ancestry informative markers, population structure, principal component analysis, admixture mapping, primordialism

43.

Baughn, L.B., Pearce, K., Larson, D., Polley, M., Elhaik, E., Baird, M., Colby, C., Benson, J., Li, Z., Asmann, Y., Therneau, T., Cerhan, J.R., Vachon, C.M., Stewart, A.K., Bergsagel, P.L., Dispenzieri, A., Kumar S., and Rajkumar, S.J. 2018. Differences in genomic abnormalities among African individuals with monoclonal gammopathies using calculated ancestry. Blood Cancer Journal.
Altmetric score: top 2% of all research outputs ever tracked by Altmetric.

More...

Abstract	Multiple myeloma (MM) is two- to three-fold more common in African Americans (AAs) compared to European Americans (EAs). This striking disparity, one of the highest of any cancer, may be due to underlying genetic predisposition between these groups. There are multiple unique cytogenetic subtypes of MM, and it is likely that the disparity is associated with only certain subtypes. Previous efforts to understand this disparity have relied on self-reported race rather than genetic ancestry, which may result in bias. To mitigate these difficulties, we studied 881 patients with monoclonal gammopathies who had undergone uniform testing to identify primary cytogenetic abnormalities. DNA from bone marrow samples was genotyped on the Precision Medicine Research Array and biogeographical ancestry was quantitatively assessed using the Geographic Population Structure Origins tool. The probability of having one of three specific subtypes, namely t(11;14), t(14;16), or t(14;20) was significantly higher in the 120 individuals with highest African ancestry (≥80%) compared with the 235 individuals with lowest African ancestry (<0.1%) (51% vs. 33%, respectively, p value = 0.008). Using quantitatively measured African ancestry, we demonstrate a major proportion of the racial disparity in MM is driven by disparity in the occurrence of the t(11;14), t(14;16), and t(14;20) types of MM.
Authors	Baughn, L.B., Pearce, K., Larson, D., Polley, M., Elhaik, E., Baird, M., Colby, C., Benson, J., Li, Z., Asmann, Y., Therneau, T., Cerhan, J.R., Vachon, C.M., Stewart, A.K., Bergsagel, P.L., Dispenzieri, A., Kumar S., and Rajkumar, S.J.
Keywords	GPS Origins, GPS, Cancer, ancestry, Multiple myeloma, Africans

Ascending and descending, Escher, M. C. 1960. From Digital Commonwealth

2017

42.

Elhaik, E., Yusuf, L., Anderson, A.I., Pirooznia, M., Arnellos, D., Vilshansky, G., Ercal, G., Lu, Y., Webster, T., Baird, M.L., and Esposito U. 2017. The Diversity of REcent and Ancient huMan (DREAM): a new microarray for genetic anthropology and genealogy, forensics, and personalized medicine. Genome Biology and Evolution.
Altmetric score

Additional materials: DREAM SNPs (800k)

More...

Abstract	The human population displays wide variety in demographic history, ancestry, content of DNA derived from hominins or ancient populations, adaptation, traits, copy number variation (CNVs), drug response, and more. These polymorphisms are of broad interest to population geneticists, forensics investigators, and medical professionals. Historically, much of that knowledge was gained from population survey projects. While many commercial arrays exist for genome-wide single-nucleotide polymorphism (SNP) genotyping, their design specifications are limited and they do not allow a full exploration of biodiversity. We thereby aimed to design the Diversity of REcent and Ancient huMan (DREAM) - an all-inclusive microarray that would allow both identification of known associations and exploration of standing questions in genetic anthropology, forensics, and personalized medicine. DREAM includes probes to interrogate ancestry informative markers obtained from over 450 human populations, over 200 ancient genomes, and 10 archaic hominins. DREAM can identify 94% and 61% of all known Y and mitochondrial haplogroups, respectively and was vetted to avoid interrogation of clinically relevant markers. To demonstrate its capabilities, we compared its FST distributions with those of the 1000 Genomes Project and commercial arrays. Although all arrays yielded similarly shaped (inverse J) FST distributions, DREAM's autosomal and X-chromosomal distributions had the highest mean FST, attesting to its ability to discern subpopulations. DREAM performances are further illustrated in biogeographical, identical by descent (IBD), and CNV analyses. In summary, with approximately 800,000 markers spanning nearly 2,000 genes, DREAM is a useful tool for genetic anthropology, forensic, and personalized medicine studies.
Authors	Elhaik, E., Yusuf, L., Anderson, A.I., Pirooznia, M., Arnellos, D., Vilshansky, G., Ercal, G., Lu, Y., Webster, T., Baird, M.L., and Esposito U.
Keywords	population genetics, biogeography, ancient DNA, archaic DNA, forensics, CNVs

41.

Shamarina, D., Stoyantcheva, I., Mason, C.E., Bibby K., and Elhaik, E. 2017. Communicating the promise, risks, and ethics of large-scale, open space microbiome and metagenome research. Microbiome.
Altmetric score: top 5% of all research outputs ever tracked

More...

Abstract	The public commonly associates microorganisms with pathogens. This suspicion of microorganisms is understandable, as historically microorganisms have killed more humans than any other agent while remaining largely unknown until the late seventeenth century with the works of van Leeuwenhoek and Kircher. Despite our improved understanding regarding microorganisms, the general public are apt to think of diseases rather than of the majority of harmless or beneficial species that inhabit our bodies and the built and natural environment. As long as microbiome research was confined to labs, the public's exposure to microbiology was limited. The recent launch of global microbiome surveys, such as the Earth Microbiome Project and MetaSUB (Metagenomics and Metadesign of Subways and Urban Biomes) project, has raised ethical, financial, feasibility, and sustainability concerns as to the public's level of understanding and potential reaction to the findings, which, done improperly, risk negative implications for ongoing and future investigations, but done correctly, can facilitate a new vision of "smart cities." To facilitate improved future research, we describe here the major concerns that our discussions with ethics committees, community leaders, and government officials have raised, and we expound on how to address them. We further discuss ethical considerations of microbiome surveys and provide practical recommendations for public engagement.
Authors	Shamarina, D., Stoyantcheva, I., Mason, C.E., Bibby, K., and Elhaik, E.
Keywords	Microbiome, Metagenome, Built environment, Public, MetaSUB, Concerns, Ethics

40.

Elhaik, E. 2017. Editorial: Population Genetics of Worldwide Jewish People. frontiers in Genetics.

More...

Abstract	Stephen Jay Gould remarked that "the most erroneous stories are those we think we know best-and therefore never scrutinize or question" (Gould, 1996). In the past, shamans and priests were believed to have omnipotence in controlling nature, man, and fate. As guardians of history and memory, they developed captivating narratives that bounded nature, religion, and mythology and aspired humans to continue their efforts to tame the natural and supernatural worlds. Nowadays, scientists have adopted the traditional role of the shamans and, grievously, some of their inclination to narratives (Sand, 2015).
Authors	Elhaik, E.
Keywords	Yiddish, Ashkenazic Jews, Ashkenaz, Geographic population structure (GPS), archaeogenetics, Rhineland hypothesis, Ancient DNA

39.

Das, R., Wexler, P., Pirooznia, M., and Elhaik, E. 2017. The origins of Ashkenaz, Ashkenazic Jews, and Yiddish. frontiers in Genetics.
Altmetric score: #1 most read in the journal since publication

More...

Abstract	Recently, the geographical origins of Ashkenazic Jews (AJs) and their native language Yiddish were investigated by applying the Geographic Population Structure (GPS) to a cohort of exclusively Yiddish-speaking and multilingual AJs. GPS localized most AJs along major ancient trade routes in northeastern Turkey adjacent to primeval villages with names that resemble the word "Ashkenaz." These findings were compatible with the hypothesis of an Irano-Turko-Slavic origin for AJs and a Slavic origin for Yiddish and at odds with the Rhineland hypothesis advocating a Levantine origin for AJs and German origins for Yiddish. We discuss how these findings advance three ongoing debates concerning 1) the historical meaning of the term "Ashkenaz;" 2) the genetic structure of Ashkenazic Jews and their geographical origins as inferred from multiple studies employing both modern and ancient DNA and an original ancient DNA analysis; and 3) the development of Yiddish. Due to the rising popularity of geo- localization tools to address questions of origin we briefly discuss the advantages and limitations of popular tools with focus on the GPS approach. Our results reinforce the non-Levantine origins of AJs.
Authors	Das, R., Wexler, P., Pirooznia, M., and Elhaik, E.
Keywords	Yiddish, Ashkenazic Jews, Ashkenaz, Geographic population structure (GPS), archaeogenetics, Rhineland hypothesis, Ancient DNA

Ancient Rock art at Mt. Karkom, Deep Desert Israel-Day Tours (2017)

2016

38.

Marshall, S., Das, R., Pirooznia, M., and Elhaik, E.. 2016. Reconstructing Druze population history. Scientific Reports.
Altmetric score: top 2% most-read paper of all papers of similar age

More...

Abstract	The Druze are an aggregate of communities in the Levant and Near East living almost exclusively in the mountains of Syria, Lebanon and Israel whose ~1000 year old religion formally opposes mixed marriages and conversions. Despite increasing interest in genetics of the population structure of the Druze, their population history remains unknown. We investigated the genetic relationships between Israeli Druze and both modern and ancient populations. We evaluated our findings in light of three hypotheses purporting to explain Druze history that posit Arabian, Persian or mixed Near Eastern-Levantine roots. The biogeographical analysis localised proto-Druze to the mountainous regions of southeastern Turkey, northern Iraq and southeast Syria and their descendants clustered along a trajectory between these two regions. The mixed Near Eastern-Middle Eastern localisation of the Druze, shown using both modern and ancient DNA data, is distinct from that of neighbouring Syrians, Palestinians and most of the Lebanese, who exhibit a high affinity to the Levant. Druze biogeographic affinity, migration patterns, time of emergence and genetic similarity to Near Eastern populations are highly suggestive of Armenian-Turkish ancestries for the proto-Druze.
Authors	Marshall, S., Das, R., Pirooznia, M., and Elhaik, E.
Keywords	Druze, genetic isolate, Geographic population structure (GPS), Archaeogenetics, Mountains, Ararat, Lake Van,

37.

Elhaik, E.. 2016. A "wear and tear" hypothesis to explain sudden infant death syndrome. Frontiers in Neurology.
Altmetric score: #1 most read in the journal since publication

More...

Abstract	Sudden infant death syndrome (SIDS) is the leading cause of death among USA infants under 1 year of age accounting for ~2,700 deaths per year. Although formally SIDS dates back at least 2,000 years and was even mentioned in the Hebrew Bible (Kings 3:19), its etiology remains unexplained prompting the CDC to initiate a sudden unexpected infant death case registry in 2010. Due to their total dependence, the ability of the infant to allostatically regulate stressors and stress responses shaped by genetic and environmental factors is severely constrained. We propose that SIDS is the result of cumulative painful, stressful, or traumatic exposures that begin in utero and tax neonatal regulatory systems incompatible with allostasis. We also identify several putative biochemical mechanisms involved in SIDS. We argue that the important characteristics of SIDS, namely male predominance (60:40), the significantly different SIDS rate among USA Hispanics (80% lower) compared to whites, 50% of cases occurring between 7.6 and 17.6 weeks after birth with only 10% after 24.7 weeks, and seasonal variation with most cases occurring during winter, are all associated with common environmental stressors, such as neonatal circumcision and seasonal illnesses. We predict that neonatal circumcision is associated with hypersensitivity to pain and decreased heart rate variability, which increase the risk for SIDS. We also predict that neonatal male circumcision will account for the SIDS gender bias and that groups that practice high male circumcision rates, such as USA whites, will have higher SIDS rates compared to groups with lower circumcision rates. SIDS rates will also be higher in USA states where Medicaid covers circumcision and lower among people that do not practice neonatal circumcision and/or cannot afford to pay for circumcision. We last predict that winter-born premature infants who are circumcised will be at higher risk of SIDS compared to infants who experienced fewer nociceptive exposures. All these predictions are testable experimentally using animal models or cohort studies in humans. Our hypothesis provides new insights into novel risk factors for SIDS that can reduce its risk by modifying current infant care practices to reduce nociceptive exposures.
Authors	Elhaik, E.
Keywords	sudden infant death syndrome (SIDS), allostatic load, neonatal circumcision, trauma, pain, stress

36.

Elhaik, E.. 2016. In search of the judische Typus: a proposed benchmark to test the genetic basis of Jewishness challenges notions of "Jewish biomarkers". Frontiers in Genetics.
Altmetric score: #2 most read in the journal since publication

More...

Abstract	The debate as to whether Jewishness is a biological trait inherent from an "authentic" "Jewish type" (judische Typus) ancestor or a system of beliefs has been raging for over two centuries. While the accumulated biological and anthropological evidence support the latter argument, recent genetic findings, bolstered by the direct-to- consumer genetic industry, purport to identify Jews or quantify one's Jewishness from genomic data. To test the merit of claims that Jews and non-Jews are genetically distinguishable, we propose a benchmark where genomic data of Jews and non-Jews are hybridized over few generations and the observed and predicted Jewishness of the terminal offspring according to either the Orthodox religious law (Halacha) or the Israeli Lafw of Return are compared. Members of academia, the public, and 23andMe were invited to use the benchmark to test claims that Jews are genetically distinct from non-Jews. Here, we report the findings from these trials. We also compare the genomic similarity of ~300 individuals from nearly thirty Afro-Eurasian Jewish communities to a simulated judische Typus population. The results are discussed in light of modern trends in the genetics of Jews and related fields and provide a tentative answer to the ageless question "who is a Jew?"
Authors	Elhaik, E.
Keywords	Urjudischer Typus, Jewish Urtypus, Jewishness, ancestry, Jews

35.

Morozova, I., Flegontov, P., Mikheyev, A.S., Asgharian, H., Ponomarenko, P., Klyuchnikov, V., ArunKumar, G., Bruskin, S., Prokhorchouk, E., Gankin, Y., Rogaev, E., Nikolsky, Y., Baranova, A., Elhaik, E. and T.V., Tatarinova. 2016. Toward high-resolution population genomics using archaeological samples. DNA Research.
#1 most-read in DNA Research

More...

Abstract	The term 'ancient DNA' (aDNA) is coming of age, with over 1,200 hits in the PubMed database, beginning in the early 1980s with the studies of 'molecular paleontology'. Rooted in cloning and limited sequencing of DNA from ancient remains during the pre-PCR era, the field has made incredible progress since the introduction of PCR and next-generation sequencing. Over the last decade, aDNA analysis ushered in a new era in genomics and became the method of choice for reconstructing the history of organisms, their biogeography, and migration routes, with applications in evolutionary biology, population genetics, archaeogenetics, paleoepidemiology, and many other areas. This change was brought by development of new strategies for coping with the challenges in studying aDNA due to damage and fragmentation, scarce samples, significant historical gaps, and limited applicability of population genetics methods. In this review, we describe the state-of-the-art achievements in aDNA studies, with particular focus on human evolution and demographic history. We present the current experimental and theoretical procedures for handling and analysing highly degraded aDNA. We also review the challenges in the rapidly growing field of ancient epigenomics. Advancement of aDNA tools and methods signifies a new era in population genetics and evolutionary medicine research.
Authors	Morozova, I., Flegontov, P., Mikheyev, A.S., Asgharian, H., Ponomarenko, P., Klyuchnikov, V., ArunKumar, G., Bruskin, S., Prokhorchouk, E., Gankin, Y., Rogaev, E., Nikolsky, Y., Baranova, A., Elhaik, E. and T.V., Tatarinova.
Keywords	ancient DNA, bioinformatics, epigenetics, population genetics, next-generation sequencing

34.

The MetaSUB International Consortium. 2016. The Metagenomics and Metadesign of the Subways and Urban Biomes (MetaSUB) International Consortium inaugural meeting report. Microbiome.
Altmetric score: #3 most read in the journal since publication

More...

Abstract	The Metagenomics and Metadesign of the Subways and Urban Biomes (MetaSUB) International Consortium is a novel, interdisciplinary initiative comprised of experts across many fields, including genomics, data analysis, engineering, public health, and architecture. The ultimate goal of the MetaSUB Consortium is to improve city utilization and planning through the detection, measurement, and design of metagenomics within urban environments. Although continual measures occur for temperature, air pressure, weather, and human activity, including longitudinal, cross-kingdom ecosystem dynamics can alter and improve the design of cities. The MetaSUB Consortium is aiding these efforts by developing and testing metagenomic methods and standards, including optimized methods for sample collection, DNA/RNA isolation, taxa characterization, and data visualization. The data produced by the consortium can aid city planners, public health officials, and architectural designers. In addition, the study will continue to lead to the discovery of new species, global maps of antimicrobial resistance (AMR) markers, and novel biosynthetic gene clusters (BGCs). Finally, we note that engineered metagenomic ecosystems can help enable more responsive, safer, and quantified cities.
Authors	Consortium Lead: Christopher E. Mason Executive Directors: Ebrahim Afshinnekoo and Sofia Ahsanuddin External Advisory Board (EAB): Elodie Ghedin, Timothy Read, Claire Fraser, Joel Dudley, Mark Hernandez, and Christopher Bowler MetaSUB City Principal Investigators: Ariel Chernomoretz and Gustavo Stolovitzky (Buenos Aires, Argentina), Pawel P Labaj & Alexandra B. Graf (Vienna, Austria), Aaron Darling and Catherine Burke (Sydney, Australia), Houtan Noushmehr (Ribeirao Preto, Brasil), Emmanuel Dias-Neto (Sao Paulo, Brazil), Yongli Guo (Beijing, China), Zhi Xie (Guangzhou, China), Patrick Lee (Hong Kong, China), Leming Shi (Shanghai, China), Carlos A. Ruiz-Perez and Maria Mercedes Zambrano (Bogota, Colombia), Rania Siam and Amged Ouf (Cairo, Egypt), Hugues Richard and Ingrid Lafontaine (Paris, France), Lothar H. Wieler and Torsten Semmler (Berlin, Germany), Niyaz Ahmed, Bharath Prithiviraj, and Narasimha Nedunuri (Hyderabad, India), Shaadi Mehr and Kambiz Banihashemi (Tehran, Iran), Florigio Lista and Anna Anselmo (Rome, Italy), Haruo Suzuki, Makoto Kuroda, Riu Yamashita, Yukoto Sato, Eli Kaminuma (Tokyo and Sendai Japan), Celia M. Alpuche Aranda and Jesus Martinez (Mexico City, Mexico), Christopher Dada (Auckland, New Zealand), Marius Dybwad (Oslo, Norway), Manuela Oliveira (Lisbon, Portugal and Porto, Portugal), Stephan Schuster (Singapore, Singapore), Geoffrey H. Siwo (Johannesburg, South Africa), Soojin Jang, Sung Chul Seo, and Sung Ho Hwang (Seoul, South Korea), Stephan Ossowski and Daniela Bezdan (Barcelona, Spain), Salama Chaker and Aspassia D. Chatziefthimiou (Doha, Qatar), Klas Udekwu and Per Liungdahl (Stockholm, Sweden), Ugur Sezerman and Cem Meydan (Izmir, Turkey), Eran Elhaik (Sheffeild, UK), Gaston Gonnet (Montevideo, Uruguay), Lynn M. Schriml and Emmanuel Mongodin (Baltimore, USA and Washington D.C., USA), Curtis Huttenhower (Boston, USA), Jack Gilbert (Chicago, USA), Christopher E. Mason (New York City, USA), Jonathan Eisen (Sacramento and San Francisco, USA), David Hirschberg (Seattle, USA), Mark Hernandez (Denver, USA) Inaugural MetaSUB International Meeting Speakers: Jack Gilbert, Curtis Huttenhower, Andrew Kasarskis*, Patrick Lee, Christopher E. Mason, Julia Maritz, Ellen Jorgensen, Scott Tighe, Russel Neches, Tom Livelli, Leming Shi, Houtan Noushmehr, Haruo Suzuki, Jesus Martinez Barnetche, Catherine Burke, Aaron Darling, Hugues Richard, Zhi Xie, Stephan Ossowski, Edoardo Pasolli, Nick Greenfield, Nur Hasan, Ebrahim Afshinnekoo, Mohamed Donia, John Brownstein, Linda Nozick, Harold Michels, Lynn Schriml, Catherine Brownstein, Jeanne Garbarino, Abby Lyons, and Jeff Zhu
Keywords	Microbiome, Biosynthetic gene clusters, Built environment, Next-generation sequencing, Antimicrobial resistance markers

Das, R., Wexler, P., Pirooznia, M., and Elhaik, E. 2016. Responding to an enquiry concerning the geographic population structure (GPS) approach and the origin of Ashkenazic Jews - a reply to Flegontov et al. arXiv.

More...

Abstract	Recently, we investigated the geographical origins of Ashkenazic Jews (AJs) and their native language Yiddish by applying a biogeographical tool, the Geographic Population Structure (GPS), to a cohort of 367 exclusively Yiddish-speaking and multilingual AJs genotyped on the Genochip microarray. GPS localized most AJs along major ancient trade routes in northeastern Turkey adjacent to primeval villages with names that may be derived from the word "Ashkenaz." These findings were compatible with the hypothesis of an Irano-Turko-Slavic origin for AJs and a Slavic origin for Yiddish and at odds with the Rhineland hypothesis advocating a German origin of both. Our approach has been recently adopted by Flegontov et al. (2016a) to trace the origin of the Siberian Ket people and their language. Recently, Flegontov et al. (2016b) have raised several questions concerning the accuracy of the Genochip microarray and GPS, specifically in relation to AJs and Yiddish. Although many of these issues have been addressed in our previous papers, we take this opportunity to clarify the principles of the GPS approach, review the recent biogeographical and ancient DNA findings regarding AJs, and comment on the origin of Yiddish.
Authors	Das, R., Wexler, P., Pirooznia, M., and Elhaik, E.
Keywords	Archaeogenetics; Yiddish; Ashkenazic Jews; Ashkenaz; Geographic population structure (GPS); Rhineland hypothesis; Citizen Science

33.

Das, R., Wexler, P., Pirooznia, M., and Elhaik, E. 2016. Localizing Ashkenazic Jews to primeval villages in the ancient Iranian lands of Ashkenaz. Genome Biology and Evolution.
Altmetric score (99th percentile of all articles, 1st in GBE)

More...

Abstract	The Yiddish language is over one thousand years old and incorporates German, Slavic, and Hebrew elements. The prevalent view claims Yiddish has a German origin, whereas the opposing view posits a Slavic origin with strong Iranian and weak Turkic substrata. One of the major difficulties in deciding between these hypotheses is the unknown geographical origin of Yiddish speaking Ashkenazic Jews (AJs). An analysis of 393 Ashkenazic, Iranian, and mountain Jews and over 600 non-Jewish genomes demonstrated that Greeks, Romans, Iranians, and Turks exhibit the highest genetic similarity with AJs. The Geographic Population Structure (GPS) analysis localized most AJs along major primeval trade routes in northeastern Turkey adjacent to primeval villages with names that may be derived from "Ashkenaz." Iranian and mountain Jews were localized along trade routes on the Turkey's eastern border. Loss of maternal haplogroups was evident in non-Yiddish speaking AJs. Our results suggest that AJs originated from a Slavo-Iranian confederation, which the Jews call "Ashkenazic" (i.e., "Scythian"), though these Jews probably spoke Persian and/or Ossete. This is compatible with linguistic evidence suggesting that Yiddish is a Slavic language created by Irano-Turko-Slavic Jewish merchants along the Silk Roads as a cryptic trade language, spoken only by its originators to gain an advantage in trade. Later, in the 9th century, Yiddish underwent relexification by adopting a new vocabulary that consists of a minority of German and Hebrew and a majority of newly coined Germanoid and Hebroid elements that replaced most of the original Eastern Slavic and Sorbian vocabularies, while keeping the original grammars intact.
Authors	Das, R., Wexler, P., Pirooznia, M., and Elhaik, E.
Keywords	Archaeogenetics; Yiddish; Ashkenazic Jews; Ashkenaz; Geographic population structure (GPS); Rhineland hypothesis; Citizen Science

Departure of the Winged Ship, Vladimir Kush (2000)

2015

32.

Elhaik, E. and Zandi P. 2015. Dysregulation of the NF-kB pathway as a potential inducer of bipolar disorder. Journal of Psychiatric Research.

More...

Abstract	A century of investigations enhanced our understanding of bipolar disorder although it remains a complex multifactorial disorder with a mostly unknown pathophysiology and etiology. The role of the immune system in this disorder is one of the most controversial notions in genetic psychiatry. Though inflammation has been consistently reported in bipolar patients, it remains unclear how the immunologic process influence the disorder. One of the core components of the immune system is the NF-kB, a major transcription factor that plays an essential role in the development of innate and adaptive immunity. Remarkably, the NF-kB pathway received only little attention in bipolar studies, in contrast to studies of related psychiatric disorders where dysregulation has been proposed to explain the neurodegeneration in patient conditions. If this is also true for bipolar disorder, it will underscore the role of the immune system in the chronicity and pathophysiology of the disorder and may promote personalized therapeutic strategies. This is the first review to summarize the current knowledge of the pathophysiological functions of NF-kB in bipolar disorder.
Authors	Elhaik E. and Zandi P.
Keywords	Bipolar disorder; NF-kB; inflammation; cytokines; psychiatric disorders; autoimmunity

31.

Guo, H., Chamberlain, S., Elhaik, E., Jalli, E., Lynes, A.R., Marczak, L., Sabath, N., Vargas, A., Wieski, K., Zelig, E.M., and Pennings, S.C. 2015. Geographic variation in plant community structure of salt marshes: species, functional and phylogenetic perspectives. PloS One.

More...

Abstract	In general, community similarity is thought to decay with distance; however, this view may be complicated by the relative roles of different ecological processes at different geographical scales, and by the compositional perspective (e.g. species, functional group and phylogenetic lineage) used. Coastal salt marshes are widely distributed worldwide, but no studies have explicitly examined variation in salt marsh plant community composition across geographical scales, and from species, functional and phylogenetic perspectives. Based on studies in other ecosystems, we hypothesized that, in coastal salt marshes, community turnover would be more rapid at local versus larger geographical scales; and that community turnover patterns would diverge among compositional perspectives, with a greater distance decay at the species level than at the functional or phylogenetic levels. We tested these hypotheses in salt marshes of two regions: The southern Atlantic and Gulf Coasts of the United States. We examined the characteristics of plant community composition at each salt marsh site, how community similarity decayed with distance within individual salt marshes versus among sites in each region, and how community similarity differed among regions, using species, functional and phylogenetic perspectives. We found that results from the three compositional perspectives generally showed similar patterns: there was strong variation in community composition within individual salt marsh sites across elevation; in contrast, community similarity decayed with distance four to five orders of magnitude more slowly across sites within each region. Overall, community dissimilarity of salt marshes was lowest on the southern Atlantic Coast, intermediate on the Gulf Coast, and highest between the two regions. Our results indicated that local gradients are relatively more important than regional processes in structuring coastal salt marsh communities. Our results also suggested that in ecosystems with low species diversity, functional and phylogenetic approaches may not provide additional insight over a species-based approach.
Authors	Hongyu Guo, Scott A. Chamberlain, Eran Elhaik, Inder Jalli, Alana-Rose Lynes, Laurie Marczak, Niv Sabath, Amy Vargas, Kazimierz Wieski, Emily M. Zelig, and Steven C. Pennings
Keywords	Bio diversity, community structure, distance-decay of community similarity, functional traits, biogeographic variation, phylogenetic relationship, salt marsh

Elhaik, E., Tatarinova, T., Klyosov, A., and Graur, D. 2015. An extended reply to Mendez et al.: The 'extremely ancient' chromosome that still isn't. arXiv.

More...

Abstract	Earlier this year, we published a scathing critique of a paper by Mendez et al. (2013) in which the claim was made that a Y chromosome was 237,000-581,000 years old. Elhaik et al. (2014) also attacked a popular article in Scientific American by the senior author of Mendez et al. (2013), whose title was "Sex with other human species might have been the secret of Homo sapiens's [sic] success" (Hammer 2013). Five of the 11 authors of Mendez et al. (2013) have now written a "rebuttal," and we were allowed to reply. Unfortunately, our reply was censored for being "too sarcastic and inflamed." References were removed, meanings were castrated, and a dedication in the Acknowledgments was deleted. Now, that the so-called rebuttal by 45% of the authors of Mendez et al. (2013) has been published together with our vasectomized reply, we decided to make public our entire reply to the so called "rebuttal." In fact, we go one step further, and publish a version of the reply that has not even been self-censored.
Authors	Eran Elhaik, Tatiana Tatarinova, Anatole Klyosov, Dan Graur.
Keywords	A00 haplotype, Y chromosome, Albert Perry, Y-chromosomal Adam, TMRCA, Fernando Mendez, Michael Hammer

30.

Elhaik, E., Tatarinova, T., Klyosov, A., and Graur, D. 2015. Reply to Mendez et al.: The 'extremely ancient' chromosome that still isn't. European Journal of Human Genetics.

More...

Abstract	Earlier this year, we discovered that an extreme age estimate for a Y chromosomal haplotype (237,000-581,000 years ago) by Mendez et al was based on analytical choices that consistently inflated its value.
Authors	Eran Elhaik, Tatiana Tatarinova, Anatole Klyosov, Dan Graur.
Keywords	A00 haplotype, Y chromosome, Albert Perry, Y-chromosomal Adam, TMRCA, Fernando Mendez, Michael Hammer

Water Lily Pond, Claude Monet (1899)

2014

29.

Elhaik, E. and Graur, D. 2014. A Comparative Study and a Phylogenetic Exploration of the Compositional Architectures of Mammalian Nuclear Genomes. PloS Computational Biology.

More...

Abstract	For the past four decades the compositional organization of the mammalian genome posed a formidable challenge to molecular evolutionists attempting to explain it from an evolutionary perspective. Unfortunately, most of the explanations adhered to the "isochore theory," which has long been rebutted. Recently, an alternative compositional domain model was proposed depicting the human and cow genomes as composed mostly of short compositionally homogeneous and nonhomogeneous domains and a few long ones. We test the validity of this model through a rigorous sequence-based analysis of eleven completely sequenced mammalian and avian genomes. Seven attributes of compositional domains are used in the analyses: (1) the number of compositional domains, (2) compositional domain-length distribution, (3) density of compositional domains, (4) genome coverage by the different domain types, (5) degree of fit to a power-law distribution, (6) compositional domain GC content, and (7) the joint distribution of GC content and length of the different domain types. We discuss the evolution of these attributes in light of two competing phylogenetic hypotheses that differ from each other in the validity of clade Euarchontoglires. If valid, the murid genome compositional organization would be a derived state and exhibit a high similarity to that of other mammals. If invalid, the murid genome compositional organization would be closer to an ancestral state. We demonstrate that the compositional organization of the murid genome differs from those of primates and laurasiatherians, a phenomenon previously termed the "murid shift," and in many ways resembles the genome of opossum. We find no support to the "isochore theory." Instead, our findings depict the mammalian genome as a tapestry of mostly short homogeneous and nonhomogeneous domains and few long ones thus providing strong evidence in favor of the compositional domain model and seem to invalidate clade Euarchontoglires.
Authors	Eran Elhaik, Dan Graur.
Keywords	Isochores, murid shift, compositional domains, isochore theory, power-law distribution, Euarchontoglires, opossum

28.

Elhaik, E., Tatarinova, T., Chebotarev, D., Piras, I.S., Calo`, C.M., Montis, A., Atzori, M., Marini, M., Tofanelli, S., Francalacci, P., Pagani, L., Tyler-Smith, C., Xue, Y., Cucca, F., Schurr, T.G., Gaieski, J.B., Melendez, C., Vilar, M.G., Owings, A.C., Go'mez, R., Fujita, R., Santos, F.R., Comas, D., Balanovsky, O., Balanovska, E., Zalloua, P., Soodyall, H., Pitchappan, R., GaneshPrasad, A., Hammer, M., Matisoo-Smith, L., Wells S.R., and The Genographic Consortium. 2014. Geographic population structure analysis of worldwide human populations infers their biogeographical origins. Nature Communications.

Altmetric score (99% percentile of all articles, ranked 8th in Nature communications).
Science highlight: Genetic 'App' Tells You Where You're From.
Nature Communications highlight: Genes give clues to where in the World you came from.
Nature Middle East highlight: Genes can reveal where we come from.
Nature India highlight: Software tools solve mysteries of human origins, crimes.
Nature Asia Genetic: whether your ancestors came from somewhere.

More...

Abstract	The search for a method that utilizes biological information to predict humans' place of origin has occupied scientists for millennia. Over the past four decades, scientists have employed genetic data in an effort to achieve this goal but with limited success. While biogeographical algorithms using next-generation sequencing data have achieved an accuracy of 700 km in Europe, they were inaccurate elsewhere. Here we describe the Geographic Population Structure (GPS) algorithm and demonstrate its accuracy with three data sets using 40,000-130,000 SNPs. GPS placed 83% of worldwide individuals in their country of origin. Applied to over 200 Sardinians villagers, GPS placed a quarter of them in their villages and most of the rest within 50km of their villages. GPS's accuracy and power to infer the biogeography of worldwide individuals down to their country or, in some cases, village, of origin, underscores the promise of admixture-based methods for biogeography and has ramifications for genetic ancestry testing
Authors	Eran Elhaik, Tatiana Tatarinova, Dmitri Chebotarev, Ignazio S. Piras, Carla Maria Calo`, Antonella De Montis, Manuela Atzori, Monica Marini, Sergio Tofanelli, Paolo Francalacci, Luca Pagani, Chris Tyler-Smith, Yali Xue, Francesco Cucca, Theodore G. Schurr, Jill B. Gaieski, Carlalynne Melendez, Miguel G. Vilar, Amanda C. Owings, Roci'o Go'mez, Ricardo Fujita, Fabri'cio R. Santos, David Comas, Oleg Balanovsky, Elena Balanovska, Pierre Zalloua, Himla Soodyall, Ramasamy Pitchappan, ArunKumar GaneshPrasad, Michael Hammer, Lisa Matisoo-Smith, Spencer R. Wells & The Genographic Consortium.
Keywords	GPS, Biogeography, village, island, SPA, PCA, Genochip

27.

Elsik, C.G., Worley, K.C., Bennett, A.K., Beye, M., Camara, F., Childers, C.P., Graaf, D., Debyser, G., Deng, J., Devreese, B., Elhaik, E., et al. 2014. Finding the missing honey bee genes: lessons learned from a genome upgrade. BMC Genomics.
Altmetric score (95 percentile of all articles, 3rd in BMC Genomics)

More...

Abstract	Background The first generation of genome sequence assemblies and annotations have had a significant impact upon our understanding of the biology of the sequenced species, the phylogenetic relationships among species, the study of populations within and across species, and have informed the biology of humans. As only a few Metazoan genomes are approaching finished quality (human, mouse, fly and worm), there is room for improvement of most genome assemblies. The honey bee (Apis mellifera) genome, published in 2006, was noted for its bimodal GC content distribution that affected the quality of the assembly in some regions and for fewer genes in the initial gene set (OGSv1.0) compared to what would be expected based on other sequenced insect genomes. Results Here, we report an improved honey bee genome assembly (Amel_4.5) with a new gene annotation set (OGSv3.2), and show that the honey bee genome contains a number of genes similar to that of other insect genomes, contrary to what was suggested in OGSv1.0. The new genome assembly is more contiguous and complete and the new gene set includes ~5000 more protein-coding genes, 50% more than previously reported. About 1/6 of the additional genes were due to improvements to the assembly, and the remaining were inferred based on new RNAseq and protein data. Conclusions Lessons learned from this genome upgrade have important implications for future genome sequencing projects. Furthermore, the improvements significantly enhance genomic resources for the honey bee, a key model for social behavior and essential to global ecology through pollination.
Authors	Christine G Elsik, Kim C Worley, Anna K Bennett, Martin Beye, Francisco Camara, Christopher P Childers, Dirk C de Graaf, Griet Debyser, Jixin Deng, Bart Devreese, Eran Elhaik, Jay D Evans, Leonard J Foster, Dan Graur, Roderic Guigo, Katharina Jasmin Hoff, Michael E Holder, Matthew E Hudson, Greg J Hunt, Huaiyang Jiang, Vandita Joshi, Radhika S Khetani, Peter Kosarev, Christie L Kovar, Jian Ma, Ryszard Maleszka, Robin F Moritz, Monica C Munoz-Torres, Terence D Murphy, Donna M Muzny et al.
Keywords	Bee, isochores, compositional domains, build 5

26.

Elhaik, E., Tatarinova, T., Klyosov, A., and Graur D. 2014. The 'extremely ancient' chromosome that isn't: a forensic bioinformatic investigation of Albert Perry's X-degenerate portion of the Y chromosome. European Journal of Human Genetics.
Altmetric score (99% of all articles of similar age, 1st in EJHG)

More...

Abstract	Mendez and colleagues reported the identification of a Y chromosome haplotype (the A00 lineage) that lies at the basal position of the Y chromosome phylogenetic tree. Incorporating this haplotype, the authors estimated the time to the most recent common ancestor (TMRCA) for the Y tree to be 338 000 years ago (95% CI:237,000-581,000). Such an extraordinarily early estimate contradicts all previous estimates in the literature and is over a 100 000 years older than the earliest fossils of anatomically modern humans. This estimate raises two astonishing possibilities, either the novel Y chromosome was inherited after ancestral humans interbred with another species, or anatomically modern Homo sapiens emerged earlier than previously estimated and quickly became subdivided into genetically differentiated subpopulations. We demonstrate that the TMRCA estimate was reached through inadequate statistical and analytical methods, each of which contributed to its inflation. We show that the authors ignored previously inferred Y-specific rates of substitution, incorrectly derived the Y-specific substitution rate from autosomal mutation rates, and compared unequal lengths of the novel Y chromosome with the previously recognized basal lineage. Our analysis indicates that the A00 lineage was derived from all the other lineages 208 300 (95% CI:163,900-260,200) years ago.
Authors	Eran Elhaik, Tatiana Tatarinova, Anatole Klyosov, Dan Graur.
Keywords	A00 haplotype, Y chromosome, Albert Perry, Y-chromosomal Adam, TMRCA, interbreeding

25.

Elhaik, E., Pellegrini, M., and Tatarinova, T. 2014. Gene expression and nucleotide composition are associated with genic methylation level in Oryza sativa. BMC Bioinformatics. Data.

More...

Abstract	Background. The methylation of cytosines at CpG dinucleotides, which plays an important role in gene expression regulation, is one of the most studied epigenetic modifications. Thus far, the detection of DNA methylation has been determined mostly by experimental methods, which are not only prone to bench effects and artifacts but are also time-consuming, expensive, and cannot be easily scaled up to many samples. It is therefore useful to develop computational prediction methods for DNA methylation. Our previous studies highlighted the existence of correlations between the GC content of the third codon position (GC₃), methylation, and gene expression. We thus designed a model to predict methylation in Oryza sativa based on genomic sequence features and gene expression data. Results. We first derive equations to describe the relationship between gene methylation levels, GC₃, expression, length, and other gene compositional features. We next assess gene compositional features involving sixmers and their association with methylation levels and other gene level properties. By applying our sixmer-based approach on rice gene expression data we show that it can accurately predict methylation (Pearson's correlation coefficient r = 0.79) for the majority (79%) of the genes. Matlab code with our model is included. Conclusions. Gene expression variation can be used as predictors of gene methylation levels.
Authors	Elhaik, Pellegrini, and Tatarinova.
Keywords	DNA methylation, gene expression, GC3, rice, Oryza sativa

The Tree Skulls, Paul Cezanne (1900)

2013

24.

Elhaik, E., Tatarinova, T., and Pellegrini, M. 2013. Cross-species analysis of genic GC₃ content and DNA methylation patterns. Genome Biology and Evolution.

More...

Abstract	Background. The GC-content in the third codon position (GC₃) exhibits a unimodal distribution in many plant and animal genomes. Interestingly, grasses and homeotherm vertebrates exhibit a unique bimodal distribution. High GC₃ was previously found to be associated with variable expression, higher frequency of upstream TATA boxes, and an increase of GC₃ from 5' to 3'. Moreover, GC₃- rich genes are predominant in certain gene classes and are enriched in CpG dinucleotides that are potential targets for methylation. Based on the GC₃ bimodal distribution we hypothesize that GC₃ has a regulatory role involving methylation and gene expression. To test that hypothesis, we selected diverse taxa (rice, thale cress, bee, and human) that varied in the modality of their GC₃ distribution and tested the association between GC₃, DNA methylation and gene expression. Results. We examine the relationship between cytosine methylation levels and GC₃, gene expression, genome signature, gene length, and other gene compositional features. We find a strong negative correlation (Pearson's correlation coefficient r=-0.67, p-value <0.0001) between GC₃ and genic CpG methylation. The comparison between 5'-3' gradients of CG3-skew and genic methylation for the taxa in the study suggests interplay between gene-body methylation and transcription-coupled cytosine deamination effect. Conclusions. Compositional features are correlated with methylation levels of genes in rice, thale cress, human, bee and fruit fly (which acts as an unmethylated control). These patterns allow us to generate evolutionary hypotheses about the relationship between GC₃ and methylation and how these affect expression patterns. Specifically, we propose that the opposite effects of methylation and compositional gradients along coding regions of GC₃-poor and GC₃-rich genes are the products of several competing processes.
Authors	Elhaik, Tatarinova, and Pellegrini.
Keywords	DNA methylation, gene expression, GC3, grasses, homeotherms, Oryza sativa, Apis mellifera, Homo sapiens, Arabidopsis thaliana

23.

Simola, D.F., Wissler, L., Donahue, G., Waterhouse, R.M., Helmkampf, M., Roux, J., Nygaard, S., Glastad, K., Hagen, D.E., Viljakainen, L., Reese, J.T., Hunt, B.G., Graur, D., Elhaik, E., Kriventseva, E., Wen, J., Parker, B.J., Cash, E., Privman, E., Childers, C.P., Munos-Torres, M.C., Boomsma, J.J., Bornberg-Bauer, E., Currie, C., Elsik, C.G., Suen, G., Goodisman, M.A., Keller, L., Liebig, J., Rawls, A., Reinberg, D., Smith, C.D., Smith, C.R., Tsutsui, N., Wurm, Y., Zdobnov, E.M., Berger, S.L., and Gadau, J. 2013. Social insect genomes exhibit dramatic evolution in gene composition and regulation while preserving regulatory features linked to sociality. Genome Research.

More...

Abstract	Genomes of eusocial insects code for dramatic examples of phenotypic plasticity and social organization. We compared the genomes of seven ants, the honeybee, and various solitary insects to examine whether eusocial lineages share distinct features of genomic organization. Each ant lineage contains ~4,000 novel genes, but only 64 of these genes are conserved among all seven ants. Many gene families have been expanded in ants, notably those involved in chemical communication (e.g., desaturases and odorant receptors). Alignment of the ant genomes revealed reduced purifying selection compared to Drosophila without significantly reduced synteny. Correspondingly, ant genomes exhibit dramatic divergence of non-coding regulatory elements, however extant conserved regions are enriched for novel non-coding RNAs and transcription factor binding sites. Comparison of orthologous gene promoters between eusocial and solitary species revealed significant regulatory evolution in both cis (e.g., CREB) and trans (e.g., Forkhead) for nearly 2000 genes, many of which exhibit phenotypic plasticity. Our results emphasize that genomic changes can occur remarkably fast in ants, as two recently diverged leaf-cutter ant species exhibit faster accumulation of species-specific genes and greater divergence in regulatory elements compared to other ants or Drosophila. Thus, while the "sociogenomes" of ants and the honeybee are broadly characterized by a pervasive pattern of divergence in gene composition and regulation, they preserve lineage-specific regulatory features linked to eusociality. We propose that changes in gene regulation played a key role in the origins of insect eusociality, whereas changes in gene composition were more relevant for lineagespecific eusocial adaptations.
Authors	Simola, Wissler, Donahue, Waterhouse, Helmkampf, Roux, Nygaard, Glastad, Hagen, Viljakainen, Reese, Hunt, Graur, Elhaik, Kriventseva, Wen, Parker, Cash, Privman, Childers, Mu�oz-Torres, Boomsma, Bornberg-Bauer, Currie, Elsik, Suen, Goodisman, Keller, Liebig, Rawls, Reinberg, Smith, Smith, Tsutsui, Wurm, Zdobnov, Berger, and Gadau.
Keywords	Ants, social, Isochores, compositional domains, IsoPlotter, compositional maps, insects, Djs

22.

Elhaik, E. and Graur D. 2013. IsoPlotter+: A Tool for Studying the Compositional. ISRN Bioinformatics.

More...

Abstract	Eukaryotic genomes, particularly animal genomes, have a complex, nonuniform, and nonrandom internal compositional organization. The compositional organization of animal genomes can be described as a mosaic of discrete genomic regions, called �compositional domains,� each with a distinct GC content that significantly differs from those of its upstream and downstream neighboring domains. A typical animal genome consists of a mixture of compositionally homogeneous and nonhomogeneous domains of varying lengths and nucleotide compositions that are interspersed with one another. We have devised IsoPlotter, an unbiased segmentation algorithm for inferring the compositional organization of genomes. IsoPlotter has become an indispensable tool for describing genomic composition and has been used in the analysis of more than a dozen genomes. Applications include describing new genomes, correlating domain composition with gene composition and their density, studying the evolution of genomes, testing phylogenomic hypotheses, and detect regions of potential interbreeding between human and extinct hominines. To extend the use of IsoPlotter, we designed a completely automated pipeline, called IsoPlotter+ to carry out all segmentation analyses, including graphical display, and built a repository for compositional domain maps of all fully sequenced vertebrate and invertebrate genomes. The IsoPlotter+ pipeline and repository offer a comprehensive solution to the study of genome compositional architecture. Here, we demonstrate IsoPlotter+ by applying it to human and insect genomes. The computational tools and data repository are available online.
Authors	Elhaik E. and Graur D.
Keywords	IsoPlotter+, Isochores, compositional domains, human genome, Bernardi, IsoFinder, IsoPlotter, compositional maps, ants, insect genome, Djs

21.

Graur D., Zheng Y., Price N., Azevedo R.B.R., Zufall R.A., and Elhaik, E. 2013. On the immortality of television sets: "function" in the human genome according to the evolution-free gospel of ENCODE. Genome Biology and Evolution .
#1 most-cited in GBE.
Altmetric score: #73 most-read paper in 2013 of all journals.

More...

Abstract	A recent slew of ENCODE Consortium publications, specifically the article signed by all Consortium members, put forward the idea that more than 80% of the human genome is functional. This claim flies in the face of current estimates according to which the fraction of the genome that is evolutionarily conserved through purifying selection is under 10%. Thus, according to the ENCODE Consortium, a biological function can be maintained indefinitely without selection, which implies that at least 80 - 10 = 70% of the genome is perfectly invulnerable to deleterious mutations, either because no mutation can ever occur in these "functional" regions, or because no mutation in these regions can ever be deleterious. This absurd conclusion was reached through various means, chiefly (1) by employing the seldom used "causal role" definition of biological function and then applying it inconsistently to different biochemical properties, (2) by committing a logical fallacy known as "affirming the consequent," (3) by failing to appreciate the crucial difference between "junk DNA" and "garbage DNA," (4) by using analytical methods that yield biased errors and inflate estimates of functionality, (5) by favoring statistical sensitivity over specificity, and (6) by emphasizing statistical significance rather than the magnitude of the effect. Here, we detail the many logical and methodological transgressions involved in assigning functionality to almost every nucleotide in the human genome. The ENCODE results were predicted by one of its authors to necessitate the rewriting of textbooks. We agree, many textbooks dealing with marketing, mass-media hype, and public relations may well have to be rewritten.
Authors	Graur D., Zheng Y., Price N., Azevedo R.B.R., Zufall R.A., and Elhaik E.
Keywords	ENCODE, critique,80%, functionality, function

20.

Elhaik, E., Greenspan E., Staats S., Krahn T., Tyler-Smith C., Xue Y., Tofanelli S., Francalacci P., Cucca F., Pagani L., Jin L., Li H., Schurr T.G., Greenspan B., Wells R.S., and the Genographic Consortium. 2013. The GenoChip: A New Tool for Genetic Anthropology. Genome Biology and Evolution.
Altmetric score (90 percentile, ranked #1 in GBE).

More...

Abstract	The Genographic Project is an international effort aimed at charting human migratory history. The project is non-profit and non-medical, and, through its Legacy Fund, supports locally led efforts to preserve indigenous and traditional cultures. While the first phase of the project was focused on uniparentally-inherited markers on the Y-chromosome and mitochondrial DNA, the current phase focuses on markers from across the entire genome to obtain a more complete understanding of human genetic variation. Although many commercial arrays exist for genomewide SNP genotyping, they were designed for medical genetic studies and contain medically related markers that are inappropriate for global population genetic studies. GenoChip, the Genographic Project's new genotyping array, was designed to resolve these issues and enable higher-resolution research into outstanding questions in genetic anthropology. The GenoChip includes ancestry informative markers obtained for over 450 human populations, an ancient human (Saqqaq), and two archaic hominins (Neanderthal and Denisovan) and was designed to identify all known Y-chromosome and mtDNA haplogroups. The chip was carefully vetted to avoid inclusion of medically relevant markers. To demonstrate its capabilities, we compared the FST distributions of GenoChip SNPs to those of two commercial arrays. While all arrays yielded similarly shaped (inverse J) FST distributions, the GenoChip autosomal and X-chromosomal distributions had the highest mean FST, attesting to its ability to discern subpopulations. The chip performances are illustrated in a principal component analysis for 14 worldwide populations. In summary, the GenoChip is a dedicated genotyping platform for genetic anthropology. With an unprecedented number of ~12,000 Y-chromosomal and ~3,300 mtDNA SNPs and over 130,000 autosomal and X-chromosomal SNPs without any known health, medical, or phenotypic relevance, the GenoChip is a useful tool for genetic anthropology and population genetics
Authors	Elhaik E., Greenspan E., Staats S., Krahn T., Tyler-Smith C., Xue Y., Tofanelli S., Francalacci P., Cucca F., Pagani L., Jin L., Li H., Schurr T.G., Greenspan B., Wells R.S., and the Genographic Consortium.
Keywords	Population structure, National Geographic, Genographic, GenoChip, Array, Genetic anthropology, AimsFinder, IsoPlotter
Supplementary Materials	Supplementary files,

19.

Elhaik, E.. 2013. The Missing Link of Jewish European Ancestry: Contrasting the Rhineland and the Khazarian Hypotheses. Genome Biology and Evolution. 5:61-74.
Altmetric score (99 percentile, ranked #2 in GBE).
Corrections.
Venton, D.. 2013. Highlight: Out of Khazaria - Evidence for "Jewish Genome" Lacking. Genome Biology and Evolution. 5:75-76.

More...

Abstract	The question of Jewish ancestry has been the subject of controversy for over two centuries and has yet to be resolved. The "Rhineland Hypothesis" depicts Eastern European Jews as a "population isolate" that emerged from a small group of German Jews who migrated eastward and expanded rapidly. Alternatively, the "Khazarian Hypothesis" suggests that Eastern European Jew descended from the Khazars, an amalgam of Turkic clans that settled the Caucasus in the early centuries CE and converted to Judaism in the 8th century. Mesopotamian and Greco-Roman Jews continuously reinforced the Judaized Empire until the 13th century. Following the collapse of their empire, the Judeo-Khazars fled to Eastern Europe. The rise of European Jewry is therefore explained by the contribution of the Judeo-Khazars. Thus far, however, the Khazar's contribution has been estimated only empirically, as the absence of genome-wide data from Caucasus populations precluded testing the Khazarian Hypothesis. Recent sequencing of modern Caucasus populations prompted us to revisit the Khazarian Hypothesis and compare it with the Rhineland Hypothesis. We applied a wide range of population genetic analyses to compare these two hypotheses. Our findings support the Khazarian Hypothesis and portray the European Jewish genome as a mosaic of Caucasus, European, and Semitic ancestries, thereby consolidating previous contradictory reports of Jewish ancestry. We further describe major difference among Caucasus populations explained by early presence of Judeans in the Southern and Central Caucasus. Our results have important implications on the demographic forces that shaped the genetic diversity in the Caucasus and medical studies.
Authors	Elhaik E.
Keywords	Population structure, Jewish genome, Jews, Khazars, Khazaria, Popualtion genetics, Ashkenazi Jews, population isolate, Judeans
Supplementary Materials	Supplementary files,

L'eglise d'Auvers-sur-Oise (The Church at Auvers-sur-Oise), Dr Paul Gachet (1890)

2012

18.

Elhaik, E.. 2012. Empirical Distributions of FST from Large-Scale Human Polymorphism Data. PloS ONE. 7:e49837.

More...

Abstract	Studies of the apportionment of human genetic variation have long established that most human variation is within population groups and that the additional variation between population groups is small but greatest when comparing different continental populations. These studies often used Wright's FST that apportions the standardized variance in allele frequencies within and between population groups. Because local adaptations increase population differentiation, high-FST may be found at closely linked loci under selection and used to identify genes undergoing directional or heterotic selection. We re-examined these processes using HapMap data. We analyzed 3 million SNPs on 602 samples from eight worldwide populations and a consensus subset of 1 million SNPs found in all populations. We identified four major features of the data: First, a hierarchically FST analysis showed that only a paucity (12%) of the total genetic variation is distributed between continental populations and even a lesser genetic variation (1%) is found between intra-continental populations. Second, the global FST distribution closely follows an exponential distribution. Third, although the overall FST distribution is similarly shaped (inverse J), FST distributions varies markedly by allele frequency when divided into non-overlapping groups by allele frequency range. Because the mean allele frequency is a crude indicator of allele age, these distributions mark the time-dependent change in genetic differentiation. Finally, the change in mean-FST of these groups is linear in allele frequency. These results suggest that investigating the extremes of the FST distribution for each allele frequency group is more efficient for detecting selection. Consequently, we demonstrate that such extreme SNPs are more clustered along the chromosomes than expected from linkage disequilibrium for each allele frequency group. These genomic regions are therefore likely candidates for natural selection.
Authors	Elhaik E.
Keywords	Genetic variation, Population genetics, Population statistics, Population structure, Wright Fst, F-statistics, Hierarchical Fst, Mean Fst, Selection, HapMap, Neutral theory, Fst distribution, Geography, Humans
Supplementary Materials	Available at Supplementary materials,

17.

Elhaik, E., Chanda P., and Bader J.S. 2012. HAPZIPPER: sharing HapMap populations just got easier. Nucleic Acids Research. gks709.

More...

Abstract	The rapidly growing amount of genomic sequence data being generated and made publicly available necessitate the development of new data storage and archiving methods. The vast amount of data being shared and manipulated also create new challenges for network resources. Thus, developing advanced data compression techniques is becoming an integral part of data production and analysis. The HapMap project is one of the largest public resources of human single-nucleotide polymorphisms (SNPs), characterizing over 3 million SNPs genotyped in over 1000 individuals. The standard format and biological properties of HapMap data suggest that a dedicated genetic compression method can outperform generic compression tools. We propose a compression methodology for genetic data by introducing HAPZIPPER, a lossless compression tool tailored to compress HapMap data beyond benchmarks defined by generic tools such as GZIP, BZIP2 and LZMA. We demonstrate the usefulness of HAPZIPPER by compressing HapMap 3 populations to <5% of their original sizes. HAPZIPPER is freely downloadable from https://bitbucket.org/pchanda/hapzipper/downloads/HapZipper.tar.bz2.
Authors	Chanda P., Elhaik E., and Bader J.S.
Keywords	Compression algorithm, HapZipper, HapMap.
Supplementary Materials	Supplementary file

Le Moulin de la Galette, Renoir Pierre-Auguste (1876)

2011

16.

Goes F.S., Rongione M., Chen Y.C., Karchin R., Elhaik, E., and Potash J.B. 2011. Exonic DNA Sequencing of ERBB4 in Bipolar Disorder. PLoS ONE. 6:e20242.

More...

Abstract	The Neuregulin-ErbB4 pathway plays a crucial role in brain development and constitutes one of the most biologically plausible signaling pathways implicated in schizophrenia and, to a lesser extent, in bipolar disorder (BP). However, recent genome-wide association analyses have not provided evidence for common variation in NRG1 or ERBB4 influencing schizophrenia or bipolar disorder susceptibility. In this study, we investigate the role of rare coding variants in ERBB4 in BP cases with mood-incongruent psychotic features, a form of BP with arguably the greatest phenotypic overlap with schizophrenia. We performed Sanger sequencing of all 28 exons in ERBB4, as well as part of the promoter and part of the 39UTR sequence, hypothesizing that rare deleterious variants would be found in 188 cases with mood-incongruent psychosis from the GAIN BP study. We found 42 variants, of which 16 were novel, although none were non-synonymous or clearly deleterious. One of the novel variants, present in 11.2% of cases, is located next to an alternative stop codon, which is associated with a shortened transcript of ERBB4 that is not translated. We genotyped this variant in the GAIN BP case-control samples and found a marginally significant association with mood-incongruent psychotic BP compared with controls (additive model: OR = 1.64, P-value = 0.055; dominant model: OR = 1.73. P-value = 0.039). In conclusion, we found no rare variants of clear deleterious effect, but did uncover a modestly associated novel variant that could affect alternative splicing of ERBB4. However, the modest sample size in this study cannot definitively rule out a role for rare variants in bipolar disorder and studies with larger sample sizes are needed to confirm the observed association
Authors	Goes F.S., Rongione M., Chen Y.C., Karchin R., Elhaik, E., and Potash J.B
Keywords	Bipolar Disorder, ERBB4, Exome.

15.

Suen, G., Teiling, C., Li, L., Holt, C., Abouheif, E., Bornberg-Bauer, E., Bouffard, P., Caldera, E.J., Cash, E., Cavanaugh, A., Denas, O., Elhaik, E. et al. 2011. The Genome Sequence of the Leaf-Cutter Ant Atta cephalotes Reveals Insights into Its Obligate Symbiotic Lifestyle. PLoS Genetics. 7:e1002007.

More...

Abstract	Leaf-cutter ants are one of the most important herbivorous insects in the Neotropics, harvesting vast quantities of fresh leaf material. The ants use leaves to cultivate a fungus that serves as the colony's primary food source. This obligate ant-fungus mutualism is one of the few occurrences of farming by non-humans and likely facilitated the formation of their massive colonies. Mature leaf-cutter ant colonies contain millions of workers ranging in size from small garden tenders to large soldiers, resulting in one of the most complex polymorphic caste systems within ants. To begin uncovering the genomic underpinnings of this system, we sequenced the genome of Atta cephalotes using 454 pyrosequencing. One prediction from this ant's lifestyle is that it has undergone genetic modifications that reflect its obligate dependence on the fungus for nutrients. Analysis of this genome sequence is consistent with this hypothesis, as we find evidence for reductions in genes related to nutrient acquisition. These include extensive reductions in serine proteases (which are likely unnecessary because proteolysis is not a primary mechanism used to process nutrients obtained from the fungus), a loss of genes involved in arginine biosynthesis (suggesting that this amino acid is obtained from the fungus), and the absence of a hexamerin (which sequesters amino acids during larval development in other insects). Following recent reports of genome sequences from other insects that engage in symbioses with beneficial microbes, the A. cephalotes genome provides new insights into the symbiotic lifestyle of this ant and advances our understanding of host-microbe symbioses.
Authors	Suen, G., Teiling, C., Li, L., Holt, C., Abouheif, E., Bornberg-Bauer, E., Bouffard, P., Caldera, E.J., Cash, E., Cavanaugh, A., Denas, O., Elhaik, E., Fave, M., Gadau, J., Gibson, J.D., Graur, D., Grubbs, K.J., Hagen, D.E., Harkins, T.T., Helmkampf, M,, Hu, H., Johnson, B.R., Kim, J., Marsh, S.E., Moeller, J.A., Muñoz-Torres, M.C., Murphy, M.C., Naughton, M.C., Nigam, S., Overson, R., Rajakumar, R., Reese, J.T., Scott, J.J., Smith, C.R., Tao, S., Tsutsui, N.D., Viljakainen, L., Wissler, L., Yandell, M.D., Zimmer, F., Taylor, J., Slater, S.C., Clifton, S.W., Warren, W.C., Elsik, C.G., Smith, C.D., Weinstock, G.M., Gerardo, N.M., and Currie, C.R.
Keywords	IsoPlotter, Isochores, GC content, Leaf-cutter ant, Atta cephalotes, Genome composition, Genome organization.

14.

Smith, C. D., Zimin, A., Holt, C., Abouheif, E., Benton, R., Cash, E., Croset, V., Currie, C. R., Elhaik, E. et al. 2011. Draft genome of the globally widespread and invasive Argentine ant (Linepithema humile). PNAS 108: 5667-5672.
PNAS highlight: The birth of ant genomics.

More...

Abstract	We report the draft genome sequence of the red harvester ant, Pogonomyrmex barbatus. The genome was sequenced using 454 pyrosequencing, and the current assembly and annotation were completed in less than 1 y. Analyses of conserved gene groups (more than 1,200 manually annotated genes to date) suggest a high-quality assembly and annotation comparable to recently sequenced insect genomes using Sanger sequencing. The red harvester ant is a model for studying reproductive division of labor, phenotypic plasticity, and sociogenomics. Although the genome of P. barbatus is similar to other sequenced hymenopterans (Apis mellifera and Nasonia vitripennis) in GC content and compositional organization, and possesses a complete CpG methylation toolkit, its predicted genomic CpG content differs markedly from the other hymenopterans. Gene networks involved in generating key differences between the queen and worker castes (e.g., wings and ovaries) show signatures of increased methylation and suggest that ants and bees may have independently co-opted the same gene regulatory mechanisms for reproductive division of labor. Gene family expansions (e.g., 344 functional odorant receptors) and pseudogene accumulation in chemoreception and P450 genes compared with A. mellifera and N. vitripennis are consistent with major life-history changes during the adaptive radiation of Pogonomyrmex spp., perhaps in parallel with the development of the North American deserts.
Authors	Smith, C. D., Zimin, A., Holt, C., Abouheif, E., Benton, R., Cash, E., Croset, V., Currie, C. R., Elhaik, E., Elsik, C. G., Fave, M. J., Fernandes, V., Gadau, J., Gibson, J. D., Graur, D., Grubbs, K. J., Hagen, D. E., Helmkampf, M., Holley, J. A., Hu, H., Viniegra, A. S., Johnson, B. R., Johnson, R. M., Khila, A., Kim, J. W., Laird, J., Mathis, K. A., Moeller, J. A., Munoz-Torres, M. C., Murphy, M. C., Nakamura, R., Nigam, S., Overson, R. P., Placek, J. E., Rajakumar, R., Reese, J. T., Robertson, H. M., Smith, C. R., Suarez, A. V., Suen, G., Suhr, E. L., Tao, S., Torres, C. W., van Wilgenburg, E., Viljakainen, L., Walden, K. K., Wild, A. L., Yandell, M., Yorke, J. A., and Tsutsui, N. D.
Keywords	IsoPlotter, Isochores, GC content, Argentine ant, Linepithema humile, Genome composition, Genome organization.

13.

Smith, C. R., Smith, C. D., Robertson, H. M., Helmkampf, M., Zimin, A., Yandell, M., Holt, C., Hu, H., Abouheif, E., Benton, R., Cash, E., Croset, V., Currie, C. R., Elhaik, E. et al. 2011. Draft genome of the red harvester ant Pogonomyrmex barbatus. PNAS. Early publication.
PNAS highlight: The birth of ant genomics.

More...

Abstract	We report the draft genome sequence of the red harvester ant, Pogonomyrmex barbatus. The genome was sequenced using 454 pyrosequencing, and the current assembly and annotation were completed in less than 1 y. Analyses of conserved gene groups (more than 1,200 manually annotated genes to date) suggest a high-quality assembly and annotation comparable to recently sequenced insect genomes using Sanger sequencing. The red harvester ant is a model for studying reproductive division of labor, phenotypic plasticity, and sociogenomics. Although the genome of P. barbatus is similar to other sequenced hymenopterans (Apis mellifera and Nasonia vitripennis) in GC content and compositional organization, and possesses a complete CpG methylation toolkit, its predicted genomic CpG content differs markedly from the other hymenopterans. Gene networks involved in generating key differences between the queen and worker castes (e.g., wings and ovaries) show signatures of increased methylation and suggest that ants and bees may have independently co-opted the same gene regulatory mechanisms for reproductive division of labor. Gene family expansions (e.g., 344 functional odorant receptors) and pseudogene accumulation in chemoreception and P450 genes compared with A. mellifera and N. vitripennis are consistent with major life-history changes during the adaptive radiation of Pogonomyrmex spp., perhaps in parallel with the development of the North American deserts.
Authors	Smith, C. R., Smith, C. D., Robertson, H. M., Helmkampf, M., Zimin, A., Yandell, M., Holt, C., Hu, H., Abouheif, E., Benton, R., Cash, E., Croset, V., Currie, C. R., Elhaik, E., Elsik, C. G., Fave, M. J., Fernandes, V., Gibson, J. D., Graur, D., Gronenberg, W., Grubbs, K. J., Hagen, D. E., Viniegra, A. S., Johnson, B. R., Johnson, R. M., Khila, A., Kim, J. W., Mathis, K. A., Munoz-Torres, M. C., Murphy, M. C., Mustard, J. A., Nakamura, R., Niehuis, O., Nigam, S., Overson, R. P., Placek, J. E., Rajakumar, R., Reese, J. T., Suen, G., Tao, S., Torres, C. W., Tsutsui, N. D., Viljakainen, L., Wolschin, F., and Gadau, J.
Keywords	IsoPlotter, Isochores, GC content, red harvester ant, Pogonomyrmex barbatus, Genome composition, Genome organization.

The Elder Sister, William Bouguereau (1869)

2010

12.

Kirkness, E. F., Haas, B. J., Sun, W., Braig, H. R., Perotti, M. A., Clark, J. M., Lee, S. H., Robertson, H. M., Kennedy, R. C., Elhaik, E. et al. 2010. Genome sequences of the human body louse and its primary endosymbiont provide insights into the permanent parasitic lifestyle. PNAS. 107: 12168-12173 .

More...

Abstract	As an obligatory parasite of humans, the body louse (Pediculus humanus humanus) is an important vector for human diseases, including epidemic typhus, relapsing fever, and trench fever. Here, we present genome sequences of the body louse and its primary bacterial endosymbiont Candidatus Riesia pediculicola. The body louse has the smallest known insect genome, spanning 108 Mb. Despite its status as an obligate parasite, it retains a remarkably complete basal insect repertoire of 10,773 protein-coding genes and 57 microRNAs. Representing hemimetabolous insects, the genome of the body louse thus provides a reference for studies of holometabolous insects. Compared with other insect genomes, the body louse genome contains significantly fewer genes associated with environmental sensing and response, including odorant and gustatory receptors and detoxifying enzymes. The unique architecture of the 18 minicircular mitochondrial chromosomes of the body louse may be linked to the loss of the gene encoding the mitochondrial single-stranded DNA binding protein. The genome of the obligatory louse endosymbiont Candidatus Riesia pediculicola encodes less than 600 genes on a short, linear chromosome and a circular plasmid. The plasmid harbors a unique arrangement of genes required for the synthesis of pantothenate, an essential vitamin deficient in the louse diet. The human body louse, its primary endosymbiont, and the bacterial pathogens that it vectors all possess genomes reduced in size compared with their free-living close relatives. Thus, the body louse genome project offers unique information and tools to use in advancing understanding of coevolution among vectors, symbionts, and pathogens.
Authors	Kirkness, E. F., Haas, B. J., Sun, W., Braig, H. R., Perotti, M. A., Clark, J. M., Lee, S. H., Robertson, H. M., Kennedy, R. C., Elhaik, E., Gerlach, D., Kriventseva, E. V., Elsik, C. G., Graur, D., Hill, C. A., Veenstra, J. A., Walenz, B., Tubio, J. M., Ribeiro, J. M., Rozas, J., Johnston, J. S., Reese, J. T., Popadic, A., Tojo, M., Raoult, D., Reed, D. L., Tomoyasu, Y., Krause, E., Mittapalli, O., Margam, V. M., Li, H. M., Meyer, J. M., Johnson, R. M., Romero-Severson, J., Vanzee, J. P., Alvarez-Ponce, D., Vieira, F. G., Aguade, M., Guirao-Rico, S., Anzola, J. M., Yoon, K. S., Strycharz, J. P., Unger, M. F., Christley, S., Lobo, N. F., Seufferheld, M. J., Wang, N., Dasch, G. A., Struchiner, C. J., Madey, G., Hannick, L. I., Bidwell, S., Joardar, V., Caler, E., Shao, R., Barker, S. C., Cameron, S., Bruggner, R. V., Regier, A., Johnson, J., Viswanathan, L., Utterback, T. R., Sutton, G. G., Lawson, D., Waterhouse, R. M., Venter, J. C., Strausberg, R. L., Berenbaum, M. R., Collins, F. H., Zdobnov, E. M., and Pittendrigh, B. R.
Keywords	IsoPlotter, Isochores, GC content, Human body louse, Genome composition, Genome organization.

11.

Elhaik, E., Graur, D., Josic, K., and Landan, G. 2010. Identifying compositionally homogeneous domains within the human genome using a novel segmentation algorithm. Nucleic Acids Research, e158.

More...

Abstract	It has been suggested that the mammalian genome is composed mainly of long compositionally homogeneous domains. Such domains are frequentlyidentified using recursive segmentation algorithms based on the Jensen-Shannon divergence.However, a common difficulty with such methods is deciding when to halt the recursive partitioningand what criteria to use in deciding whether a detected boundary between two segments is real or not. We demonstrate that commonly used halting criteria are intrinsically biased, and propose IsoPlotter, a parameter-free segmentation algorithm that overcomes such biases by using a simple dynamic halting criterion and tests the homogeneity of the inferred domains. IsoPlotter was compared with an alternative segmentation algorithm, D_JS, using two sets of simulated genomic sequences. Our results show that IsoPlotter was able to infer both long and short compositionally homogeneous domains with low GC content dispersion, whereas D_JS failed to identify short compositionally homogeneous domains and sequences with low compositional dispersion. By segmenting the human genome with IsoPlotter, we found that one-third of the genome is composed of compositionally nonhomogeneous domains and the remaining is a mixture of many short compositionally homogeneous domains and relatively few long ones.
Authors	Eran Elhaik, Dan Graur, Kresimir Josic, and Giddy Landan.
Keywords	IsoPlotter, Isochores, GC content, Segmentation algorithms, Jensen-Shannon divergence statistic, Entropy, Genome composition.
Software	Available here.

10.

Elhaik, E., Graur, D., and Josic, K. 2010. Comparative testing of DNA segmentation algorithms using benchmark simulations. Molecular Biology and Evolution 27: 1015-1024.

More...

Abstract	Numerous segmentation methods for the detection of compositionally homogeneous domains within genomic sequences have been proposed. Unfortunately, these methods yield inconsistent results. Here, we present a benchmark consisting of two sets of simulated genomic sequences for testing the performances of segmentation algorithms. Sequences in the first set are composed of fixed-sized homogeneous domains, distinct in their between-domain GC-content variability. The sequences in the second set are composed of a mosaic of many short domains and a few long ones, distinguished by sharp GC content boundaries between neighboring domains. We use these sets to test the performance of seven segmentation algorithms in the literature. Our results show that recursive segmentation algorithms based on the Jensen-Shannon divergence outperform all other algorithms. However, even these algorithms perform poorly in certain instances because of the arbitrary choice of a segmentation-stopping criterion.
Authors	Eran Elhaik, Dan Graur, and Kresimir Josic.
Keywords	Isochores, GC content, Segmentation algorithms, Jensen-Shannon divergence statistic, Entropy, Genome composition, Benchmark simulations.
Software	Available here.
Supplementary materials	Available here.

Elhaik, E., Graur, D., and Josic, K. 2010. 'Genome order index' should not be used for defining compositional constraints in nucleotide sequences - a case study of the Z-curve. Biology Direct. 5:10.

More...

Abstract	Background: The Z-curve is a three dimensional representation of DNA sequences proposed over a decade ago and has been extensively applied to sequence segmentation, horizontal gene transfer detection, and sequence analysis. Based on the Z-curve, a "genome order index," was proposed, which is defined as S = a^2+ c^2+t^2+g^2, where a, c, t, and g are the nucleotide frequencies of A, C, T, and G, respectively. This index was found to be smaller than 1/3 for almost all tested genomes, which was taken as support for the existence of a constraint on genome composition. A geometric explanation for this constraint has been suggested. Each genome was represented by a point P whose distance from the four faces of a regular tetrahedron was given by the frequencies a, c, t, and g. They claimed that an inscribed sphere of radius r = 1/3^0.5 contains almost all points corresponding to various genomes, implying that S < r2. The distribution of the points P obtained by S was studied using the Z-curve. Results: In this work, we studied the basic properties of the Z-curve using the "genome order index" as a case study. We show that (1) the calculation of the radius of the inscribed sphere of a regular tetrahedron is incorrect, (2) the S index is narrowly distributed, (3) based on the second parity rule, the S index can be derived directly from the Shannon entropy and is, therefore, redundant, and (4) the Z-curve suffers from over dimensionality, and the dimension stands for GC content alone suffices to represent any given genome. Conclusion: The "genome order index" S does not represent a constraint on nucleotide composition. Moreover, S can be easily computed from the Gini-Simpson index and be directly derived from entropy and is redundant. Overall, the Z-curve and S are over-complicated measures to GC content and Shannon H index, respectively. Reviewers: This article was reviewed by Claus Wilke, Joel Bader, Marek Kimmel and Uladzislau Hryshkevich (nominated by Itai Yanai).
Authors	Eran Elhaik, Dan Graur, and Kresimir Josic.
Keywords	Nucleotide composition; Genomic GC content; Shannon H function; Z-curve; Genome order index; Gini-Simpson index

Werren J. H., Richards S., Desjardins C. A., Niehuis O., Gadau J., John K. J. K., Beukeboom L. W., Desplan C., Elsik C. G., Grimmelikhuijzen C. J. P., Kitts P., Lynch J., Murphy T., Oliveira D. C. S. G., Smith C. D., Zande L., Worley K. C., Zdobnov E. M., Aerts M., Albert S., Anaya V. H., Anzola J. M., Angel R., Barchuk A. R., Behura S. K., Bera A. N., Berenbaum M. R., Bertossa R. C., Bitondi M. M. G., Bordenstein S. R., Bork P., Bornberg-Bauer E., Brunain M., Cazzamali G., Chaboub L., Chacko J., Chavez D., Childers C. P., Choi J-H., Clark M. E., Claudianos C., Clinton R. A., Cree A. G., Cristino A. S., Dang P. M., Darby A. C, de Graaf D. C., Devreese B., Dinh H. H., Edwards R., Elango N., Elhaik, E. et al. 2009. Functional and evolutionary insights from the genomes of three parasitoid Nasonia species. Science, 327:343-348.
Science highlight: The Little Wasp That Could.
Science highlight: Podcast.

More...

Abstract	We report here genome sequences and comparative analyses of three closely related parasitoid wasps: Nasonia vitripennis, N. giraulti, and N. longicornis. Parasitoids are important regulators of arthropod populations, including major agricultural pests and disease vectors, and Nasonia is an emerging genetic model, particularly for evolutionary and developmental genetics. Key findings include the identification of a functional DNA methylation tool kit; hymenopteran-specific genes including diverse venoms; lateral gene transfers among Pox viruses, Wolbachia, and Nasonia; and the rapid evolution of genes involved in nuclearmitochondrial interactions that are implicated in speciation. Newly developed genome resources advance Nasonia for genetic research, accelerate mapping and cloning of quantitative trait loci, and will ultimately provide tools and knowledge for further increasing the utility of parasitoids as pest insect-control agents.
Authors	Werren J. H., Richards S., Desjardins C. A., Niehuis O., Gadau J., John K. J. K., Beukeboom L. W., Desplan C., Elsik C. G., Grimmelikhuijzen C. J. P., Kitts P., Lynch J., Murphy T., Oliveira D. C. S. G., Smith C. D., Zande L., Worley K. C., Zdobnov E. M., Aerts M., Albert S., Anaya V. H., Anzola J. M., Angel R., Barchuk A. R., Behura S. K., Bera A. N., Berenbaum M. R., Bertossa R. C., Bitondi M. M. G., Bordenstein S. R., Bork P., Bornberg-Bauer E., Brunain M., Cazzamali G., Chaboub L., Chacko J., Chavez D., Childers C. P., Choi J-H., Clark M. E., Claudianos C., Clinton R. A., Cree A. G., Cristino A. S., Dang P. M., Darby A. C, de Graaf D. C., Devreese B., Dinh H. H., Edwards R., Elango N., Elhaik, E., Ermolaeva O., Evans J. D., Foret S., Fowler G. R., Gerlach D., Gibson J. D., Gilbert D. G., Graur D., Grunder S., Hagen D. E., Han Y., Hauser F., Hultmark D., Hunter H. C., Hurst G. D. D., Jhangian S. N., Jiang H., Johnson R. M., Jones A. K., Junier T., Kadowaki T., Kamping A., Kapustin Y., Kechavarzi B., Kim J., Kim J., Kiryutin B., Koevoets T., Kovar C. L., Kriventseva E. V., Kucharski R., Lee H., Lee S. L., Lees K., Lewis L. R., Loehlin D. W., Logsdon J. M., Lopez J. A., Lozado R. J., Maglott D., Maleszka R., Mayampurath A., Mazur D. J., McClure M. A., Moore A. D., Morgan M. B., Jean Muller J., Munoz-Torres M. C., Donna Muzny D. M., Nazareth L. V., Neupert S., Nguyen N. B., Nunes F. M. F., Oakeshott J. G., Okwuonu G. O., Pannebakker B. A., Pejaver V. R., Peng Z., Pratt S. C., Predel R., Pu L-L., Ranson H., Raychoudhury R., Rechtsteiner A., Reese J. T., Reid J. G., Riddle M., Robertson H. M., Romero-Severson J., Rosenberg M., Sackton T. B., Sattelle D. B., Schluns H., Schmitt T., Schneider M., Schuler A., Schurko A. M., Shuker D. M., Simoes Z. L. P., Sinha S., Smith Z., Solovyev V., Souvorov A., Springauf A., Stafflinger E., Stage D. E., Stanke M., Tanaka Y., Telschow A., Trent C., Vattathil S., Verhulst E. C., Viljakainen L., Wanner K. W., Waterhouse R. M., Whitfield J. B., Wilkes T. E., Williamson M., Willis J. H., Wolschin F., Wyder S., Yamada T., Yi S. V., Zecher C. N., Zhang L., Gibbs R. A.
Keywords	Nasonia, Isochores, GC content, Segmentation algorithms, Jensen-Shannon divergence statistic, Entropy, Genome composition.

The Nightmare, Henry Fuseli (1781)

2009

Elhaik, E., Landan, G., and Graur, D. 2009. Can GC Content at Third-Codon Positions Be Used as a Proxy for Isochore Composition? Molecular Biology and Evolution, 26: 1829-1833.

More...

Abstract	The isochore theory depicts the genomes of warm-blooded vertebrates as a mosaic of long genomic regions that are characterized by relatively homogeneous GC content. In the absence of genomic data, the GC content at third-codon positions of protein-coding genes (GC3) was commonly used as a proxy for the GC content of isochores. Oddly, in the postgenomic era, GC3 is still sometimes used as a proxy for the GC composition of isochores. Here, we use genic and genomic sequences from human, chimpanzee, cow, mouse, rat, chicken, and zebrafish to show that GC3 only explains a very small proportion of the variation in GC content of long genomic sequences flanking the genes (GCf), and what little correlation there is between GC3 and GCf was found to decay rapidly with distance from the gene. The coefficient of variation of GC3 was found to be much larger than that of GCf and, therefore, GC3 and GCf values are not comparable with each other. Comparisons of orthologous gene pairs from 1) human and chimpanzee and 2) mouse and rat show strong correlations between their GC3 values, but very weak correlations between their GCf values. We conclude that the GC content of third-codon position cannot be used as stand-in for isochoric composition
Authors	Eran Elhaik, Giddy Landan, and Dan Graur.
Keywords	Isochores, GC3, GC content, Flanking regions, Genome composition, Compositional patterns

Elsik, C. G., Tellam, R. L., Worley, K. C., Gibbs, R. A., Muzny, D. M., Weinstock, G. M., Adelson, D. L., Eichler, E. E., Elnitski, E., Guigo, G., Hamernik, D. L., Kappes, S. M., Lewin, H. A., Lynn, D. J., Nicholas, F. W., Reymond, R., Rijnkels, R., Skow, L. C., Zdobnov, E. M., Schook, S., Womack, W., Alioto, A., Antonarakis, S. E., Astashyn, A., Chapple, C. E., Chen, C., Chrast, C., Camara, C., Ermolaeva, E., Henrichsen, C. N., Hlavina, H., Kapustin, K., Kiryutin, K., Kitts, K., Kokocinski, K., Landrum, L., Maglott, M., Pruitt, P., Sapojnikov, S., Searle, S. M., Solovyev, S., Souvorov, S., Ucla, U., Wyss, W., Anzola, J. M., Gerlach, G., Elhaik, E. et al. 2009. The Genome Sequence of Taurine Cattle: A Window to Ruminant Biology and Evolution. Science, 324: 522-528.
Science Cover: Livestock decoded.
Science highlight: Podcast.

More...

Abstract	To understand the biology and evolution of ruminants, the cattle genome was sequenced to about sevenfold coverage. The cattle genome contains a minimum of 22,000 genes, with a core set of 14,345 orthologs shared among seven mammalian species of which 1217 are absent or undetected in noneutherian (marsupial or monotreme) genomes. Cattle-specific evolutionary breakpoint regions in chromosomes have a higher density of segmental duplications, enrichment of repetitive elements, and species-specific variations in genes associated with lactation and immune responsiveness. Genes involved in metabolism are generally highly conserved, although five metabolic genes are deleted or extensively diverged from their human orthologs. The cattle genome sequence thus provides a resource for understanding mammalian evolution and accelerating livestock genetic improvement for milk and meat production.
Authors	Elsik, C. G., Tellam, R. L., Worley, K. C., Gibbs, R. A., Muzny, D. M., Weinstock, G. M., Adelson, D. L., Eichler, E. E., Elnitski, E., Guigo, G., Hamernik, D. L., Kappes, S. M., Lewin, H. A., Lynn, D. J., Nicholas, F. W., Reymond, R., Rijnkels, R., Skow, L. C., Zdobnov, E. M., Schook, S., Womack, W., Alioto, A., Antonarakis, S. E., Astashyn, A., Chapple, C. E., Chen, C., Chrast, C., Camara, C., Ermolaeva, E., Henrichsen, C. N., Hlavina, H., Kapustin, K., Kiryutin, K., Kitts, K., Kokocinski, K., Landrum, L., Maglott, M., Pruitt, P., Sapojnikov, S., Searle, S. M., Solovyev, S., Souvorov, S., Ucla, U., Wyss, W., Anzola, J. M., Gerlach, G., Elhaik, E., Graur, G., Reese, J. T., Edgar, R. C., Mcewan, J. C., Payne, G. M., Raison, J. M., Junier, J., Kriventseva, E. V., Eyras, E., Plass, P., Donthu, D., Larkin, D. M., Reecy, R., Yang, M. Q., Chen, C., Cheng, C., Chitko-Mckown, C. G., Liu, G. E., Matukumalli, L. K., Song, S., Zhu, Z., Bradley, D. G., Brinkman, F. S., Lau, L. P., Whiteside, M. D., Walker, W., Wheeler, T. T., Casey, C., German, B. J., Lemay, D. G., Maqbool, N. J., Molenaar, A. J., Seo, S., Stothard, S., Baldwin, C. L., Baxter, B., Brinkmeyer-Langford, C. L., Brown, W. C., Childers, C. P., Connelley, C., Ellis, S. A., Fritz, F., Glass, E. J., Herzig, C. T., Iivanainen, I., Lahmers, K. K., Bennett, A. K., Dickens, M. C., Gilbert, J. G., Hagen, D. E., Salih, S., Aerts, A., Caetano, A. R., Dalrymple, D., Garcia, J. F., Gill, C. A., Hiendleder, S. G., Memili, M., Spurlock, S., Williams, J. L., Alexander, A., Brownstein, M. J., Guan, G., Holt, R. A., Jones, S. J., Marra, M. A., Moore, M., Moore, S. S., Roberts, R., Taniguchi, T., Waterman, R. C., Chacko, C., Chandrabose, M. M., Cree, C., Dao, M. D., Dinh, H. H., Gabisi, R. A., Hines, H., Hume, H., Jhangiani, S. N., Joshi, J., Kovar, C. L., Lewis, L. R., Liu, L., Lopez, L., Morgan, M. B., Nguyen, N. B., Okwuonu, G. O., Ruiz, S. J., Santibanez, S., Wright, R. A., Buhay, B., Ding, D., Dugan- Rocha, D., Herdandez, H., Holder, H., Sabo, S., Egan, E., Goodell, G., Wilczek-Boney, W., Fowler, G. R., Hitchens, M. E., Lozado, R. J., Moen, M., Steffen, S., Warren, J. T., Zhang, Z., Chiu, C., Schein, J. E., Durbin, J. K., Havlak, H., Jiang, J., Liu, L., Qin, Q., Ren, R., Shen, S., Song, S., Bell, S. N., Davis, D., Johnson, A. J., Lee, L., Nazareth, L. V., Patel, B. M., Pu, P., Vattathil, V., Williams, R. L., Curry, C., Hamilton, H., Sodergren, S., Wheeler, D. A., Barris, B., Bennett, G. L., Eggen, E., Green, R. D., Harhay, G. P., Hobbs, H., Jann, J., Keele, J. W., Kent, M. P., Lien, L., Mckay, S. D., Mcwilliam, M., Ratnakumar, R., Schnabel, R. D., Smith, S., Snelling, W. M., Sonstegard, T. S., Stone, R. T., Sugimoto, S., Takasuga, T., Taylor, J. F., Van Tassell, C. P., Macneil, M. D., Abatepaulo, A. R., Abbey, C. A., Ahola, A., Almeida, I. G., Amadio, A. F., Anatriello, A., Bahadue, S. M., Biase, F. H., Boldt, C. R., Carroll, J. A., Carvalho, W. A., Cervelatti, E. P., Chacko, C., Chapin, J. E., Cheng, C., Choi, C., Colley, A. J., de Campos, T. A., Donato, M. D., Santos, I. K., de Oliveira, C. J., Deobald, D., Devinoy, D., Donohue, K. E., Dovc, D., Eberlein, E., Fitzsimmons, C. J., Franzin, A. M., Garcia, G. R., Genini, G., Gladney, C. J., Grant, J. R., Greaser, M. L., Green, J. A., Hadsell, D. L., Hakimov, H. A., Halgren, H., Harrow, J. L., Hart, E. A., Hastings, H., Hernandez, H., Hu, H., Ingham, I., Iso-Touru, I., Jamis, J., Jensen, J., Kapetis, K., Kerr, K., Khalil, S. S., Khatib, K., Kolbehdari, K., Kumar, C. G., Kumar, K., Leach, L., Lee, J. C., Li, L., Logan, K. M., Malinverni, M., Marques, M., Martin, W. F., Martins, N. F., Maruyama, S. R., Mazza, M., Mclean, K. L., Medrano, J. F., Moreno, B. T., More, D. D., Muntean, C. T., Nandakumar, H. P., Nogueira, M. F., Olsaker, O., Pant, S. D., Panzitta, P., Pastor, R. C., Poli, M. A., Poslusny, P., Rachagani, R., Ranganathan, R., Razpet, R., Riggs, P. K., Rincon, R., Rodriguez-Osorio, R., Rodriguez-Zas, S. L., Romero, N. E., Rosenwald, R., Sando, S., Schmutz, S. M., Shen, S., Sherman, S., Southey, B. R., Lutzow, Y. S., Sweedler, J. V., Tammen, T., Telugu, B. P., Urbanski, J. M., Utsunomiya, Y. T., Verschoor, C. P., Waardenberg, A. J., Wang, W., Ward, W., Weikard, W., Welsh, T. H., White, S. N., Wilming, L. G., Wunderlich, K. R., Yang, Y., and Zhao, Z.
Keywords	Cow, Cattle, Isochores, GC content, Segmentation algorithms, Jensen-Shannon divergence statistic, Entropy, Genome composition.

Tower of Babel, Pieter Bruegel the Elder (1563)

2008

Richards S., Gibbs R. A., Weinstock G. M., Brown S. J., Denell R. E., Beeman R.W. , Bucher G., Friedrich M., Grimmelikhuijzen C. J. P., Klingler M., Lorenzen M., Roth S., Schroder R., Tautz D., Zdobnov E. M., Muzny D., Attaway T., Bell S., Buhay C. J., Chandrabose M. N., Chavez D., Clerk- Blankenburg K. P., Cree A., Dao M., Davis C., Chacko J., Dinh H., Dugan-Rocha S., Fowler G., Garner T. T., Garnes J., Gnirke A., Hawes A., Hernandez J., Hines S., Holder M., Hume J., Jhangiani S. N., Joshi V., Mohid Khan Z., Jackson L., Kovar C., Kowis A., Lee S., Lewis L. R., Margolis J., Morgan M., Nazareth L .V., Nguyen N., Okwuonu G., Parker D., Ruiz S-J., Santibanez J., Savard J., Scherer S. E., Schneider B., Sodergren E., Vattahil S., Villasana D., White C. S., Wright R., Park Y., Lord J., Oppert B., Wang L., Liu Y., Worley K., Elsik C. G., Reese J. T., Elhaik E. et al. 2008. The genome of the model beetle and pest Tribolium castaneum. Nature, 452: 949-955.

More...

Abstract	Tribolium castaneum is a member of the most species-rich eukaryotic order, a powerful model organism for the study of generalized insect development, and an important pest of stored agricultural products. We describe its genome sequence here. This omnivorous beetle has evolved the ability to interact with a diverse chemical environment, as shown by large expansions in odorant and gustatory receptors, as well as P450 and other detoxification enzymes. Development in Tribolium is more representative of other insects than is Drosophila, a fact reflected in gene content and function. For example, Tribolium has retained more ancestral genes involved in cell-cell communication than Drosophila, some being expressed in the growth zone crucial for axial elongation in short-germ development. SystemicRNA interference in T. castaneum functions differently from that in Caenorhabditis elegans, but nevertheless offers similar power for the elucidation of gene function and identification of targets for selective insect control
Authors	Richards S., Gibbs R. A., Weinstock G. M., Brown S. J., Denell R. E., Beeman R.W. , Bucher G., Friedrich M., Grimmelikhuijzen C. J. P., Klingler M., Lorenzen M., Roth S., Schr�der R., Tautz D., Zdobnov E. M., Muzny D., Attaway T., Bell S., Buhay C. J., Chandrabose M. N., Chavez D., Clerk- Blankenburg K. P., Cree A., Dao M., Davis C., Chacko J., Dinh H., Dugan-Rocha S., Fowler G., Garner T. T., Garnes J., Gnirke A., Hawes A., Hernandez J., Hines S., Holder M., Hume J., Jhangiani S. N., Joshi V., Mohid Khan Z., Jackson L., Kovar C., Kowis A., Lee S., Lewis L. R., Margolis J., Morgan M., Nazareth L .V., Nguyen N., Okwuonu G., Parker D., Ruiz S-J., Santibanez J., Savard J., Scherer S. E., Schneider B., Sodergren E., Vattahil S., Villasana D., White C. S., Wright R., Park Y., Lord J., Oppert B., Wang L., Liu Y., Worley K., Elsik C. G., Reese J. T., Elhaik E., Landan G., Graur D., Arensburger P., Atkinson P., Beidler J., Demuth J. P., Drury D. W., Du Y-Z., Fujiwara H., Maselli V., Osanai M., Robertson H. M., Tu Z., Wang J-J., Wang S., Song H., Zhang L., Werner D., Stanke M., Morgenstern B., Solovyev V., Kosarev P., Brown G., Chen H-C., Ermolaeva O., Hlavina W., Kapustin Y., Kiryutin B., Kitts P., Maglott D., Pruitt K., Sapojnikov V., Souvorov A., Mackey A. J., Waterhouse R. M., Wyder S., Kriventseva E. V., Kadowaki T., Bork P., Aranda M., Bao R., Beermann A., Berns N., Bolognesi R., Bonneton F., Bopp D., Butts T., Chaumot A., Ferrier D. E. K., Gordon C. M., Jindra M., Lan Q., Lattorff H. M. G., Laudet V., von Levetsow C., Liu Z., Lutz R., Lynch J. A., Nunes da Fonseca R., Posnien N., Reuter R., Schinko J. B., Schmitt C., Schoppmeier M., Shippy T. D., Simonnet F., Marques-Souza H., Tomoyasu Y., Trauner J., Van der Zee M., Vervoort M., Wittkopp N., Wimmer E. A., Yang X., Jones A. K., Sattelle D. B., Ebert P. R., Nelson D., Scott J. G., Muthukrishnan S., Kramer K. J., Arakane Y., Zhu Q., Hogenkamp D., Dixit R., Jiang H., Zou Z., Marshall J., Elpidina E., Vinokurov K., Oppert C., Evans J., Lu Z., Zhao P., Sumathipala N., Altincicek B., Vilcinskas A., Williams M., Hultmark D., Hetru C., Hauser F., Cazzamali G., Williamson M., Li B., Tanaka Y., Predel R., Neupert S., Schachtner J., Verleyen P., Raible F., Walden K. K. O., Angeli S., Foret S., Schuetz S., Maleszka R., Miller S. C., and Grossmann D.
Keywords	Tribolium, Isochores, GC content, Segmentation algorithms, Jensen-Shannon divergence statistic, Entropy, Genome composition.

Elhaik, E., Graur, D., and Josić, K. 2008. 'Genome order index' should not be used for defining compositional constraints in nucleotide sequences. Computational Biology and Chemistry, 32: 147

More...

Abstract	The isochore theory depicts the genomes of warm-blooded vertebrates as a mosaic of long genomic regions that are characterized by relatively homogeneous GC content. In the absence of genomic data, the GC content at third-codon positions of protein-coding genes (GC3) was commonly used as a proxy for the GC content of isochores. Oddly, in the postgenomic era, GC3 is still sometimes used as a proxy for the GC composition of isochores. Here, we use genic and genomic sequences from human, chimpanzee, cow, mouse, rat, chicken, and zebrafish to show that GC3 only explains a very small proportion of the variation in GC content of long genomic sequences flanking the genes (GCf), and what little correlation there is between GC3 and GCf was found to decay rapidly with distance from the gene. The coefficient of variation of GC3 was found to be much larger than that of GCf and, therefore, GC3 and GCf values are not comparable with each other. Comparisons of orthologous gene pairs from 1) human and chimpanzee and 2) mouse and rat show strong correlations between their GC3 values, but very weak correlations between their GCf values. We conclude that the GC content of third-codon position cannot be used as stand-in for isochoric composition
Authors	Eran Elhaik, Dan Graur, and Kresimir Josic.
Keywords	Nucleotide composition; Genomic G+C content; Shannon H-function; Genome order index; Isochores; Z-curve

Springtime at Giverny, Claude Monet (1886)

2006

Weinstock, G. M., Robinson, G. E., Gibbs, R. A., Worley, K. C., Evans, J. D., Maleszka, R., Robertson, H. M., Weaver, D. B., Beye, M., Bork, P., Elsik, C. G., Hartfelder, K., Hunt, G. J., Zdobnov, E. M., Amdam, G. V., Bitondi, M. M. G., Collins, A. M., Cristino, A. S., Lattorff, H. M. G., Lobo, C. H., Moritz, R. F. A., Nunes, F. M. F., Page Jr., R. E., Simoes, Z. L. P., Wheeler, Diana, Carninci, P., Fukuda, S., Hayashizaki, Y., Kai, C., Kawai, J., Sakazume, Sasaki, D., Tagami, M., Albert, S., Baggerman, G., Beggs, K. T., Bloch, G., Cazzamali, G., Cohen, M., Drapeau, M. D., Eisenhardt, D., Emore, C., Ewing, M. A., Fahrbach, S. E., Foret, S., Grimmelikhuijzen, C. J. P., Hauser, F., Hummon, A. B., Huybrechts, J., Jones, A. K., Kadowaki, T., Kaplan, N., Kucharski, R., Leboulle, G., Linial, M., Littleton, J. T., Mercer, A. R., Richmond, T. A., Rodriguez-Zas, S. L., Rubin, E. B., Sattelle, D. B., Schlipalius, D., Schoofs, L., Shemesh, Y., Sweedler, J. V., Velarde, R., Verleyen, P., Vierstraete, E., Williamson, M. R., Ament, S. A., Brown, S. J., Corona, M., Dearden, P. K., Dunn, W. A., Elekonich, M. M., Fujiyuki, T., Gattermeier, I., Gempe, T., Hasselmann, M., Kage, E., Kamikouchi, A., Kubo, T., Kunieda, T., Lorenzen, M., Milshina, N. V., Morioka, M., Ohashi, K., Overbeek, R., Ross, C. A., Schioett, M., Shippy, T., Takeuchi, H., Toth, A. L., Willis, J. H., Wilson, M. J., Gordon, K. H. J., Letunic, I., Hackett, K., Peterson, J., Felsenfeld, A., Guyer, M., Solignac, M., Agarwala, R., Cornuet, J. M., Monnerot, M., Mougel, F., Reese, J. T., Vautrin, D., Gillespie, J. J., Cannone, J. J., Gutell, R. R., Johnston, J. S., Eisen, M. B., Iyer, V. N., Iyer, V., Kosarev, P., Mackey, A. J., Solovyev, V., Souvorov, A., Aronstein, K. A., Bilikova, K., Chen, Y. P., Clark, A. G., Decanini, L. I., Gelbart, W. M., Hetru, C., Hultmark, D., Imler, J.-L., Jiang, H., Kanost, M., Kimura, K., Lazzaro, B. P., Lopez, D. L., Simuth, J., Thompson, G. J., Zou, Z., de Jong, P., Sodergren, E., Csuros, M., Milosavljevic, A., Osoegawa, K., Richards, S., Shu, C.-L., Duret, L., Elhaik, E. et al. 2006. Insights into social insects from the genome of the honeybee Apis mellifera. Nature, 443: 931-949.
Nature Cover: Honeybee genome.
Nature highlight: Plan Bee.
Nature highlight: From hive minds to humans.
Nature highlight: How to make a social insect.

More...

Abstract	Here we report the genome sequence of the honeybee Apis mellifera, a key model for social behaviour and essential to global ecology through pollination. Compared with other sequenced insect genomes, the A. mellifera genome has high A1T and CpG contents, lacks major transposon families, evolves more slowly, and is more similar to vertebrates for circadian rhythm, RNA interference andDNAmethylation genes, among others. Furthermore, A. mellifera has fewer genes for innate immunity, detoxification enzymes, cuticle-forming proteins and gustatory receptors, more genes for odorant receptors, and novel genes for nectar and pollen utilization, consistent with its ecology and social organization. Compared to Drosophila, genes in early developmental pathways differ in Apis, whereas similarities exist for functions that differ markedly, such as sex determination, brain function and behaviour. Population genetics suggests a novel African origin for the species A. mellifera and insights into whether Africanized bees spread throughout the New World via hybridization or displacement.
Authors	Weinstock, G. M., Robinson, G. E., Gibbs, R. A., Worley, K. C., Evans, J. D., Maleszka, R., Robertson, H. M., Weaver, D. B., Beye, M., Bork, P., Elsik, C. G., Hartfelder, K., Hunt, G. J., Zdobnov, E. M., Amdam, G. V., Bitondi, M. M. G., Collins, A. M., Cristino, A. S., Lattorff, H. M. G., Lobo, C. H., Moritz, R. F. A., Nunes, F. M. F., Page Jr., R. E., Sim�es, Z. L. P., Wheeler, Diana, Carninci, P., Fukuda, S., Hayashizaki, Y., Kai, C., Kawai, J., Sakazume, Sasaki, D., Tagami, M., Albert, S., Baggerman, G., Beggs, K. T., Bloch, G., Cazzamali, G., Cohen, M., Drapeau, M. D., Eisenhardt, D., Emore, C., Ewing, M. A., Fahrbach, S. E., For�t, S., Grimmelikhuijzen, C. J. P., Hauser, F., Hummon, A. B., Huybrechts, J., Jones, A. K., Kadowaki, T., Kaplan, N., Kucharski, R., Leboulle, G., Linial, M., Littleton, J. T., Mercer, A. R., Richmond, T. A., Rodriguez-Zas, S. L., Rubin, E. B., Sattelle, D. B., Schlipalius, D., Schoofs, L., Shemesh, Y., Sweedler, J. V., Velarde, R., Verleyen, P., Vierstraete, E., Williamson, M. R., Ament, S. A., Brown, S. J., Corona, M., Dearden, P. K., Dunn, W. A., Elekonich, M. M., Fujiyuki, T., Gattermeier, I., Gempe, T., Hasselmann, M., Kage, E., Kamikouchi, A., Kubo, T., Kunieda, T., Lorenzen, M., Milshina, N. V., Morioka, M., Ohashi, K., Overbeek, R., Ross, C. A., Schioett, M., Shippy, T., Takeuchi, H., Toth, A. L., Willis, J. H., Wilson, M. J., Gordon, K. H. J., Letunic, I., Hackett, K., Peterson, J., Felsenfeld, A., Guyer, M., Solignac, M., Agarwala, R., Cornuet, J. M., Monnerot, M., Mougel, F., Reese, J. T., Vautrin, D., Gillespie, J. J., Cannone, J. J., Gutell, R. R., Johnston, J. S., Eisen, M. B., Iyer, V. N., Iyer, V., Kosarev, P., Mackey, A. J., Solovyev, V., Souvorov, A., Aronstein, K. A., Bilikova, K., Chen, Y. P., Clark, A. G., Decanini, L. I., Gelbart, W. M., Hetru, C., Hultmark, D., Imler, J.-L., Jiang, H., Kanost, M., Kimura, K., Lazzaro, B. P., Lopez, D. L., Simuth, J., Thompson, G. J., Zou, Z., de Jong, P., Sodergren, E., Csuros, M., Milosavljevic, A., Osoegawa, K., Richards, S., Shu, C.-L., Duret, L., Elhaik, E., Graur, D., Anzola, J. M., Campbell, K. S., Childs, K. L., Collinge, D., Crosby, M. A., Dickens, C. M., Grametes, L. S., Grozinger, C. M., Jones, P. L., Jorda, M., Ling, X., Matthews, B. B., Miller, J., Mizzen, C., Peinado, M. A., Reid, J. G., Russo, S. M., Schroeder, A. J., St Pierre, S. E., Wang, Y. Zhou, P., Kitts, P., Ruef, B., Venkatraman, A., Zhang, L., Aquino-Perez, G., Whitfield, C. W., Behura, S. K., Berlocher, S. H., Sheppard, W. S., Smith, D. R., Suarez, A. V., Tsutsui, N. D., Wei, X., Wheeler, David, Havlak, P., Li, B., Liu, Y., Jolivet, A., Lee, S., Nazareth, L. V., Pu, L.-L., Thorn, R., Stolc, V., Newman, T., Samanta, M., Tongprasit, W. A., Claudianos, C., Berenbaum, M. R., Biswas, S., de Graaf, D. C., Feyereisen, R., Johnson, R. M., Oakeshott, J. G., Ranson, H., Schuler, M. A., Muzny, D., Chacko, J., Davis, C., Dinh, H., Gill, R., Hernandez, J., Hines, S., Hume, J., Jackson, L., Kovar, C., Lewis, L., Miner, G., Morgan, M., Nguyen, N., Okwuonu, G., Paul, H., Santibanez, J., Savery, G., Svatek, A., Villasana, D., and Wright, R.
Keywords	Bee, Isochores, GC content, Segmentation algorithms, Jensen-Shannon divergence statistic, Entropy, Genome composition.

Sodergren, E., Weinstock, G. M., Davidson, E. H., Cameron, R. A., Gibbs, R. A., Angerer, R. C., Angerer, L. M., Arnone, M. I., Burgess, D. R., Burke, R. D., Coffman, J. A., Dean, M., Elphick, M. R., Ettensohn, C. A., Foltz, K. R., Hamdoun, A., Hynes, R. O., Klein, W. H., Marzluff, W., McClay, D. R., Morris, R. L., Mushegian, A., Rast, J. P., Smith, L. C., Thorndyke, M. C., Vacquier, V. D., Wessel, G. M., Wray, G., Zhang, L., Elsik, C. G., Ermolaeva, O., Hlavina, W., Hofmann, G., Kitts, P., Landrum, M. J., Mackey, A. J., Maglott, D., Panopoulou, G., Poustka, A. J., Pruitt, K., Sapojnikov, V., Song, X., Souvorov, A., Solovyev, V., Wei, Z., Whittaker, C. A., Worley, K., Durbin, K. J., Shen, Y., Fedrigo, O., Garfield, D., Haygood, R., Primus, A., Satija, R., Severson, T., Gonzalez-Garay, M. L., Jackson, A., R., Milosavljevic, A., Tong, M., Killian, C. E., Livingston, B. T., Wilt, F. H., Adams, N., Belle, R., Carbonneau, S., Cheung, R., Cormier, P., Cosson, B., Croce, J., Fernandez-Guerra, A., Geneviere, A.-M., Goel, M., Kelkar, H., Morales, J., Mulner-Lorillon, O., Robertson, A. J., Goldstone, J. V., Cole, B., Epel, D., Gold, B., Hahn, M. E., Howard-Ashby, M., Scally, M., Stegeman, J. J., Allgood, E. L., Cool, J., Judkins, K. M., McCafferty, S. S., Musante, A. M., Obar, R. A., Rawson, A. P., Rossetti, B. J., Gibbons, I. R., Hoffman, M. P., Leone, A., Istrail, S., Materna, S. C., Samanta, M. P., Stolc, V., Tongprasit, W., Tu, Q., Bergeron, K.-F., Brandhorst, B. P., Whittle, J., Berney, K., Bottjer, D. J., Calestani, C., Peterson, K., Chow, E., Yuan, Q. A., Elhaik, E. et al. 2006. The genome of the sea urchin Strongylocentrotus purpuratus. Science, 314: 941-952.
Science Cover: The sea urchin.
Science highlight: The Glorious Sea Urchin.
Science highlight: Poster: The Sea Urchin.
Science highlight: The Sea Urchin Genome: Where Will It Lead Us?
Science highlight: Ecological Role of Purple Sea Urchins.

More...

Abstract	We report the sequence and analysis of the 814-megabase genome of the sea urchin Strongylocentrotus purpuratus, a model for developmental and systems biology. The sequencing strategy combined whole-genome shotgun and bacterial artificial chromosome (BAC) sequences. This use of BAC clones, aided by a pooling strategy, overcame difficulties associated with high heterozygosity of the genome. The genome encodes about 23,300 genes, including many previously thought to be vertebrate innovations or known only outside the deuterostomes. This echinoderm genome provides an evolutionary outgroup for the chordates and yields insights into the evolution of deuterostomes.
Authors	Sodergren, E., Weinstock, G. M., Davidson, E. H., Cameron, R. A., Gibbs, R. A., Angerer, R. C., Angerer, L. M., Arnone, M. I., Burgess, D. R., Burke, R. D., Coffman, J. A., Dean, M., Elphick, M. R., Ettensohn, C. A., Foltz, K. R., Hamdoun, A., Hynes, R. O., Klein, W. H., Marzluff, W., McClay, D. R., Morris, R. L., Mushegian, A., Rast, J. P., Smith, L. C., Thorndyke, M. C., Vacquier, V. D., Wessel, G. M., Wray, G., Zhang, L., Elsik, C. G., Ermolaeva, O., Hlavina, W., Hofmann, G., Kitts, P., Landrum, M. J., Mackey, A. J., Maglott, D., Panopoulou, G., Poustka, A. J., Pruitt, K., Sapojnikov, V., Song, X., Souvorov, A., Solovyev, V., Wei, Z., Whittaker, C. A., Worley, K., Durbin, K. J., Shen, Y., Fedrigo, O., Garfield, D., Haygood, R., Primus, A., Satija, R., Severson, T., Gonzalez-Garay, M. L., Jackson, A., R., Milosavljevic, A., Tong, M., Killian, C. E., Livingston, B. T., Wilt, F. H., Adams, N., Bell�, R., Carbonneau, S., Cheung, R., Cormier, P., Cosson, B., Croce, J., Fernandez-Guerra, A., Genevi�re, A.-M., Goel, M., Kelkar, H., Morales, J., Mulner-Lorillon, O., Robertson, A. J., Goldstone, J. V., Cole, B., Epel, D., Gold, B., Hahn, M. E., Howard-Ashby, M., Scally, M., Stegeman, J. J., Allgood, E. L., Cool, J., Judkins, K. M., McCafferty, S. S., Musante, A. M., Obar, R. A., Rawson, A. P., Rossetti, B. J., Gibbons, I. R., Hoffman, M. P., Leone, A., Istrail, S., Materna, S. C., Samanta, M. P., Stolc, V., Tongprasit, W., Tu, Q., Bergeron, K.-F., Brandhorst, B. P., Whittle, J., Berney, K., Bottjer, D. J., Calestani, C., Peterson, K., Chow, E., Yuan, Q. A., Elhaik, E., Graur, D., Reese, J. T., Bosdet, I., Heesun, S., Marra, M. A., Schein, J., Anderson, M. K., Brockton, V., Buckley, K. M., Cohen, A. H., Fugmann, S. D., Hibino, T., Loza-Coll, M., Majeske, A. J., Messier, C., Nair, S. V., Pancer, Z., Terwilliger, D. P., Agca, C., Arboleda, E., Chen, N., Churcher, A. M., Hallb��k, F., Humphrey, G. W., Idris, M. M., Kiyama, T., Liang, S., Mellott, D., Mu, X., Murray, G., Olinski, R. P., Raible, F., Rowe, M., Taylor, J. S., Tessmar-Raible, K., Wang, D., Wilson, K. H., Yaguchi, S., Gaasterland, T., Galindo, B. E., Gunaratne, H. J., Juliano, C., Kinukawa, M., Moy, G. W., Neill, A. T., Nomura, M., Raisch, M., Reade, A., Roux, M. M., Song, J. L., Su, Y.-H., Townley, I. K., Voronina, E., Wong, J. L., Amore, G., Branno, M., Brown, E. R., Cavalieri, V., Duboc, V., Duloquin, L., Flytzanis, C., Gache, C., Lapraz, F., Lepage, T., Locascio, A., Martinez, P., Matassi, G., Matranga, V., Range, R., Rizzo, F., R�ttinger, E., Beane, W., Bradham, C., Byrum, C., Glenn, T., Hussain, S., Manning, G., Miranda, E., Thomason, R., Walton, K., Wikramanayke, A., Wu, S.-Y., Xu, R., Brown, C. T., Chen, L., Gray, R. F., Lee, P. Y., Nam, J., Oliveri, P., Smith, J., Muzny, D., Bell, S., Chacko, J., Cree, A., Curry, S., Davis, C., Dinh, H., Dugan-Rocha, S., Fowler, J., Gill, R., Hamilton, C., Hernandez, J., Hines, S., Hume, J., Jackson, L., Jolivet, A., Kovar, C., Lee, S., Lewis, L., Miner, G., Morgan, M., Nazareth, L. V., Okwuonu, G., Parker, D., Pu, L.-L., Thorn, R., and Wright, R.
Keywords	Sea urchin, Isochores, GC content, Segmentation algorithms, Jensen-Shannon divergence statistic, Entropy, Genome composition.

Elhaik, E., Sabath, N., and Graur, D. 2006. The "inverse relationship between evolutionary rate and age of mammalian genes" is an artifact of increased genetic distance with rate of evolution and time of divergence. Molecular Biology and Evolution, 23: 1-3.

More...

Abstract	It has recently been claimed that older genes tend to evolve more slowly than newer ones (Alba and Castresana 2005). By simulation of genes of equal age, we show that the inverse correlation between age and rate is an artifact caused by our inability to detect homology when evolutionary distances are large. Since evolutionary distance increases with time of divergence and rate of evolution, homologs of fast-evolving genes are frequently undetected in distantly related taxa and are, hence, misclassified as "new." This misclassification causes the mean genetic distance of'new'genes to be overestimated and the mean genetic distance of "old" genes to be underestimated.
Authors	Eran Elhaik, Niv Sabath, and Dan Graur.
Keywords	Nonsynonymous substitutions; Novel genes; Divergence times.

Book Chapters

Elhaik, E. 2017. A "Wear and Tear" Hypothesis to Explain Sudden Infant Death Syndrome. New approaches to the pathogenesis of sudden intrauterine unexplained death and sudden infant death syndrome Free e-Book (Editors A.M. Lavezzi and C.E. Johanson)

More...

Chapter Abstract	Sudden infant death syndrome (SIDS) is the leading cause of death among USA infants under 1 year of age accounting for ~2,700 deaths per year. Although formally SIDS dates back at least 2,000 years and was even mentioned in the Hebrew Bible (Kings 3:19), its etiology remains unexplained prompting the CDC to initiate a sudden unexpected infant death case registry in 2010. Due to their total dependence, the ability of the infant to allostatically regulate stressors and stress responses shaped by genetic and environmental factors is severely constrained. We propose that SIDS is the result of cumulative painful, stressful, or traumatic exposures that begin in utero and tax neonatal regulatory systems incompatible with allostasis. We also identify several putative biochemical mechanisms involved in SIDS. We argue that the important characteristics of SIDS, namely male predominance (60:40), the significantly different SIDS rate among USA Hispanics (80% lower) compared to whites, 50% of cases occurring between 7.6 and 17.6 weeks after birth with only 10% after 24.7 weeks, and seasonal variation with most cases occurring during winter, are all associated with common environmental stressors, such as neonatal circumcision and seasonal illnesses. We predict that neonatal circumcision is associated with hypersensitivity to pain and decreased heart rate variability, which increase the risk for SIDS. We also predict that neonatal male circumcision will account for the SIDS gender bias and that groups that practice high male circumcision rates, such as USA whites, will have higher SIDS rates compared to groups with lower circumcision rates. SIDS rates will also be higher in USA states where Medicaid covers circumcision and lower among people that do not practice neonatal circumcision and/or cannot afford to pay for circumcision. We last predict that winter-born premature infants who are circumcised will be at higher risk of SIDS compared to infants who experienced fewer nociceptive exposures. All these predictions are testable experimentally using animal models or cohort studies in humans. Our hypothesis provides new insights into novel risk factors for SIDS that can reduce its risk by modifying current infant care practices to reduce nociceptive exposures.
Book Abstract	Sudden Infant Death Syndrome (SIDS) is the leading cause of death among infants in the first year of age. The more known definition of SIDS is the sudden unexpected death of an infant less than 1 year of age, with onset of the fatal episode apparently occurring during sleep, that remains unexplained after a thorough investigation, including performance of a complete autopsy and review of the circumstances of death and the clinical history.
Keywords	sudden infant death syndrome (SIDS), allostatic load, neonatal circumcision, trauma, pain, stress,
Authors	Eran Elhaik

Elhaik, E. and Tatarinova, T 2012. GC3 Biology in Eukaryotes and Prokaryotes. DNA Methylation-From Genomics to Technology Free e-Book. (Editor T. Tatarinova)

More...

Chapter Introduction	In this chapter we describe the distribution of Guanine and Cytosine (GC) content in the third codon position (GC₃) distributions in different species, analyze evolutionary trends and discuss differences between genes and organisms with distinct GC₃ levels. We scrutinize previously published theoretical frameworks and construct a unified view of GC₃ biology in eukaryotes and prokaryotes.
Book Abstract	Epigenetics is one of the most exciting and rapidly developing areas of modern genetics with applications in many disciplines from medicine to agriculture. The most common form of epigenetic modification is DNA methylation, which plays a key role in fundamental developmental processes such as embryogenesis and also in the response of organisms to a wide range of environmental stimuli. Indeed, epigenetics is increasing regarded as one of the major mechanisms used by animals and plants to modulate their genome and its expression to adapt to a wide range of environmental factors. This book brings together a group of experts at the cutting edge of research into DNA methylation and highlights recent advances in methodology and knowledge of underlying mechanisms of this most important of genetic processes. The reader will gain an understanding of the impact, significance and recent advances within the field of epigenetics with a focus on DNA methylation.
Keywords	GC3, Methylation,
Authors	Eran Elhaik and Tatiana Tatarinova

Book Review

Elhaik, E. 2017. Selected advances in genetics-cream of the crop. European Journal of Human Genetics.

More...

Abstract	The lives of scientists can be almost as complicated as those of the organisms they study. One particular challenge relates to the old generalists versus specialists argument. The choice between the two depends on the scientist's personality and skills, and there is no right answer. Specialists may forever disapprove of the way generalists misunderstand science at the micro-level, whereas a generalist may criticize the way a specialist's work lacks adequate frame of reference, is too niche, and lacks applicability to other fields. For this reason, a series like The Annual Reviews provides a rare neutral platform where both generalists and specialists can find useful coverage of the knowledge accumulated in their fields
Authors	Elhaik E.
Keywords	Genetics, Genomics, Patents

Opinions

Third-codon position - genomics magic eight-ball

Abstracts

Baughn, L.B.... Elhaik, E. ... and Kumar S.K. (2019) Flipping bioinformatics for the NHS. Blood annual meeting 2019 November 13th 2019

More...

Abstract	Purpose: Multiple myeloma (MM) is a plasma cell (PC) malignancy with an increasing incidence in the US. Epidemiological studies demonstrate a 2-3 fold higher incidence of the pre-malignant monoclonal gammopathy of undetermined significance (MGUS) and MM with a ~4-year younger age of onset among African Americans (AA) compared to European Americans (EAs) (Fonseca, Leukemia, 2017). With equal access to care, AAs have better overall survival compared to EAs (Waxman, Blood, 2010). This disparity may be explained by ancestral-associated genetic predisposition of AAs to development of monoclonal gammopathies and to specific acquired, cytogenetically-defined subtypes. Using calculated genetic ancestry data, we have previously identified a higher prevalence of IgH translocations t(11;14), t(14;16) and t(14;20) in individuals with >80% African ancestry (Baughn, BCJ, 2018). Since SNP rs9344 encoding the CCND1 870G>A polymorphism has been reported in association with increased risk of t(11;14) (Weinhold, Nature Genetics, 2013), we investigated whether rs9344 correlates with African ancestry and with t(11;14) in our cohort of patients with plasma cell dyscrasias.
Keywords	alleles, cyclin d1, genes, bcl-1, plasma cell disorder, genetic predisposition to disease, acute aortic syndrome, anabolic steroids, dna, monoclonal gammopathy of undetermined significance, paraproteinemias
Authors	Linda B Baughn, Zhou Li, Kathryn E. Pearce, Celine M. Vachon, Mei-Yin Polley, Jonathan J Keats, Eran Elhaik, Michael L. Baird, Terry Therneau, James R. Cerhan, P. Leif Bergsagel, Angela Dispenzieri, S.Vincent Rajkumar, Yan Asmann, Shaji K. Kumar,

Wand D., Dunning M., Parker M., and Elhaik, E. 2018. Flipping bioinformatics for the NHS. TELFest - The Technology Enhanced Learning Festival 2018 Tuesday 26th June

More...

Abstract	Background: The Genomics Education Programme of Health Education England developed a set of learning objectives and competencies for NHS staff to obtain in order to transform NHS to include genomic diagnostics and personalised medicine. As providers of an MSc course in Genomic Medicine, we were tasked with upgrading the skills of NHS staff in genomic and clinical data analysis. I will detail the development of flipped learning modules in clinical bioinformatics (first of their kind anywhere in the world). Insights will be given on how to combine online teaching with in class practicals. Given the diverse cohort of students from different healthcare professions (nurses to consultant clinicians) and age groups (20-60 years old), a number of surprising challenges emerged from teaching these computer-based modules. I will show examples of new software tools used in these modules, such as Google Classroom, Kaltura Video, Skype, markdown pages, github, etc., that help cater to distance learners.
Keywords	Teaching, Medical genomics, Bioinformatics, NHS
Authors	Dennis Wang, Mark Dunning, Matthew Parker, Eran Elhaik.

Elhaik, E. Pirooznia M., Goes F.S., Parla J., Karchin R., Chakravarti A., Zandi P.P., McCombie R.W., and Potash J.B. 2012. Whole-exome sequencing study of four families with bipolar disorder. The 62th Annual Meeting of American Society of Human Genetics Program #2305T

More...

Abstract	Background: Bipolar disorder (BP) is a common mental disorder often associated with lifelong disability and premature mortality. We are conducting a whole-exome study of BP using next generation sequencing to examine the whole exomes of up to 100 multiplex BP families (with at least 6 individuals from 2-3 generations of each family), 1,800 BP cases, and 1,800 controls with the goal of identifying rare and common genetic variants associated with the disease. Methods: Exome sequencing was performed in a pilot sample of 22 individuals from four multiplex BP families using solution-based capture and paired-end sequencing on the Illumina GA II. Alignment and variant calling were performed with BWA, SAM tools, and GATK. SNVs were annotated with the SIFT and PolyPhen tools. Families were analyzed separately for the segregation of functionally relevant variants with disease. Results: We identified a single common (MAF=0.2) deleterious splice site variant (rs8373) in a zinc-finger protein gene (ZFP91) that segregated with all affected relatives and none of the unaffected married-in relatives in all four families. A family-based test indicated the variant was significantly associated with BP in these families (p=0.0026). ZFP91 is involved in the non-canonical nuclear factor kB (NF-kB) signaling pathway, which regulates the canonical NF-kB pathway. The non- canonical pathway is associated with adaptive immunity and protection against inflammation and apoptosis. Conclusions: Our initial analysis of four multiplex families with bipolar disorder revealed a common splice-site polymorphism in ZFP91 that segregates with disease in all pedigrees. Mutations inhibiting the non-canonical NF-kB, such as the one identified here, have been shown to induce apoptosis and inflammation due to the continuous activation of the complementary pathway. This variant was imputed in the Psychiatric GWAS Consortium (PGC) mega-analysis of Bipolar Disorder, but it was not significantly associated with illness. However, if the current finding can be replicated in other sequenced families, it may provide evidence of a potential inflammatory etiology in bipolar disorder. Such replication efforts are ongoing.
Keywords	Complex Traits, Polygenic Disorders, brain/nervous system, candidate gene, Nf-kb
Authors	Eran Elhaik, Pirooznia M., Goes F.S., Parla J., Karchin R., Chakravarti A., Zandi P.P., McCombie R.W., and Potash J.B

Wells S., Greenspan E., Staats S., Krahn T., Tyler-Smith C., Xue Y., Tofanelli S., Francalacci P., Cucca F., Pagani L.,Jin L., Li H., Schurr T.G., Gaieski J.B., Melendez C., Vilar M.G., Owings A.C., Gomez R., Fujita R., Santos F., Comas D., Balanovsky O., Balanovska E., Zalloua P., Soodyall H., Pitchappan R., Kumar G.A., Hammer M.F., Greenspan B., and Elhaik, E. 2012. The GenoChip: a new tool for genetic anthropology. The 62th Annual Meeting of American Society of Human Genetics Program #3377W

More...

Abstract	Background: The Genographic Project is an international effort aimed at charting human history using genetic data. The project is non-profit and non-medical, and through the sale of its public participation kits it supports cultural preservation efforts in indigenous and traditional communities. To extend our knowledge of the human journey, interbreeding with ancient hominins, and modern human demographic history, we designed a genotyping chip optimized for genetic anthropology research. Methods: Our goal was to design, produce, and validate a SNP array dedicated to genetic anthropology. The GenoChip is an Illumina HD iSelect genotyping bead array with over 130,000 highly informative autosomal and X-chromosomal SNPs ascertained from over 450 worldwide populations, ~13,000 Y-chromosomal SNPs, and ~3,000 mtDNA SNPs. To determine the extent of gene flow from archaic hominins to modern humans, we included over 25,000 SNPs from candidate regions of interbreeding between extinct hominins (Neanderthal and Denisovan) and modern humans. To avoid any inadvertent medical testing we filtered out all SNPs that have known or suspected health or functional associations. We validated the chip by genotyping over 1,000 samples from 1000 Genomes, Family Tree DNA, and Genographic Project populations. Results: The concordance between the GenoChip and the 1000 Genomes data was over 99.5%. The GenoChip has a SNP density of approximately (1/100,000) bases over 92% of the human genome and is highly compatible with Illumina and Affymetrix commercial platforms. The ~10,000 novel Y SNPs included on the chip have greatly refined our understanding of the Y-chromosome phylogenetic tree. By including Y and mtDNA SNPs on an unprecedented scale, the GenoChip is able to delineate extremely detailed human migratory paths. The autosomal and X-chromosomal markers included on the GenoChip have revealed novel patterns of ancestry that shed a detailed new light on human history. Interbreeding analysis with extinct hominids confirmed some previous reports and allowed us to describe the modern geographical distribution of these markers in detail. Conclusions: The GenoChip is the first genotyping chip completely dedicated to genetic anthropology with no known medically relevant markers. We anticipate that the large-scale application of the GenoChip using the Genographic Project's diverse sample collection will provide new insights into genetic anthropology and human history.
Keywords	Evolutionary and Population Genetics, population genetics, population structure, SNP analysis/discovery, genomic methodologies, microarrays
Authors	Wells S., Greenspan E., Staats S., Krahn T., Tyler-Smith C., Xue Y., Tofanelli S., Francalacci P., Cucca F., Pagani L.,Jin L., Li H., Schurr T.G., Gaieski J.B., Melendez C., Vilar M.G., Owings A.C., Gomez R., Fujita R., Santos F., Comas D., Balanovsky O., Balanovska E., Zalloua P., Soodyall H., Pitchappan R., Kumar G.A., Hammer M.F., Greenspan B., and Elhaik, E

Gaieski J.B., Elhaik E., Owings A.C., Vilar M.G., Walia A.T., Gaieski D.F., Wells R.S., Schurr T.G., and The Genographic Consortium. 2012. Genetic ancestry and admixture analysis in a Bermudian population reveals evidence of Native American origins consistent with oral histories and genealogies. The 62th Annual Meeting of American Society of Human Genetics Program #3329W

More...

Abstract	Background: Shortly after its colonization in the early 17th century, Bermuda became the first English speaking dependency to forcibly import its labor by trafficking in enslaved Africans, European ethnic minorities, and indigenous Americans. Unlike the many ethnic groups that now call the island home, Bermuda's St. David's Islanders claim to be linked to Native American ancestors. In particular, their use of oral traditions and complex genealogies helps to reinforce their Native American identity. To elucidate the influence of historical events on genetic ancestry and native cultural identity among St. David's Islanders, we examined mtDNA and Ychromosomal variation in over 100 individuals. We found that the majority of their mtDNA and Y-chromosome haplotypes (greater than 98%) were African and West Eurasian in origin. However, due to the limitations of this approach in reconstructing the genetic history of admixed populations, and because most participants were interested in learning more about their genetic genealogies, we expanded our analysis to include autosomal markers using a novel genotyping platform. Methods: To identify genetic contributions of putative indigenous American ancestors among the St. David's Islanders, we used the GenoChip to genotype Bermudians along with 200 samples from ~20 worldwide populations. Developed by Genographic Project scientists, the GenoChip is a SNP array ascertained from over 450 worldwide populations, and is dedicated to enhancing our knowledge of genetic anthropology. Results: Principal component analysis of the autosomal SNP data separated our participants into three discrete clusters. An admixture analysis identified up to 9% ancestry associated with Native Americans overall. The two largest clusters overlapped with African Americans and Puerto Ricans, and distributed evenly amongst the two main clusters (mean of 3% each). Samples from the third cluster averaged an unusually high Native American ancestry (mean of 6%). Conclusions: The GenoChip enabled us to detect otherwise elusory Native American ancestry among the Bermudians of St. David's Island. We speculate that the uneven distribution of this ancestry is due to admixture of Africans, Europeans, and Native Americans in varying degrees in the different source populations for modern-day St. David's Islanders. Application of this novel genotyping platform has provided new insights into the complex history of the Bermudian population.
Keywords	Evolutionary and Population Genetics, genetic diversity, SNP analysis/discovery, ethical, legal and social issues, genome sequencing, genomic methodologies
Authors	J.B. Gaieski, E. Elhaik, A.C. Owings, M.G. Vilar, A.T. Walia, D.F. Gaieski, R.S. Wells, T.G. Schurr, The Genographic Consortium

Elhaik, E. and Chakravarti, A 2010. Empirical distributions of F_ST from large-scale polymorphism data. The 60th Annual Meeting of American Society of Human Genetics Program #1521

More...

Abstract	Apportionment of human genetic variation has long established that most human variation is within groups and that the additional variation between groups is small but greatest when comparing continental populations. These studies have used Wright's F_ST that apportions the standardized variance in allele frequencies within and between groups in a hierarchical manner. High values of F_ST are unlikely in humans due to genetic drift and migration and are consequently used to identify genes undergoing directional or heterotic selection. The availability of the HapMap data from phases I - III now allows us to reexamine these questions. We analyzed data on ~3 million autosomal, X-linked, Y-linked, and mitochondrial SNPs from the HapMap database on 602 samples from 8 populations and a common subset of ~1 million autosomal and X-linked SNPs that have been genotyped in all populations. We identified two major features of the data. First, only a paucity (12%) of the total genetic variation is among populations of different continents and even a lesser (1%) amount among populations of the same continent. These data are remarkably consistent with the early observations of Lewontin in 1972. Second, we demonstrate that, although the overall distribution is similarly shaped (inverse J), the distribution of F_ST varies significantly by mean allele frequency. Since the mean allele frequency is a crude indicator of allele age, these distributions mark the time-dependent change in genetic differentiation. The change in mean F_ST of these distributions is linear in mean allele frequency suggesting the nature of allele frequency dynamics. These observations are true for autosomal, X-linked, and mitochondrial SNPs, but not Y-linked SNPs. These results suggest that investigating the extremes of the F_ST distribution for each allele frequency class may be more efficient for detection of selection. Consequently, we demonstrate that such extreme SNPs are more clustered that that expected from linkage disequilibrium for each allele frequency class. These genomic regions are likely candidates for natural selection
Keywords	Wright's F_ST, human genetic variation, HapMap 3
Authors	Eran Elhaik and Aravinda Chakravarti

Berlinger, M.J., Lebiush-Mordechi, S., Fridja, D., Khasdan, V., Elhaik, E., and Rodman R. 1997. Potato Tuber Moth Parasites in Potato and Processing Tomato Fields: Preliminary Results. The 10th Conference of the Entomological Society of Israel (Abstracts), p. 154.

More...

Abstract	The potato tuberworm, Phthorimaea operculella (Zeller), is a major pest of potato in Israel. To control the larvae that bore into the exposed tubers, commercial fields are usually treated with insecticides, irrespective of pest density. It is therefore likely that there is excessive use of harmful chemicals against the pest. The ultimate goal of this research project is to develop an integrated pest control program for the potato tuberworm in potatoes and to reduce the use of insecticides. During the last 2 years the pest was studied in two agricultural systems in the western Negev in Israel: cv. 'Kara' in sandy soil and cv. Desiree in loessial soil. We examined (i) the pest's phenology; (ii) the infestation level in the tubers; (iii) the pest distribution pattern in the field; and (iv) the larval parasitism rate. Additionally, we tested the importance of roller treatment and volunteer host plants on infestation levels. Catches of adult moths peaked consistently at the end of April. Larval infestation in the foliage and in tubers was significantly higher in the edge rows of the field than in its center. Infestation rates in the exposed (green) tubers were generally higher than in the unexposed (white) tubers. Finally, the presence of nearby volunteer potato plants and the timing of roller treatments had a major influence on pest populations. (P)
Keywords	Potatoe Tuber Moth; Phthorimaea operculella; Biological control
Authors	Menachem J. Berlinger , Sara Lebiush-Mordechi, Dvora Fridja, Vadim Khasdan, Eran Elhaik, and Rafi Rodman.

Posters:

	McCarthy, L., Mason, C.E., MetaSUB International Consortium, and Elhaik, E.. MMicrobiome predicts geography on a global scale. MetaSUB conference in Istanbul MetaSUB 2019, Istanbul, Turkey.
	Mason-Buck, G., Graf, A., Oliveira, M., Githae, D., Pospiech, E., Lee, P., Ballard, D., Syndercombe Court, D., Elhaik, E., Branicki, W., and Labaj, P. Metagenomics for Intelligence – a forensic perspective. MetaSUB conference in Istanbul MetaSUB 2019, Istanbul, Turkey.
	Johansen, M. and Elhaik, E.. The ancient biogeographical history of the Roma. SMBE 2019, Manchester, UK.
	McCarthy, L., Mason, C.E., MetaSUB International Consortium, and Elhaik, E.. MMicrobiome predicts geography on a global scale. SMBE 2019, Manchester, UK.
	Spencer, A. and Elhaik, E.. Multiway Matcher: a novel R package for personalised medicine applicable to all clinical trials and epidemiological studies. Insigneo Showcase 2019, Sheffield, UK.
	Elhaik, E. and Desmond M. Ryan. A novel precision medicine approach to improve the outcome of clinical trials. Insigneo Showcase 2017, Sheffield, UK.
	Elhaik, E. and Desmond M. Ryan. A novel precision medicine approach to improve the outcome of clinical trials. HPC@Sheffield 2017, Sheffield, UK.
	Elhaik, E. and Desmond M. Ryan. A novel precision medicine approach to improve the outcome of clinical trials. Festival of Genomics 2017, London, UK.
	Elhaik, E., Das, R., Pirooznia, M., and Wexler, P. Localizing Ashkenazic Jews to primeval villages in the ancient Iranian lands of Ashkenaz. SMBE 2016, Queensland, Australia.
	Isiaq, A.J. Elhaik, E., and Chakravarti, A. Size Matters - Examining Mutation Enrichment In Relation To Gene Size. 2011. SIP program, Johns Hopkins University, MD.
	Elhaik, E. European and Asian Jews are proto-Khazars in origin. 2011. Johns Hopkins 3^rd Annual Postdoc Symposium, Baltimore, MD.
	Elhaik, E. Empirical distributions of F_ST from large-scale polymorphism data. 2010. The 60th Annual Meeting of the American Society of Human Genetics (ASHG), Washington, DC.
	McCoy, E. Elhaik, E., and Chakravarti, A. The Extent of Genetic Variation in Human Genes. 2010. SIP program, Johns Hopkins University, MD.
	Sabath, N. Elhaik, E., and Graur D. Absence of similarity does not equal absence of homology: On the purported relationship between evolutionary rate and gene age. 2007. The Annual Meeting of the Society for Molecular Biology and Evolution, Dalhousie University, Halifax, Nova Scotia, Canada.
	Sabath, N. Elhaik, E., and Graur D. Is there a relationship between evolutionary rate and age of genes? 2007. Texas Genetics Society conference, San Antonio, Texas.
	Elhaik, E. and Graur D. Compositional heterogeneity and GC-content domains in animal genomes. 2007. Texas Genetics Society conference, San Antonio, Texas.
	Elhaik, E., Graur D., and Josic K. Nucleotides homogeneity within eukaryotes genomes: A comparison of three methods. 2006. SMBE conference, Arizona State University, Tempe, Arizona.
	Elhaik, E., Graur D.,and Josic K. An improved Haar wavelet analysis of the human genome. 2006. Sigma Xi Research Day, University of Houston, Houston, TX.

Popular Science

YouTube Videos

4. Dating ancient genomes using their DNA with TPS. 2022.

3. Unearthing Ancient Ashkenaz and the origin of Yiddish. 2019.

2. How does the Geographic Population Structure (GPS) work?. 2016.

1. Newly found Y-Chromosomal Adam - not so ancient as you may think!. 2014.