Correcting for the bias due to expression specificity improves the estimation of constrained evolution of expression between mouse and human

Piasecka, Barbara ; Robinson-Rechavi, Marc ; Bergmann, Sven

In: Bioinformatics, 2012, vol. 28, no. 14, p. 1865-1872

Add to personal list
    Summary
    Motivation: Comparative analyses of gene expression data from different species have become an important component of the study of molecular evolution. Thus methods are needed to estimate evolutionary distances between expression profiles, as well as a neutral reference to estimate selective pressure. Divergence between expression profiles of homologous genes is often calculated with Pearson's or Euclidean distance. Neutral divergence is usually inferred from randomized data. Despite being widely used, neither of these two steps has been well studied. Here, we analyze these methods formally and on real data, highlight their limitations and propose improvements. Results: It has been demonstrated that Pearson's distance, in contrast to Euclidean distance, leads to underestimation of the expression similarity between homologous genes with a conserved uniform pattern of expression. Here, we first extend this study to genes with conserved, but specific pattern of expression. Surprisingly, we find that both Pearson's and Euclidean distances used as a measure of expression similarity between genes depend on the expression specificity of those genes. We also show that the Euclidean distance depends strongly on data normalization. Next, we show that the randomization procedure that is widely used to estimate the rate of neutral evolution is biased when broadly expressed genes are abundant in the data. To overcome this problem, we propose a novel randomization procedure that is unbiased with respect to expression profiles present in the datasets. Applying our method to the mouse and human gene expression data suggests significant gene expression conservation between these species. Contact: marc.robinson-rechavi@unil.ch; sven.bergmann@unil.ch Supplementary information: Supplementary data are available at Bioinformatics online