Evolutionary dynamics of conserved non-coding DNA elements: Big bang or gradual accretion?
See more in Publications and Presentations, Complex Systems Publications and Presentations, Technology Publications and Presentations
Thesis submitted for the MSc Informatics degree at the University of Edinburgh, 2007.
Title: Evolutionary dynamics of conserved non-coding DNA elements: Big bang or gradual accretion?
Abstract:
Background Previous studies have found that DNA elements are highly conserved
in species from the same lineage, even though they do not code for proteins or RNA.
One proposed function of such conserved non-coding elements (CNEs) is that they
are cis-regulatory sequences for developmental genes which act as an abstraction of
genetic regulatory networks, thus allowing new animal body plans to be specified in
a modular way. This thesis tests the specific proposal by a previous study that CNEs
arose in a big bang in the Precambrian, approximately 600 million years ago.
Results The evolutionary dynamics of CNEs were studied by first identifying the
elements, and then examining their levels of identity over time. Pairwise comparative
sequence analysis of five contemporary nematode species provided a window into
the past because these species diverged at different points of time over the last ap-
proximately 700 million years. The number of CNEs and their basic properties for the
three most recently diverged species match the results obtained by other researchers,
although no clear trend is visible in the change in identity of CNEs with respect to
time since divergence. On adding two more species to the analysis, it was found that
no such elements could be identified for species pairs with deep divergences.
Conclusions The absence of CNEs for pairwise comparisons of species that diverged
earliest indicates that CNEs did not arise in a big bang. CNEs that were found for the
three Caenorhabditis species that diverged relatively recently (approximately 100 mil-
lion years ago) seem to be specific only to that clade. However, the big bang hypothe-
sis cannot be conclusively discarded because it is possible that the elements exist, but
are short, or have multiple components spread across the genome, and are therefore
difficult to detect. Missing CNEs could therefore be a limitation of computational ap-
proaches to discovering CNEs, and this study also suggests some ways to overcome
those limitations.
Bibliography:
S. F. Altschul, T. L. Madden, A. A. Schäffer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman. Gapped blast and psi-blast: a new generation of protein database search programs. Nucleic Acids Research, 25(17):3389–3402, September 1997.
E. Andrianantoandro, S. Basu, D. K. Karig, and R. Weiss. Synthetic biology: new engineering rules for an emerging discipline. Molecular Systems Biology, 2:2006.0028, 2006.
G. Bejerano, M. Pheasant, I. Makunin, S. Stephen, W. J. Kent, J. S. Mattick, and D. Haus-sler. Ultraconserved elements in the human genome. Science, 304(5675):1321–1325, May 2004.
T. Bieri, D. Blasiar, P. Ozersky, I. Antoshechkin, C. Bastiani, P. Canaran, J. Chan, N. Chen, W. J. Chen, P. Davis, T. J. Fiedler, L. Girard, M. Han, T. W. Harris, R. Kishore, R. Lee, S. McKay, H.-M. Muller, C. Nakamura, A. Petcherski, A. Rangarajan, A. Rogers, G. Schindelman, E. M. Schwarz, W. Spooner, M. A. Tuli, K. V. Auken, D. Wang, X. Wang, G. Williams, R. Durbin, L. D. Stein, P. W. Sternberg, and J. Spieth. WormBase: new content and better access. Nucleic Acids Research, 35: D506–510, 2007.
C. P. Bird, B. E. Stranger, and E. T. Dermitzakis. Functional variation and evolution of non-coding dna. Current Opinion in Genetics & Development, 16(6):559–564, December 2006.
M. L. Blaxter. Personal communication, 2007.
A. R. Borneman, T. A. Gianoulis, Z. D. Zhang, H. Yu, J. Rozowsky, M. R. Seringhaus, L. Y. Wang, M. Gerstein, and M. Snyder. Divergence of transcription factor binding sites across related yeast species. Science, 317(5839):815–819, August 2007.
E. H. Davidson and D. H. Erwin. Gene regulatory networks and the evolution of animal body plans. Science, 311(5762):796–800, February 2006.
E. T. Dermitzakis, A. Reymond, and S. E. Antonarakis. Conserved non-genic sequences - an unexpected feature of mammalian genomes. Nature Reviews Genetics, 6 (2):151–157, February 2005.
I. Dubchak, M. Brudno, G. G. Loots, L. Pachter, C. Mayor, E. M. Rubin, and K. A. Frazer. Active conservation of noncoding sequences revealed by three-way species comparisons. Genome Research, 10(9):1304–1306, September 2000.
E. Ghedin, et al. Draft Genome of Filarial Nematode Parasite Brugia Malayi. Science, In press, 2007.
B. Ewing and P. Green. Basecalling of automated sequencer traces using phred. II. Error probabilities. Genome Research, 8:186–194, 1998.
E. A. A. Glazov, M. Pheasant, E. McGraw, G. Bejerano, and J. S. S. Mattick. Ultraconserved elements in insect genomes: A highly conserved intronic sequence implicated in the control of homothorax mrna splicing. Genome Research, May 2005.
S. Griffiths-Jones. The microRNA Registry. Nucleic Acids Research, 32:D109–D111, 2004.
S. Griffiths-Jones, S. Moxon, M. Marshall, A. Khanna, S. R. Eddy, and A. Bateman. Rfam: annotating non-coding RNAs in complete genomes. Nucleic Acids Research, 33:D121–D124, 2005.
GSC. Washington Univ in St Louis Genome Sequencing Center: C. remanei, 2007a. http://genome.wustl.edu/genome.cgi?GENOME=Caenorhabditis%20remanei.
GSC. Washington Univ in St Louis Genome Sequencing Center: T. spiralis, 2007b. http://genome.wustl.edu/genome.cgi?GENOME=Trichinella%20spiralis.
N. Harte, V. Silventoinen, E. Quevillon, S. Robinson, K. Kallio, X. Fustero, P. Patel, P. Jokinen, and R. Lopez. Public web-based services from the European Bioinformatics Institute. Nucleic Acids Research, 32:W3–9, 2004.
J. Jurka, V. Kapitonov, A. Pavlicek, P. Klonowski, O. Kohany, and J. Walichiewicz. Repbase Update, a database of eukaryotic repetitive elements. Cytogenetic and Genome Research, 110:462–467, 2005.
T. Lowe and S. Eddy. tRNAscan-SE: A program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Research, 25:955–964, 1997.
E. H. Margulies, M. Blanchette, D. Haussler, and E. D. Green. Identification and characterization of multi-species conserved sequences. Genome Research, 13(12):2507–2518, December 2003.
G. K. K. McEwen, A. Woolfe, D. Goode, T. Vavouri, H. Callaway, and G. Elgar. Ancient duplicated conserved noncoding elements in vertebrates: A genomic and functional analysis. Genome Research, March 2006.
NCBI. BLASTCLUST - BLAST score-based single-linkage clustering, 2007. http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/blastclust.html.
W. R. Pearson and D. J. Lipman. Improved tools for biological sequence comparison. PNAS, 85(8):2444–2448, April 1988.
A. Sandelin, P. Bailey, S. Bruce, P. G. Engstrom, J. M. Klos, W. W. Wasserman, J. Ericson, and B. Lenhard. Arrays of ultraconserved non-coding regions span the loci of key developmental genes in vertebrate genomes. BMC Genomics, 5(1):5–99, 2004.
A. Siepel, G. Bejerano, J. S. Pedersen, A. S. Hinrichs, M. Hou, K. Rosenbloom, H. Clawson, J. Spieth, L. W. Hillier, S. Richards, G. M. Weinstock, R. K. Wilson, R. A. Gibbs, W. J. Kent, W. Miller, and D. Haussler. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Research, 15(8):1034–1050, August 2005.
A. F. A. Smit, R. Hubley, and P. Green. RepeatMasker Open-3.0, 1996-2004. http://www.repeatmasker.org.
T. F. Smith and M. S. Waterman. Identification of common molecular subsequences. Journal of Molecular Biology, 147(1):195–197, March 1981.
L. D. Stein, Z. Bao, D. Blasiar, T. Blumenthal, M. R. Brent, N. Chen, A. Chinwalla, L. Clarke, C. Clee, A. Coghlan, A. Coulson, P. D’eustachio, D. H. Fitch, L. A. Fulton, R. E. Fulton, S. Griffiths-Jones, T. W. Harris, L. W. Hillier, R. Kamath, P. E. Kuwabara,
E. R. Mardis, M. A. Marra, T. L. Miner, P. Minx, J. C. Mullikin, R. W. Plumb, J. Rogers, J. E. Schein, M. Sohrmann, J. Spieth, J. E. Stajich, C. Wei, D. Willey, R. K. Wilson, R. Durbin, and R. H. Waterston. The genome sequence of caenorhabditis briggsae:
A platform for comparative genomics. PLoS Biology, 1(2):e45+, November 2003. The C. elegans Sequencing Consortium. Genome sequence of the nematode C. elegans: a platform for investigating biology. Science, 282(5396):2012–2018, December
1998.
J. R. Vanfleteren, Y. Van De Peer, M. L. Blaxter, S. A. Tweedie, C. Trotman, L. Lu, M.L. Van Hauwaert, and L. Moens. Molecular genealogy of some nematode taxa as based on cytochrome c and globin amino acid sequences. Molecular Phylogenetics
and Evolution, 3(2):92–101, June 1994.
T. Vavouri, G. K. Mcewen, A. Woolfe, W. R. Gilks, and G. Elgar. Defining a genomic radius for long-range enhancer action: duplicated conserved non-coding elements hold the key. Trends in Genetics, 22(1):5–10, January 2006.
T. Vavouri, K. Walter, W. R. Gilks, B. Lehner, and G. Elgar. Parallel evolution of conserved noncoding elements that target a common set of developmental regulatory genes from worms to humans. Genome Biology, 8:R15+, February 2007.
J. Wasmuth and M. L. Blaxter. On the origins of genic novelty in the phylum Nematoda. 2006.
Wolfram Research Inc. Mathematica Edition: Version 6.0, Wolfram Research Inc., 2007.
A. Woolfe, M. Goodson, D. K. Goode, P. Snell, G. K. McEwen, T. Vavouri, S. F. Smith, P. North, H. Callaway, K. Kelly, K. Walter, I. Abnizova, W. Gilks, Y. J. Edwards, J. E. Cooke, and G. Elgar. Highly conserved non-coding sequences are associated with vertebrate development. PLoS Biology, 3(1), January 2005.
Z. Zhang, S. Schwartz, L. Wagner, and W. Miller. A greedy algorithm for aligning DNA sequences. Journal of Computational Biology, 7(1-2):203–214, 2000.