Enhancer Prediction in Proboscis Monkey Genome: A Comparative Study

Norshafarina Omar, Yu Shiong Wong, Xi Li, Yee Ling Chong, Mohd Tajuddin Abdullah, Nung Kion Lee

Abstract


Genome annotation is an essential task for understanding and analyzing the whole genome and its function. We have sequenced the complete proboscis Monkey (Nasalis larvatus) genome due to its importance for medical and evolutionary studies. We have performed an initial annotation of the genes genome using the MAKER gene annotation pipeline. 3084 genes were predicted from chromosome 18 of the genome using six eukaryotic model species. Intergenic regions possibly enriched with enhancers are then predicted using five different tools: DeepBind, LS-GKM, GMFR-CNN, CSI-ANN and iEnhancer-2L. These tools find the enhancers of the complex intergenic regions based on epigenetic features, in which intergenic regions are seen as a potential region for enhancers with a certain epigenetic features bound to it. Empirical results demonstrate competitive performance using different prediction tools with multiple epigenetic features to predict the enhancers for chromosome 18 in proboscis monkey. Based on the findings of this study, predicted enhancers can be used for the purpose of scientific and genomic discoveries.

Keywords


Enhancer Annotation; Enhancer Prediction; Motif Discovery; Proboscis Monkey;

Full Text:

PDF

References


B. Liu, L. Fang, R. Long, X. Lan and K.-C. Chou, "iEnhancer-2L: a twolayer predictor for identifying enhancers and their strength by pseudo k tuple nucleotide composition," Bioinformatics, vol. 32, pp. 362-369, 2016.

Y. Dai, J. Xu and H. Hu, "LMethyR-SVM: Predict human enhancers using low methylated regions based on weighted support vector machines," bioRxiv, p. 054221, 2016.

D. Lee, R. Karchin and M. A. Beer,"Discriminative prediction of mammalian enhancers from DNA sequence," Genome research, vol. 21, pp. 2167-2180, 2011.

M. Fernández and D. Miranda-Saavedra,"Genome-wide enhancer prediction from epigenetic signatures using genetic algorithm-optimized support vector machines," Nucleic acids research, vol. 40, pp. e77-e77, 2012.

N. Dogan, W. Wu, C. S. Morrissey, K.-B. Chen, A. Stonestrom, M. Long, C. A. Keller, Y. Cheng, D. Jain and A. Visel, "Occupancy by key transcription factors is a more accurate predictor of enhancer activity than histone modifications or chromatin accessibility," Epigenetics & chromatin, vol. 8, p. 1, 2015.

H. A. Firpi, D. Ucar and K. Tan, "Discover regulatory DNA elements using chromatin signatures and artificial neural network," Bioinformatics, vol. 26, pp. 1579-1586, 2010.

Zhu, Y., Sun, L., Chen, Z., Whitaker, J. W., Wang, T., & Wang, W. (2013). Predicting enhancer transcription and activity from chromatin modifications. Nucleic acids research, 41(22), 10032-10043.

B. E. Blaisdell, "A measure of the similarity of sets of sequences not requiring sequence alignment," Proceedings of the National Academy of Sciences, vol. 83, pp. 5155-5159, 1986.

Y. Lu, W. Qu, G. Shan and C. Zhang, "DELTA: a distal enhancer locating tool based on AdaBoost algorithm and shape features of chromatin modifications," PLoS ONE, vol. 10, p. e0130622, 2015.

B. Alipanahi, A. Delong, M. T. Weirauch and B. J. Frey, "Predicting the sequence specificities of DNA-and RNA-binding proteins by deep learning," Nature biotechnology, 2015.

Y.S.Wong, N.K.Lee, N.Omar, “GMFR-CNN:an integration of gapped motif feature representation and deep learning approach for enhancer prediction,”7 th International Conference on Computational SystemsBiology and Bioinformatics, 2016 (In press)

M. S. Campbell, C. Holt, B. Moore and M. Yandel, "Genome annotation and curation using MAKER and MAKER‐P," Current Protocols in Bioinformatics, pp. 4.11. 1-4.11. 39

B. L. Cantarel, I. Korf, S. M. Robb, G. Parra, E. Ross, B. Moore, C. Holt, A. S. Alvarado and M. Yandell, "MAKER: an easy-to-use annotation pipeline designed for emerging model organism genomes," Genome research, vol. 18, pp. 188-196, 2008.

L. Coombe, R. L. Warren, S. D. Jackman, C. Yang, B. P. Vandervalk, R. A. Moore, S. Pleasance, R. J. Coope, J. Bohlmann and R. A. Holt, "Assembly of the Complete Sitka Spruce Chloroplast Genome Using 10X Genomics’ GemCode Sequencing Data," PLoS ONE, vol. 11, p. e0163059, 2016

M. Campbell, K. F. Oakeson, M. Yandell, J. R. Halpert and D. Dearing, "The draft genome sequence and annotation of the desert woodrat Neotoma lepida," Genomics Data, vol. 9, pp. 58-59, 2016.

V. L. Sork, S. T. Fitz-Gibbon, D. Puiu, M. Crepeau, P. F. Gugger, R. Sherman, K. Stevens, C. H. Langley, M. Pellegrini and S. L. Salzberg,"First Draft Assembly and Annotation of the Genome of a California Endemic Oak Quercus lobata Née (Fagaceae)," G3: Genes| Genomes| Genetics, vol. 6, pp. 3485-3495, 2016.

P. M. De León-Medina, R. Elizondo-González, L. C. Damas-Buenrostro, J.-M. Geertman, M. Van den Broek, L. J. Galán-Wong, R. Ortiz-López and B. Pereyra-Alférez, "Genome annotation of a Saccharomyces sp. lager brewer's yeast," Genomics Data, vol. 9, pp. 25-29, 2016.

S. W. Choo, M. Rayko, T. K. Tan, R. Hari, A. Komissarov, W. Y. Wee, A. A. Yurchenko, S. Kliver, G. Tamazian and A. Antunes, "Pangolin genomes and the evolution of mammalian scales and immunity," Genome research, vol. 26, pp. 1312-1322, 2016.

D. Gupta, "GFF-Ex: A genome feature extraction package," Journal of Natural Science, Biology and Medicine, vol. 2, p. 90, 2011.

M. Ghandi, D. Lee, M. Mohammad-Noori and M. A. Beer, "Enhanced regulatory sequence prediction using gapped k-mer features," PLoS Computational Biology, vol. 10, p. e1003711, 2014.

S. J. B. Holwerda and W. de Laat, "CTCF: the protein, the binding partners, the binding sites and their chromatin loops," Phil. Trans. R. Soc. B, vol. 368, p. 20120369, 2013.

A. R. Quinlan and I. M. Hall, "BEDTools: a flexible suite of utilities for comparing genomic features," Bioinformatics, vol. 26, pp. 841-842, 2010.


Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.

ISSN: 2180-1843

eISSN: 2289-8131