Please use this identifier to cite or link to this item: http://nopr.niscair.res.in/handle/123456789/13251
Title: Automatic prediction of non-coding RNA genes in prokaryotes based on compositional statistics
Authors: Tong, Hao
Guo, Feng-Biao
Ye, Yuan-Nong
Keywords: Automatic gene prediction
Non-coding RNA genes
<i style="mso-bidi-font-style:normal"><span style="font-size:9.0pt;font-family:"Times New Roman","serif"; mso-fareast-font-family:"Times New Roman";letter-spacing:-.1pt;mso-ansi-language: EN-GB;mso-fareast-language:EN-US;mso-bidi-language:AR-SA" lang="EN-GB">Sulfolobus solfataricus</span></i>
<i style="mso-bidi-font-style:normal"><span style="font-size:9.0pt;font-family:"Times New Roman","serif"; mso-fareast-font-family:"Times New Roman";letter-spacing:-.1pt;mso-ansi-language: EN-GB;mso-fareast-language:EN-US;mso-bidi-language:AR-SA" lang="EN-GB">E. coli</span></i>
Nucleotide composition
Support vector machine
Issue Date: Dec-2011
Publisher: NISCAIR-CSIR, India
Abstract: <span style="font-size:11.0pt;mso-bidi-font-size: 9.0pt;font-family:" times="" new="" roman","serif";mso-fareast-font-family:"times="" roman";="" letter-spacing:-.1pt;mso-ansi-language:en-gb;mso-fareast-language:en-us;="" mso-bidi-language:ar-sa"="" lang="EN-GB">Although non-coding RNA (ncRNA) genes do not encode proteins, they play vital roles in cells by producing functionally important RNAs. In this paper, we present a novel method for predicting ncRNA genes based on compositional features extracted directly from gene sequences. Our method consists of two Support Vector Machine (SVM) models — Codon model which uses codon usage features derived from ncRNA genes and protein-coding genes and Kmer model which utilizes features of nucleotide and dinucleotide frequency extracted respectively from ncRNA genes and randomly chosen genome sequences. The 10-fold cross-validation accuracy for the two models is found to be 92% and 91%, respectively. Thus, we could make an automatic prediction of ncRNA genes in one genome without manual filtration of protein-coding genes. After applying our method in <i style="mso-bidi-font-style:normal">Sulfolobus solfataricus</i> genome, 25 prediction results have been generated according to 25 cut-off pairs. We have also applied the approach in <i style="mso-bidi-font-style:normal">E. coli</i> and found our results comparable to those of previous studies. In general, our method enables automatic identification of ncRNA genes in newly sequenced prokaryotic genomes. Datasets and program code used in this work are available at <a href="http://cobi.uestc.edu.cn/resource/SS_ncRNA/">http://cobi.uestc.edu.cn/resource/SS_ncRNA/</a></span>
Description: 416-421
URI: http://hdl.handle.net/123456789/13251
ISSN: 0975-0959 (Online); 0301-1208 (Print)
Appears in Collections:IJBB Vol.48(6) [December 2011]

Files in This Item:
File Description SizeFormat 
IJBB 48(6) 416-421.pdf64.92 kBAdobe PDFView/Open


Items in NOPR are protected by copyright, with all rights reserved, unless otherwise indicated.