Categorization of Species Based on Their Micrornas Employing Sequence Motifs, Information-Theoretic Sequence Feature Extraction, and K-Mers

Loading...

Date

Journal Title

Journal ISSN

Volume Title

Open Access Color

GOLD

Green Open Access

Yes

OpenAIRE Downloads

OpenAIRE Views

Publicly Funded

No
Impulse
Top 10%
Influence
Average
Popularity
Top 10%

relationships.isProjectOf

relationships.isJournalIssueOf

Abstract

Background: Diseases like cancer can manifest themselves through changes in protein abundance, and microRNAs (miRNAs) play a key role in the modulation of protein quantity. MicroRNAs are used throughout all kingdoms and have been shown to be exploited by viruses to modulate their host environment. Since the experimental detection of miRNAs is difficult, computational methods have been developed. Many such tools employ machine learning for pre-miRNA detection, and many features for miRNA parameterization have been proposed. To train machine learning models, negative data is of importance yet hard to come by; therefore, we recently started to employ pre-miRNAs from one species as positive data versus another species’ pre-miRNAs as negative examples based on sequence motifs and k-mers. Here, we introduce the additional usage of information-theoretic (IT) features. Results: Pre-miRNAs from one species were used as positive and another species’ pre-miRNAs as negative training data for machine learning. The categorization capability of IT and k-mer features was investigated. Both feature sets and their combinations yielded a very high accuracy, which is as good as the previously suggested sequence motif and k-mer based method. However, for obtaining a high performance, a sufficiently large phylogenetic distance between the species and sufficiently high number of pre-miRNAs in the training set is required. To examine the contribution of the IT and k-mer features, an information gain-based feature ranking was performed. Although the top 3 are IT features, 80% of the top 100 features are k-mers. The comparison of all three individual approaches (motifs, IT, and k-mers) shows that the distinction of species based on their pre-miRNAs k-mers are sufficient. Conclusions: IT sequence feature extraction enables the distinction among species and is less computationally expensive than motif calculations. However, since IT features need larger amounts of data to have enough statistics for producing highly accurate results, future categorization into species can be effectively done using k-mers only. The biological reasoning for this is the existence of a codon bias between species which can, at least, be observed in exonic miRNAs. Future work in this direction will be the ab initio detection of pre-miRNA. In addition, prediction of pre-miRNA from RNA-seq can be done.

Description

Keywords

Information theory, MicroRNAs, Machine learning, Sequence motifs, RNA, Information theory, TK7800-8360, k-mer, MicroRNA, TK5101-6720, Differentiate miRNAs among species, miRNA categorization, MicroRNAs, Sequence motifs, Pre-microRNA, Machine learning, Telecommunication, RNA, Electronics

Fields of Science

0301 basic medicine, 0206 medical engineering, 02 engineering and technology, 03 medical and health sciences

Citation

Yousef, M., Nigatu, D., Levy, D., Allmer, J., and Henkel, W. (2017). Categorization of species based on their microRNAs employing sequence motifs, information-theoretic sequence feature extraction, and k-mers. Eurasip Journal on Advances in Signal Processing, 2017(1). doi:10.1186/s13634-017-0506-8

WoS Q

Scopus Q

OpenCitations Logo
OpenCitations Citation Count
11

Volume

2017

Issue

1

Start Page

End Page

PlumX Metrics
Citations

CrossRef : 11

Scopus : 14

Captures

Mendeley Readers : 17

SCOPUS™ Citations

14

checked on May 01, 2026

Web of Science™ Citations

11

checked on May 01, 2026

Page Views

872

checked on May 01, 2026

Downloads

437

checked on May 01, 2026

Google Scholar Logo
Google Scholar™
OpenAlex Logo
OpenAlex FWCI
1.20253239

Sustainable Development Goals

GOOD HEALTH AND WELL-BEING3
GOOD HEALTH AND WELL-BEING