Data Mining for Microrna Gene Prediction: on the Impact of Class Imbalance and Feature Number for Microrna Gene Prediction

Loading...

Date

Authors

Journal Title

Journal ISSN

Volume Title

Open Access Color

Green Open Access

Yes

OpenAIRE Downloads

OpenAIRE Views

Publicly Funded

No
Impulse
Top 10%
Influence
Average
Popularity
Top 10%

relationships.isProjectOf

relationships.isJournalIssueOf

Abstract

MicroRNAs (miRNAs) are small, non-coding RNAs which are involved in the posttranscriptional modulation of gene expression. Their short (18-24) single stranded mature sequences are involved in targeting specific genes. It turns out that experimental methods are limited and that it is difficult, if not impossible, to establish all miRNAs and their targets experimentally. Therefore, many tools for the prediction of miRNA genes and miRNA targets have been proposed. Most of these tools are based on machine learning methods and within that area mostly two-class classification is employed. Unfortunately, truly negative data is impossible to attain and only approximations of negative data are currently available. Also, we recently showed that the available positive data is not flawless. Here we investigate the impact of class imbalance on the learner accuracy and find that there is a difference of up to 50% between the best and worst precision and recall values. In addition, we looked at increasing number of features and found a curve maximizing at 0.97 recall and 0.91 precision with quickly decaying performance after inclusion of more than 100 features. © 2013 IEEE.

Description

8th International Symposium on Health Informatics and Bioinformatics, HIBIT 2013; Ankara; Turkey; 25 September 2013 through 27 September 2013

Keywords

Class imbalance, Data mining, Feature selection, Machine learning, MicroRNAs, MiRNA gene prediction, MicroRNAs, Class imbalance, Feature selection, Machine learning, Data mining, MiRNA gene prediction

Fields of Science

0301 basic medicine, 03 medical and health sciences, 0206 medical engineering, 02 engineering and technology

Citation

Saçar, M. D., and Allmer, J. (2013, September 25-27). Data mining for microrna gene prediction: On the impact of class imbalance and feature number for microrna gene prediction. Paper presented at the 8th International Symposium on Health Informatics and Bioinformatics. doi:10.1109/HIBIT.2013.6661685

WoS Q

Scopus Q

OpenCitations Logo
OpenCitations Citation Count
12

Volume

Issue

Start Page

1

End Page

6
PlumX Metrics
Citations

CrossRef : 1

Scopus : 19

Captures

Mendeley Readers : 15

Google Scholar Logo
Google Scholar™
OpenAlex Logo
OpenAlex FWCI
1.04579925

Sustainable Development Goals