Improving the Quality of Positive Datasets for the Establishment of Machine Learning Models for Pre-Microrna Detection

Loading...

Date

2017

Journal Title

Journal ISSN

Volume Title

Publisher

Informationsmanagement in der Biotechnologie e.V. (IMBio e.V.)

Open Access Color

GOLD

Green Open Access

Yes

OpenAIRE Downloads

0

OpenAIRE Views

2

Publicly Funded

No
Impulse
Average
Influence
Average
Popularity
Average

relationships.isProjectOf

relationships.isJournalIssueOf

Abstract

MicroRNAs (miRNAs) are involved in the post-transcriptional regulation of protein abundance and thus have a great impact on the resulting phenotype. It is, therefore, no wonder that they have been implicated in many diseases ranging from virus infections to cancer. This impact on the phenotype leads to a great interest in establishing the miRNAs of an organism. Experimental methods are complicated which led to the development of computational methods for pre-miRNA detection. Such methods generally employ machine learning to establish models for the discrimination between miRNAs and other sequences. Positive training data for model establishment, for the most part, stems from miRBase, the miRNA registry. The quality of the entries in miRBase has been questioned, though. This unknown quality led to the development of filtering strategies in attempts to produce high quality positive datasets which can lead to a scarcity of positive data. To analyze the quality of filtered data we developed a machine learning model and found it is well able to establish data quality based on intrinsic measures. Additionally, we analyzed which features describing pre-miRNAs could discriminate between low and high quality data. Both models are applicable to data from miRBase and can be used for establishing high quality positive data. This will facilitate the development of better miRNA detection tools which will make the prediction of miRNAs in disease states more accurate. Finally, we applied both models to all miRBase data and provide the list of high quality hairpins.

Description

Keywords

MicroRNAs, Machine learning, Confidence, High quality, Positive data, microrna, mirbase, Datasets as Topic, MicroRNA, Confidence, High quality, Machine Learning, high quality, MicroRNAs, machine learning, Positive data, positive data, MirGeneDB, Machine learning, miRBase, Humans, Registries, confidence, mirgenedb, TP248.13-248.65, Research Articles, Biotechnology

Fields of Science

0301 basic medicine, 03 medical and health sciences

Citation

WoS Q

Q3

Scopus Q

Q1
OpenCitations Logo
OpenCitations Citation Count
2

Source

Journal of Integrative Bioinformatics

Volume

14

Issue

2

Start Page

End Page

PlumX Metrics
Citations

Scopus : 4

PubMed : 3

Captures

Mendeley Readers : 17

SCOPUS™ Citations

4

checked on Apr 27, 2026

Web of Science™ Citations

4

checked on Apr 27, 2026

Page Views

604

checked on Apr 27, 2026

Downloads

134

checked on Apr 27, 2026

Google Scholar Logo
Google Scholar™
OpenAlex Logo
OpenAlex FWCI
0.10096901

Sustainable Development Goals

GOOD HEALTH AND WELL-BEING3
GOOD HEALTH AND WELL-BEING