Improving the Quality of Positive Datasets for the Establishment of Machine Learning Models for Pre-Microrna Detection

dc.contributor.author Saçar Demirci, Müşerref Duygu
dc.contributor.author Allmer, Jens
dc.coverage.doi 10.1515/jib-2017-0032
dc.date.accessioned 2020-07-25T22:09:24Z
dc.date.available 2020-07-25T22:09:24Z
dc.date.issued 2017
dc.description.abstract MicroRNAs (miRNAs) are involved in the post-transcriptional regulation of protein abundance and thus have a great impact on the resulting phenotype. It is, therefore, no wonder that they have been implicated in many diseases ranging from virus infections to cancer. This impact on the phenotype leads to a great interest in establishing the miRNAs of an organism. Experimental methods are complicated which led to the development of computational methods for pre-miRNA detection. Such methods generally employ machine learning to establish models for the discrimination between miRNAs and other sequences. Positive training data for model establishment, for the most part, stems from miRBase, the miRNA registry. The quality of the entries in miRBase has been questioned, though. This unknown quality led to the development of filtering strategies in attempts to produce high quality positive datasets which can lead to a scarcity of positive data. To analyze the quality of filtered data we developed a machine learning model and found it is well able to establish data quality based on intrinsic measures. Additionally, we analyzed which features describing pre-miRNAs could discriminate between low and high quality data. Both models are applicable to data from miRBase and can be used for establishing high quality positive data. This will facilitate the development of better miRNA detection tools which will make the prediction of miRNAs in disease states more accurate. Finally, we applied both models to all miRBase data and provide the list of high quality hairpins. en_US
dc.identifier.doi 10.1515/jib-2017-0032
dc.identifier.issn 1613-4516
dc.identifier.scopus 2-s2.0-85044694577
dc.identifier.uri https://doi.org/10.1515/jib-2017-0032
dc.identifier.uri https://hdl.handle.net/11147/9314
dc.language.iso en en_US
dc.publisher Informationsmanagement in der Biotechnologie e.V. (IMBio e.V.) en_US
dc.relation.ispartof Journal of Integrative Bioinformatics en_US
dc.rights info:eu-repo/semantics/openAccess en_US
dc.subject MicroRNAs en_US
dc.subject Machine learning en_US
dc.subject Confidence en_US
dc.subject High quality en_US
dc.subject Positive data en_US
dc.title Improving the Quality of Positive Datasets for the Establishment of Machine Learning Models for Pre-Microrna Detection en_US
dc.type Article en_US
dspace.entity.type Publication
gdc.author.institutional Saçar Demirci, Müşerref Duygu
gdc.author.institutional Allmer, Jens
gdc.bip.impulseclass C5
gdc.bip.influenceclass C5
gdc.bip.popularityclass C5
gdc.coar.access open access
gdc.coar.type text::journal::journal article
gdc.collaboration.industrial false
gdc.description.department İzmir Institute of Technology. Molecular Biology and Genetics en_US
gdc.description.issue 2 en_US
gdc.description.publicationcategory Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı en_US
gdc.description.scopusquality Q1
gdc.description.volume 14 en_US
gdc.description.wosquality Q3
gdc.identifier.openalex W2740761319
gdc.identifier.pmid 28753538
gdc.identifier.wos WOS:000406931200011
gdc.index.type WoS
gdc.index.type Scopus
gdc.index.type PubMed
gdc.oaire.accesstype GOLD
gdc.oaire.diamondjournal false
gdc.oaire.downloads 0
gdc.oaire.impulse 1.0
gdc.oaire.influence 2.7929838E-9
gdc.oaire.isgreen true
gdc.oaire.keywords microrna
gdc.oaire.keywords mirbase
gdc.oaire.keywords Datasets as Topic
gdc.oaire.keywords MicroRNA
gdc.oaire.keywords Confidence
gdc.oaire.keywords High quality
gdc.oaire.keywords Machine Learning
gdc.oaire.keywords high quality
gdc.oaire.keywords MicroRNAs
gdc.oaire.keywords machine learning
gdc.oaire.keywords Positive data
gdc.oaire.keywords positive data
gdc.oaire.keywords MirGeneDB
gdc.oaire.keywords Machine learning
gdc.oaire.keywords miRBase
gdc.oaire.keywords Humans
gdc.oaire.keywords Registries
gdc.oaire.keywords confidence
gdc.oaire.keywords mirgenedb
gdc.oaire.keywords TP248.13-248.65
gdc.oaire.keywords Research Articles
gdc.oaire.keywords Biotechnology
gdc.oaire.popularity 3.1449903E-9
gdc.oaire.publicfunded false
gdc.oaire.sciencefields 0301 basic medicine
gdc.oaire.sciencefields 03 medical and health sciences
gdc.oaire.views 2
gdc.openalex.collaboration National
gdc.openalex.fwci 0.10096901
gdc.openalex.normalizedpercentile 0.43
gdc.opencitations.count 2
gdc.plumx.mendeley 17
gdc.plumx.pubmedcites 3
gdc.plumx.scopuscites 4
gdc.scopus.citedcount 4
gdc.wos.citedcount 4
relation.isAuthorOfPublication.latestForDiscovery bf9f97a4-6d62-49cd-a7c8-1bc8463d14d2
relation.isOrgUnitOfPublication.latestForDiscovery 9af2b05f-28ac-4013-8abe-a4dfe192da5e

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Name:
Improving-the-Quality.pdf
Size:
1.62 MB
Format:
Adobe Portable Document Format