Data Mining for Microrna Gene Prediction: on the Impact of Class Imbalance and Feature Number for Microrna Gene Prediction

dc.contributor.author Saçar, Müşerref Duygu
dc.contributor.author Allmer, Jens
dc.coverage.doi 10.1109/HIBIT.2013.6661685
dc.date.accessioned 2017-04-17T11:24:11Z
dc.date.available 2017-04-17T11:24:11Z
dc.date.issued 2013
dc.description 8th International Symposium on Health Informatics and Bioinformatics, HIBIT 2013; Ankara; Turkey; 25 September 2013 through 27 September 2013 en_US
dc.description.abstract MicroRNAs (miRNAs) are small, non-coding RNAs which are involved in the posttranscriptional modulation of gene expression. Their short (18-24) single stranded mature sequences are involved in targeting specific genes. It turns out that experimental methods are limited and that it is difficult, if not impossible, to establish all miRNAs and their targets experimentally. Therefore, many tools for the prediction of miRNA genes and miRNA targets have been proposed. Most of these tools are based on machine learning methods and within that area mostly two-class classification is employed. Unfortunately, truly negative data is impossible to attain and only approximations of negative data are currently available. Also, we recently showed that the available positive data is not flawless. Here we investigate the impact of class imbalance on the learner accuracy and find that there is a difference of up to 50% between the best and worst precision and recall values. In addition, we looked at increasing number of features and found a curve maximizing at 0.97 recall and 0.91 precision with quickly decaying performance after inclusion of more than 100 features. © 2013 IEEE. en_US
dc.identifier.citation Saçar, M. D., and Allmer, J. (2013, September 25-27). Data mining for microrna gene prediction: On the impact of class imbalance and feature number for microrna gene prediction. Paper presented at the 8th International Symposium on Health Informatics and Bioinformatics. doi:10.1109/HIBIT.2013.6661685 en_US
dc.identifier.doi 10.1109/HIBIT.2013.6661685 en_US
dc.identifier.doi 10.1109/HIBIT.2013.6661685
dc.identifier.isbn 9781479907014
dc.identifier.scopus 2-s2.0-84892650223
dc.identifier.uri http://doi.org/10.1109/HIBIT.2013.6661685
dc.identifier.uri https://hdl.handle.net/11147/5322
dc.language.iso en en_US
dc.publisher Institute of Electrical and Electronics Engineers Inc. en_US
dc.relation.ispartof 8th International Symposium on Health Informatics and Bioinformatics, HIBIT 2013 en_US
dc.rights info:eu-repo/semantics/openAccess en_US
dc.subject Class imbalance en_US
dc.subject Data mining en_US
dc.subject Feature selection en_US
dc.subject Machine learning en_US
dc.subject MicroRNAs en_US
dc.subject MiRNA gene prediction en_US
dc.title Data Mining for Microrna Gene Prediction: on the Impact of Class Imbalance and Feature Number for Microrna Gene Prediction en_US
dc.type Conference Object en_US
dspace.entity.type Publication
gdc.author.institutional Saçar, Müşerref Duygu
gdc.author.institutional Allmer, Jens
gdc.author.yokid 114170
gdc.author.yokid 107974
gdc.bip.impulseclass C4
gdc.bip.influenceclass C5
gdc.bip.popularityclass C4
gdc.coar.access open access
gdc.coar.type text::conference output
gdc.collaboration.industrial false
gdc.description.department İzmir Institute of Technology. Molecular Biology and Genetics en_US
gdc.description.endpage 6
gdc.description.publicationcategory Konferans Öğesi - Uluslararası - Kurum Öğretim Elemanı en_US
gdc.description.scopusquality N/A
gdc.description.startpage 1
gdc.description.wosquality N/A
gdc.identifier.openalex W2081525771
gdc.index.type Scopus
gdc.oaire.diamondjournal false
gdc.oaire.impulse 5.0
gdc.oaire.influence 3.352486E-9
gdc.oaire.isgreen true
gdc.oaire.keywords MicroRNAs
gdc.oaire.keywords Class imbalance
gdc.oaire.keywords Feature selection
gdc.oaire.keywords Machine learning
gdc.oaire.keywords Data mining
gdc.oaire.keywords MiRNA gene prediction
gdc.oaire.popularity 5.9006227E-9
gdc.oaire.publicfunded false
gdc.oaire.sciencefields 0301 basic medicine
gdc.oaire.sciencefields 03 medical and health sciences
gdc.oaire.sciencefields 0206 medical engineering
gdc.oaire.sciencefields 02 engineering and technology
gdc.openalex.collaboration National
gdc.openalex.fwci 1.04579925
gdc.openalex.normalizedpercentile 0.78
gdc.openalex.toppercent TOP 10%
gdc.opencitations.count 12
gdc.plumx.crossrefcites 1
gdc.plumx.mendeley 15
gdc.plumx.scopuscites 19
gdc.scopus.citedcount 19
relation.isAuthorOfPublication.latestForDiscovery bf9f97a4-6d62-49cd-a7c8-1bc8463d14d2
relation.isOrgUnitOfPublication.latestForDiscovery 9af2b05f-28ac-4013-8abe-a4dfe192da5e

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Name:
5322.pdf
Size:
880.1 KB
Format:
Adobe Portable Document Format
Description:
Conference Paper

License bundle

Now showing 1 - 1 of 1
Loading...
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: