The Impact of Feature Selection on One and Two-Class Classification Performance for Plant Micrornas

dc.contributor.author Khalifa, Waleed
dc.contributor.author Yousef, Malik
dc.contributor.author Saçar Demirci, Müşerref Duygu
dc.contributor.author Allmer, Jens
dc.coverage.doi 10.7717/peerj.2135
dc.date.accessioned 2017-06-28T08:33:00Z
dc.date.available 2017-06-28T08:33:00Z
dc.date.issued 2016
dc.description.abstract MicroRNAs (miRNAs) are short nucleotide sequences that form a typical hairpin structure which is recognized by a complex enzyme machinery. It ultimately leads to the incorporation of 18-24 nt long mature miRNAs into RISC where they act as recognition keys to aid in regulation of target mRNAs. It is involved to determine miRNAs experimentally and, therefore, machine learning is used to complement such endeavors. The success of machine learning mostly depends on proper input data and appropriate features for parameterization of the data. Although, in general, two-class classification (TCC) is used in the field; because negative examples are hard to come by, one-class classification (OCC) has been tried for pre-miRNA detection. Since both positive and negative examples are currently somewhat limited, feature selection can prove to be vital for furthering the field of pre-miRNA detection. In this study, we compare the performance of OCC and TCC using eight feature selection methods and seven different plant species providing positive pre-miRNA examples. Feature selection was very successful for OCC where the best feature selection method achieved an average accuracy of 95.6%, thereby being ~29% better than the worst method which achieved 66.9% accuracy. While the performance is comparable to TCC, which performs up to 3% better than OCC, TCC is much less affected by feature selection and its largest performance gap is ~13% which only occurs for two of the feature selection methodologies. We conclude that feature selection is crucially important for OCC and that it can perform on par with TCC given the proper set of features. en_US
dc.description.sponsorship The Scientific and Technological Research Council of Turkey (grant number 113E326) en_US
dc.identifier.citation Khalifa, W., Yousef, M., Saçar Demirci, M. D., and Allmer, J. (2016). The impact of feature selection on one and two-class classification performance for plant microRNAs. PeerJ, 2016(6). doi:10.7717/peerj.2135 en_US
dc.identifier.doi 10.7717/peerj.2135 en_US
dc.identifier.doi 10.7717/peerj.2135
dc.identifier.issn 2167-8359
dc.identifier.scopus 2-s2.0-84977103713
dc.identifier.uri http://doi.org/10.7717/peerj.2135
dc.identifier.uri https://hdl.handle.net/11147/5794
dc.language.iso en en_US
dc.publisher PeerJ Inc. en_US
dc.relation info:eu-repo/grantAgreement/TUBITAK/EEEAG/113E326 en_US
dc.relation.ispartof PeerJ en_US
dc.rights info:eu-repo/semantics/openAccess en_US
dc.subject Feature selection en_US
dc.subject Machine learning en_US
dc.subject MicroRNAs en_US
dc.subject Plant genetics en_US
dc.subject Classification en_US
dc.title The Impact of Feature Selection on One and Two-Class Classification Performance for Plant Micrornas en_US
dc.type Article en_US
dspace.entity.type Publication
gdc.author.institutional Saçar Demirci, Müşerref Duygu
gdc.author.institutional Allmer, Jens
gdc.author.yokid 114170
gdc.author.yokid 107974
gdc.bip.impulseclass C4
gdc.bip.influenceclass C5
gdc.bip.popularityclass C4
gdc.coar.access open access
gdc.coar.type text::journal::journal article
gdc.collaboration.industrial false
gdc.description.department İzmir Institute of Technology. Molecular Biology and Genetics en_US
gdc.description.issue 6 en_US
gdc.description.publicationcategory Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı en_US
gdc.description.scopusquality Q3
gdc.description.volume 2016 en_US
gdc.description.wosquality Q2
gdc.identifier.openalex W2472053654
gdc.identifier.pmid 27366641
gdc.identifier.wos WOS:000378351000002
gdc.index.type WoS
gdc.index.type Scopus
gdc.index.type PubMed
gdc.oaire.accesstype GOLD
gdc.oaire.diamondjournal false
gdc.oaire.downloads 0
gdc.oaire.impulse 9.0
gdc.oaire.influence 3.289088E-9
gdc.oaire.isgreen true
gdc.oaire.keywords One-class classification
gdc.oaire.keywords Plant genetics
gdc.oaire.keywords QH301-705.5
gdc.oaire.keywords Bioinformatics
gdc.oaire.keywords Two-class classification
gdc.oaire.keywords R
gdc.oaire.keywords MicroRNA
gdc.oaire.keywords Plant
gdc.oaire.keywords Classification
gdc.oaire.keywords MicroRNAs
gdc.oaire.keywords Machine learning
gdc.oaire.keywords Feature selection
gdc.oaire.keywords Medicine
gdc.oaire.keywords Biology (General)
gdc.oaire.popularity 4.30585E-9
gdc.oaire.publicfunded false
gdc.oaire.sciencefields 0301 basic medicine
gdc.oaire.sciencefields 0303 health sciences
gdc.oaire.sciencefields 03 medical and health sciences
gdc.oaire.views 5
gdc.openalex.collaboration International
gdc.openalex.fwci 1.10538283
gdc.openalex.normalizedpercentile 0.78
gdc.opencitations.count 12
gdc.plumx.crossrefcites 6
gdc.plumx.facebookshareslikecount 23
gdc.plumx.mendeley 27
gdc.plumx.pubmedcites 5
gdc.plumx.scopuscites 12
gdc.scopus.citedcount 12
gdc.wos.citedcount 14
relation.isAuthorOfPublication.latestForDiscovery bf9f97a4-6d62-49cd-a7c8-1bc8463d14d2
relation.isOrgUnitOfPublication.latestForDiscovery 9af2b05f-28ac-4013-8abe-a4dfe192da5e

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Name:
5794.pdf
Size:
709.24 KB
Format:
Adobe Portable Document Format
Description:
Makale

License bundle

Now showing 1 - 1 of 1
Loading...
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: