PubMed İndeksli Yayınlar Koleksiyonu / PubMed Indexed Publications Collection

Permanent URI for this collectionhttps://hdl.handle.net/11147/7645

Browse

Search Results

Now showing 1 - 4 of 4
  • Article
    Citation - WoS: 1
    Citation - Scopus: 1
    Dnmso; an Ontology for Representing De Novo Sequencing Results From Tandem-Ms Data
    (PeerJ Inc., 2020) Takan, Savaş; Allmer, Jens
    For the identification and sequencing of proteins, mass spectrometry (MS) has become the tool of choice and, as such, drives proteomics. MS/MS spectra need to be assigned a peptide sequence for which two strategies exist. Either database search or de novo sequencing can be employed to establish peptide spectrum matches. For database search, mzIdentML is the current community standard for data representation. There is no community standard for representing de novo sequencing results, but we previously proposed the de novo markup language (DNML). At the moment, each de novo sequencing solution uses different data representation, complicating downstream data integration, which is crucial since ensemble predictions may be more useful than predictions of a single tool. We here propose the de novo MS Ontology (DNMSO), which can, for example, provide many-to-many mappings between spectra and peptide predictions. Additionally, an application programming interface (API) that supports any file operation necessary for de novo sequencing from spectra input to reading, writing, creating, of the DNMSO format, as well as conversion from many other file formats, has been implemented. This API removes all overhead from the production of de novo sequencing tools and allows developers to concentrate on algorithm development completely. We make the API and formal descriptions of the format freely available at https://github.com/savastakan/dnmso.
  • Article
    Citation - WoS: 20
    Citation - Scopus: 24
    Newly Developed Ssr Markers Reveal Genetic Diversity and Geographical Clustering in Spinach (spinacia Oleracea)
    (Springer Verlag, 2017) Göl, Şurhan; Göktay, Mehmet; Allmer, Jens; Doğanlar, Sami; Frary, Anne
    Spinach is a popular leafy green vegetable due to its nutritional composition. It contains high concentrations of vitamins A, E, C, and K, and folic acid. Development of genetic markers for spinach is important for diversity and breeding studies. In this work, Next Generation Sequencing (NGS) technology was used to develop genomic simple sequence repeat (SSR) markers. After cleaning and contig assembly, the sequence encompassed 2.5% of the 980 Mb spinach genome. The contigs were mined for SSRs. A total of 3852 SSRs were detected. Of these, 100 primer pairs were tested and 85% were found to yield clear, reproducible amplicons. These 85 markers were then applied to 48 spinach accessions from worldwide origins, resulting in 389 alleles with 89% polymorphism. The average gene diversity (GD) value of the markers (based on a GD calculation that ranges from 0 to 0.5) was 0.25. Our results demonstrated that the newly developed SSR markers are suitable for assessing genetic diversity and population structure of spinach germplasm. The markers also revealed clustering of the accessions based on geographical origin with clear separation of Far Eastern accessions which had the overall highest genetic diversity when compared with accessions from Persia, Turkey, Europe, and the USA. Thus, the SSR markers have good potential to provide valuable information for spinach breeding and germplasm management. Also they will be helpful for genome mapping and core collection establishment.
  • Article
    Citation - WoS: 14
    Citation - Scopus: 13
    Delineating the Impact of Machine Learning Elements in Pre-Microrna Detection
    (PeerJ Inc., 2017) Saçar Demirci, Müşerref Duygu; Allmer, Jens
    Gene regulation modulates RNA expression via transcription factors. Posttranscriptional gene regulation in turn influences the amount of protein product through, for example, microRNAs (miRNAs). Experimental establishment of miRNAs and their effects is complicated and even futile when aiming to establish the entirety of miRNA target interactions. Therefore, computational approaches have been proposed. Many such tools rely on machine learning (ML) which involves example selection, feature extraction, model training, algorithm selection, and parameter optimization. Different ML algorithms have been used for model training on various example sets, more than 1,000 features describing pre-miRNAs have been proposed and different training and testing schemes have been used for model establishment. For pre-miRNA detection, negative examples cannot easily be established causing a problem for two class classification algorithms. There is also no consensus on what ML approach works best and, therefore, we set forth and established the impact of the different parts involved in ML on model performance. Furthermore, we established two new negative datasets and analyzed the impact of using them for training and testing. It was our aim to attach an order of importance to the parts involved in ML for pre-miRNA detection, but instead we found that all parts are intricately connected and their contributions cannot be easily untangled leading us to suggest that when attempting ML-based pre-miRNA detection many scenarios need to be explored.
  • Article
    Citation - WoS: 14
    Citation - Scopus: 12
    The Impact of Feature Selection on One and Two-Class Classification Performance for Plant Micrornas
    (PeerJ Inc., 2016) Khalifa, Waleed; Yousef, Malik; Saçar Demirci, Müşerref Duygu; Allmer, Jens
    MicroRNAs (miRNAs) are short nucleotide sequences that form a typical hairpin structure which is recognized by a complex enzyme machinery. It ultimately leads to the incorporation of 18-24 nt long mature miRNAs into RISC where they act as recognition keys to aid in regulation of target mRNAs. It is involved to determine miRNAs experimentally and, therefore, machine learning is used to complement such endeavors. The success of machine learning mostly depends on proper input data and appropriate features for parameterization of the data. Although, in general, two-class classification (TCC) is used in the field; because negative examples are hard to come by, one-class classification (OCC) has been tried for pre-miRNA detection. Since both positive and negative examples are currently somewhat limited, feature selection can prove to be vital for furthering the field of pre-miRNA detection. In this study, we compare the performance of OCC and TCC using eight feature selection methods and seven different plant species providing positive pre-miRNA examples. Feature selection was very successful for OCC where the best feature selection method achieved an average accuracy of 95.6%, thereby being ~29% better than the worst method which achieved 66.9% accuracy. While the performance is comparable to TCC, which performs up to 3% better than OCC, TCC is much less affected by feature selection and its largest performance gap is ~13% which only occurs for two of the feature selection methodologies. We conclude that feature selection is crucially important for OCC and that it can perform on par with TCC given the proper set of features.