Molecular Biology and Genetics / Moleküler Biyoloji ve Genetik
Permanent URI for this collectionhttps://hdl.handle.net/11147/9
Browse
6 results
Search Results
Conference Object Citation - Scopus: 19Data Mining for Microrna Gene Prediction: on the Impact of Class Imbalance and Feature Number for Microrna Gene Prediction(Institute of Electrical and Electronics Engineers Inc., 2013) Saçar, Müşerref Duygu; Allmer, Jens; Allmer, Jens; 04.03. Department of Molecular Biology and Genetics; 04. Faculty of Science; 01. Izmir Institute of TechnologyMicroRNAs (miRNAs) are small, non-coding RNAs which are involved in the posttranscriptional modulation of gene expression. Their short (18-24) single stranded mature sequences are involved in targeting specific genes. It turns out that experimental methods are limited and that it is difficult, if not impossible, to establish all miRNAs and their targets experimentally. Therefore, many tools for the prediction of miRNA genes and miRNA targets have been proposed. Most of these tools are based on machine learning methods and within that area mostly two-class classification is employed. Unfortunately, truly negative data is impossible to attain and only approximations of negative data are currently available. Also, we recently showed that the available positive data is not flawless. Here we investigate the impact of class imbalance on the learner accuracy and find that there is a difference of up to 50% between the best and worst precision and recall values. In addition, we looked at increasing number of features and found a curve maximizing at 0.97 recall and 0.91 precision with quickly decaying performance after inclusion of more than 100 features. © 2013 IEEE.Conference Object Citation - Scopus: 1Ranking Tandem Mass Spectra: and the Impact of Database Size and Scoring Function on Peptide Spectrum Matches(Institute of Electrical and Electronics Engineers Inc., 2013) Has, Canan; Kundakçı, Cemal Ulaş; Allmer, Jens; Allmer, Jens; 04.03. Department of Molecular Biology and Genetics; 04. Faculty of Science; 01. Izmir Institute of TechnologyProteomics is currently driven by mass spectrometry. For the analysis of tandem mass spectra many computational algorithms have been proposed. There are two approaches, one which assigns a peptide sequence to a tandem mass spectrum directly and one which employs a sequence database for looking up possible solutions. The former method needs high quality spectra while the latter can tolerate lower quality spectra. Since both methods are computationally expensive, it is sensible to establish spectral quality using an independent fast algorithm. In this study, we first establish proper settings for database search algorithms for the analysis of spectra in our gold benchmark dataset and then analyze the performance of ScanRanker, an algorithm for quality assessment of tandem MS spectra, on this ground truth data. We found that OMSSA and MSGFDB have limitations in their scoring functions but were able to form a proper consensus prediction using majority vote for our benchmark data. Unfortunately, ScanRanker's results do not correlate well with the consensus and ScanRanker is also too slow to be used in the capacity it is supposed to be used. © 2013 IEEEConference Object Citation - Scopus: 1De Novo Markup Language, a Standard To Represent De Novo Sequencing Results From Ms/Ms Data(Institute of Electrical and Electronics Engineers Inc., 2012) Takan, Savaş; Allmer, Jens; Allmer, Jens; Takan, Savaş; 04.03. Department of Molecular Biology and Genetics; 03.04. Department of Computer Engineering; 03. Faculty of Engineering; 04. Faculty of Science; 01. Izmir Institute of TechnologyProteomics is the study of the proteins that can be derived from a genome. For the identification and sequencing of proteins, mass spectrometry has become the tool of choice. Within mass spectrometry-based proteomics, proteins can be identified or sequenced by either database search or de novo sequencing. Both methods have certain advantages and drawbacks but in the long run we envision de novo sequencing to become the predominant tool. Currently, de novo sequencing results are stored in arbitrary file formats, depending on the developers of the algorithms. We identified this as a large and unnecessary obstacle while integrating results from multiple de novo sequencing algorithms. Therefore, we designed a standard file format for the representation of de novo sequencing results. We further developed an application programming interface since we identified the lack of proper APIs as another obstacle, introducing a needlessly high learning curve for developers. © 2012 IEEE.Conference Object Citation - Scopus: 1Removing Contamination From Genomic Sequences Based on Vector Reference Libraries(Institute of Electrical and Electronics Engineers Inc., 2012) Bağcı, Caner; Allmer, Jens; Allmer, Jens; 04.03. Department of Molecular Biology and Genetics; 04. Faculty of Science; 01. Izmir Institute of TechnologyDNA is often sequenced after being cloned into a vector since this provides the possibility for using standard primers and removes the need to develop custom primers. In this way a certain amount of vector is sequenced along with the sequence of interest. Unfortunately, occasionally these contaminating vector sequences find their way into public databases as part of submitted sequences. It has been pointed out that SeqClean, a program used to remove vector contamination from sequences, does not take into account that vectors are circular structures. A workaround has been presented before, but we were able to simplify the process and, additionally, we provide an implementation. We further applied our method to a test set of EST sequences and also analyzed the amount of contamination found in the EST sequences available on NCBI. © 2012 IEEE.Conference Object Citation - Scopus: 17Systematic Computational Analysis of Potential Rnai Regulation in Toxoplasma Gondii(Institute of Electrical and Electronics Engineers Inc., 2010) Çakır, Mehmet Volkan; Allmer, Jens; Allmer, Jens; 04.03. Department of Molecular Biology and Genetics; 04. Faculty of Science; 01. Izmir Institute of TechnologyRNA interference (RNAi) is the mechanism through which RNA interferes with the production of other RNAs in a sequence specific manner. Micro RNA (miRNA) is a type of RNA which is transcribed as pri-miRNAs and processed to premiRNAs in the nucleus. These pre-miRNAs are then exported from the nucleus and processed in the cytoplasm to double stranded RNA with one strand providing target specificity.. Toxoplasma gondii is a parasitic apicomplexan which causes several diseases. T. gondii is a good candidate for computational efforts with its small and publicly available genome files and extensive information about its gene structure. Although the existence of RNA interference in T. gondii is being debated, establishment of its complete potential RNAi regulatory network may be beneficial for further investigations into the topic. ©2009 IEEE.Conference Object Relative Protein Quantitation With Post Translational Modifications in Mass Spectrometry Based Proteomics(Institute of Electrical and Electronics Engineers Inc., 2010) Allmer, Jens; Allmer, Jens; 04.03. Department of Molecular Biology and Genetics; 04. Faculty of Science; 01. Izmir Institute of TechnologyMass spectrometry has become the tool of choice for most investigations in proteomics. Identification of proteins from complex mixtures has long been achieved and is now routinely used in countless high throughput studies. Quantitation by mass spectrometry is comparably newer and many different strategies have been proposed. One such strategy quantitates the difference in protein expression level among samples via extracted ion chromatograms, or spectral counts or a combination thereof. Another strategy involves mass modifications of the analytes in one or more of the samples under investigation. MSMAG has been developed as an extension to 2DB and it has been shown that it can aid in quantitation of data from experiments employing label-free quantitation. Recently, it has been extended to allow for analysis of data based on labelling strategies. This also makes it possible to quickly visualize and investigate inherent mass differences as presented by post translational modifications. ©2009 IEEE.
