Scopus İndeksli Yayınlar Koleksiyonu / Scopus Indexed Publications Collection

Permanent URI for this collectionhttps://hdl.handle.net/11147/7148

Browse

Search Results

Now showing 1 - 10 of 11
  • Conference Object
    Citation - WoS: 3
    Citation - Scopus: 8
    Distinguishing Between Microrna Targets From Diverse Species Using Sequence Motifs and K-Mers
    (SCITEPRESS, 2017) Yousef, Malik; Khalifa, Waleed; Acar, İlhan Erkin; Allmer, Jens
    A disease phenotype is often due to dysregulation of gene expression. Post-translational regulation of protein abundance by microRNAs (miRNAs) is, therefore, of high importance in, for example, cancer studies. MicroRNAs provide a complementary sequence to their target messenger RNA (mRNA) as part of a complex molecular machinery. Known miRNAs and targets are listed in miRTarBase for a variety of organisms. The experimental detection of such pairs is convoluted and, therefore, their computational detection is desired which is complicated by missing negative data. For machine learning, many features for parameterization of the miRNA targets are available and k-mers and sequence motifs have previously been used. Unrelated organisms like intracellular pathogens and their hosts may communicate via miRNAs and, therefore, we investigated whether miRNA targets from one species can be differentiated from miRNA targets of another. To achieve this end, we employed target information of one species as positive and the other as negative training and testing data. Models of species with higher evolutionary distance generally achieved better results of up to 97% average accuracy (mouse versus Caenorhabditis elegans) while more closely related species did not lead to successful models (human versus mouse; 60%). In the future, when more targeting data becomes available, models can be established which will be able to more precisely determine miRNA targets in hostpathogen systems using this approach.
  • Book Part
    Citation - Scopus: 9
    Differential Expression of Toxoplasma Gondii Micrornas in Murine and Human Hosts
    (Springer, 2016) Allmer, Jens; Saçar Demirci, Müşerref Duygu; Bağcı, Caner
    MicroRNAs are short RNA sequences involved in post-transcriptional gene regulation. MicroRNAs are known for a wide variety of species ranging from bacteria to plants. It has become clear that some cross-kingdom regulation is possible especially between viruses and their hosts. We hypothesized that intracellular parasites, like Toxoplasma gondii, similar to viruses would be able to modulate their host’s gene expression. We were able to show that T. gondii produces many putative pre-miRNAs which are actually transcribed. Furthermore, some of these expressed pre-miRNAs have a striking resemblance to host mature miRNAs. Previous studies indicated that T. gondii infection coincides with increased abundance of some miRNAs. Here we were able to show that many of these miRNAs have close relatives in T. gondii which may not be distinguishable using PCR. Taken together, the similarity to host miRNAs, their confirmed expression, and their upregulation during infection, it suggests that T. gondii actively transfers miRNAs to regulate its host. We conclude, that this type of cross-kingdom regulation may be possible, but that targeted analysis is necessary to consolidate our computational findings. © Springer International Publishing Switzerland 2016. All rights are reserved.
  • Conference Object
    Citation - Scopus: 13
    Feature Selection for Microrna Target Prediction Comparison of One-Class Feature Selection Methodologies
    (Hindawi Publishing Corporation, 2016) Yousef, Malik; Allmer, Jens; Khalifa, Waleed
    Traditionally, machine learning algorithms build classification models from positive and negative examples. Recently, one-class classification (OCC) receives increasing attention in machine learning for problems where the negative class cannot be defined unambiguously. This is specifically problematic in bioinformatics since for some important biological problems the target class (positive class) is easy to obtain while the negative one cannot be measured. Artificially generating the negative class data can be based on unreliable assumptions. Several studies have applied two-class machine learning to predict microRNAs (miRNAs) and their target. Different approaches for the generation of an artificial negative class have been applied, but may lead to a biased performance estimate. Feature selection has been well studied for the two-class classification problem, while fewer methods are available for feature selection in respect to OCC. In this study, we present a feature selection approach for applying one-class classification to the prediction of miRNA targets. A comparison between one-class and two-class approaches is presented to highlight that their performance are similar while one-class classification is not based on questionable artificial data for training and performance evaluation. We further show that the feature selection method we tried works to a degree, but needs improvement in the future. Perhaps it could be combined with other approaches.
  • Article
    Citation - Scopus: 19
    Feature Selection Has a Large Impact on One-Class Classification Accuracy for Micrornas in Plants
    (Hindawi Publishing Corporation, 2016) Yousef, Malik; Demirci, Müşerref Duygu Saçar; Khalifa, Waleed; Allmer, Jens
    MicroRNAs (miRNAs) are short RNA sequences involved in posttranscriptional gene regulation. Their experimental analysis is complicated and, therefore, needs to be supplemented with computational miRNA detection. Currently computational miRNA detection is mainly performed using machine learning and in particular two-class classification. For machine learning, the miRNAs need to be parametrized and more than 700 features have been described. Positive training examples for machine learning are readily available, but negative data is hard to come by. Therefore, it seems prerogative to use one-class classification instead of two-class classification. Previously, we were able to almost reach two-class classification accuracy using one-class classifiers. In this work, we employ feature selection procedures in conjunction with one-class classification and show that there is up to 36% difference in accuracy among these feature selection methods. The best feature set allowed the training of a one-class classifier which achieved an average accuracy of 95.6% thereby outperforming previous two-class-based plant miRNA detection approaches by about 0.5%. We believe that this can be improved upon in the future by rigorous filtering of the positive training examples and by improving current feature clustering algorithms to better target pre-miRNA feature selection.
  • Conference Object
    Citation - Scopus: 19
    Data Mining for Microrna Gene Prediction: on the Impact of Class Imbalance and Feature Number for Microrna Gene Prediction
    (Institute of Electrical and Electronics Engineers Inc., 2013) Saçar, Müşerref Duygu; Allmer, Jens
    MicroRNAs (miRNAs) are small, non-coding RNAs which are involved in the posttranscriptional modulation of gene expression. Their short (18-24) single stranded mature sequences are involved in targeting specific genes. It turns out that experimental methods are limited and that it is difficult, if not impossible, to establish all miRNAs and their targets experimentally. Therefore, many tools for the prediction of miRNA genes and miRNA targets have been proposed. Most of these tools are based on machine learning methods and within that area mostly two-class classification is employed. Unfortunately, truly negative data is impossible to attain and only approximations of negative data are currently available. Also, we recently showed that the available positive data is not flawless. Here we investigate the impact of class imbalance on the learner accuracy and find that there is a difference of up to 50% between the best and worst precision and recall values. In addition, we looked at increasing number of features and found a curve maximizing at 0.97 recall and 0.91 precision with quickly decaying performance after inclusion of more than 100 features. © 2013 IEEE.
  • Conference Object
    Citation - Scopus: 1
    Ranking Tandem Mass Spectra: and the Impact of Database Size and Scoring Function on Peptide Spectrum Matches
    (Institute of Electrical and Electronics Engineers Inc., 2013) Has, Canan; Kundakçı, Cemal Ulaş; Altay, Aybuge; Allmer, Jens
    Proteomics is currently driven by mass spectrometry. For the analysis of tandem mass spectra many computational algorithms have been proposed. There are two approaches, one which assigns a peptide sequence to a tandem mass spectrum directly and one which employs a sequence database for looking up possible solutions. The former method needs high quality spectra while the latter can tolerate lower quality spectra. Since both methods are computationally expensive, it is sensible to establish spectral quality using an independent fast algorithm. In this study, we first establish proper settings for database search algorithms for the analysis of spectra in our gold benchmark dataset and then analyze the performance of ScanRanker, an algorithm for quality assessment of tandem MS spectra, on this ground truth data. We found that OMSSA and MSGFDB have limitations in their scoring functions but were able to form a proper consensus prediction using majority vote for our benchmark data. Unfortunately, ScanRanker's results do not correlate well with the consensus and ScanRanker is also too slow to be used in the capacity it is supposed to be used. © 2013 IEEE
  • Conference Object
    Citation - WoS: 6
    Citation - Scopus: 8
    Comparison of Four Ab Initio Microrna Prediction Tools
    (SciTePress, 2013) Saçar, Müşerref Duygu; Allmer, Jens
    MicroRNAs are small RNA sequences of 18-24 nucleotides in length, which serve as templates to drive post transcriptional gene silencing. The canonical microRNA pathway starts with transcription from DNA and is followed by processing by the Microprocessor complex, yielding a hairpin structure. This is then exported into the cytosol where it is processed by Dicer and next incorporated into the RNA induced silencing complex. All of these biogenesis steps add to the overall specificity of miRNA production and effect. Unfortunately, experimental detection of miRNAs is cumbersome and therefore computational tools are necessary. Homology-based miRNA prediction tools are limited by fast miRNA evolution and by the fact that they are template driven. Ab initio miRNA prediction methods have been proposed but they have not been analyzed competitively so that their relative performance is largely unknown. Here we implement the features proposed in four miRNA ab initio studies and evaluate them on two data sets. Using the features described in Bentwich 2008 leads to the highest accuracy but still does not provide enough confidence into the results to warrant experimental validation of all predictions in a larger genome like the human genome. Copyright © 2013 SCITEPRESS - Science and Technology Publications.
  • Conference Object
    Citation - Scopus: 1
    De Novo Markup Language, a Standard To Represent De Novo Sequencing Results From Ms/Ms Data
    (Institute of Electrical and Electronics Engineers Inc., 2012) Takan, Savaş; Allmer, Jens
    Proteomics is the study of the proteins that can be derived from a genome. For the identification and sequencing of proteins, mass spectrometry has become the tool of choice. Within mass spectrometry-based proteomics, proteins can be identified or sequenced by either database search or de novo sequencing. Both methods have certain advantages and drawbacks but in the long run we envision de novo sequencing to become the predominant tool. Currently, de novo sequencing results are stored in arbitrary file formats, depending on the developers of the algorithms. We identified this as a large and unnecessary obstacle while integrating results from multiple de novo sequencing algorithms. Therefore, we designed a standard file format for the representation of de novo sequencing results. We further developed an application programming interface since we identified the lack of proper APIs as another obstacle, introducing a needlessly high learning curve for developers. © 2012 IEEE.
  • Conference Object
    Citation - Scopus: 1
    Removing Contamination From Genomic Sequences Based on Vector Reference Libraries
    (Institute of Electrical and Electronics Engineers Inc., 2012) Bağcı, Caner; Allmer, Jens
    DNA is often sequenced after being cloned into a vector since this provides the possibility for using standard primers and removes the need to develop custom primers. In this way a certain amount of vector is sequenced along with the sequence of interest. Unfortunately, occasionally these contaminating vector sequences find their way into public databases as part of submitted sequences. It has been pointed out that SeqClean, a program used to remove vector contamination from sequences, does not take into account that vectors are circular structures. A workaround has been presented before, but we were able to simplify the process and, additionally, we provide an implementation. We further applied our method to a test set of EST sequences and also analyzed the amount of contamination found in the EST sequences available on NCBI. © 2012 IEEE.
  • Conference Object
    Citation - Scopus: 17
    Systematic Computational Analysis of Potential Rnai Regulation in Toxoplasma Gondii
    (Institute of Electrical and Electronics Engineers Inc., 2010) Çakır, Mehmet Volkan; Allmer, Jens
    RNA interference (RNAi) is the mechanism through which RNA interferes with the production of other RNAs in a sequence specific manner. Micro RNA (miRNA) is a type of RNA which is transcribed as pri-miRNAs and processed to premiRNAs in the nucleus. These pre-miRNAs are then exported from the nucleus and processed in the cytoplasm to double stranded RNA with one strand providing target specificity.. Toxoplasma gondii is a parasitic apicomplexan which causes several diseases. T. gondii is a good candidate for computational efforts with its small and publicly available genome files and extensive information about its gene structure. Although the existence of RNA interference in T. gondii is being debated, establishment of its complete potential RNAi regulatory network may be beneficial for further investigations into the topic. ©2009 IEEE.