Molecular Biology and Genetics / Moleküler Biyoloji ve Genetik

Permanent URI for this collectionhttps://hdl.handle.net/11147/9

Browse

Search Results

Now showing 1 - 9 of 9
  • Conference Object
    Preparing Sequence Databases for Application in Proteogenomics
    (Springer, 2016) Has, Canan; Mungan, Mehmet Direnç; Çiftçi, Cansu; Allmer, Jens
    Proteomics involves the identification of proteins from complex mixtures which is performed using mass spectrometry (MS) followed by computational data analysis. MS/MS spectra can either be sequenced de novo if no sequence is available for the proteins in the mixture, or by using database search algorithms such as OMSSA, X!Tandem, and MSGF+.
  • Conference Object
    Database Normalization Is Crucial for Reliable Protein Identification in Mass Spectrometry-Based Proteomics
    (Springer, 2016) Has, Canan; Mungan, Mehmet Direnç; Çiftçi, Cansu; Allmer, Jens
    Research in proteomics is driven by mass spectrometry, especially the identification of proteins from complex samples. Computational analysis of the resulting data determines the peptide sequences of the recorded spectra and integrates identifications into proteins. For this, database search algorithms can be employed, but they need a list of amino acid sequences that are expected to exist in the sample. Many algorithms have been proposed and consensus scoring has been performed. While the comparison/integration among results from different algorithms is important, there has been no attempt to integrate the results from searching multiple databases. This is, however, important since it poses technical problems when all databases, needed for a study, are simply concatenated. Unfortunately, it has been shown that databases of different size influence scoring and prohibit the direct comparison of results.
  • Article
    Citation - WoS: 10
    Citation - Scopus: 11
    Intersection of Microrna and Gene Regulatory Networks and Their Implication in Cancer
    (Bentham Science Publishers B.V., 2014) Yousef, Malik; Trinh, Hung V.; Allmer, Jens
    MicroRNAs (miRNAs) have attracted heightened attention for their role as post-transcriptional regulators of gene expression. It has become clear that miRNAs can both up- and downregulate protein expression. According to current estimates, most human genes are harboring miRNAs and/or are regulated by them. Thus miRNAs form a complex network of expression regulation which tightly interacts with known gene regulatory networks. Similar to some transcription factors, some miRNAs can have hundreds of target transcripts whose expression they modulate. Thus miRNAs can form complex regulatory networks by themselves, but because their expression is often tightly coordinated with gene expression, they form an intertwined regulatory network with many possible interactions among gene and miRNA regulatory pathways. In this review we first consider gene regulatory networks. Then we discuss microRNAs and their implication in cancer and how they may form regulatory networks. Finally, we give our perspective and provide an outlook including the aspect of personalized medicine.
  • Article
    Citation - WoS: 11
    Citation - Scopus: 14
    Categorization of Species Based on Their Micrornas Employing Sequence Motifs, Information-Theoretic Sequence Feature Extraction, and K-Mers
    (Springer Verlag, 2017) Yousef, Malik; Nigatu, Dawit; Levy, Dalit; Allmer, Jens; Henkel, Werner
    Background: Diseases like cancer can manifest themselves through changes in protein abundance, and microRNAs (miRNAs) play a key role in the modulation of protein quantity. MicroRNAs are used throughout all kingdoms and have been shown to be exploited by viruses to modulate their host environment. Since the experimental detection of miRNAs is difficult, computational methods have been developed. Many such tools employ machine learning for pre-miRNA detection, and many features for miRNA parameterization have been proposed. To train machine learning models, negative data is of importance yet hard to come by; therefore, we recently started to employ pre-miRNAs from one species as positive data versus another species’ pre-miRNAs as negative examples based on sequence motifs and k-mers. Here, we introduce the additional usage of information-theoretic (IT) features. Results: Pre-miRNAs from one species were used as positive and another species’ pre-miRNAs as negative training data for machine learning. The categorization capability of IT and k-mer features was investigated. Both feature sets and their combinations yielded a very high accuracy, which is as good as the previously suggested sequence motif and k-mer based method. However, for obtaining a high performance, a sufficiently large phylogenetic distance between the species and sufficiently high number of pre-miRNAs in the training set is required. To examine the contribution of the IT and k-mer features, an information gain-based feature ranking was performed. Although the top 3 are IT features, 80% of the top 100 features are k-mers. The comparison of all three individual approaches (motifs, IT, and k-mers) shows that the distinction of species based on their pre-miRNAs k-mers are sufficient. Conclusions: IT sequence feature extraction enables the distinction among species and is less computationally expensive than motif calculations. However, since IT features need larger amounts of data to have enough statistics for producing highly accurate results, future categorization into species can be effectively done using k-mers only. The biological reasoning for this is the existence of a codon bias between species which can, at least, be observed in exonic miRNAs. Future work in this direction will be the ab initio detection of pre-miRNA. In addition, prediction of pre-miRNA from RNA-seq can be done.
  • Article
    Citation - WoS: 20
    Citation - Scopus: 25
    Microrna Categorization Using Sequence Motifs and K-Mers
    (BioMed Central Ltd., 2017) Yousef, Malik; Khalifa, Waleed; Acar, İlhan Erkin; Allmer, Jens
    Background: Post-transcriptional gene dysregulation can be a hallmark of diseases like cancer and microRNAs (miRNAs) play a key role in the modulation of translation efficiency. Known pre-miRNAs are listed in miRBase, and they have been discovered in a variety of organisms ranging from viruses and microbes to eukaryotic organisms. The computational detection of pre-miRNAs is of great interest, and such approaches usually employ machine learning to discriminate between miRNAs and other sequences. Many features have been proposed describing pre-miRNAs, and we have previously introduced the use of sequence motifs and k-mers as useful ones. There have been reports of xeno-miRNAs detected via next generation sequencing. However, they may be contaminations and to aid that important decision-making process, we aimed to establish a means to differentiate pre-miRNAs from different species. Results: To achieve distinction into species, we used one species' pre-miRNAs as the positive and another species' pre-miRNAs as the negative training and test data for the establishment of machine learned models based on sequence motifs and k-mers as features. This approach resulted in higher accuracy values between distantly related species while species with closer relation produced lower accuracy values. Conclusions: We were able to differentiate among species with increasing success when the evolutionary distance increases. This conclusion is supported by previous reports of fast evolutionary changes in miRNAs since even in relatively closely related species a fairly good discrimination was possible.
  • Article
    Citation - Scopus: 29
    Computational Methods for Ab Initio Detection of Micrornas
    (Frontiers Media S.A., 2012) Allmer, Jens; Yousef, Malik
    MicroRNAs are small RNA sequences of 18-24 nucleotides in length, which serve as templates to drive post-transcriptional gene silencing. The canonical microRNA pathway starts with transcription from DNA and is followed by processing via the microprocessor complex, yielding a hairpin structure. Which is then exported into the cytosol where it is processed by Dicer and then incorporated into the RNA-induced silencing complex. All of these biogenesis steps add to the overall specificity of miRNA production and effect. Unfortunately, their modes of action are just beginning to be elucidated and therefore computational prediction algorithms cannot model the process but are usually forced to employ machine learning approaches. This work focuses on ab initio prediction methods throughout; and therefore homology-based miRNA detection methods are not discussed. Current ab initio prediction algorithms, their ties to data mining, and their prediction accuracy are detailed.
  • Article
    Citation - WoS: 91
    Citation - Scopus: 106
    Algorithms for the De Novo Sequencing of Peptides From Tandem Mass Spectra
    (Taylor & Francis, 2011) Allmer, Jens
    Proteomics is the study of proteins, their time- and location-dependent expression profiles, as well as their modifications and interactions. Mass spectrometry is useful to investigate many of the questions asked in proteomics. Database search methods are typically employed to identify proteins from complex mixtures. However, databases are not often available or, despite their availability, some sequences are not readily found therein. To overcome this problem, de novo sequencing can be used to directly assign a peptide sequence to a tandem mass spectrometry spectrum. Many algorithms have been proposed for de novo sequencing and a selection of them are detailed in this article. Although a standard accuracy measure has not been agreed upon in the field, relative algorithm performance is discussed. The current state of the de novo sequencing is assessed thereafter and, finally, examples are used to construct possible future perspectives of the field. © 2011 Expert Reviews Ltd.
  • Article
    Citation - WoS: 4
    Citation - Scopus: 12
    Existing Bioinformatics Tools for the Quantitation of Post-Translational Modifications
    (Springer Verlag, 2012) Allmer, Jens
    Mass spectrometry (MS)-based proteomics, by itself, is a vast and complex area encompassing various mass spectrometers, different spectra, and search result representations. When the aim is quantitation performed in different scanning modes at different MS levels, matters become additionally complex. Quantitation of post-translational modifications (PTM) represents the greatest challenge among these endeavors. Many different approaches to quantitation have been described and some of these can be directly applied to the quantitation of PTMs. The amount of data produced via MS, however, makes manual data interpretation impractical. Therefore, specialized software tools meet this challenge. Any software currently able to quantitate differentially labeled samples may theoretically be adapted to quantitate differential PTM expression among samples as well. Due to the heterogeneity of mass spectrometry-based proteomics; this review will focus on quantitation of PTM using liquid chromatography followed by one or more stages of mass spectrometry. Currently available free software, which either allow analysis of PTM or are easily adaptable for this purpose, is briefly reviewed in this paper. Selected studies, especially those related to phosphoproteomics, shall be used to highlight the current ability to quantitate PTMs. © Springer-Verlag 2010
  • Article
    Citation - WoS: 1
    Citation - Scopus: 2
    Label-Free Quantitation, an Extension To 2db
    (Springer Verlag, 2010) Allmer, Jens
    Determining the differential expression of proteins under different conditions is of major importance in proteomics. Since mass spectrometry-based proteomics is often used to quantify proteins, several labelling strategies have been developed. While these are generally more precise than label-free quantitation approaches, they imply specifically designed experiments which also require knowledge about peptides that are expected to be measured and need to be modified. We recently designed the 2DB database which aids storage, analysis, and publication of data from mass spectrometric experiments to identify proteins. This database can aid identifying peptides which can be used for quantitation. Here an extension to the database application, named MSMAG, is presented which allows for more detailed analysis of the distribution of peptides and their associated proteins over the fractions of an experiment. Furthermore, given several biological samples in the database, label-free quantitation can be performed. Thus, interesting proteins, which may warrant further investigation, can be identified en passant while performing high-throughput proteomics studies. © 2009 Springer-Verlag.