Scopus İndeksli Yayınlar Koleksiyonu / Scopus Indexed Publications Collection

Permanent URI for this collectionhttps://hdl.handle.net/11147/7148

Browse

Search Results

Now showing 1 - 10 of 11
  • Article
    Citation - WoS: 4
    Citation - Scopus: 9
    Pgminer Reloaded, Fully Automated Proteogenomic Annotation Tool Linking Genomes To Proteomes
    (Informationsmanagement in der Biotechnologie e.V. (IMBio e.V.), 2016) Has, Canan; Lashin, Sergey A.; Kochetov, Alexey; Allmer, Jens
    Improvements in genome sequencing technology increased the availability of full genomes and transcriptomes of many organisms. However, the major benefit of massive parallel sequencing is to better understand the organization and function of genes which then lead to understanding of phenotypes. In order to interpret genomic data with automated gene annotation studies, several tools are currently available. Even though the accuracy of computational gene annotation is increasing, a combination of multiple lines of experimental evidences should be gathered. Mass spectrometry allows the identification and sequencing of proteins as major gene products; and it is only these proteins that conclusively show whether a part of a genome is a coding region or not to result in phenotypes. Therefore, in the field of proteogenomics, the validation of computational methods is done by exploiting mass spectrometric data. As a result, identification of novel protein coding regions, validation of current gene models, and determination of upstream and downstream regions of genes can be achieved. In this paper, we present new functionality for our proteogenomic tool, PGMiner which performs all proteogenomic steps like acquisition of mass spectrometric data, peptide identification against preprocessed sequence databases, assignment of statistical confidence to identified peptides, mapping confident peptides to gene models, and result visualization. The extensions cover determining proteotypic peptides and thus unambiguous protein identification. Furthermore, peptides conflicting with gene models can now automatically assessed within the context of predicted alternative open reading frames.
  • Article
    Citation - WoS: 7
    Citation - Scopus: 5
    A Machine Learning Approach for Microrna Precursor Prediction in Retro-Transcribing Virus Genomes
    (Informationsmanagement in der Biotechnologie e.V. (IMBio e.V.), 2016) Saçar Demirci, Müşerref Duygu; Toprak, Mustafa; Allmer, Jens
    Identification of microRNA (miRNA) precursors has seen increased efforts in recent years. The difficulty in experimental detection of pre-miRNAs increased the usage of computational approaches. Most of these approaches rely on machine learning especially classification. In order to achieve successful classification, many parameters need to be considered such as data quality, choice of classifier settings, and feature selection. For the latter one, we developed a distributed genetic algorithm on HTCondor to perform feature selection. Moreover, we employed two widely used classification algorithms libSVM and random forest with different settings to analyze the influence on the overall classification performance. In this study we analyzed 5 human retro virus genomes; Human endogenous retrovirus K113, Hepatitis B virus (strain ayw), Human T lymphotropic virus 1, Human T lymphotropic virus 2, Human immunodeficiency virus 2, and Human immunodeficiency virus 1. We then predicted pre-miRNAs by using the information from known virus and human pre-miRNAs. Our results indicate that these viruses produce novel unknown miRNA precursors which warrant further experimental validation.
  • Article
    Citation - WoS: 4
    Citation - Scopus: 4
    Improving the Quality of Positive Datasets for the Establishment of Machine Learning Models for Pre-Microrna Detection
    (Informationsmanagement in der Biotechnologie e.V. (IMBio e.V.), 2017) Saçar Demirci, Müşerref Duygu; Allmer, Jens
    MicroRNAs (miRNAs) are involved in the post-transcriptional regulation of protein abundance and thus have a great impact on the resulting phenotype. It is, therefore, no wonder that they have been implicated in many diseases ranging from virus infections to cancer. This impact on the phenotype leads to a great interest in establishing the miRNAs of an organism. Experimental methods are complicated which led to the development of computational methods for pre-miRNA detection. Such methods generally employ machine learning to establish models for the discrimination between miRNAs and other sequences. Positive training data for model establishment, for the most part, stems from miRBase, the miRNA registry. The quality of the entries in miRBase has been questioned, though. This unknown quality led to the development of filtering strategies in attempts to produce high quality positive datasets which can lead to a scarcity of positive data. To analyze the quality of filtered data we developed a machine learning model and found it is well able to establish data quality based on intrinsic measures. Additionally, we analyzed which features describing pre-miRNAs could discriminate between low and high quality data. Both models are applicable to data from miRBase and can be used for establishing high quality positive data. This will facilitate the development of better miRNA detection tools which will make the prediction of miRNAs in disease states more accurate. Finally, we applied both models to all miRBase data and provide the list of high quality hairpins.
  • Article
    Citation - WoS: 9
    Citation - Scopus: 11
    Development of Simple Sequence Repeat Markers in Hazelnut (corylus Avellana L.) by Next-Generation Sequencing and Discrimination of Turkish Hazelnut Cultivars
    (Springer, 2018) Özturk, Süleyman Can; Göktay, Mehmet; Doğanlar, Sami; Allmer, Jens; Frary, Anne
    European hazelnut (Corylus avellana) is a diploid tree species and is widely used in confections. Hazelnuts are, to a large part, produced in Turkey with the cultivar "Tombul" widely grown in the Black Sea region. In this work, the "Tombul" genome was partially sequenced by next-generation sequencing technology yielding 29.2% (111.85 Mb) of the similar to 385 Mb (1C). This sequence information was used to develop genetic markers in order to enable differentiation of material before the long maturation process and to facilitate future breeding strategies. A total of 90,142 simple sequence repeats (SSRs) were identified in the contigs giving a frequency of 1 SSR per 1240 nt in the assembly. Mononucleotides were the most abundant SSR marker type (60.9%) followed by di- and trinucleotides. Primer pairs were designed for 75,139 (83.3%) of the SSRs. Fifty SSR primers were applied to 47 hazelnut accessions from nine countries to test their effectiveness and polymorphism. The markers amplified an average of 3.2 fragments. The highest polymorphism information content value was for cavSSR11062 (0.97) and the lowest (0.04) was for cavSSR13386. Two markers were monomorphic: cavSSR12855 and cavSSR13267. Single-copy SSR primers were also assessed for their ability to discriminate 19 Turkish cultivars, and it was found that seven primer pairs (Cav4217, Cav14875, Cav14418, Cav2704, Cav12862, Cav3909, Cav1361) were sufficient for this task. Thus, this study developed new SSR markers for use in hazelnut breeding and genetic studies and also provide a method to distinguish and identify true-type Turkish cultivars.
  • Article
    Citation - WoS: 10
    Citation - Scopus: 11
    Intersection of Microrna and Gene Regulatory Networks and Their Implication in Cancer
    (Bentham Science Publishers B.V., 2014) Yousef, Malik; Trinh, Hung V.; Allmer, Jens
    MicroRNAs (miRNAs) have attracted heightened attention for their role as post-transcriptional regulators of gene expression. It has become clear that miRNAs can both up- and downregulate protein expression. According to current estimates, most human genes are harboring miRNAs and/or are regulated by them. Thus miRNAs form a complex network of expression regulation which tightly interacts with known gene regulatory networks. Similar to some transcription factors, some miRNAs can have hundreds of target transcripts whose expression they modulate. Thus miRNAs can form complex regulatory networks by themselves, but because their expression is often tightly coordinated with gene expression, they form an intertwined regulatory network with many possible interactions among gene and miRNA regulatory pathways. In this review we first consider gene regulatory networks. Then we discuss microRNAs and their implication in cancer and how they may form regulatory networks. Finally, we give our perspective and provide an outlook including the aspect of personalized medicine.
  • Article
    Citation - WoS: 11
    Citation - Scopus: 14
    Categorization of Species Based on Their Micrornas Employing Sequence Motifs, Information-Theoretic Sequence Feature Extraction, and K-Mers
    (Springer Verlag, 2017) Yousef, Malik; Nigatu, Dawit; Levy, Dalit; Allmer, Jens; Henkel, Werner
    Background: Diseases like cancer can manifest themselves through changes in protein abundance, and microRNAs (miRNAs) play a key role in the modulation of protein quantity. MicroRNAs are used throughout all kingdoms and have been shown to be exploited by viruses to modulate their host environment. Since the experimental detection of miRNAs is difficult, computational methods have been developed. Many such tools employ machine learning for pre-miRNA detection, and many features for miRNA parameterization have been proposed. To train machine learning models, negative data is of importance yet hard to come by; therefore, we recently started to employ pre-miRNAs from one species as positive data versus another species’ pre-miRNAs as negative examples based on sequence motifs and k-mers. Here, we introduce the additional usage of information-theoretic (IT) features. Results: Pre-miRNAs from one species were used as positive and another species’ pre-miRNAs as negative training data for machine learning. The categorization capability of IT and k-mer features was investigated. Both feature sets and their combinations yielded a very high accuracy, which is as good as the previously suggested sequence motif and k-mer based method. However, for obtaining a high performance, a sufficiently large phylogenetic distance between the species and sufficiently high number of pre-miRNAs in the training set is required. To examine the contribution of the IT and k-mer features, an information gain-based feature ranking was performed. Although the top 3 are IT features, 80% of the top 100 features are k-mers. The comparison of all three individual approaches (motifs, IT, and k-mers) shows that the distinction of species based on their pre-miRNAs k-mers are sufficient. Conclusions: IT sequence feature extraction enables the distinction among species and is less computationally expensive than motif calculations. However, since IT features need larger amounts of data to have enough statistics for producing highly accurate results, future categorization into species can be effectively done using k-mers only. The biological reasoning for this is the existence of a codon bias between species which can, at least, be observed in exonic miRNAs. Future work in this direction will be the ab initio detection of pre-miRNA. In addition, prediction of pre-miRNA from RNA-seq can be done.
  • Article
    Citation - WoS: 20
    Citation - Scopus: 24
    Newly Developed Ssr Markers Reveal Genetic Diversity and Geographical Clustering in Spinach (spinacia Oleracea)
    (Springer Verlag, 2017) Göl, Şurhan; Göktay, Mehmet; Allmer, Jens; Doğanlar, Sami; Frary, Anne
    Spinach is a popular leafy green vegetable due to its nutritional composition. It contains high concentrations of vitamins A, E, C, and K, and folic acid. Development of genetic markers for spinach is important for diversity and breeding studies. In this work, Next Generation Sequencing (NGS) technology was used to develop genomic simple sequence repeat (SSR) markers. After cleaning and contig assembly, the sequence encompassed 2.5% of the 980 Mb spinach genome. The contigs were mined for SSRs. A total of 3852 SSRs were detected. Of these, 100 primer pairs were tested and 85% were found to yield clear, reproducible amplicons. These 85 markers were then applied to 48 spinach accessions from worldwide origins, resulting in 389 alleles with 89% polymorphism. The average gene diversity (GD) value of the markers (based on a GD calculation that ranges from 0 to 0.5) was 0.25. Our results demonstrated that the newly developed SSR markers are suitable for assessing genetic diversity and population structure of spinach germplasm. The markers also revealed clustering of the accessions based on geographical origin with clear separation of Far Eastern accessions which had the overall highest genetic diversity when compared with accessions from Persia, Turkey, Europe, and the USA. Thus, the SSR markers have good potential to provide valuable information for spinach breeding and germplasm management. Also they will be helpful for genome mapping and core collection establishment.
  • Article
    Citation - WoS: 6
    Citation - Scopus: 5
    Development of Genomic Simple Sequence Repeat Markers in Faba Bean by Next-Generation Sequencing
    (Springer Verlag, 2017) Abuzayed, Mazen A.; Göktay, Mehmet; Allmer, Jens; Doğanlar, Sami; Frary, Anne
    Faba bean (Vicia faba L.) is an important food legume crop with a huge genome. Development of genetic markers for faba bean is important to study diversity and for molecular breeding. In this study, we used Next Generation Sequencing (NGS) technology for the development of genomic simple sequence repeat (SSR) markers. A total of 14,027,500 sequence reads were obtained comprising 4,208 Mb. From these reads, 56,063 contigs were assembled (16,367 Mb) and 2138 SSRs were identified. Mono and dinucleotides were the most abundant, accounting for 57.5 % and 20.9 % of all SSR repeats, respectively. A total of 430 primer pairs were designed from contigs larger than 350 nucleotides and 50 primers pairs were tested for validation of SSR locus amplification. Nearly all (96 %) of the markers were found to produce clear amplicons and to be reproducible. Thirty-nine SSR markers were then applied to 46 faba bean accessions from worldwide origins, resulting in 161 alleles with 87.5 % polymorphism, and an average of 4.1 alleles per marker. Gene diversity (GD) of the markers ranged from 0 to 0.48 with an average of 0.27. Testing of the markers showed that they were useful in determining genetic relationships and population structure in faba bean accessions.
  • Article
    Citation - WoS: 25
    Citation - Scopus: 21
    Can Mirbase Provide Positive Data for Machine Learning for the Detection of Mirna Hairpins?
    (Informationsmanagement in der Biotechnologie e.V. (IMBio e.V.), 2013) Demirci, Müşerref Duygu Saçar; Hamzeiy, Hamid; Allmer, Jens
    Experimental detection and validation of miRNAs is a tedious, time-consuming, and expensive process. Computational methods for miRNA gene detection are being developed so that the number of candidates that need experimental validation can be reduced to a manageable amount. Computational methods involve homology-based and ab inito algorithms. Both approaches are dependent on positive and negative training examples. Positive examples are usually derived from miRBase, the main resource for experimentally validated miRNAs. We encountered some problems with miRBase which we would like to report here. Some problems, among others, we encountered are that folds presented in miRBase are not always the fold with the minimum free energy; some entries do not seem to conform to expectations of miRNAs, and some external accession numbers are not valid. In addition, we compared the prediction accuracy for the same negative dataset when the positive data came from miRBase or miRTarBase and found that the latter led to more precise prediction models. We suggest that miRBase should introduce some automated facilities for ensuring data quality to overcome these problems.
  • Article
    Citation - WoS: 4
    Citation - Scopus: 12
    Existing Bioinformatics Tools for the Quantitation of Post-Translational Modifications
    (Springer Verlag, 2012) Allmer, Jens
    Mass spectrometry (MS)-based proteomics, by itself, is a vast and complex area encompassing various mass spectrometers, different spectra, and search result representations. When the aim is quantitation performed in different scanning modes at different MS levels, matters become additionally complex. Quantitation of post-translational modifications (PTM) represents the greatest challenge among these endeavors. Many different approaches to quantitation have been described and some of these can be directly applied to the quantitation of PTMs. The amount of data produced via MS, however, makes manual data interpretation impractical. Therefore, specialized software tools meet this challenge. Any software currently able to quantitate differentially labeled samples may theoretically be adapted to quantitate differential PTM expression among samples as well. Due to the heterogeneity of mass spectrometry-based proteomics; this review will focus on quantitation of PTM using liquid chromatography followed by one or more stages of mass spectrometry. Currently available free software, which either allow analysis of PTM or are easily adaptable for this purpose, is briefly reviewed in this paper. Selected studies, especially those related to phosphoproteomics, shall be used to highlight the current ability to quantitate PTMs. © Springer-Verlag 2010