Master Degree / Yüksek Lisans Tezleri
Permanent URI for this collectionhttps://hdl.handle.net/11147/3008
Browse
10 results
Search Results
Master Thesis Mining the Toxoplasma Gondii Genome for Microrna Regulatory Patterns(Izmir Institute of Technology, 2017) Acar, İlhan Erkin; Allmer, JensToxoplasma gondii is a parasite that causes mental retardation, blindness or nearblindness, and decreased psycho-motor performance if the patient is congenitally infected. There have been efforts to vaccinate humans against this parasite, yet it was not achieved. Therefore, a better understanding of Toxoplasma gondii can be provided by examining its microRNA regulation. MicroRNAs are known to regulate messenger RNAs and prevent translation. This results in different effects in different biological pathways. In this study, the Toxoplasma gondii genome was used to predict precursor and mature microRNAs, while experimentally validated microRNAs were taken into consideration. This was further explored in terms of microRNA targeting, with the known genes of Toxoplasma gondii. Furthermore, RNA Sequencing data of this organism was obtained and analysed in terms of gene expression and possible microRNA expression outcomes. Combining gene expression analyses with targeting predictions, it was possible to create a microRNA - gene interaction network. Gene expression analyses showed that there was no differentially expressed genes, microRNAs or interactions between two developmental stages of Toxoplasma gondii, tachyzoite and bradyzoite. This result was added to interactions to determine up and down regulations. Then, all of these interactions were connected where they intersect, to create a regulation network of microRNAs. This network was further explored and compared to random networks of the same size. It was seen that the biological network contains many larger sized cliques. This knowledge can be further analysed in future work, to create drug leads that will target vital pathways of Toxoplasma gondii.Master Thesis Importance of Database Normalization for Reliable Protein Identification in Mass Spectrometry-Based Proteomics(Izmir Institute of Technology, 2016) Mungan, Mehmet Direnç; Allmer, Jens; Yalçın, TalatOne of the revolutionary steps towards proteomics, was introducing mass spectrometry to protein inference analysis. Its powerful aspects such as speed, and accuracy towards identifying and quantifying proteins have made it the first choice to obtain highthroughput data. Due to development of a variety of fragmentation techniques, mass spectrometry-based analysis even made it possible to acquire knowledge about single polymorphisms and modifications of amino acids of a peptide. Although this technology provides enormous amounts of data, identification of the proteins is still a hard challenge to overcome due to the shortcomings of computational methods. Herein a novel methodology is offered to better analyze mass spectrometry data and overcome the deficiency of protein identification algorithms in terms of speed and accuracy. When the spectral data is acquired from an organism by mass spectrometry, database search algorithms are used for protein identification if the protein sequences of the organism are known. These algorithms compare the experimental data from mass spectrometry analysis to theoretical data gathered from known databases of organism to try and find the best match by ranking the PSMs via scoring functions. Since the databases can be too large to search and multiple databases with different sizes can contain the peptides of experimental data, database search algorithms may fail to produce fair, fast or complete results. In this work a methodology is presented to overcome unfair scoring of peptides in different size databases and enable database search algorithms to utilize relatively big sized entries such as human chromosome six frame translations. In terms of speed and accuracy the method is found to be better than some of the existing methods.Master Thesis Automatic, Fast and Accurate Sequence Decontamination(Izmir Institute of Technology, 2016) Bağcı, Caner; Allmer, Jens; Tekir, SelmaThe introduction of massively parallel sequencing technologies was a revolutionary step in genomics. Their decreasing cost and powerful features have put them more and more on demand in the last decade. It is now possible to sequence even complete genomes of organisms, using massively parallel sequencing technologies even for small laboratories around the world. However, the power of this powerful technology comes with its challenges. The challenges are both in technological and computational side of the work. In this work, one of these computational challenges is addressed and a novel algorithm is offered to solve the problem. Sequencing by synthesis is one of the methods used in many different massively parallel sequencing instruments. This method utilizes the biological process of DNA replication and with the help of different means of detection, it allows sequencing a DNA molecule while it is replicated. Since DNA polymerase requires a primer to start the replication reaction, short oligonucleotide adapters are used in sequencing by synthesis methods to initiate the reaction. However, certain circumstances allow these adapters to contaminate final sequence reads. Several tools have been offered to trim adapters from reads; but all depend on the prior knowledge of the adapter sequence by the bioinformatician. In this work, an algorithm is offered to detect and trim adapters only using the sequences of reads, without relying on prior knowledge of adapter sequences. The algorithm was shown to perform better or on the same grounds with existing methods in terms of speed and efficiency.Master Thesis A Lattice-Based Approach for News Chain Construction(Izmir Institute of Technology, 2015) Toprak, Mustafa; Tekir, Selma; Allmer, JensEach news article and column can be part of a manually created news story or chain by journalists and columnists. However, increasing amounts of data published by news companies each year makes manual analysis thus creation of news stories and chains almost impossible. When the amount of data is considered, it is obvious that automated systems’ support is vital to journalists, columnists and intelligence analysts. A news chain is a set of news articles that form a connected and coherent whole. In the traditional “connecting the dots” approach, news chains are constructed based on given two articles as start and end news of the chain. In this study, a method is proposed to create coherent news chains without the predetermination of start and end articles of the chain. Intuition of the method comes from the partial order relation among news articles. We try to show that lattice structure can represent relation or hierarchy among news articles that have a partial order in nature. Creating concept lattice is prepared out of the inverted index structure of news articles which is one of the main contributions of the study. In the experimental work, an artificial dataset is processed to show the steps of the method. After that, we also provide the evaluation using real dataset results.Master Thesis Systematic Computational Analysis of Potential Rna Interference Regulation in Toxoplasma Gondii(Izmir Institute of Technology, 2009) Çakır, Mehmet Volkan; Allmer, JensRNA-mediated silencing was first described in plants and became famous by studies in Caenorhabditis elegans. RNA interference (RNAi) is the mechanism through which an RNA interferes with the production of other RNAs in a sequence specific manner. MiRNAs are a type of RNA which originate from the genome with their active form being ss-RNAs of 21-23 nucleotides in length. They are being transcribed as primiRNAs then processed in the nucleus by Drosha to pre-miRNAs with a stem-loop structure and 70 nucleotides in length. This stem-loop containing pre-miRNAs is then processed in the cytoplasm to ds-RNA one strand of which will serve as interfering RNA. Toxoplasma gondii is a species of parasitic protozoa which causes several diseases. T.gondii emerges as a good candidate for computational efforts with its small genome size, publicly available genome files and extensive information about its gene structure, either based on experimental data or the prediction with several gene finders in parallel. Therefore, it seems important to establish the regulatory network composed of RNAi which may be beneficial for the Toxoplasma community. Within this context the pool of possible stem-loop constitutive transcripts are produced, further analysis of this pool for desired 2D structure is integrated and mapping of possible RNAi regulation to T.gondii.s genome is established. In connection with computational assessment and mapping, the derived information is provided as a database for quick lookup using a convenient web interface for experimental studies of RNAi regulation in Toxoplasma, thus reduce time and money costs in such studies.Master Thesis An Integrative Data Mining Approach for Microrna Detection in Human(Izmir Institute of Technology, 2013) Saçar, Müşerref Duygu; Allmer, JensMicroRNAs (miRNAs) are single-stranded, small, usually non-coding RNAs of about 22 nucleotides in length, that control gene expression at the posttranscriptional level through translational inhibition, degradation, adenylation, or destabilization of their target mRNAs. Although hundreds of miRNAs have been identified in various species, many more may still remain unknown. Therefore, the discovery of new miRNA genes is an important step for understanding miRNA mediated post transcriptional regulation mechanisms. First attempts for the identification of novel miRNA genes were almost exclusively based on directional cloning of endogenous small RNAs and high-throughput sequencing of large numbers of cDNA clones. However, conventional forward genetic screening is known to be biased towards abundantly and/or ubiquitously expressed miRNAs that can dominate the cloned products. Hence, such biological approaches might be limited in their ability to detect rare miRNAs, and restricted to the tissues and the developmental stage of the organism under examination. These limitations have led to the development of sophisticated computational approaches attempting to identify possible miRNAs in silico. Nevertheless, the programs designed to predict possible miRNAs in a genome are not sensitive or accurate enough to warrant sufficient confidence for validating all their predictions experimentally. With this study, we aim to solve these problems by developing a new and sensitive machine learning based approach to predict potential miRNAs in the human genome.Master Thesis Quality Assessment of De Novo Sequence Assembly Tools(Izmir Institute of Technology, 2012) Gültekin, Visam; Allmer, JensHigh-throughput next generation sequencing technologies progressed very rapidly; revolutionized genomics by providing a robust working field for new studies to be performed and promising the facilitation of the achievements that was extremely challenging before. Although the massive output of these instruments is getting more accurate, still delivers the projection of the real sequence in very short fragments; which necessitates another process of merging and ordering those fragments to reconstruct the larger sequences. This process is performed by sequence assemblers and in the absence of a reference genome; it becomes a de novo sequence assembly. Since assembling millions of fragments in biological aspects have many obvious challenges, there have been many studies specifically focused on developing tools that can adapt to newly announced sequencing technologies, take advantage of the computer science achievements and the technological advancement of computer hardware to the utmost. But these sequence assemblers also need to justify the gain they claim. We took 5 of the commonly used assemblers and assembled two genomic datasets, mined the never mentioned statistics before and commonly used statistics that thought to be the representative of the quality of the assembly. On top of that we also used experimentally validated data that is known to be a part of the organisms’ genome and trailed those in assemblies.Master Thesis Ray: a Profile-Based Approach for Homology Matching of Tandem-Ms Spectra To Sequence Databases(Izmir Institute of Technology, 2012) Yılmaz, Şule; Allmer, Jens; Karaçalı, BilgeMass spectrometry is a tool that is commonly used in proteomics to identify and quantify proteins. Thousands of spectra can be obtained in just few hours. Computational methods enable the analysis of high-throughput studies. There are mainly two strategies: database search and de novo sequencing. Most of the researchers prefer database search as a first choice but any slight changes on protein can prevent identification. In such cases, de novo sequencing can be used. However, this approach highly depends on spectral quality and it is difficult to achieve predictions with full length sequence. Peptide sequence tags (PST) allows some flexibility on database searches. A PST is a short amino acid sequence with certain mass information but obtaining accurate PST is still arduous. In case a sequence is missing in database, homology searches can be useful. There are some homology search algorithms such as MS-BLAST, MS-Shotgun, FASTS. But, they are altered versions of existing algorithms, for example BLAST has been modified for mass spectrometric data and became MS-BLAST. Besides, they are usually coupled with de novo sequencing which still possess limitations. Therefore, there is a need for novel algorithms in order to increase the scope of homology searches. For this purpose, a novel approach that is based on sequence profiles has been implemented. A sequence profile is like a table that contains frequencies of all possible amino acids on a given MS/MS spectrum. Then, they are aligned to sequences in database. Profiles are more specific than PSTs and the requirement for precursor mass restrictions or enzyme information can be removed.Master Thesis Evaluation of Protein Secondary Structure Prediction Algorithms on a New Advanced Benchmark Dataset(Izmir Institute of Technology, 2011) Has, Canan; Allmer, JensStarting from 1970s, researchers have been studying secondary structure prediction. However the accuracy of state-of art methods reach to approximately 80- 85%. One of the reasons for that is related with the limitations in respect to datasets used for training or testing the algorithm. A number of databases with n number of experimentally determined proteins, which also contain the knowledge of functionality, biochemical properties and location annotation of proteins, will directly show us how the algorithms work on certain groups of proteins. This also ensures opportunity to users to determine the quality of algorithms on those datasets and to decide on which algorithm can be used for which type of proteins. In this thesis, the objective is set through the development of a new and advanced protein benchmark database which contains functional and biochemical information of experimentally defined 64872 proteins in S2C database derived by ProteinDataBank (PDB). With this database, the seven available predictors are evaluated in respect to their performances on different datasets in terms of functionality and subcellular localization of proteins in the benchmark database. According to the results obtained on proposed benchmark datasets in compare to results on one of existing dataset, RS126, it was shown that grouping proteins into functions in their subcellular localizations have a great impact on deciding the accuracies of existing algorithms.Master Thesis Exploiting Fragment-Ion Complementarity for Peptide De Novo Sequencing From Collision Induced Dissociation Tandem Mass Spectra(Izmir Institute of Technology, 2011) Aytun, Belgin; Allmer, JensPeptide identification from mass spectrometric data is a key step in proteomics because this field provides sequence, quantitative, and modification data of actually expressed proteins. Two approaches are generally deployed to interpret experimental MS/MS data, database searching and de novo sequencing. Database search method has been used successfully in proteomics projects for organisms with well-studied genomes. However, it is not applicable in situations where a target sequence is not in the protein database. This can happen for a number of reasons, including novel proteins, protein mutations and post-translational modifications. Because of the disadvantages of database searching method, a lot of research has focused on de novo sequencing method which assigns amino acid sequences to MS/MS spectra without the need for a database. The aim of this study is to enhance the accuracy of de novo sequencing tools. One step commonly employed in all de novo sequencing tools is naming of fragment ions. It is essential to know which peak represents which ion type in order to traverse a spectrum graph to find an amino acid sequence that best explains the MS/MS spectrum. Different approaches have been tried to name ions and some success has been achieved in naming b-type ions and y-type ions. We have presented a new approach which enables the naming of not only b- and y-type ions but other arbitrary ion types as well. This enabled the detection of b-ion ladder. In the latter case, missing fragments were determined by using other named ion types. Furthermore, unexplained data in tandem mass spectra were reduced as much as possible. Therefore, a complete sequence will be derived by the new approach.
