GCRIS

Now showing 1 - 3 of 3

Automatic, Fast and Accurate Sequence Decontamination
(Izmir Institute of Technology, 2016) Bağcı, Caner; Allmer, Jens; Tekir, Selma
The introduction of massively parallel sequencing technologies was a revolutionary step in genomics. Their decreasing cost and powerful features have put them more and more on demand in the last decade. It is now possible to sequence even complete genomes of organisms, using massively parallel sequencing technologies even for small laboratories around the world. However, the power of this powerful technology comes with its challenges. The challenges are both in technological and computational side of the work. In this work, one of these computational challenges is addressed and a novel algorithm is offered to solve the problem. Sequencing by synthesis is one of the methods used in many different massively parallel sequencing instruments. This method utilizes the biological process of DNA replication and with the help of different means of detection, it allows sequencing a DNA molecule while it is replicated. Since DNA polymerase requires a primer to start the replication reaction, short oligonucleotide adapters are used in sequencing by synthesis methods to initiate the reaction. However, certain circumstances allow these adapters to contaminate final sequence reads. Several tools have been offered to trim adapters from reads; but all depend on the prior knowledge of the adapter sequence by the bioinformatician. In this work, an algorithm is offered to detect and trim adapters only using the sequences of reads, without relying on prior knowledge of adapter sequences. The algorithm was shown to perform better or on the same grounds with existing methods in terms of speed and efficiency.
Ray: a Profile-Based Approach for Homology Matching of Tandem-Ms Spectra To Sequence Databases
(Izmir Institute of Technology, 2012) Yılmaz, Şule; Allmer, Jens; Karaçalı, Bilge
Mass spectrometry is a tool that is commonly used in proteomics to identify and quantify proteins. Thousands of spectra can be obtained in just few hours. Computational methods enable the analysis of high-throughput studies. There are mainly two strategies: database search and de novo sequencing. Most of the researchers prefer database search as a first choice but any slight changes on protein can prevent identification. In such cases, de novo sequencing can be used. However, this approach highly depends on spectral quality and it is difficult to achieve predictions with full length sequence. Peptide sequence tags (PST) allows some flexibility on database searches. A PST is a short amino acid sequence with certain mass information but obtaining accurate PST is still arduous. In case a sequence is missing in database, homology searches can be useful. There are some homology search algorithms such as MS-BLAST, MS-Shotgun, FASTS. But, they are altered versions of existing algorithms, for example BLAST has been modified for mass spectrometric data and became MS-BLAST. Besides, they are usually coupled with de novo sequencing which still possess limitations. Therefore, there is a need for novel algorithms in order to increase the scope of homology searches. For this purpose, a novel approach that is based on sequence profiles has been implemented. A sequence profile is like a table that contains frequencies of all possible amino acids on a given MS/MS spectrum. Then, they are aligned to sequences in database. Profiles are more specific than PSTs and the requirement for precursor mass restrictions or enzyme information can be removed.
Evaluation of Protein Secondary Structure Prediction Algorithms on a New Advanced Benchmark Dataset
(Izmir Institute of Technology, 2011) Has, Canan; Allmer, Jens
Starting from 1970s, researchers have been studying secondary structure prediction. However the accuracy of state-of art methods reach to approximately 80- 85%. One of the reasons for that is related with the limitations in respect to datasets used for training or testing the algorithm. A number of databases with n number of experimentally determined proteins, which also contain the knowledge of functionality, biochemical properties and location annotation of proteins, will directly show us how the algorithms work on certain groups of proteins. This also ensures opportunity to users to determine the quality of algorithms on those datasets and to decide on which algorithm can be used for which type of proteins. In this thesis, the objective is set through the development of a new and advanced protein benchmark database which contains functional and biochemical information of experimentally defined 64872 proteins in S2C database derived by ProteinDataBank (PDB). With this database, the seven available predictors are evaluated in respect to their performances on different datasets in terms of functionality and subcellular localization of proteins in the benchmark database. According to the results obtained on proposed benchmark datasets in compare to results on one of existing dataset, RS126, it was shown that grouping proteins into functions in their subcellular localizations have a great impact on deciding the accuracies of existing algorithms.

Master Degree / Yüksek Lisans Tezleri

Browse

Filters

Settings

Sort By

Results per page

Search Results