Master Degree / Yüksek Lisans Tezleri

Permanent URI for this collectionhttps://hdl.handle.net/11147/3008

Browse

Search Results

Now showing 1 - 7 of 7

Quasi-Supervised Strategies for Compound-Protein Interaction Prediction [master Thesis]
(01. Izmir Institute of Technology, 2021) Çakı, Onur; Karaçalı, Bilge
In-silico prediction of compound-protein interaction using computational methods preserves its importance in various pharmacology applications because the wet-lab experiments are time-consuming, laborious and costly. Most machine learning methods proposed to that end approach this problem with supervised learning strategies in which known interactions are labeled as positive and the rest are labeled as negative. However, treating all unknown interactions as negative instances may lead to inaccuracies in real practice since some of the unknown interactions are bound to be positive interactions waiting to be identified as such. In this study, we propose to address this problem using the Quasi-Supervised Learning algorithm. In this framework, potential interactions are predicted by estimating the overlap between two datasets: a true positive dataset which consists of compound-protein pairs with known interactions and an unknown dataset which consists of all the remaining compound-protein pairs. The potential interactions are then identified as those in the unknown dataset that overlap with the interacting pairs in the true positive dataset in terms of the associated similarity structure between interacting pairs. Experimental results on GPCR and Nuclear Receptor datasets show that the proposed method can identify actual interactions from all possible combinations.
Bioinformatic Analysis and Biostatistical Modelling of Genetic Interactions Between Microbiota and Host
(01. Izmir Institute of Technology, 2020) Musa, Farid; Sezgin, Efe
Advances in genome sequencing technology have revolutionized the study of microorganisms. Recent genome-wide association studies (GWAS) on gut microbiota revealed fascinating discoveries about the effect of microbiota on our health. In this thesis, Drosophila Melanogaster samples were used to investigate the associations between the host's genotype and microbiota. The meta-analysis of microbiota data was performed using PhyloMAF, a novel, and comprehensive microbiome meta-analysis framework. The resulting microbial abundance tables were analyzed using alpha and phylogenetic beta bio-diversity metrics, which were used in the microbiome GWAS study. Significant variant associations were further analyzed in the post-GWAS analysis. The results of our study show that several genomic variants are significantly associated with bio-diversity estimates. Among identified variants, few were found to be associated with more specific phenotypes. Particularly, the gene involved in folate transport and linked to folate malabsorption was found to be associated with Proteobacteria. The latter for its part was found to be one of the primary phyla containing the highest number of genes responsible for de-novo folate synthesis. Similarly, the fly gene related to immune function with the human homologous gene linked to the inflammatory gut disease was found to be associated with the Acetobacter genus. This genus based on the literature survey was found to be associated with an immune deficiency in a fruit fly. In summary, this research revealed captivating findings of genetic factors associated with fruit fly microbiota. The limitations and future directions were stated in order to provide the basis for future prospective studies.
Bioinformatics Based Approach To Design a Thermophilic P450 Fot Industrial Biocatalysis
(Izmir Institute of Technology, 2019) Kestevur Doğru, Ekin; Sürmeli Eraltuğ, Nur Başak
Enzyme catalyzed biosynthesis of steroidal drugs is an important process for pharmaceutical manufacturing. Cytochrome P450 (P450) monooxygenases are important for hydroxylation of steroid structures because they can catalyze the oxidation of inactive carbon bonds with high selectivity and efficiency. CYP119 is an acidothermophilic P450 from Sulfolobus acidocaldarius, which has the potential to be used as biocatalyst for industrial production since it shows activity at high temperature and low pH conditions. In this work we aim to use CYP119 for selective hydroxylation of progesterone, which is not the original substrate of CYP119, for production of precursor molecules of important hormones like cortisone and aldosterone. Crystal structure of CYP119 (PDB ID: 1F4T) was used for selecting residues that were mutated according to structural alignment with other CYPs that can catalyze progesterone hydroxylation naturally. Progesterone-docking performed with CYP119 to identify residues that create clashes with substrate. Finally selected 12 residues (Leu69, Val151, Phe153, Leu155, Leu205, Ile208, Ala209, Thr213, Thr214, Val254, Thr257, Leu354) were mutated with PyRosetta program to Gly, Glu, Phe, Met, Ala, His, Arg and Ile. Progesterone-docking performed with using DockMCM Protocol of PyRosetta. We used two different starting coordinates of progesterone for docking and results were eliminated according to their energy scores. Best mutants were used for creating double/triple mutants and second round of docking and elimination process were performed with using double/triple mutant enzymes. Final number of 11 mutants with best scores were selected and their possible products were identified.
Developing a Guide of Bioinformatic Database for Probiotic Products
(Izmir Institute of Technology, 2019) Yılmaz, Melike; Harsa, Hayriye Şebnem; Sezgin, Efe
Recently, probiotic use has rapidly expanded, as they have potential health effects for microbiota to protect homeostasis in the human body. Bioinformatics is generally defined as collecting and analysing biological data. Establishing a bioinformatic system for probiotics, would have a potential to emphasize the beneficial impacts for human health, while enabling cross examination on diseases and products. In this study, new information has been collected about probiotics based on in vitro, in vivo, clinical trials and meta-analysis to develop a comprehensive guide. Metaanalyses of sixteen and seventeen randomized, controlled trials of S. boulardii (Sb) against diarrhea reported pooled relative risks of 0.51 (95% CI [0.40-0.64]) in adults and 0.55 (95% CI [0.42-0.72]) in children, respectively. These results demonstrated that Sb was effective for preventing and treating different types of diarrhea in adult and children patients. An in silico gene expression study conducted in Tecnico Lisboa* comparing Sb probiotic and non-probiotic Saccharomyces cerevisiae (Sc) strains showed transcription regulation differences in 26 genes. An in silico pipeline that was used as the basis for a new query in the ProBioYeastract database was developed. A cross-strain promoter analysis, comparing Sb CNCM I-745 and Unique28 strains with Sc S288C strain showed that the expression of 26 probiotic-related genes was predicted to be controlled by different transcription factors in probiotic vs non-probiotic strains. Among the evaluated six selected genes, a gene involved in biofilm formation, aggregation, and adhesion, EFG1, was found to be up-regulated in Sb CNCM I-745 compared to Sc BY4741.
Automatic, Fast and Accurate Sequence Decontamination
(Izmir Institute of Technology, 2016) Bağcı, Caner; Allmer, Jens; Tekir, Selma
The introduction of massively parallel sequencing technologies was a revolutionary step in genomics. Their decreasing cost and powerful features have put them more and more on demand in the last decade. It is now possible to sequence even complete genomes of organisms, using massively parallel sequencing technologies even for small laboratories around the world. However, the power of this powerful technology comes with its challenges. The challenges are both in technological and computational side of the work. In this work, one of these computational challenges is addressed and a novel algorithm is offered to solve the problem. Sequencing by synthesis is one of the methods used in many different massively parallel sequencing instruments. This method utilizes the biological process of DNA replication and with the help of different means of detection, it allows sequencing a DNA molecule while it is replicated. Since DNA polymerase requires a primer to start the replication reaction, short oligonucleotide adapters are used in sequencing by synthesis methods to initiate the reaction. However, certain circumstances allow these adapters to contaminate final sequence reads. Several tools have been offered to trim adapters from reads; but all depend on the prior knowledge of the adapter sequence by the bioinformatician. In this work, an algorithm is offered to detect and trim adapters only using the sequences of reads, without relying on prior knowledge of adapter sequences. The algorithm was shown to perform better or on the same grounds with existing methods in terms of speed and efficiency.
Ray: a Profile-Based Approach for Homology Matching of Tandem-Ms Spectra To Sequence Databases
(Izmir Institute of Technology, 2012) Yılmaz, Şule; Allmer, Jens; Karaçalı, Bilge
Mass spectrometry is a tool that is commonly used in proteomics to identify and quantify proteins. Thousands of spectra can be obtained in just few hours. Computational methods enable the analysis of high-throughput studies. There are mainly two strategies: database search and de novo sequencing. Most of the researchers prefer database search as a first choice but any slight changes on protein can prevent identification. In such cases, de novo sequencing can be used. However, this approach highly depends on spectral quality and it is difficult to achieve predictions with full length sequence. Peptide sequence tags (PST) allows some flexibility on database searches. A PST is a short amino acid sequence with certain mass information but obtaining accurate PST is still arduous. In case a sequence is missing in database, homology searches can be useful. There are some homology search algorithms such as MS-BLAST, MS-Shotgun, FASTS. But, they are altered versions of existing algorithms, for example BLAST has been modified for mass spectrometric data and became MS-BLAST. Besides, they are usually coupled with de novo sequencing which still possess limitations. Therefore, there is a need for novel algorithms in order to increase the scope of homology searches. For this purpose, a novel approach that is based on sequence profiles has been implemented. A sequence profile is like a table that contains frequencies of all possible amino acids on a given MS/MS spectrum. Then, they are aligned to sequences in database. Profiles are more specific than PSTs and the requirement for precursor mass restrictions or enzyme information can be removed.
Evaluation of Protein Secondary Structure Prediction Algorithms on a New Advanced Benchmark Dataset
(Izmir Institute of Technology, 2011) Has, Canan; Allmer, Jens
Starting from 1970s, researchers have been studying secondary structure prediction. However the accuracy of state-of art methods reach to approximately 80- 85%. One of the reasons for that is related with the limitations in respect to datasets used for training or testing the algorithm. A number of databases with n number of experimentally determined proteins, which also contain the knowledge of functionality, biochemical properties and location annotation of proteins, will directly show us how the algorithms work on certain groups of proteins. This also ensures opportunity to users to determine the quality of algorithms on those datasets and to decide on which algorithm can be used for which type of proteins. In this thesis, the objective is set through the development of a new and advanced protein benchmark database which contains functional and biochemical information of experimentally defined 64872 proteins in S2C database derived by ProteinDataBank (PDB). With this database, the seven available predictors are evaluated in respect to their performances on different datasets in terms of functionality and subcellular localization of proteins in the benchmark database. According to the results obtained on proposed benchmark datasets in compare to results on one of existing dataset, RS126, it was shown that grouping proteins into functions in their subcellular localizations have a great impact on deciding the accuracies of existing algorithms.

Master Degree / Yüksek Lisans Tezleri

Browse

Filters

Settings

Sort By

Results per page

Search Results