Sürdürülebilir Yeşil Kampüs Koleksiyonu / Sustainable Green Campus Collection

Permanent URI for this collectionhttps://hdl.handle.net/11147/7755

Browse

Search Results

Now showing 1 - 2 of 2
  • Master Thesis
    Estimation of Low Sucrose Concentrations and Classification of Bacteria Concentrations With Machine Learning on Spectroscopic Data
    (Izmir Institute of Technology, 2019) Mezgil, Bahadır; Baştanlar, Yalın; Baştanlar, Yalın
    Spectroscopy can be used to identify elements. In a similar way, there are recent studies that use optical spectroscopy to measure the material concentrations in chemical solutions. In this study, we employ machine learning techniques on collected ultraviolet-visible spectra to estimate the level of sucrose concentrations in solutions and to classify bacteria concentrations. Some metal nanoparticles are very sensitive to refraction index changes in the environment and this helps to detect small refraction index changes in the solution. In our study, gold nanoparticles are used and we benefited from this property to estimate sucrose concentrations. The samples in different low sucrose concentration solutions are obtained by mixing the sucrose measured with precision scales with pure water and then the UV-Vis spectrum of each sample is measured. For the bacteria concentration solutions, spectra for six different bacteria concentrations are captured. Spectra of the same solutions are also captured before adding the bacteria. For each of these solutions, four sets are prepared where gold nanoparticles are not grown (minute 0) and grown for 4 minutes, 10 minutes and 12 minutes. After the dataset preparation, these spectrum measurements are transferred into MATLAB environment as sucrose concentration dataset and bacteria solution dataset. Then the necessary preprocessing steps are performed in order to get the most informative and distinguishing information from these datasets. The raw measurement values and processed spectrum measurements are trained with shallow Artificial Neural Networks (ANN) on MATLAB Deep Learning Toolbox and Support Vector Machine (SVM) on MATLAB Statistics and Machine Learning Toolbox. When the results of the conducted machine learning experiments are examined, success rate is promising for the estimation of sucrose concentrations and very high for classification of bacteria concentrations in pure water solution.
  • Article
    Citation - WoS: 11
    Citation - Scopus: 14
    Categorization of Species Based on Their Micrornas Employing Sequence Motifs, Information-Theoretic Sequence Feature Extraction, and K-Mers
    (Springer Verlag, 2017) Yousef, Malik; Nigatu, Dawit; Levy, Dalit; Allmer, Jens; Henkel, Werner
    Background: Diseases like cancer can manifest themselves through changes in protein abundance, and microRNAs (miRNAs) play a key role in the modulation of protein quantity. MicroRNAs are used throughout all kingdoms and have been shown to be exploited by viruses to modulate their host environment. Since the experimental detection of miRNAs is difficult, computational methods have been developed. Many such tools employ machine learning for pre-miRNA detection, and many features for miRNA parameterization have been proposed. To train machine learning models, negative data is of importance yet hard to come by; therefore, we recently started to employ pre-miRNAs from one species as positive data versus another species’ pre-miRNAs as negative examples based on sequence motifs and k-mers. Here, we introduce the additional usage of information-theoretic (IT) features. Results: Pre-miRNAs from one species were used as positive and another species’ pre-miRNAs as negative training data for machine learning. The categorization capability of IT and k-mer features was investigated. Both feature sets and their combinations yielded a very high accuracy, which is as good as the previously suggested sequence motif and k-mer based method. However, for obtaining a high performance, a sufficiently large phylogenetic distance between the species and sufficiently high number of pre-miRNAs in the training set is required. To examine the contribution of the IT and k-mer features, an information gain-based feature ranking was performed. Although the top 3 are IT features, 80% of the top 100 features are k-mers. The comparison of all three individual approaches (motifs, IT, and k-mers) shows that the distinction of species based on their pre-miRNAs k-mers are sufficient. Conclusions: IT sequence feature extraction enables the distinction among species and is less computationally expensive than motif calculations. However, since IT features need larger amounts of data to have enough statistics for producing highly accurate results, future categorization into species can be effectively done using k-mers only. The biological reasoning for this is the existence of a codon bias between species which can, at least, be observed in exonic miRNAs. Future work in this direction will be the ab initio detection of pre-miRNA. In addition, prediction of pre-miRNA from RNA-seq can be done.