Machine-learning-assisted de novo design of molybdenum disulfide binding peptides
Loading...
Files
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Open Access Color
OpenAIRE Downloads
OpenAIRE Views
Abstract
Kısa amino asit zincirleri, peptitler, biyolojik süreçler ve yüksek teknoloji uygulamaları için vazgeçilmez moleküllerdir. Geniş kullanım alanları arasında, moleküler tanıma özelliği ile bio-nano arayüzler oluşturmak ilgi toplayan bir araştırma konusu olmuştur. Yapılan çalışmalar sonucunda yönlendirilmiş evrim metodolojileri oluşturulmuş ve çeşitli hedeflere -enzim, antijen veya inorganik yapılar- bağlanan fonksiyonel peptit tanısı mümkün hale gelmiştir fakat bu geleneksel yaklaşım ölçeklenebilirlik ve sekans uzayındaki ilişkilerin anlaşılması konusunda zayıflıklar taşımaktadır. Bu zafiyetler, yüksek çıktılı sekanslama ve hesaplama verimlerinin artması ile beraber derin yönlendirilmiş evrim gibi daha güçlü teknolojilerinin geliştirilmesini motive etmiştir. Bu yöntemle üretilen büyük veri setleri, sekans-fonksiyon ilişkilerinin makine öğrenmesi ile modellenebilmesinin önünü açmıştır. Bu tezin amacı bu veri setlerine uygun bir makine öğrenmesi akışı oluşturmaktır. Bu düzlemde Random Forest algoritması ve derin nöral ağlar kullanılmış, eğitilen modellerin bağlanma puanı öngörüleri beraber kullanıldığında mutlak hata sırasıyla, 0.0304, Pearson korelasyon ölçütü 0.904 olarak elde edilmiştir. Bu modelleri kullanarak rastgele arama ve tekrarlayan optimizasyonlar ile güçlü bağlanan örnek bir peptit tasarlanmıştır. Bulgular alan bilgisinin makine öğrenme modeli eğitimdeki yerini vurgulamış, kullanılan örnek ağırlıklarının ve semantik amino asit vektörlerinin başarıya önemli katkıları gözlemlenmiştir. Bu çalışma çeşitli fonksiyonlara sahip peptit tasarlayabilen bir platform oluşturabilmek için temel noktaları göz önüne serer.
Peptides are molecular entities with a diverse set of functionalities vital for biological processes and biotechnological applications. Among their roles, the ability of peptides to bind to solid materials has gathered attention, particularly as building blocks in constructing bio-nano interfaces and molecular linkers. Directed evolution techniques such as iterative phage display, have emerged as capable tools for identifying peptides and proteins with specific affinities for various targets despite its constraints, particularly its low-throughput nature. Those limits have motivated the work on more advanced methodologies such as deep-directed evolution, which integrates high-throughput sequencing. By collecting massive amounts of data, deep-directed evolution provides a broad landscape of sequence information, thus enabling computational modeling and optimization of peptide sequences. This thesis aims to develop machine learning workflows that capture the sequence-function relationship from the data, allowing the design of peptides with desired functionalities. Two machine learning approaches were employed: the Random Forest algorithm (RF) and deep neural networks (DNN). By aggregating binding score predictions from the two models, the predictor achieved a Pearson correlation coefficient of 0.904 and a mean absolute error of 0.0304 on the high- confidence test set and was employed to design a candidate peptide as a proof of principle. Our findings emphasize the importance of including domain knowledge via peptide abundance weighting and amino acid encoding types while designing training strategies. The procedures outlined in this work demonstrate key steps towards designing a peptide sequence-function prediction platform with broad implications for bio-nanotechnology and engineering.
Peptides are molecular entities with a diverse set of functionalities vital for biological processes and biotechnological applications. Among their roles, the ability of peptides to bind to solid materials has gathered attention, particularly as building blocks in constructing bio-nano interfaces and molecular linkers. Directed evolution techniques such as iterative phage display, have emerged as capable tools for identifying peptides and proteins with specific affinities for various targets despite its constraints, particularly its low-throughput nature. Those limits have motivated the work on more advanced methodologies such as deep-directed evolution, which integrates high-throughput sequencing. By collecting massive amounts of data, deep-directed evolution provides a broad landscape of sequence information, thus enabling computational modeling and optimization of peptide sequences. This thesis aims to develop machine learning workflows that capture the sequence-function relationship from the data, allowing the design of peptides with desired functionalities. Two machine learning approaches were employed: the Random Forest algorithm (RF) and deep neural networks (DNN). By aggregating binding score predictions from the two models, the predictor achieved a Pearson correlation coefficient of 0.904 and a mean absolute error of 0.0304 on the high- confidence test set and was employed to design a candidate peptide as a proof of principle. Our findings emphasize the importance of including domain knowledge via peptide abundance weighting and amino acid encoding types while designing training strategies. The procedures outlined in this work demonstrate key steps towards designing a peptide sequence-function prediction platform with broad implications for bio-nanotechnology and engineering.
Description
Includes bibliographical references (leaves. 48-56)
Thesis (Master)--İzmir Institute of Technology, Bioengineering, Izmir, 2024
Text in English; Abstract: Turkish and English
Thesis (Master)--İzmir Institute of Technology, Bioengineering, Izmir, 2024
Text in English; Abstract: Turkish and English
Keywords
Two-dimensional materials, Peptides, Neural networks, Deep learning, Biomimetic materials, Machine learning methods, Artificial intelligence
Fields of Science
Citation
WoS Q
Scopus Q
Source
Volume
Issue
Start Page
End Page
75
