Scopus İndeksli Yayınlar Koleksiyonu / Scopus Indexed Publications Collection
Permanent URI for this collectionhttps://hdl.handle.net/11147/7148
Browse
6 results
Search Results
Now showing 1 - 6 of 6
Article Citation - Scopus: 1Dynamic Frequent Subgraph Mining Algorithms Over Evolving Graphs: a Survey(PeerJ Inc., 2024) Bostanoğlu, B.E.; Abuzayed, N.Frequent subgraph mining (FSM) is an essential and challenging graph mining task used in several applications of the modern data science. Some of the FSM algorithms have the objective of finding all frequent subgraphs whereas some of the algorithms focus on discovering frequent subgraphs approximately. On the other hand, modern applications employ evolving graphs where the increments are small graphs or stream of nodes and edges. In such cases, FSM task becomes more challenging due to growing data size and complexity of the base algorithms. Recently we see frequent subgraph mining algorithms designed for dynamic graph data. However, there is no comparative review of the dynamic subgraph mining algorithms focusing on the discovery of frequent subgraphs over evolving graph data. This article focuses on the characteristics of dynamic frequent subgraph mining algorithms over evolving graphs. We first introduce and compare dynamic frequent subgraph mining algorithms; trying to highlight their attributes as increment type, graph type, graph representation, internal data structure, algorithmic approach, programming approach, base algorithm and output type. Secondly, we introduce and compare the approximate frequent subgraph mining algorithms for dynamic graphs with additional attributes as their sampling strategy, data in the sample, statistical guarantees on the sample and their main objective. Finally, we highlight research opportunities in this specific domain from our perspective. Overall, we aim to introduce the research area of frequent subgraph mining over evolving graphs with the hope that this can serve as a reference and inspiration for the researchers of the field. © (2024), (PeerJ Inc.). All rights reserved.Article Citation - WoS: 12Citation - Scopus: 12Quantitative Real-Time Pcr Analysis of Bacterial Biomarkers Enable Fast and Accurate Monitoring in Inflammatory Bowel Disease(PeerJ Inc., 2022) Sezgin, Efe; Terlemez, Gamze; Bozkurt, Berkay; Bengi, Göksel; Akpınar, Hale; Büyüktorun, İlkerInflammatory bowel diseases (IBD) affect millions of people worldwide with increasing incidence. Ulcerative colitis (UC) and Crohn’s disease (CD) are the two most common IBDs. There is no definite cure for IBD, and response to treatment greatly vary among patients. Therefore, there is urgent need for biomarkers to monitor therapy efficacy, and disease prognosis. We aimed to test whether qPCR analysis of common candidate bacteria identified from a patient’s individual fecal microbiome can be used as a fast and reliable personalized microbial biomarker for efficient monitoring of disease course in IBD. Next generation sequencing (NGS) of 16S rRNA gene region identified species level microbiota profiles for a subset of UC, CD, and control samples. Common high abundance bacterial species observed in all three groups, and reported to be associated with IBD are chosen as candidate marker species. These species, and total bacteria amount are quantified in all samples with qPCR. Relative abundance of anti-inflammatory, beneficial Faecalibacterium prausnitzii, Akkermansia muciniphila, and Streptococcus thermophilus was significantly lower in IBD compared to control samples. Moreover, the relative abundance of the examined common species was correlated with the severity of IBD disease. The variance in qPCR data was much lower compared to NGS data, and showed much higher statistical power for clinical utility. The qPCR analysis of target common bacterial species can be a powerful, cost and time efficient approach for monitoring disease status and identify better personalized treatment options for IBD patients.Article Citation - WoS: 1Creation of Mutants by Using Centrality Criteria in Social Network Analysis(PeerJ Inc., 2020) Takan, SavaşMutation testing is a method widely used to evaluate the effectiveness of the test suite in hardware and software tests or to design new software tests. In mutation testing, the original model is systematically mutated using certain error assumptions. Mutation testing is based on well-defined mutation operators that imitate typical programming errors or which form highly successful test suites. The success of test suites is determined by the rate of killing mutants created through mutation operators. Because of the high number of mutants in mutation testing, the calculation cost increases in the testing of finite state machines (FSM). Under the assumption that each mutant is of equal value, random selection can be a practical method of mutant reduction. However, in this study, it was assumed that each mutant did not have an equal value. Starting from this point of view, a new mutant reduction method was proposed by using the centrality criteria in social network analysis. It was assumed that the central regions selected within this frame were the regions from where test cases pass the most. To evaluate the proposed method, besides the feature of detecting all failures related to the model, the widely-used W method was chosen. Random and proposed mutant reduction methods were compared with respect to their success by using test suites. As a result of the evaluations, it was discovered that mutants selected via the proposed reduction technique revealed a higher performance. Furthermore, it was observed that the proposed method reduced the cost of mutation testing.Article Citation - WoS: 1Citation - Scopus: 1Dnmso; an Ontology for Representing De Novo Sequencing Results From Tandem-Ms Data(PeerJ Inc., 2020) Takan, Savaş; Allmer, JensFor the identification and sequencing of proteins, mass spectrometry (MS) has become the tool of choice and, as such, drives proteomics. MS/MS spectra need to be assigned a peptide sequence for which two strategies exist. Either database search or de novo sequencing can be employed to establish peptide spectrum matches. For database search, mzIdentML is the current community standard for data representation. There is no community standard for representing de novo sequencing results, but we previously proposed the de novo markup language (DNML). At the moment, each de novo sequencing solution uses different data representation, complicating downstream data integration, which is crucial since ensemble predictions may be more useful than predictions of a single tool. We here propose the de novo MS Ontology (DNMSO), which can, for example, provide many-to-many mappings between spectra and peptide predictions. Additionally, an application programming interface (API) that supports any file operation necessary for de novo sequencing from spectra input to reading, writing, creating, of the DNMSO format, as well as conversion from many other file formats, has been implemented. This API removes all overhead from the production of de novo sequencing tools and allows developers to concentrate on algorithm development completely. We make the API and formal descriptions of the format freely available at https://github.com/savastakan/dnmso.Article Citation - WoS: 14Citation - Scopus: 13Delineating the Impact of Machine Learning Elements in Pre-Microrna Detection(PeerJ Inc., 2017) Saçar Demirci, Müşerref Duygu; Allmer, JensGene regulation modulates RNA expression via transcription factors. Posttranscriptional gene regulation in turn influences the amount of protein product through, for example, microRNAs (miRNAs). Experimental establishment of miRNAs and their effects is complicated and even futile when aiming to establish the entirety of miRNA target interactions. Therefore, computational approaches have been proposed. Many such tools rely on machine learning (ML) which involves example selection, feature extraction, model training, algorithm selection, and parameter optimization. Different ML algorithms have been used for model training on various example sets, more than 1,000 features describing pre-miRNAs have been proposed and different training and testing schemes have been used for model establishment. For pre-miRNA detection, negative examples cannot easily be established causing a problem for two class classification algorithms. There is also no consensus on what ML approach works best and, therefore, we set forth and established the impact of the different parts involved in ML on model performance. Furthermore, we established two new negative datasets and analyzed the impact of using them for training and testing. It was our aim to attach an order of importance to the parts involved in ML for pre-miRNA detection, but instead we found that all parts are intricately connected and their contributions cannot be easily untangled leading us to suggest that when attempting ML-based pre-miRNA detection many scenarios need to be explored.Article Citation - WoS: 14Citation - Scopus: 12The Impact of Feature Selection on One and Two-Class Classification Performance for Plant Micrornas(PeerJ Inc., 2016) Khalifa, Waleed; Yousef, Malik; Saçar Demirci, Müşerref Duygu; Allmer, JensMicroRNAs (miRNAs) are short nucleotide sequences that form a typical hairpin structure which is recognized by a complex enzyme machinery. It ultimately leads to the incorporation of 18-24 nt long mature miRNAs into RISC where they act as recognition keys to aid in regulation of target mRNAs. It is involved to determine miRNAs experimentally and, therefore, machine learning is used to complement such endeavors. The success of machine learning mostly depends on proper input data and appropriate features for parameterization of the data. Although, in general, two-class classification (TCC) is used in the field; because negative examples are hard to come by, one-class classification (OCC) has been tried for pre-miRNA detection. Since both positive and negative examples are currently somewhat limited, feature selection can prove to be vital for furthering the field of pre-miRNA detection. In this study, we compare the performance of OCC and TCC using eight feature selection methods and seven different plant species providing positive pre-miRNA examples. Feature selection was very successful for OCC where the best feature selection method achieved an average accuracy of 95.6%, thereby being ~29% better than the worst method which achieved 66.9% accuracy. While the performance is comparable to TCC, which performs up to 3% better than OCC, TCC is much less affected by feature selection and its largest performance gap is ~13% which only occurs for two of the feature selection methodologies. We conclude that feature selection is crucially important for OCC and that it can perform on par with TCC given the proper set of features.
