Identifying Promoter and Enhancer Sequences by Graph Convolutional Networks

dc.contributor.author Tenekeci,S.
dc.contributor.author Tekir,S.
dc.date.accessioned 2024-05-05T14:59:36Z
dc.date.available 2024-05-05T14:59:36Z
dc.date.issued 2024
dc.description.abstract Identification of promoters, enhancers, and their interactions helps understand genetic regulation. This study proposes a graph-based semi-supervised learning model (GCN4EPI) for the enhancer-promoter classification problem. We adopt a graph convolutional network (GCN) architecture to integrate interaction information with sequence features. Nodes of the constructed graph hold word embeddings of DNA sequences while edges hold the Enhancer-Promoter Interaction (EPI) information. By means of semi-supervised learning, much less data (16%) and time are needed in model training. Comparisons on a benchmark dataset of six human cell lines show that the proposed approach outperforms the state-of-the-art methods by a large margin (10% higher F1 score) and has the fastest training time (up to 3 times). Moreover, GCN4EPI's performance on cross-cell line data is also better than the baselines (3% higher F1 score). Our qualitative analyses with graph explainability models prove that GCN4EPI learns from both text and graph structure. The results suggest that integrating interaction information with sequence features improves predictive performance and compensates for the number of training instances. © 2024 en_US
dc.identifier.doi 10.1016/j.compbiolchem.2024.108040
dc.identifier.issn 1476-9271
dc.identifier.scopus 2-s2.0-85186593399
dc.identifier.uri https://doi.org/10.1016/j.compbiolchem.2024.108040
dc.identifier.uri https://hdl.handle.net/11147/14420
dc.language.iso en en_US
dc.publisher Elsevier Ltd en_US
dc.relation.ispartof Computational Biology and Chemistry en_US
dc.rights info:eu-repo/semantics/closedAccess en_US
dc.subject Enhancer en_US
dc.subject Graph convolutional networks en_US
dc.subject Natural language processing en_US
dc.subject Promoter en_US
dc.subject Sequence analysis en_US
dc.title Identifying Promoter and Enhancer Sequences by Graph Convolutional Networks en_US
dc.type Article en_US
dspace.entity.type Publication
gdc.author.scopusid 57340107000
gdc.author.scopusid 16234844500
gdc.bip.impulseclass C5
gdc.bip.influenceclass C5
gdc.bip.popularityclass C5
gdc.coar.access metadata only access
gdc.coar.type text::journal::journal article
gdc.collaboration.industrial false
gdc.description.department Izmir Institute of Technology en_US
gdc.description.departmenttemp Tenekeci S., Department of Computer Engineering, Izmir Institute of Technology, Izmir, 35430, Turkey; Tekir S., Department of Computer Engineering, Izmir Institute of Technology, Izmir, 35430, Turkey en_US
gdc.description.publicationcategory Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı en_US
gdc.description.scopusquality Q3
gdc.description.volume 110 en_US
gdc.description.wosquality Q1
gdc.identifier.openalex W4392301339
gdc.identifier.pmid 38430611
gdc.identifier.wos WOS:001209523700001
gdc.index.type WoS
gdc.index.type Scopus
gdc.oaire.diamondjournal false
gdc.oaire.impulse 1.0
gdc.oaire.influence 2.6398488E-9
gdc.oaire.isgreen false
gdc.oaire.keywords Enhancer Elements, Genetic
gdc.oaire.keywords Humans
gdc.oaire.keywords Neural Networks, Computer
gdc.oaire.keywords Promoter Regions, Genetic
gdc.oaire.popularity 3.0011467E-9
gdc.oaire.publicfunded false
gdc.openalex.collaboration National
gdc.openalex.fwci 0.96049445
gdc.openalex.normalizedpercentile 0.64
gdc.opencitations.count 0
gdc.plumx.mendeley 6
gdc.plumx.pubmedcites 1
gdc.plumx.scopuscites 2
gdc.scopus.citedcount 2
gdc.wos.citedcount 1
relation.isAuthorOfPublication.latestForDiscovery 57639474-3954-4f77-a84c-db8a079648a8
relation.isOrgUnitOfPublication.latestForDiscovery 9af2b05f-28ac-4003-8abe-a4dfe192da5e

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Name:
1-s2.0-S1476927124000288-main.pdf
Size:
1.46 MB
Format:
Adobe Portable Document Format
Description:
Article