Recognition of Counterfactual Statements in Turkish

dc.contributor.author Acar, Ali
dc.contributor.author Tekir, Selma
dc.date.accessioned 2025-02-25T20:01:08Z
dc.date.available 2025-02-25T20:01:08Z
dc.date.issued 2025
dc.description.abstract Counterfactual statements are examples of causal reasoning as they describe events that did not happen and, optionally, those events' consequences if they happened. SemEval-2020 introduces the counterfactual detection (CFD) task and shares an English dataset. Since then, a set of datasets has been released in English, German, and Japanese as part of Amazon product reviews. This work releases the first Turkish corpus of counterfactuals (TRCD). The data collection process is driven by a clue phrase list of counterfactuals, mainly in the form of verb inflections in Turkish. We use clue phrase-based filtering to collect sentences from the Turkish National Corpus (TNC). On the other hand, half of the collection is subject to random word filtering to avoid selection bias due to clue phrases. After the human annotation process with an Inter Annotator Agreement of 0.65, we have 5000 sentences, of which 12.8% contain counterfactual statements. Furthermore, we provide a comprehensive baseline of transformer-based models by testing the effect of clue phrases, cross-lingual performance comparisons using the available CFD datasets, and zero-shot cross-lingual classification experiments using fine-tuning on the different combinations of the existing datasets. The results confirm that TRCD is compatible with the other CFD datasets. Moreover, fine-tuning a Turkish-specific model (BERTurk) performs better than the multilingual alternatives (mBERT and XLM-R). BERTurk is more robust to clue phrase masking. This result emphasizes the importance of a language-specific tokenizer for contextual understanding, especially for low-resource languages. Finally, our qualitative analysis gives insights into errors by different models. en_US
dc.identifier.doi 10.1145/3706105
dc.identifier.issn 2375-4699
dc.identifier.issn 2375-4702
dc.identifier.scopus 2-s2.0-85216341068
dc.identifier.uri https://doi.org/10.1145/3706105
dc.identifier.uri https://hdl.handle.net/11147/15405
dc.language.iso en en_US
dc.publisher Assoc Computing Machinery en_US
dc.relation.ispartof ACM Transactions on Asian and Low-Resource Language Information Processing
dc.rights info:eu-repo/semantics/openAccess en_US
dc.subject Turkish en_US
dc.subject Corpus en_US
dc.subject Counterfactual Detection en_US
dc.subject Multilingual Transformers en_US
dc.subject Berturk en_US
dc.title Recognition of Counterfactual Statements in Turkish en_US
dc.type Article en_US
dspace.entity.type Publication
gdc.bip.impulseclass C5
gdc.bip.influenceclass C5
gdc.bip.popularityclass C5
gdc.coar.access open access
gdc.coar.type text::journal::journal article
gdc.collaboration.industrial false
gdc.description.department İzmir Institute of Technology en_US
gdc.description.departmenttemp [Acar, Ali; Tekir, Selma] Izmir Inst Technol, Urla, Izmir, Turkiye en_US
gdc.description.endpage 26
gdc.description.issue 1 en_US
gdc.description.publicationcategory Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı en_US
gdc.description.scopusquality Q2
gdc.description.startpage 1
gdc.description.volume 24 en_US
gdc.description.woscitationindex Science Citation Index Expanded
gdc.description.wosquality Q3
gdc.identifier.openalex W4404766634
gdc.identifier.wos WOS:001416741200007
gdc.index.type WoS
gdc.index.type Scopus
gdc.oaire.accesstype HYBRID
gdc.oaire.diamondjournal false
gdc.oaire.impulse 0.0
gdc.oaire.influence 2.635068E-9
gdc.oaire.isgreen false
gdc.oaire.popularity 2.1091297E-10
gdc.oaire.publicfunded false
gdc.oaire.sciencefields 0301 basic medicine
gdc.oaire.sciencefields 03 medical and health sciences
gdc.oaire.sciencefields 0202 electrical engineering, electronic engineering, information engineering
gdc.oaire.sciencefields 02 engineering and technology
gdc.openalex.collaboration National
gdc.openalex.fwci 0.0
gdc.openalex.normalizedpercentile 0.29
gdc.opencitations.count 0
gdc.plumx.mendeley 4
gdc.plumx.scopuscites 0
gdc.scopus.citedcount 0
gdc.wos.citedcount 0
relation.isAuthorOfPublication.latestForDiscovery bd1d2d24-79ff-4a37-b824-c8f2888a1389
relation.isOrgUnitOfPublication.latestForDiscovery 9af2b05f-28ac-4014-8abe-a4dfe192da5e

Files