Scopus İndeksli Yayınlar Koleksiyonu / Scopus Indexed Publications Collection
Permanent URI for this collectionhttps://hdl.handle.net/11147/7148
Browse
2 results
Search Results
Now showing 1 - 2 of 2
Article Recognition of Counterfactual Statements in Turkish(Assoc Computing Machinery, 2025) Acar, Ali; Tekir, SelmaCounterfactual statements are examples of causal reasoning as they describe events that did not happen and, optionally, those events' consequences if they happened. SemEval-2020 introduces the counterfactual detection (CFD) task and shares an English dataset. Since then, a set of datasets has been released in English, German, and Japanese as part of Amazon product reviews. This work releases the first Turkish corpus of counterfactuals (TRCD). The data collection process is driven by a clue phrase list of counterfactuals, mainly in the form of verb inflections in Turkish. We use clue phrase-based filtering to collect sentences from the Turkish National Corpus (TNC). On the other hand, half of the collection is subject to random word filtering to avoid selection bias due to clue phrases. After the human annotation process with an Inter Annotator Agreement of 0.65, we have 5000 sentences, of which 12.8% contain counterfactual statements. Furthermore, we provide a comprehensive baseline of transformer-based models by testing the effect of clue phrases, cross-lingual performance comparisons using the available CFD datasets, and zero-shot cross-lingual classification experiments using fine-tuning on the different combinations of the existing datasets. The results confirm that TRCD is compatible with the other CFD datasets. Moreover, fine-tuning a Turkish-specific model (BERTurk) performs better than the multilingual alternatives (mBERT and XLM-R). BERTurk is more robust to clue phrase masking. This result emphasizes the importance of a language-specific tokenizer for contextual understanding, especially for low-resource languages. Finally, our qualitative analysis gives insights into errors by different models.Conference Object Citation - WoS: 2Citation - Scopus: 2Türkçe Manzara Metni Veri Kümesi(IEEE, 2017) Erdogmus, NesliScene text localization and recognition keeps attracting an increasing interest from researchers due to its valuable advantage in extracting content from real world images and in image retrieval via text search. Nevertheless, due to the fact that the majority of the image datasets that are commonly used in this field is comprised of text in English, the related studies have mostly been limited to a single language. On that account, in order to apply the technologies developed for scene text detection and recognition to Turkish scene text, analyze their performances and to develop Turkish language specific algorithms, a Turkish scene text database is collected for the first time in the literature. In this paper, the contents of this database, shortly called STRIT (Scene Text Recognition In Turkish), are detailed. Additionally, two baseline methods are tested to detect and recognize scene text in Turkish and the preliminary results are presented.
