Quote Detection: a New Task and Dataset for Nlp

Tekir, S.; Güzel, A.; Tenekeci, S.; Haman, B.U.

Quote Detection: a New Task and Dataset for Nlp

Date

2023

Authors

Tekir, S.

Güzel, A.

Tenekeci, S.

Haman, B.U.

Publisher

Association for Computational Linguistics

Abstract

Quotes are universally appealing. Humans recognize good quotes and save them for later reference. However, it may pose a challenge for machines. In this work, we build a new corpus of quotes and propose a new task, quote detection, as a type of span detection. We retrieve the quote set from Goodreads and collect the spans through a custom search on the Gutenberg Book Corpus. We run two types of baselines for quote detection: Conditional random field (CRF) and summarization with pointer-generator networks and Bidirectional and Auto-Regressive Transformers (BART). The results show that the neural sequence-to-sequence models perform substantially better than CRF. From the viewpoint of neural extractive summarization, quote detection seems easier than news summarization. Moreover, model fine-tuning on our corpus and the Cornell Movie-Quotes Corpus introduces incremental performance boosts. Finally, we provide a qualitative analysis to gain insight into the performance. © 2023 Association for Computational Linguistics.

Description

7th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, LaTeCH-CLfL 2023 -- 5 May 2023 -- 192793

Keywords

Computational linguistics, Natural language processing systems, Auto-regressive, Extractive summarizations, Fine tuning, Gain insight, News summarization, Performance, Qualitative analysis, Random fields, Sequence models, Random processes

WoS Q

N/A

Scopus Q

N/A

Source

EACL 2023 - 7th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, Proceedings of LaTeCH-CLfL 2023

Start Page

21

End Page

27

URI

https://hdl.handle.net/11147/14206

Collections

Scopus İndeksli Yayınlar Koleksiyonu / Scopus Indexed Publications Collection

Full item page

Page Views

191

checked on Apr 27, 2026

Google Scholar™

Check

Quote Detection: a New Task and Dataset for Nlp

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Open Access Color

OpenAIRE Downloads

OpenAIRE Views

relationships.isProjectOf

relationships.isJournalIssueOf

Abstract

Description

Keywords

Fields of Science

Citation

WoS Q

Scopus Q

Source

Volume

Issue

Start Page

End Page

URI

Collections

Page Views

191

Google Scholar™

Sustainable Development Goals

SDG data is not available