Quote Detection: a New Task and Dataset for Nlp

dc.contributor.author Tekir, S.
dc.contributor.author Güzel, A.
dc.contributor.author Tenekeci, S.
dc.contributor.author Haman, B.U.
dc.date.accessioned 2024-01-06T07:22:37Z
dc.date.available 2024-01-06T07:22:37Z
dc.date.issued 2023
dc.description 7th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, LaTeCH-CLfL 2023 -- 5 May 2023 -- 192793 en_US
dc.description.abstract Quotes are universally appealing. Humans recognize good quotes and save them for later reference. However, it may pose a challenge for machines. In this work, we build a new corpus of quotes and propose a new task, quote detection, as a type of span detection. We retrieve the quote set from Goodreads and collect the spans through a custom search on the Gutenberg Book Corpus. We run two types of baselines for quote detection: Conditional random field (CRF) and summarization with pointer-generator networks and Bidirectional and Auto-Regressive Transformers (BART). The results show that the neural sequence-to-sequence models perform substantially better than CRF. From the viewpoint of neural extractive summarization, quote detection seems easier than news summarization. Moreover, model fine-tuning on our corpus and the Cornell Movie-Quotes Corpus introduces incremental performance boosts. Finally, we provide a qualitative analysis to gain insight into the performance. © 2023 Association for Computational Linguistics. en_US
dc.identifier.isbn 9781959429548
dc.identifier.scopus 2-s2.0-85175428867
dc.identifier.uri https://hdl.handle.net/11147/14206
dc.language.iso en en_US
dc.publisher Association for Computational Linguistics en_US
dc.relation.ispartof EACL 2023 - 7th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, Proceedings of LaTeCH-CLfL 2023 en_US
dc.rights info:eu-repo/semantics/closedAccess en_US
dc.subject Computational linguistics en_US
dc.subject Natural language processing systems en_US
dc.subject Auto-regressive en_US
dc.subject Extractive summarizations en_US
dc.subject Fine tuning en_US
dc.subject Gain insight en_US
dc.subject News summarization en_US
dc.subject Performance en_US
dc.subject Qualitative analysis en_US
dc.subject Random fields en_US
dc.subject Sequence models en_US
dc.subject Random processes en_US
dc.title Quote Detection: a New Task and Dataset for Nlp en_US
dc.type Conference Object en_US
dspace.entity.type Publication
gdc.author.institutional
gdc.author.scopusid 16234844500
gdc.author.scopusid 58675151700
gdc.author.scopusid 57340107000
gdc.author.scopusid 58675886200
gdc.coar.access metadata only access
gdc.coar.type text::conference output
gdc.description.department İzmir Institute of Technology en_US
gdc.description.departmenttemp Tekir, S., Izmir Institute of Technology, Dept. of Computer Engineering, Izmir, 35430, Turkey; Güzel, A., Izmir Institute of Technology, Dept. of Computer Engineering, Izmir, 35430, Turkey; Tenekeci, S., Izmir Institute of Technology, Dept. of Computer Engineering, Izmir, 35430, Turkey; Haman, B.U., Izmir Institute of Technology, Dept. of Computer Engineering, Izmir, 35430, Turkey en_US
gdc.description.endpage 27 en_US
gdc.description.publicationcategory Konferans Öğesi - Uluslararası - Kurum Öğretim Elemanı en_US
gdc.description.scopusquality N/A
gdc.description.startpage 21 en_US
gdc.description.wosquality N/A
gdc.index.type Scopus
gdc.scopus.citedcount 0
relation.isAuthorOfPublication.latestForDiscovery 57639474-3954-4f77-a84c-db8a079648a8
relation.isOrgUnitOfPublication.latestForDiscovery 9af2b05f-28ac-4014-8abe-a4dfe192da5e

Files