Quote Detection: a New Task and Dataset for Nlp

Tekir, S.; Güzel, A.; Tenekeci, S.; Haman, B.U.

Quote Detection: a New Task and Dataset for Nlp

dc.contributor.author	Tekir, S.
dc.contributor.author	Güzel, A.
dc.contributor.author	Tenekeci, S.
dc.contributor.author	Haman, B.U.
dc.date.accessioned	2024-01-06T07:22:37Z
dc.date.available	2024-01-06T07:22:37Z
dc.date.issued	2023
dc.description	7th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, LaTeCH-CLfL 2023 -- 5 May 2023 -- 192793	en_US
dc.description.abstract	Quotes are universally appealing. Humans recognize good quotes and save them for later reference. However, it may pose a challenge for machines. In this work, we build a new corpus of quotes and propose a new task, quote detection, as a type of span detection. We retrieve the quote set from Goodreads and collect the spans through a custom search on the Gutenberg Book Corpus. We run two types of baselines for quote detection: Conditional random field (CRF) and summarization with pointer-generator networks and Bidirectional and Auto-Regressive Transformers (BART). The results show that the neural sequence-to-sequence models perform substantially better than CRF. From the viewpoint of neural extractive summarization, quote detection seems easier than news summarization. Moreover, model fine-tuning on our corpus and the Cornell Movie-Quotes Corpus introduces incremental performance boosts. Finally, we provide a qualitative analysis to gain insight into the performance. © 2023 Association for Computational Linguistics.	en_US
dc.identifier.isbn	9781959429548
dc.identifier.scopus	2-s2.0-85175428867
dc.identifier.uri	https://hdl.handle.net/11147/14206
dc.language.iso	en	en_US
dc.publisher	Association for Computational Linguistics	en_US
dc.relation.ispartof	EACL 2023 - 7th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, Proceedings of LaTeCH-CLfL 2023	en_US
dc.rights	info:eu-repo/semantics/closedAccess	en_US
dc.subject	Computational linguistics	en_US
dc.subject	Natural language processing systems	en_US
dc.subject	Auto-regressive	en_US
dc.subject	Extractive summarizations	en_US
dc.subject	Fine tuning	en_US
dc.subject	Gain insight	en_US
dc.subject	News summarization	en_US
dc.subject	Performance	en_US
dc.subject	Qualitative analysis	en_US
dc.subject	Random fields	en_US
dc.subject	Sequence models	en_US
dc.subject	Random processes	en_US
dc.title	Quote Detection: a New Task and Dataset for Nlp	en_US
dc.type	Conference Object	en_US
dspace.entity.type	Publication
gdc.author.institutional	…
gdc.author.scopusid	16234844500
gdc.author.scopusid	58675151700
gdc.author.scopusid	57340107000
gdc.author.scopusid	58675886200
gdc.coar.access	metadata only access
gdc.coar.type	text::conference output
gdc.description.department	İzmir Institute of Technology	en_US
gdc.description.departmenttemp	Tekir, S., Izmir Institute of Technology, Dept. of Computer Engineering, Izmir, 35430, Turkey; Güzel, A., Izmir Institute of Technology, Dept. of Computer Engineering, Izmir, 35430, Turkey; Tenekeci, S., Izmir Institute of Technology, Dept. of Computer Engineering, Izmir, 35430, Turkey; Haman, B.U., Izmir Institute of Technology, Dept. of Computer Engineering, Izmir, 35430, Turkey	en_US
gdc.description.endpage	27	en_US
gdc.description.publicationcategory	Konferans Öğesi - Uluslararası - Kurum Öğretim Elemanı	en_US
gdc.description.scopusquality	N/A
gdc.description.startpage	21	en_US
gdc.description.wosquality	N/A
gdc.index.type	Scopus
gdc.scopus.citedcount	0
relation.isAuthorOfPublication.latestForDiscovery	57639474-3954-4f77-a84c-db8a079648a8
relation.isOrgUnitOfPublication.latestForDiscovery	9af2b05f-28ac-4014-8abe-a4dfe192da5e

Collections

Scopus İndeksli Yayınlar Koleksiyonu / Scopus Indexed Publications Collection

Quote Detection: a New Task and Dataset for Nlp

Files

Collections