Quote Detection: a New Task and Dataset for Nlp
| dc.contributor.author | Tekir, S. | |
| dc.contributor.author | Güzel, A. | |
| dc.contributor.author | Tenekeci, S. | |
| dc.contributor.author | Haman, B.U. | |
| dc.date.accessioned | 2024-01-06T07:22:37Z | |
| dc.date.available | 2024-01-06T07:22:37Z | |
| dc.date.issued | 2023 | |
| dc.description | 7th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, LaTeCH-CLfL 2023 -- 5 May 2023 -- 192793 | en_US |
| dc.description.abstract | Quotes are universally appealing. Humans recognize good quotes and save them for later reference. However, it may pose a challenge for machines. In this work, we build a new corpus of quotes and propose a new task, quote detection, as a type of span detection. We retrieve the quote set from Goodreads and collect the spans through a custom search on the Gutenberg Book Corpus. We run two types of baselines for quote detection: Conditional random field (CRF) and summarization with pointer-generator networks and Bidirectional and Auto-Regressive Transformers (BART). The results show that the neural sequence-to-sequence models perform substantially better than CRF. From the viewpoint of neural extractive summarization, quote detection seems easier than news summarization. Moreover, model fine-tuning on our corpus and the Cornell Movie-Quotes Corpus introduces incremental performance boosts. Finally, we provide a qualitative analysis to gain insight into the performance. © 2023 Association for Computational Linguistics. | en_US |
| dc.identifier.isbn | 9781959429548 | |
| dc.identifier.scopus | 2-s2.0-85175428867 | |
| dc.identifier.uri | https://hdl.handle.net/11147/14206 | |
| dc.language.iso | en | en_US |
| dc.publisher | Association for Computational Linguistics | en_US |
| dc.relation.ispartof | EACL 2023 - 7th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, Proceedings of LaTeCH-CLfL 2023 | en_US |
| dc.rights | info:eu-repo/semantics/closedAccess | en_US |
| dc.subject | Computational linguistics | en_US |
| dc.subject | Natural language processing systems | en_US |
| dc.subject | Auto-regressive | en_US |
| dc.subject | Extractive summarizations | en_US |
| dc.subject | Fine tuning | en_US |
| dc.subject | Gain insight | en_US |
| dc.subject | News summarization | en_US |
| dc.subject | Performance | en_US |
| dc.subject | Qualitative analysis | en_US |
| dc.subject | Random fields | en_US |
| dc.subject | Sequence models | en_US |
| dc.subject | Random processes | en_US |
| dc.title | Quote Detection: a New Task and Dataset for Nlp | en_US |
| dc.type | Conference Object | en_US |
| dspace.entity.type | Publication | |
| gdc.author.institutional | … | |
| gdc.author.scopusid | 16234844500 | |
| gdc.author.scopusid | 58675151700 | |
| gdc.author.scopusid | 57340107000 | |
| gdc.author.scopusid | 58675886200 | |
| gdc.coar.access | metadata only access | |
| gdc.coar.type | text::conference output | |
| gdc.description.department | İzmir Institute of Technology | en_US |
| gdc.description.departmenttemp | Tekir, S., Izmir Institute of Technology, Dept. of Computer Engineering, Izmir, 35430, Turkey; Güzel, A., Izmir Institute of Technology, Dept. of Computer Engineering, Izmir, 35430, Turkey; Tenekeci, S., Izmir Institute of Technology, Dept. of Computer Engineering, Izmir, 35430, Turkey; Haman, B.U., Izmir Institute of Technology, Dept. of Computer Engineering, Izmir, 35430, Turkey | en_US |
| gdc.description.endpage | 27 | en_US |
| gdc.description.publicationcategory | Konferans Öğesi - Uluslararası - Kurum Öğretim Elemanı | en_US |
| gdc.description.scopusquality | N/A | |
| gdc.description.startpage | 21 | en_US |
| gdc.description.wosquality | N/A | |
| gdc.index.type | Scopus | |
| gdc.scopus.citedcount | 0 | |
| relation.isAuthorOfPublication.latestForDiscovery | 57639474-3954-4f77-a84c-db8a079648a8 | |
| relation.isOrgUnitOfPublication.latestForDiscovery | 9af2b05f-28ac-4014-8abe-a4dfe192da5e |
