...

A Chinese Dialogue Dataset Towards Multi-turn Topic-driven Conversation


View a PDF of the paper titled NaturalConv: A Chinese language Dialogue Dataset In direction of Multi-turn Subject-driven Dialog, by Xiaoyang Wang and three different authors

View PDF
HTML (experimental)

Summary:On this paper, we suggest a Chinese language multi-turn topic-driven dialog dataset, NaturalConv, which permits the individuals to speak something they need so long as any factor from the subject is talked about and the subject shift is easy. Our corpus comprises 19.9K conversations from six domains, and 400K utterances with a median flip variety of 20.1. These conversations include in-depth discussions on associated matters or broadly pure transition between a number of matters. We consider both method is regular for human dialog. To facilitate the analysis on this corpus, we offer outcomes of a number of benchmark fashions. Comparative outcomes present that for this dataset, our present fashions usually are not capable of present vital enchancment by introducing background data/subject. Subsequently, the proposed dataset needs to be a superb benchmark for additional analysis to judge the validity and naturalness of multi-turn dialog methods. Our dataset is obtainable at this https URL.

Submission historical past

From: Xiaoyang Wang [view email]
[v1]
Wed, 3 Mar 2021 17:38:33 UTC (27 KB)
[v2]
Fri, 5 Mar 2021 17:12:20 UTC (571 KB)
[v3]
Thu, 7 Nov 2024 01:08:46 UTC (571 KB)

Source link

#Chinese language #Dialogue #Dataset #Multiturn #Topicdriven #Dialog