This website requires JavaScript.

Automatic Recognition and Classification of Future Work Sentences from Academic Articles in a Specific Domain

Chengzhi ZhangYi XiangWenke HaoZhicheng LiYuchen QianYuzhuo Wang
Dec 2022
Future work sentences (FWS) are the particular sentences in academic papersthat contain the author's description of their proposed follow-up researchdirection. This paper presents methods to automatically extract FWS fromacademic papers and classify them according to the different future directionsembodied in the paper's content. FWS recognition methods will enable subsequentresearchers to locate future work sentences more accurately and quickly andreduce the time and cost of acquiring the corpus. The current work on automaticidentification of future work sentences is relatively small, and the existingresearch cannot accurately identify FWS from academic papers, and thus cannotconduct data mining on a large scale. Furthermore, there are many aspects tothe content of future work, and the subdivision of the content is conducive tothe analysis of specific development directions. In this paper, Nature LanguageProcessing (NLP) is used as a case study, and FWS are extracted from academicpapers and classified into different types. We manually build an annotatedcorpus with six different types of FWS. Then, automatic recognition andclassification of FWS are implemented using machine learning models, and theperformance of these models is compared based on the evaluation metrics. Theresults show that the Bernoulli Bayesian model has the best performance in theautomatic recognition task, with the Macro F1 reaching 90.73%, and the SCIBERTmodel has the best performance in the automatic classification task, with theweighted average F1 reaching 72.63%. Finally, we extract keywords from FWS andgain a deep understanding of the key content described in FWS, and we alsodemonstrate that content determination in FWS will be reflected in thesubsequent research work by measuring the similarity between future worksentences and the abstracts.