This website requires JavaScript.

Towards Robust Bangla Complex Named Entity Recognition

HAZ Sameen ShahgirRamisa AlamMd. Zarif Ul Alam
Mar 2023
摘要
Named Entity Recognition (NER) is a fundamental task in natural languageprocessing that involves identifying and classifying named entities in text.But much work hasn't been done for complex named entity recognition in Bangla,despite being the seventh most spoken language globally. CNER is a morechallenging task than traditional NER as it involves identifying andclassifying complex and compound entities, which are not common in Banglalanguage. In this paper, we present the winning solution of Bangla ComplexNamed Entity Recognition Challenge - addressing the CNER task on BanglaCoNERdataset using two different approaches, namely Conditional Random Fields (CRF)and finetuning transformer based Deep Learning models such as BanglaBERT. The dataset consisted of 15300 sentences for training and 800 sentences forvalidation, in the .conll format. Exploratory Data Analysis (EDA) on thedataset revealed that the dataset had 7 different NER tags, with notablepresence of English words, suggesting that the dataset is synthetic and likelya product of translation. We experimented with a variety of feature combinations including Part ofSpeech (POS) tags, word suffixes, Gazetteers, and cluster information fromembeddings, while also finetuning the BanglaBERT (large) model for NER. Wefound that not all linguistic patterns are immediately apparent or evenintuitive to humans, which is why Deep Learning based models has proved to bethe more effective model in NLP, including CNER task. Our fine tuned BanglaBERT(large) model achieves an F1 Score of 0.79 on the validation set. Overall, ourstudy highlights the importance of Bangla Complex Named Entity Recognition,particularly in the context of synthetic datasets. Our findings alsodemonstrate the efficacy of Deep Learning models such as BanglaBERT for NER inBangla language.
展开全部
图表提取

暂无人提供速读十问回答

论文十问由沈向洋博士提出,鼓励大家带着这十个问题去阅读论文,用有用的信息构建认知模型。写出自己的十问回答,还有机会在当前页面展示哦。

Q1论文试图解决什么问题?
Q2这是否是一个新的问题?
Q3这篇文章要验证一个什么科学假设?
0
被引用
笔记
问答