This website requires JavaScript.

Towards Robust Bangla Complex Named Entity Recognition

HAZ Sameen ShahgirRamisa AlamMd. Zarif Ul Alam
Mar 2023
Named Entity Recognition (NER) is a fundamental task in natural languageprocessing that involves identifying and classifying named entities in text.But much work hasn't been done for complex named entity recognition in Bangla,despite being the seventh most spoken language globally. CNER is a morechallenging task than traditional NER as it involves identifying andclassifying complex and compound entities, which are not common in Banglalanguage. In this paper, we present the winning solution of Bangla ComplexNamed Entity Recognition Challenge - addressing the CNER task on BanglaCoNERdataset using two different approaches, namely Conditional Random Fields (CRF)and finetuning transformer based Deep Learning models such as BanglaBERT. The dataset consisted of 15300 sentences for training and 800 sentences forvalidation, in the .conll format. Exploratory Data Analysis (EDA) on thedataset revealed that the dataset had 7 different NER tags, with notablepresence of English words, suggesting that the dataset is synthetic and likelya product of translation. We experimented with a variety of feature combinations including Part ofSpeech (POS) tags, word suffixes, Gazetteers, and cluster information fromembeddings, while also finetuning the BanglaBERT (large) model for NER. Wefound that not all linguistic patterns are immediately apparent or evenintuitive to humans, which is why Deep Learning based models has proved to bethe more effective model in NLP, including CNER task. Our fine tuned BanglaBERT(large) model achieves an F1 Score of 0.79 on the validation set. Overall, ourstudy highlights the importance of Bangla Complex Named Entity Recognition,particularly in the context of synthetic datasets. Our findings alsodemonstrate the efficacy of Deep Learning models such as BanglaBERT for NER inBangla language.