Reference Hub13
A BERT-Based Hybrid Short Text Classification Model Incorporating CNN and Attention-Based BiGRU

A BERT-Based Hybrid Short Text Classification Model Incorporating CNN and Attention-Based BiGRU

Tong Bao, Ni Ren, Rui Luo, Baojia Wang, Gengyu Shen, Ting Guo
Copyright: © 2021 |Volume: 33 |Issue: 6 |Pages: 21
ISSN: 1546-2234|EISSN: 1546-5012|EISBN13: 9781799867494|DOI: 10.4018/JOEUC.294580
Cite Article Cite Article

MLA

Bao, Tong, et al. "A BERT-Based Hybrid Short Text Classification Model Incorporating CNN and Attention-Based BiGRU." JOEUC vol.33, no.6 2021: pp.1-21. http://doi.org/10.4018/JOEUC.294580

APA

Bao, T., Ren, N., Luo, R., Wang, B., Shen, G., & Guo, T. (2021). A BERT-Based Hybrid Short Text Classification Model Incorporating CNN and Attention-Based BiGRU. Journal of Organizational and End User Computing (JOEUC), 33(6), 1-21. http://doi.org/10.4018/JOEUC.294580

Chicago

Bao, Tong, et al. "A BERT-Based Hybrid Short Text Classification Model Incorporating CNN and Attention-Based BiGRU," Journal of Organizational and End User Computing (JOEUC) 33, no.6: 1-21. http://doi.org/10.4018/JOEUC.294580

Export Reference

Mendeley
Favorite Full-Issue Download

Abstract

Short text classification is a research focus for natural language processing (NLP), which is widely used in news classification, sentiment analysis, mail filtering and other fields. In recent years, deep learning techniques are applied to text classification and has made some progress. Different from ordinary text classification, short text has the problem of less vocabulary and feature sparsity, which raise higher request for text semantic feature representation. To address this issue, this paper propose a feature fusion framework based on the Bidirectional Encoder Representations from Transformers (BERT). In this hybrid method, BERT is used to train word vector representation. Convolutional neural network (CNN) capture static features. As a supplement, a bi-gated recurrent neural network (BiGRU) is adopted to capture contextual features. Furthermore, an attention mechanism is introduced to assign the weight of salient words. The experimental results confirmed that the proposed model significantly outperforms the other state-of-the-art baseline methods.