Fake News Detection Using Albert-base-v2 transformer and CNN-BiLSTM architectures: A Comparative Analysis of Transformer-Based and Deep Learning Approaches

Big Data and Cognitive Information Network, 2025, 1(1); doi: xxx.

Fake News Detection Using Albert-base-v2 transformer and CNN-BiLSTM architectures: A Comparative Analysis of Transformer-Based and Deep Learning Approaches

Author(s)

Naeem Ghabussi

Corresponding Author:

Naeem Ghabussi

Affiliation(s)

Khwaja Fareed University of Engineering and Information Technology, Rahim Yar Khan

Download PDF
|
Download: 11
|
View: 94

Abstract

The widespread propagation of fake information via social media platforms has been a source of severe concern for misinformation and its potential impact on society. This study compares the Albert-base-v2 transformer and CNN-BiLSTM models to identify fake news on the Fake News Sample-Pontes dataset. The proposed models were trained and evaluated using the Fake News Sample (Pontes) dataset from Kaggle, which includes over 45,000 news articles labeled as real or fake, based on predefined criteria. Preprocessing is done on the dataset by eliminating punctuation, removing non-English characters, and tokenization for improvement in the model's performance. Five deep learning architectures—2-CNN 2-BiLSTM, 3-CNN 1-BiLSTM, 1-CNN 3-BiLSTM, DistilBERT, and Albert-base-v2—are evaluated. The models are trained using a 75%-20%-5% data split, where an embedding size of 300 is used in CNN-BiLSTM architectures. Performance is assessed based on accuracy, precision, recall, F1-score, and AUC-ROC metrics. Among the models, Albert-base-v2 has the best performance with 90.8% accuracy and 0.908 F1-score that outperforms 2-CNN 2-BiLSTM (accuracy of 86.1%, F1-score of 0.861) and DistilBERT (85.0% accuracy, 0.850 F1-score). Statistical significance is determined using t-tests, and class-wise performance is analyzed using a confusion matrix. The results highlight the superiority of transformer-based models over conventional deep learning methods in fake news detection. In addition, limitations, ethical considerations, and future directions toward enhancing model interpretability and efficiency are discussed.

Keywords

Albert-base-v2, transformer, deep learning, Convolutional Neural Network (CNN), BILSTM, and fake news classification

Cite This Paper

Naeem Ghabussi. Fake News Detection Using Albert-base-v2 transformer and CNN-BiLSTM architectures: A Comparative Analysis of Transformer-Based and Deep Learning Approaches. Big Data and Cognitive Information Network (2025), Vol. 1, Issue 1: 1-21.

References

[1] A. Hassan and H. Rashid, “Estimation of Ultimate Bearing Capacity in rock-socketed piles using optimized machine learning approaches,” Advances in Engineering and Intelligence Systems, vol. 2, no. 04, pp. 114–127, 2023.

[2] S. Mastan Val, “Improving COPD Readmission Prediction with Optimized Machine Learning,” Journal of Artificial Intelligence and System Modelling, vol. 1, no. 03, pp. 86–103, 2024.

[3] T. E. Trueman, A. Kumar, P. Narayanasamy, and J. Vidya, “Attention-based C-BiLSTM for fake news detection,” Appl Soft Comput, vol. 110, p. 107600, 2021.

[4] M. J. Awan et al., “Fake news data exploration and analytics,” Electronics (Basel), vol. 10, no. 19, p. 2326, 2021.

[5] Y. Yang, L. Zheng, J. Zhang, Q. Cui, Z. Li, and P. S. Yu, “TI-CNN: Convolutional neural networks for fake news detection,” arXiv preprint arXiv:1806.00749, 2018.

[6] D. K. Vishwakarma, P. Meel, A. Yadav, and K. Singh, “A framework of fake news detection on web platform using ConvNet,” Soc Netw Anal Min, vol. 13, no. 1, p. 24, 2023.

[7] M. Choudhary, S. S. Chouhan, E. S. Pilli, and S. K. Vipparthi, “BerConvoNet: A deep learning framework for fake news classification,” Appl Soft Comput, vol. 110, p. 107614, 2021.

[8] P. Meel and D. K. Vishwakarma, “A temporal ensembling based semi-supervised ConvNet for the detection of fake news articles,” Expert Syst Appl, vol. 177, no. February, p. 115002, 2021, doi: 10.1016/j.eswa.2021.115002.

[9] V. Gunasekaran and D. Karthikeyan, “Real Time Fake news Detection Web App Enhanced by Machine Learning Algorithms,” in 2023 4th International Conference on Intelligent Technologies (CONIT), IEEE, 2024, pp. 1–7.

[10] Y. Yang, L. Zheng, J. Zhang, Q. Cui, Z. Li, and P. S. Yu, “TI-CNN: Convolutional neural networks for fake news detection,” arXiv preprint arXiv:1806.00749, 2018.

[11] J. Clarke, H. Chen, D. Du, and Y. J. Hu, “Fake news, investor attention, and market reaction,” Information Systems Research, vol. 32, no. 1, pp. 35–52, 2020.

[12] P. Meel and D. K. Vishwakarma, “Fake news detection using semi-supervised graph convolutional network,” arXiv preprint arXiv:2109.13476, 2021.

[13] B. Bhattarai, O.-C. Granmo, and L. Jiao, “Explainable tsetlin machine framework for fake news detection with credibility score assessment,” arXiv preprint arXiv:2105.09114, 2021.

[14] O.-C. Granmo and L. Jiao, “Explainable Tsetlin Machine Framework for Fake News Detection with Credibility Score Assessment,” 2022.

[15] M. J. Awan et al., “Fake news data exploration and analytics,” Electronics (Basel), vol. 10, no. 19, p. 2326, 2021.

[16] T. E. Trueman, A. Kumar, P. Narayanasamy, and J. Vidya, “Attention-based C-BiLSTM for fake news detection,” Appl Soft Comput, vol. 110, p. 107600, 2021.

[17] D. K. Vishwakarma, P. Meel, A. Yadav, and K. Singh, “A framework of fake news detection on web platform using ConvNet,” Soc Netw Anal Min, vol. 13, no. 1, p. 24, 2023.

[18] N. Põldvere, Z. Uddin, and A. Thomas, “The PolitiFact-Oslo Corpus: A new dataset for fake news analysis and detection,” Information, vol. 14, no. 12, p. 627, 2023.

[19] J. Jing, H. Wu, J. Sun, X. Fang, and H. Zhang, “Multimodal fake news detection via progressive fusion networks,” Inf Process Manag, vol. 60, no. 1, p. 103120, 2023.

[20] H. Moalla, H. Abid, D. Sallami, E. Aïmeur, and B. Ben Hamed, “Exploring the Power of Dual Deep Learning for Fake News Detection,” Informatica, vol. 48, no. 4, 2025.

[21] P. Meel and D. K. Vishwakarma, “A temporal ensembling based semi-supervised ConvNet for the detection of fake news articles,” Expert Syst Appl, vol. 177, no. April, p. 115002, 2021, doi: 10.1016/j.eswa.2021.115002.

[22] V. Sanh, L. Debut, J. Chaumond, and T. Wolf, “DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter,” pp. 2–6, 2019.

[23] B. Büyüköz, A. Hürriyetoğlu, and A. Özgür, “Analyzing ELMo and DistilBERT on Socio-political News Classification,” Proceedings of the Workshop on Automated Extraction of Socio-political Events from News 2020, no. May, pp. 9–18, 2020.

[24] M. Abadeer, “Assessment of DistilBERT performance on Named Entity Recognition task for the detection of Protected Health Information and medical concepts,” no. 2019, pp. 158–167, 2020, doi: 10.18653/v1/2020.clinicalnlp-1.18.

[25] Y. Chen, J. Bin, and C. Kang, “Smart Agricultural Technology Application of machine vision and convolutional neural networks in discriminating tobacco leaf maturity on mobile devices,” Smart Agricultural Technology, vol. 5, no. September, p. 100322, 2023, doi: 10.1016/j.atech.2023.100322.

[26] F. A. Alijoyo et al., “Advanced hybrid CNN-Bi-LSTM model augmented with GA and FFO for enhanced cyclone intensity forecasting,” Alexandria Engineering Journal, vol. 92, no. March, pp. 346–357, 2024, doi: 10.1016/j.aej.2024.02.062.

[27] J. Gu et al., “Recent advances in convolutional neural networks,” Pattern Recognit, vol. 77, pp. 354–377, 2018, doi: 10.1016/j.patcog.2017.10.013.

[28] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Comput, vol. 9, no. 8, pp. 1735–1780, 1997.

[29] P. Zhou et al., “Attention-Based Bidirectional Long Short-Term Memory Networks for Relation Classification,” pp. 207–212, 2016.

[30] Z. Lan, M. Chen, S. Goodman, K. Gimpel, P. Sharma, and R. Soricut, “ALBERT: A Lite BERT for Self-supervised Learning of Language Representations,” pp. 1–17, 2019.

[31] H. Wang, X. Hu, and H. Zhang, “Sentiment analysis of commodity reviews based on ALBERT-LSTM,” in Journal of Physics Conference Series, AA(Guilin University of Electronic Technology, China), AB(Guilin University of Electronic Technology, China), AC(Guilin University of Electronic Technology, China), Nov. 2020, p. 12022. doi: 10.1088/1742-6596/1651/1/012022.

[32] X. Wang, H. Wang, G. Zhao, Z. Liu, and H. Wu, “Albert over match-lstm network for intelligent questions classification in chinese,” Agronomy, vol. 11, no. 8, 2021, doi: 10.3390/agronomy11081530.

[33] C. H. Chiang, S. F. Huang, and H. Y. Lee, “Pretrained language model embryology: The birth of ALBERT,” EMNLP 2020 - 2020 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference, no. Section 3, pp. 6813–6828, 2020, doi: 10.18653/v1/2020.emnlp-main.553.

[34] R. Zhao, B. Vogel, and T. Ahmed, “Adaptive Loss Scaling for Mixed Precision Training,” pp. 1–11, 2019.

[35] S. Alrowili and K. Vijay-Shanker, “BioM-Transformers: Building Large Biomedical Language Models with BERT, ALBERT and ELECTRA,” Proceedings of the 20th Workshop on Biomedical Language Processing, BioNLP 2021, pp. 221–227, 2021, doi: 10.18653/v1/2021.bionlp-1.24.

[36] M. Hort, R. Moussa, and F. Sarro, “Multi-objective search for gender-fair and semantically correct word embeddings,” Appl Soft Comput, vol. 133, p. 109916, 2023, doi: 10.1016/j.asoc.2022.109916.

[37] R. Rao et al., “MSA Transformer,” bioRxiv, p. 2021.02.12.430858, 2021.

[38] J. Graën, M. Bertamini, M. Volk, M. Cieliebak, D. Tuggener, and F. Benites, “Cutter–a universal multilingual tokenizer,” in CEUR Workshop Proceedings, CEUR-WS, 2018, pp. 75–81.