The Author Profiling and Deception Detection in Arabic consists of two tasks:
Task 1. Author Profiling in Arabic Tweets
Author profiling distinguishes between classes of authors studying how language is shared by people. This helps in identifying profiling aspects such as age, gender, and language variety, among others. The focus of this task is to identify the age, gender, and language variety of Arabic Twitter users.
NOTE: Although we suggest to participate in all the subtasks, it is possible participating only in some of them.
Task 2. Deception Detection in Arabic Texts
We can consider that a message is deceptive when it is intentionally written trying to sound authentic. The focus of the task is on deception detection in Arabic on two different genres: Twitter and news headlines..
- Francisco Rangel, Paolo Rosso, Bilal Ghanem, Javier Sánchez, PRHLT Research Center, Universitat Politècnica de València, Spain
- Anis Charfi, Carnegie Mellon University Qatar
- Wajdi Zaghouani, Hamad Bin Khalifa University, Qatar
Author Profiling for Cyber-Security (ARAP)
In the framework of the project Arabic Author Profiling for Cyber-Security (ARAP), we aim at preventing cyber-threats using machine learning (see next figure). To this end, we monitor social media to early detect threatening messages and, in such a case, to profile the authors behind. Profiling potential terrorists from messages shared in social media may allow detecting communities whose aim is to undermine the security of others. Nonetheless, we must be aware of false positives, i.e., potential threatening messages that are actually deceptive, ironic or humorous.
The research project has been funded under grant NPRP 9-175-1-033 from the Qatar National Research Fund (a member ofQatar Foundation).
- Rosso, P., Rangel, F., Hernández, I., Cagnina, L., Zaghouan, W., Charfi, A. A Survey on Author Profiling, Deception, and Irony Detection for the Arabic Language. In: Language and Linguistics Compass, Wiley Online Library, pp.1-20 DOI: 10.1111/lnc3.12275
- Rangel, F., Rosso, P., Charfi, A., Zaghouani, W. Detecting Deceptive Tweets in Arabicfor Cyber-Security. In: Proceedings of the 17th IEEE International Conference on Intelligence and Security Informatics (ISI), 2019
- Zaghouani, W. and Charfi, A. ArapTweet: A Large MultiDialect Twitter Corpusfor Gender, Age and Language Variety Identification. In Proceedings of the 11th Inter-national Conference on Language Resources and Evaluation (LREC), Miyazaki, Japan, 2018
- Zaghouani, W., and Charfi, A. Guidelines and Annotation Framework for ArabicAuthor Profiling. In Proceedings of the 3rd Workshop on Open-Source Arabic Corpora and Processing Tools, 11th International Conference on Language Resources andEvaluation (LREC), Miyazaki, Japan, 2018
- Rangel, F., Rosso, P., Potthast, M., Stein, B. Overview of the 5th Author Profiling Task at PAN 2017: Gender and Language Variety Identification in Twitter. In: Cappellato L., Ferro N., Goeuriot L, Mandl T. (Eds.) CLEF 2017 Labs and Workshops, Notebook Papers. CEUR Workshop Proceedings. CEUR-WS.org, vol. 1866, 2017
- Rangel, F., Rosso, P., Chugur, I., Potthast, M., Trenkmann, M., Stein, B., Verhoeven, B., Daelemans, W. Overview of the 2nd Author Profiling Task at PAN 2014. In: Cappellato L., Ferro N., Halvey M., Kraaij W. (Eds.) CLEF 2014 Labs and Workshops, Notebook Papers. CEUR-WS.org, vol. 1180, pp. 898-827, 2014
- Cagnina, L., Rosso, P. Detecting Deceptive Opinions: Intra and Cross-Domain Classification Using an Efficient Representation. In: International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems,vol. 25, Suppl. 2, pp. 151–174, World Scientific, 2017
- Sánchez-Junquera, J., Villaseñor-Pineda, L., Montes-y-Gómez, M., Rosso, P. Character n-grams for detecting deceptive controversial opinions. In Experimental IR Meets Multilinguality, Multimodality, and Interaction – Proc. of the 9th Int. Conf. of the CLEF Association. Springer-Verlag, LNCS (11018), pp. 135–140, 2018
- Rangel F., Rosso P. On the Impact of Emotions on Author Profiling. In: Information Processing & Management 52(1):73-92, 2016
- Rangel, F., Rosso, P., Franco, M. A Low Dimensionality Representation for Language Variety Identification. In: Proc. of the 17th Int. Conf. on Intelligent Text Processing and Computational Linguistics (CICLing’16), Springer-Verlag, LNCS(9624), pp. 156-169, 2018