Corpus

Task 1. Author Profiling in Arabic Tweets

To develop your software, we provide you with a training corpus that consists of tweets in Arabic, labeled with age, gender and language variety.

With regard to age, we will consider tweets of three classes: Under 25, Between 25 and 34, and Above 35.

We will consider the following fifteen Arabic varieties: Algeria, Egypt, Iraq, Kuwait, Lebanon-Syria, Lybia, Morocco, Oman, Palestine-Jordan, Qatar, Saudi Arabia, Sudan, Tunisia, UAE, Yemen.

Training set

The training set will be released in five consecutive days, with three varieties per day. The corpus is password protected. To obtain the password, send an email to apda.pan.fire (at) gmail (dot) com

Test set

The test set will be released in five consecutive days, with three varieties per day. The corpus is password protected. To obtain the password, send an email to apda.pan.fire (at) gmail (dot) com

Task 2. Deception Detection in Arabic Texts

To develop your software, we provide you with a training corpus that consists of two different genres: Twitter and news headlines.

The corpus is annotated with credible and non-credible labels.

Training set

The corpus is password protected. To obtain the password, send an email to apda.pan.fire (at) gmail (dot) com

Test set

The test corpus is password protected. To obtain the password, send an email to apda.pan.fire (at) gmail (dot) com