Task Description

The focus of PR-SOCO is on predicting personality traits from source code. At training, participants will be provided with source codes in Java of computer science students together with the students’ personality information. At test, participants will be given source codes of a few programmers and they will have to predict their personality traits. The number of source codes per programmer will be small reflecting a real scenario such as the one of a job interview: the interviewer could be interested in knowing the interviewee degree of conscientiousness by evaluating just a couple of programming problems.

We suggest participants to investigate beyond standard n-grams based features. For example, the way the code is commented, the variables naming or indentation may also provide valuable information. In order to encourage the investigation of different kinds of features, five runs per participant are allowed.

Submission procedure

Your software must generate a file with a line for each document of the dataset with the following information separated by commas (the same format as the truth file provided in the training corpus):

Author id, emotional stability / neuroticism, extroversion, openness to experience, agreeableness, conscientiousness

For example, the following line:


Corresponds to author 5, with 74 for emotional stability, 38 for extroversion, 46 for openness to experience, 42 for agreeableness and 46 for conscientiousness.

You can submit your runs by sending the generated files to pr-soco (at) autoritas (dot) es with your full name.

Your working notes must follow ACM SIGIR style: https://www.acm.org/publications/proceedings-template

Author profiling consists of predicting an author’s demographics (e.g. age, gender, personality) from her writing. In the PR-SOCO shared task we will address the problem of predicting an author’s personality from her source code. Personality traits influence most, if not all, of the human activities, such as the way people write (Celli et al., 2014), (Rangel et al., 2015), interact with others, and the way people make decisions, for instance in the case of developers the criteria they consider when selecting a software project they want to participate (Paruma-Parbón et al., 2016), or the way they write and structure their source code.