Overview

AVI Challenge 2025 consists of two tracks: the personality assessment track and the interview performance assessment track. Track 1 focuses on evaluating subjects' personality traits based on their responses to corresponding personality questions. Track 2 concentrates on using multimodal information from subjects' responses to all questions (i.e., including both generic and personality questions) to assess their job-related competencies and interview performance.

Dataset

Subjects

  • The dataset consisted of video interview data from 644 subjects (307 men, 309 women, 28 non-binary). Subjects (n = 793) were recruited through the online platform Prolific. After excluding subjects (a) with incomplete responses (n = 40), (b) who did not consent for their data to be shared (n = 10), (c) who did not pass the attention checks (n = 7), (d) whose variation in personality (HEXACO) items was either too large or too small (n = 6), (e) who self-reported that they did not take the study seriously (n = 12), (f) whose videos contained corrupted audio (n = 51), (g) and who were flagged by personality raters as non-compliant (n = 23), the final sample size consisted of 644 subjects.

Procedure

  • Subjects applied to a fictitious management traineeship position. Part of the application procedure was to complete an AVI using a platform we developed for the purpose of the study. During the AVI, subjects responded to six interview questions. Two of them were generic questions, frequently asked in selection interviews. The other four questions were related to the personality traits of Honesty-Humility, Extraversion, Agreeableness, and Conscientiousness, as described by the HEXACO model of personality (i.e, the personality questions). Subjects were instructed to reply within 1-2 minutes to the interview questions.

Interview question development

  • The interview followed a structured format, since previous literature suggests that structured (vs. non- or semi-structured) interviews have stronger reliability and validity. Subjects always started with the generic questions and proceeded to the personality questions. Table 1 shows the content, order and type of the six questions and their corresponding personality traits.
    • Generic questions. For the development of generic interview questions, we created an initial pool of 86 job interview questions taken from previous literature and a list of frequently asked interview questions provided by a Dutch consultancy company. As a first step, we screened those questions for eligibility. Questions were included if they were (a) open-ended, (b) conveyed personality information to some extent, and (c) could apply to multiple jobs. Questions were excluded if they described specific behaviors, specific jobs, or knowledge, values, and motives. This procedure ended up in retaining 61 questions. To assess those 61 questions, we asked 17 professional recruiters from a Dutch consultancy company to assess how frequently they use each of those questions in practice. Responses were given on a 3-point scale. The inter-rater agreement between the recruiters was ICC(2,17) = 0.88. We then asked four personality experts to assess each interview question on a 7-point scale using three criteria. Namely, whether the questions applied to limited or multiple jobs, whether they activated one or more personality traits, as well as provide a general assessment (ICC(2,4) = 0.66). Then, we calculated the average score per criterion (professional recruiters, personality experts) and excluded all questions that scored below the average (per criterion). This process returned 16 questions which (a) were frequently used by practitioners, (b) were not specific to a particular job, and (c) activated more than one personality traits. Of those 16 questions, we slightly edited and selected two questions that received the highest ratings from recruiters and personality experts.
    • Personality questions. For the development of personality interview questions, we created an initial pool of 25 past behavior interview questions for the personality traits of Honesty-Humility (n = 6), Extraversion (n = 8), Agreeableness (n = 5), and Conscientiousness (n = 6). The questions were developed to target the core facets of each personality trait. Questions were developed in a past-behavior formal (e.g., “Think of situations when…”) since this type of format are more suited to elicit personality-relevant information according to previous research. Four personality experts independently selected one question per personality trait and later discussed any disagreements between them until a consensus was reached, retaining one question per personality trait. After some further editing, we ended up with four personality-related questions (Table 1).
Table 1. The content, order and type of the interview questions and their corresponding personality traits.
Order Interview question Type Personality Trait
1 What would you consider among your greatest strengths and weaknesses as an employee? Generic \
2 How would your best friend describe you? Generic \
3 Think of situations when you made professional decisions that could affect your status or how much money you make. How do you usually behave in such situations? Why do you think that is? Personality Honesty-Humility
4 Think of situations when you joined a new team of people. How do you usually behave when you enter a new team? Why do you think that is? Personality Extraversion
5 Think of situations when someone annoyed you. How do you usually react in such situations? Why do you think that is? Personality Agreeableness
6 Think of situations when your work or workspace were not very organized. How typical is that of you? Why do you think that is? Personality Conscientiousness

Annotations

  • Personality traits. Observer reports of personality were provided by a group of 12 raters. Raters followed a 9-hour training. Each subject received eight independent personality ratings. Responses were provided using a Behavioral Anchored Response Scale. The BARS contained four items per personality domain (one item per facet), and responses were given on a 5-point scale (1 = Very low; 5 = Very high), allowing to register up to one decimal point. Personality domain scores were calculated after averaging the four facet scores per domain, across the eight ratings. The inter-rater agreement ranged from ICC(1,8) = 0.61 (Honesty-Humility) to ICC(1,8) = 0.83 (Extraversion).
  • Interview performance. The job-related competencies were annotated by a group of five professional recruiters. The raters followed a 3-hour training. Raters assessed five job competencies and one overall hireability score after having watched all six interview questions (raters manually rotated the order of questions to avoid ordering effects). The five job competencies were taken from the manual of a Dutch consultancy company. The definitions of the four competencies are shown below:
      1. Integrity: The extent to which the candidate inspires trust, displays integrity in their interaction with others, treats others fairly, and adheres to high ethical standards.
      2. Collegiality: The extent to which the candidate is open to and shows an interest in others and is willing to adapt one’s own activities to help others in their work.
      3. Social versatility: The extent to which the candidate has the ability to adapt one’s own behavior in a wide range of social situations in order to function effectively in different types of companies.
      4. Development orientation: The extent to which the candidate is willing to exert oneself in order to broaden and deepen knowledge and skills and to gain new experiences in order to grow professionally and increase the quality of one’s own work.
      5. Overall hireability: The extent to which the candidate would be able to fulfill the requirements of the management traineeship position.

Data description

  • Video index: The dataset contains videos of participants who answer 2 generic questions and 4 personality questions. For each personality question, there is a corresponding personality trait rated by psychologists. The filename of the video is organized as:

    "participant id_quesition index_question type"

    For example, the filename "5484821efdf99b07b28f2300_q1_generic.mp4" means:
    • participant id: 5484821efdf99b07b28f2300
    • question index: q1
    • question type: generic
    The question index and the corresponding personality trait are listed as shown in Table 1. For example, the question “q3” is asked to activate the personality trait of “Honesty-Humility”.
  • Ground truth labels: The train_data.csv and val_data.csv contain the ground truth labels for training and validating recognition models respectively. The csv files also include the meta information of the interviewees. You may use this meta information as additional information for recognition for both of tracks.
    • age; gender: 1 = Male; 2 = Female; 3 = Non-binary/third gender; 4 = Prefer not to say;
    • education: 1 = Less than high school; 2 = High school graduate; 3 = Some college; 4 = Bachelor degree or equivalent; 5 = Master's degree or equivalent; 6 = Doctorate; 7 = Prefer not to say
    • work experience: How many years of work experience do you have?

Dataset split

  • The dataset used for the challenge is split into training (70%, n = 450), validation (10%, n = 64), and testing (20%, n = 130) sets. The training and validation sets will be made available to participants for algorithm development. The dataset is divided at the subject level, ensuring that videos from a single subject are assigned exclusively to one of the training, validation, or testing sets. Although the input videos differ between the two tracks (i.e., track 1 uses only videos for answering personality questions, while track 2 uses videos for answering all questions), the split remains consistent across both tracks of our challenge. In splitting the dataset, we consider the distribution of gender, age, and working experience of the subjects. Specifically, we employ joint sampling to ensure that these three sets maintain similar distributions of these demographic and experiential variables. The gender, age, working experience distribution of the training, validation and testing sets are shown in Figure 1.

Figure 1. The gender, age, working experience distributions of the training, validation and testing set.

Track 1: Personality assessment

In this track, you will develop models and algorithms to assess personality traits based on subjects' responses to the questions which are designed by psychologists to activate the corresponding personality traits. The task of this track is a single-input-single-label regression task. Below is an example for training models:

  • Input video: “5484821efdf99b07b28f2300_q3_personality.mp4”
  • Corresponding personality trait: q3->Honesty-Humanity
  • Label (personality trait): 3.5
The range of the personality trait ratings is [1,5]. Please note that you may also use videos which contain information from other personality traits for recognition. However, it should be stated clearly in the report you submit.


Codalab link: https://codalab.lisn.upsaclay.fr/competitions/23100

Track 2: Interview performance assessment

In this track, you are tasked with developing models and algorithms to evaluate five job-related competencies (i.e., Integrity, Collegiality, Social versatility, Development orientation and Overall hireability) using videos in which subjects respond to both generic and personality questions. The task of this track is a multi-input-multi-label regression task. The range of the job-related competencies ratings is also [1,5] Below is an example for training models:

  • Input video: “60fccc84440f8e8c82ca0288_q1_generic.mp4”; “60fccc84440f8e8c82ca0288_q2_generic.mp4”; “60fccc84440f8e8c82ca0288_q3_personality.mp4”; “60fccc84440f8e8c82ca0288_q4_ personality.mp4”; “60fccc84440f8e8c82ca0288_q5_ personality.mp4”; “60fccc84440f8e8c82ca0288_q6_ personality.mp4”.
  • Label: [Integrity, Collegiality, Social_versatility, Development_orientation, Hireability] = [4,3.9,4,4.3,4.5]


Codalab link: https://codalab.lisn.upsaclay.fr/competitions/23101

Evaluation metric

For the AVI Challenge 2025, the evaluation metric for model performance is Mean Squared Error (MSE). We will use the mean MSE of all personality traits and job-related competencies to evaluate the performance of the models submitted to the challenge.

If you have any questions, please send your question to t.zhang@seu.edu.cn

Submission

Will be announced after the testing phase.