Artificial Intelligence and Discrimination: A Vignette Experiment of Labour Market Discrimination in Large Language Models
Chapter
Publication Date:
2025
abstract:
Large Language Models are increasingly used across various sectors,
including recruitment, where they assist in evaluating candidate profiles and opti-
mizing hiring processes. While Large Language Models offer significant advan-
tages in terms of cost and time efficiency, concerns regarding algorithmic bias have
emerged, particularly in relation to gender and ethnic discrimination. This study
employs a Factorial Survey Experiment to assess biases in six widely used Large
Language Models—Le Chat, ChatGPT, Gemini, DeepSeek, Grok, and MetaAI. By
systematically varying candidate attributes such as sex, ethnicity, education, and
age, we examine whether hiring recommendations are influenced by taste-based or
statistical discrimination. Our findings indicate that among the six models tested,
we find none to exhibit gender bias, while ChatGPT, Grok and DeepSeek show
signs of ethnic discrimination, though at varying degrees. These results underscore
the need for greater transparency and stronger anti-bias measures in Large Lan-
guage Models development and training. We advocate for enhanced oversight in
AI-driven salary discrimination tools to mitigate discrimination risks and ensure
fair and equitable recruitment practices. Our study highlights the broader impli-
cations of biased Artificial Intelligence models, emphasizing the potential risks of
productivity loss and workforce homogeneity if biases remain unaddressed
including recruitment, where they assist in evaluating candidate profiles and opti-
mizing hiring processes. While Large Language Models offer significant advan-
tages in terms of cost and time efficiency, concerns regarding algorithmic bias have
emerged, particularly in relation to gender and ethnic discrimination. This study
employs a Factorial Survey Experiment to assess biases in six widely used Large
Language Models—Le Chat, ChatGPT, Gemini, DeepSeek, Grok, and MetaAI. By
systematically varying candidate attributes such as sex, ethnicity, education, and
age, we examine whether hiring recommendations are influenced by taste-based or
statistical discrimination. Our findings indicate that among the six models tested,
we find none to exhibit gender bias, while ChatGPT, Grok and DeepSeek show
signs of ethnic discrimination, though at varying degrees. These results underscore
the need for greater transparency and stronger anti-bias measures in Large Lan-
guage Models development and training. We advocate for enhanced oversight in
AI-driven salary discrimination tools to mitigate discrimination risks and ensure
fair and equitable recruitment practices. Our study highlights the broader impli-
cations of biased Artificial Intelligence models, emphasizing the potential risks of
productivity loss and workforce homogeneity if biases remain unaddressed
Iris type:
14.b.1 Contributo in volume (Capitolo o Saggio)
Keywords:
Large Language Models · Bias in AI · Salary Discrimination ·
Factorial Survey Experiment
List of contributors:
Busetta, Giovanni; Campolo, Maria Gabriella; Ficarra, Giovanni Maria
Book title:
Statistics for Innovation II