Artificial Intelligence and Discrimination: A Vignette Experiment of Labour Market Discrimination in Large Language Models
Capitolo di libro
Data di Pubblicazione:
2025
Abstract:
Large Language Models are increasingly used across various sectors,
including recruitment, where they assist in evaluating candidate profiles and opti-
mizing hiring processes. While Large Language Models offer significant advan-
tages in terms of cost and time efficiency, concerns regarding algorithmic bias have
emerged, particularly in relation to gender and ethnic discrimination. This study
employs a Factorial Survey Experiment to assess biases in six widely used Large
Language Models—Le Chat, ChatGPT, Gemini, DeepSeek, Grok, and MetaAI. By
systematically varying candidate attributes such as sex, ethnicity, education, and
age, we examine whether hiring recommendations are influenced by taste-based or
statistical discrimination. Our findings indicate that among the six models tested,
we find none to exhibit gender bias, while ChatGPT, Grok and DeepSeek show
signs of ethnic discrimination, though at varying degrees. These results underscore
the need for greater transparency and stronger anti-bias measures in Large Lan-
guage Models development and training. We advocate for enhanced oversight in
AI-driven salary discrimination tools to mitigate discrimination risks and ensure
fair and equitable recruitment practices. Our study highlights the broader impli-
cations of biased Artificial Intelligence models, emphasizing the potential risks of
productivity loss and workforce homogeneity if biases remain unaddressed
including recruitment, where they assist in evaluating candidate profiles and opti-
mizing hiring processes. While Large Language Models offer significant advan-
tages in terms of cost and time efficiency, concerns regarding algorithmic bias have
emerged, particularly in relation to gender and ethnic discrimination. This study
employs a Factorial Survey Experiment to assess biases in six widely used Large
Language Models—Le Chat, ChatGPT, Gemini, DeepSeek, Grok, and MetaAI. By
systematically varying candidate attributes such as sex, ethnicity, education, and
age, we examine whether hiring recommendations are influenced by taste-based or
statistical discrimination. Our findings indicate that among the six models tested,
we find none to exhibit gender bias, while ChatGPT, Grok and DeepSeek show
signs of ethnic discrimination, though at varying degrees. These results underscore
the need for greater transparency and stronger anti-bias measures in Large Lan-
guage Models development and training. We advocate for enhanced oversight in
AI-driven salary discrimination tools to mitigate discrimination risks and ensure
fair and equitable recruitment practices. Our study highlights the broader impli-
cations of biased Artificial Intelligence models, emphasizing the potential risks of
productivity loss and workforce homogeneity if biases remain unaddressed
Tipologia CRIS:
14.b.1 Contributo in volume (Capitolo o Saggio)
Keywords:
Large Language Models · Bias in AI · Salary Discrimination ·
Factorial Survey Experiment
Elenco autori:
Busetta, Giovanni; Campolo, Maria Gabriella; Ficarra, Giovanni Maria
Link alla scheda completa:
Titolo del libro:
Statistics for Innovation II