...

[2508.06671] Do Biased Models Have Biased Thoughts?


View a PDF of the paper titled Do Biased Models Have Biased Thoughts?, by Swati Rajwal and 3 other authors

View PDF
HTML (experimental)

Abstract:The impressive performance of language models is undeniable. However, the presence of biases based on gender, race, socio-economic status, physical appearance, and sexual orientation makes the deployment of language models challenging. This paper studies the effect of chain-of-thought prompting, a recent approach that studies the steps followed by the model before it responds, on fairness. More specifically, we ask the following question: $\textit{Do biased models have biased thoughts}$? To answer our question, we conduct experiments on $5$ popular large language models using fairness metrics to quantify $11$ different biases in the model’s thoughts and output. Our results show that the bias in the thinking steps is not highly correlated with the output bias (less than $0.6$ correlation with a $p$-value smaller than $0.001$ in most cases). In other words, unlike human beings, the tested models with biased decisions do not always possess biased thoughts.

Submission history

From: Swati Rajwal [view email]
[v1]
Fri, 8 Aug 2025 19:41:20 UTC (813 KB)
[v2]
Tue, 12 Aug 2025 02:42:23 UTC (810 KB)

Source link

#Biased #Models #Biased #Thoughts