View a PDF of the paper titled Do Biased Models Have Biased Thoughts?, by Swati Rajwal and 3 other authors
Abstract:The impressive performance of language models is undeniable. However, the presence of biases based on gender, race, socio-economic status, physical appearance, and sexual orientation makes the deployment of language models challenging. This paper studies the effect of chain-of-thought prompting, a recent approach that studies the steps followed by the model before it responds, on fairness. More specifically, we ask the following question: $\textit{Do biased models have biased thoughts}$? To answer our question, we conduct experiments on $5$ popular large language models using fairness metrics to quantify $11$ different biases in the model’s thoughts and output. Our results show that the bias in the thinking steps is not highly correlated with the output bias (less than $0.6$ correlation with a $p$-value smaller than $0.001$ in most cases). In other words, unlike human beings, the tested models with biased decisions do not always possess biased thoughts.
Submission history
From: Swati Rajwal [view email]
[v1]
Fri, 8 Aug 2025 19:41:20 UTC (813 KB)
[v2]
Tue, 12 Aug 2025 02:42:23 UTC (810 KB)
Source link
#Biased #Models #Biased #Thoughts