A Study on the Impact of Cross-Corpus Training on Model Values and Biases

2025-10-15 by AiNEWS2025

[Submitted on 17 Aug 2025 (v1), last revised 14 Oct 2025 (this version, v2)]

View a PDF of the paper titled The Cultural Gene of Large Language Models: A Study on the Impact of Cross-Corpus Training on Model Values and Biases, by Emanuel Z. Fenech-Borg and 3 other authors

View PDF
HTML (experimental)

Abstract:Large language models (LLMs) are deployed globally, yet their underlying cultural and ethical assumptions remain underexplored. We propose the notion of a “cultural gene” — a systematic value orientation that LLMs inherit from their training corpora — and introduce a Cultural Probe Dataset (CPD) of 200 prompts targeting two classic cross-cultural dimensions: Individualism-Collectivism (IDV) and Power Distance (PDI). Using standardized zero-shot prompts, we compare a Western-centric model (GPT-4) and an Eastern-centric model (ERNIE Bot). Human annotation shows significant and consistent divergence across both dimensions. GPT-4 exhibits individualistic and low-power-distance tendencies (IDV score approx 1.21; PDI score approx -1.05), while ERNIE Bot shows collectivistic and higher-power-distance tendencies (IDV approx -0.89; PDI approx 0.76); differences are statistically significant (p

Submission history

From: Kabir Khan [view email]
[v1]
Sun, 17 Aug 2025 15:54:14 UTC (4,636 KB)
[v2]
Tue, 14 Oct 2025 08:26:39 UTC (2,600 KB)

Source link

#Study #Impact #CrossCorpus #Training #Model #Values #Biases