整体事实还是有偏样本——基于大语言模型生成数据的测量
Approximating Facts or Reproducing Bias——Evaluating LLMs in Social Research
-
摘要: 大语言模型 (LLM)在社会科学中的应用日益广泛,其生成数据是否能反映真实社会图景仍存争议。以中国综合社会调查(CGSS2021)为基准,构建多模型对比实验框架,系统评估不同LLM生成“硅基样本”的拟合度与偏差特征,可发现,主流模型可较好复现宏观变量间的统计关系,但存在表征偏差,易强化主流话语、忽略边缘声音。通过引入思维链(Chain-of-Thought)分析,发现模型在生成评分理由时呈现出标准化的因果推理结构,反映其潜在的社会观念建构路径。此外,提示策略与微调机制可能无形中影响模型对公共议题的认知方式。LLM在社会测量中既存在潜能也有局限,建议未来应提升数据多样性、模型可解释性,并推动社会科学领域的专用大模型发展。
-
关键词:
- 大语言模型(LLM) /
- 整体事实 /
- 数据偏误 /
- 对齐机制 /
- CoT
Abstract: The application of large language models (LLMs) in the social sciences is expanding rapidly, yet whether the data they generate accurately reflect real-world social phenomena remains contested. Using the 2021 Chinese General Social Survey (CGSS) as a benchmark, this study develops a multi-model comparative framework to systematically assess the representativeness and biases of "silicon-based samples" produced by various LLMs. The results show that mainstream models can reproduce statistical relationships among macro-level variables, but they exhibit representational biases—tending to reinforce dominant discourses while marginalizing alternative perspectives. Through the incorporation of Chain-of-Thought (CoT) analysis, we find that the models generate standardized causal reasoning structures when explaining their responses, revealing implicit pathways of social cognition embedded in their outputs. Furthermore, prompt design and fine-tuning mechanisms may inadvertently shape users' perceptions of public issues. This paper highlights both the potential and limitations of LLMs in social measurement and recommends enhancing data diversity, improving model interpretability, and developing domain-specific models tailored for the social sciences. -

计量
- 文章访问数: 29
- HTML全文浏览量: 7