整体事实还是有偏样本——基于大语言模型生成数据的测量

龚为纲; 黄思源

作者简介: 龚为纲,武汉大学社会学院/人工智能学院教授(湖北武汉 430072);黄思源,武汉大学社会学院博士研究生(湖北武汉 430072)。

基金项目: 本文系国家社会科学基金项目“基于大数据的社会情绪风险与网络集群事件治理研究”(22BSH024)、武汉大学社会科学数智创新研究团队项目“大国竞争背景下的战略情报分析"的阶段性成果

中图分类号: C91

摘要: 大语言模型（LLM）在社会科学中的应用日益广泛，其生成数据是否能反映真实社会图景仍存争议。以中国综合社会调查（CGSS2021）为基准，构建多模型对比实验框架，系统评估不同ＬＬＭ生成“硅基样本”的拟合度与偏差特征，可发现，主流模型可较好复现宏观变量间的统计关系，但存在表征偏差，易强化主流话语、忽略边缘声音。通过引入思维链（Chain-of-Thought）分析，发现模型在生成评分理由时呈现出标准化的因果推理结构，反映其潜在的社会观念建构路径。此外，提示策略与微调机制可能无形中影响模型对公共议题的认知方式。ＬＬＭ在社会测量中既存在潜能也有局限，建议未来应提升数据多样性、模型可解释性，并推动社会科学领域的专用大模型发展。

Abstract: The application of large language models (LLMs) in the social sciences is expanding rapidly, yet whether the data they generate accurately reflect real-world social phenomena remains contested. Using the 2021 Chinese General Social Survey (CGSS) as a benchmark, this study develops a multi-model comparative framework to systematically assess the representativeness and biases of "silicon-based samples" produced by various LLMs. The results show that mainstream models can reproduce statistical relationships among macro-level variables, but they exhibit representational biases—tending to reinforce dominant discourses while marginalizing alternative perspectives. Through the incorporation of Chain-of-Thought (CoT) analysis, we find that the models generate standardized causal reasoning structures when explaining their responses, revealing implicit pathways of social cognition embedded in their outputs. Furthermore, prompt design and fine-tuning mechanisms may inadvertently shape users' perceptions of public issues. This paper highlights both the potential and limitations of LLMs in social measurement and recommends enhancing data diversity, improving model interpretability, and developing domain-specific models tailored for the social sciences.

Key words:

整体事实还是有偏样本——基于大语言模型生成数据的测量

作者简介:龚为纲,武汉大学社会学院/人工智能学院教授(湖北武汉 430072);黄思源,武汉大学社会学院博士研究生(湖北武汉 430072)。

关键词: