Approximating Facts or Reproducing Bias——Evaluating LLMs in Social Research
Abstract: The application of large language models (LLMs) in the social sciences is expanding rapidly, yet whether the data they generate accurately reflect real-world social phenomena remains contested. Using the 2021 Chinese General Social Survey (CGSS) as a benchmark, this study develops a multi-model comparative framework to systematically assess the representativeness and biases of "silicon-based samples" produced by various LLMs. The results show that mainstream models can reproduce statistical relationships among macro-level variables, but they exhibit representational biases—tending to reinforce dominant discourses while marginalizing alternative perspectives. Through the incorporation of Chain-of-Thought (CoT) analysis, we find that the models generate standardized causal reasoning structures when explaining their responses, revealing implicit pathways of social cognition embedded in their outputs. Furthermore, prompt design and fine-tuning mechanisms may inadvertently shape users' perceptions of public issues. This paper highlights both the potential and limitations of LLMs in social measurement and recommends enhancing data diversity, improving model interpretability, and developing domain-specific models tailored for the social sciences.