World Mental Health Report: Transforming Mental Health For All (World Health Organization, 2022).

Huang, Y. et al. Prevalence of mental disorders in China: a cross-sectional epidemiological study. Lancet Psychiatry 6, 211–224 (2019).

Article 

Google Scholar
 

Mental Health Atlas 2020: Review of the Eastern Mediterranean Region (World Health Organization, 2022).

Chen, R., Zhang, W. & Wu, X. Mental health policy and implementation from 2009 to 2020 in China. SSM – Ment. Health 4, 100244 (2023).

Article 

Google Scholar
 

Stein, D. J. et al. Psychiatric diagnosis and treatment in the 21st century: paradigm shifts versus incremental integration. World Psychiatry 21, 393–414 (2022).

Article 

Google Scholar
 

Feuerriegel, S. et al. Using natural language processing to analyse text data in behavioural science. Nat. Rev. Psychol. 4, 96–111 (2025).

Article 

Google Scholar
 

Obradovich, N. et al. Opportunities and risks of large language models in psychiatry. NPP Digit. Psychiatry Neurosci. 2, 8 (2024).

Article 

Google Scholar
 

Mukherjee, S. S. et al. Natural language processing-based quantification of the mental state of psychiatric patients. Comput. Psychiatry 4, 76–106 (2020).

Article 

Google Scholar
 

Jacob, K. Patient experience and psychiatric discourse. The Psychiatrist 36, 414–417 (2012).

Article 

Google Scholar
 

Murad, M. H. et al. Measuring documentation burden in healthcare. J. Gen. Intern. Med. 39, 2837–2848 (2024).

Article 

Google Scholar
 

Gaffney, A. et al. Medical documentation burden among US office-based physicians in 2019: a national study. JAMA Intern. Med. 182, 564–566 (2022).

Article 

Google Scholar
 

Van Veen, D. et al. Adapted large language models can outperform medical experts in clinical text summarization. Nat. Med. 30, 1134–1142 (2024).

Article 

Google Scholar
 

Li, J. et al. Integrated image-based deep learning and language models for primary diabetes care. Nat. Med. 30, 2886–2896 (2024).

Article 

Google Scholar
 

Liu, X. et al. A generalist medical language model for disease diagnosis assistance. Nat. Med. 31, 932–942 (2025).

Article 

Google Scholar
 

Hager, P. et al. Evaluation and mitigation of the limitations of large language models in clinical decision-making. Nat. Med. 30, 2613–2622 (2024).

Article 

Google Scholar
 

Lamichhane, B. Evaluation of chatgpt for NLP-based mental health applications. Preprint at https://arxiv.org/abs/2303.15727 (2023).

Amin, M., Cambria, E. & Schuller, B. Will affective computing emerge from foundation models and general AI? A first evaluation on ChatGPT. Preprint at http://arxiv.org/abs/2303.03186 (2023).

Wei, J. et al. Chain-of-thought prompting elicits reasoning in large language models. In Proc. 36th International Conference on Neural Information Processing Systems 24824–24837 (2022).

Tu, T. et al. Towards conversational diagnostic artificial intelligence. Nature 642, 442–450 (2025).

Article 

Google Scholar
 

Sartori, G. & Orrù, G. Language models and psychological sciences. Front. Psychol. 14, 1279317 (2023).

Article 

Google Scholar
 

Wang, N. et al. Rolellm: benchmarking, eliciting, and enhancing role-playing abilities of large language models. In Findings of the Association for Computational Linguistics: ACL 2024 14743–14777 (Association for Computational Linguistics, 2024).

Yang, Q. et al. Psychogat: a novel psychological measurement paradigm through interactive fiction games with llm agents. In Proc. 62nd Annual Meeting of the Association for Computational Linguistics Volume 1: Long Papers 14470–14505 (Association for Computational Linguistics, 2024).

Rathje, S. et al. GPT is an effective tool for multilingual psychological text analysis. Proc. Natl Acad. Sci. USA 121, e2308950121 (2024).

She, D., Zhang, C., Yao, X., Gao, Y. & Jin, Z. MindChat-R0: a large language model for emotionally supportive dialogue through reinforcement learning. In Companion of the 2025 ACM International Joint Conference on Pervasive and Ubiquitous Computing 1209–1216 (Association for Computing Machinery, 2025).

Team, E. EmoLLM: reinventing mental health support with large language models. Preprint at https://arxiv.org/abs/2406.16442 (2024).

Chen, Y., et al. Soulchat: improving LLMs’ empathy, listening, and comfort abilities through fine-tuning with multi-turn empathy conversations. In Findings of the Association for Computational Linguistics: EMNLP 2023 1170–1183 (Association for Computational Linguistics, 2023).

Hu, J. et al. Psycollm: enhancing LLM for psychological understanding and evaluation. IEEE Trans. Comput. Soc. Syst. 12, 539–551 (2024).

Article 

Google Scholar
 

Hiemke, C. et al. Consensus guidelines for therapeutic drug monitoring in neuropsychopharmacology: update 2017. Pharmacopsychiatry 51, 9–62 (2018).

Article 

Google Scholar
 

Wicha, S. G. et al. From therapeutic drug monitoring to model-informed precision dosing for antibiotics. Clin. Pharmacol. Ther. 109, 928–941 (2021).

Article 

Google Scholar
 

Relling, M. & Klein, T. CPIC: clinical pharmacogenetics implementation consortium of the pharmacogenomics research network. Clin. Pharmacol. Ther. 89, 464–467 (2011).

Article 

Google Scholar
 

Hicks, J. K. et al. Clinical Pharmacogenetics Implementation Consortium (CPIC) guideline for CYP2D6 and CYP2C19 genotypes and dosing of selective serotonin reuptake inhibitors. Clin. Pharmacol. Ther. 98, 127–134 (2015).

Article 

Google Scholar
 

Liu, S. et al. PsychBench: a comprehensive and professional benchmark for evaluating the performance of LLM-assisted psychiatric clinical practice. Preprint at https://arxiv.org/abs/2503.01903 (2025).

Liu, J. et al. Benchmarking large language models on CMExam—a comprehensive Chinese medical exam dataset. In Proc. 37th International Conference on Neural Information Processing System 52430–52452 (2023).

Sun, Y. et al. Ernie 3.0: large-scale knowledge enhanced pre-training for language understanding and generation. Preprint at https://arxiv.org/abs/2107.02137 (2021).

Papineni, K., Roukos, S., Ward, T. & Zhu, W.-J. Bleu: a method for automatic evaluation of machine translation. In Proc. 40th Annual Meeting of the Association for Computational Linguistics 311–318 (Association for Computational Linguistics, 2002).

Lin, C.-Y. Rouge: a package for automatic evaluation of summaries. In Text Summarization Branches Out 74-81 (Association for Computational Linguistics, 2004).

Zhang, T., Kishore, V., Wu, F., Weinberger, K. Q. & Artzi, Y. BERTScore: evaluating text generation with BERT. In International Conference on Learning Representations (ICLR) https://openreview.net/pdf?id=SkeHuCVFDr (2020).

International Statistical Classification of Diseases and Related Health Problems: Alphabetical Index (World Health Organization, 2004).

Yang, A. et al. Qwen2.5-1M technical report. Preprint at https://arxiv.org/abs/2501.15383 (2025).

Achiam, J. et al. GPT-4 technical report. Preprint at https://arxiv.org/abs/2303.08774(2023).

Guo, D. et al. DeepSeek-R1: incentivizing reasoning capability in llms via reinforcement learning. Preprint at https://arxiv.org/abs/2501.12948 (2025).

Zhang, T. et al. Prevalence of personality disorders using two diagnostic systems in psychiatric outpatients in Shanghai, China: a comparison of uni-axial and multi-axial formulation. Soc. Psychiatry Psichiatr. Epidemiol. 47, 1409–1417 (2012).

Article 

Google Scholar
 

Demszky, D. et al. Using large language models in psychology. Nat. Rev. Psychol. 2, 688–701 (2023).


Google Scholar
 

Singhal, K. et al. Toward expert-level medical question answering with large language models. Nat. Med. 31, 943–950 (2025).

Article 

Google Scholar
 

Huang, J. & Chang, K. C.-C. Towards reasoning in large language models: a survey. In Findings of the Association for Computational Linguistics: ACL 2023 1049–1065 (Association for Computational Linguistics, 2023).

Thieme, A., Belgrave, D. & Doherty, G. Machine learning in mental health: a systematic review of the HCI literature to support the development of effective and implementable ML systems. ACM Trans. Comput. Hum. Interact. 27, 1–53 (2020).

Article 

Google Scholar
 

Kaplan, J. et al. Scaling laws for neural language models. Preprint at https://arxiv.org/abs/2001.08361 (2020).

Shao, Z. et al. DeepSeekMath: pushing the limits of mathematical reasoning in open language models. Preprint at https://arxiv.org/abs/2402.03300 (2024).

Kwon, W. et al. Efficient memory management for large language model serving with pagedattention. Association for Computing Machinery (ACM). In Proc. 29th Symposium On Operating Systems Principles 611–626 (2023).

Wang, R. et al. PsychFound: PsychFound code and dataset. Zenodo https://doi.org/10.5281/zenodo.17768150 (2025).

Share.

Comments are closed.