A domain-adapted large language model to support clinicians in psychiatric clinical practice

World Mental Health Report: Transforming Mental Health For All (World Health Organization, 2022).

Huang, Y. et al. Prevalence of mental disorders in China: a cross-sectional epidemiological study. Lancet Psychiatry 6, 211–224 (2019).

Article

Google Scholar

Mental Health Atlas 2020: Review of the Eastern Mediterranean Region (World Health Organization, 2022).

Chen, R., Zhang, W. & Wu, X. Mental health policy and implementation from 2009 to 2020 in China. SSM – Ment. Health 4, 100244 (2023).

Article

Google Scholar

Stein, D. J. et al. Psychiatric diagnosis and treatment in the 21st century: paradigm shifts versus incremental integration. World Psychiatry 21, 393–414 (2022).

Article

Google Scholar

Feuerriegel, S. et al. Using natural language processing to analyse text data in behavioural science. Nat. Rev. Psychol. 4, 96–111 (2025).

Article

Google Scholar

Obradovich, N. et al. Opportunities and risks of large language models in psychiatry. NPP Digit. Psychiatry Neurosci. 2, 8 (2024).

Article

Google Scholar

Mukherjee, S. S. et al. Natural language processing-based quantification of the mental state of psychiatric patients. Comput. Psychiatry 4, 76–106 (2020).

Article

Google Scholar

Jacob, K. Patient experience and psychiatric discourse. The Psychiatrist 36, 414–417 (2012).

Article

Google Scholar

Murad, M. H. et al. Measuring documentation burden in healthcare. J. Gen. Intern. Med. 39, 2837–2848 (2024).

Article

Google Scholar

Gaffney, A. et al. Medical documentation burden among US office-based physicians in 2019: a national study. JAMA Intern. Med. 182, 564–566 (2022).

Article

Google Scholar

Van Veen, D. et al. Adapted large language models can outperform medical experts in clinical text summarization. Nat. Med. 30, 1134–1142 (2024).

Article

Google Scholar

Li, J. et al. Integrated image-based deep learning and language models for primary diabetes care. Nat. Med. 30, 2886–2896 (2024).

Article

Google Scholar

Liu, X. et al. A generalist medical language model for disease diagnosis assistance. Nat. Med. 31, 932–942 (2025).

Article

Google Scholar

Hager, P. et al. Evaluation and mitigation of the limitations of large language models in clinical decision-making. Nat. Med. 30, 2613–2622 (2024).

Article

Google Scholar

Lamichhane, B. Evaluation of chatgpt for NLP-based mental health applications. Preprint at https://arxiv.org/abs/2303.15727 (2023).

Amin, M., Cambria, E. & Schuller, B. Will affective computing emerge from foundation models and general AI? A first evaluation on ChatGPT. Preprint at http://arxiv.org/abs/2303.03186 (2023).

Wei, J. et al. Chain-of-thought prompting elicits reasoning in large language models. In Proc. 36th International Conference on Neural Information Processing Systems 24824–24837 (2022).

Tu, T. et al. Towards conversational diagnostic artificial intelligence. Nature 642, 442–450 (2025).

Article

Google Scholar

Sartori, G. & Orrù, G. Language models and psychological sciences. Front. Psychol. 14, 1279317 (2023).

Article

Google Scholar

Wang, N. et al. Rolellm: benchmarking, eliciting, and enhancing role-playing abilities of large language models. In Findings of the Association for Computational Linguistics: ACL 2024 14743–14777 (Association for Computational Linguistics, 2024).

Yang, Q. et al. Psychogat: a novel psychological measurement paradigm through interactive fiction games with llm agents. In Proc. 62nd Annual Meeting of the Association for Computational Linguistics Volume 1: Long Papers 14470–14505 (Association for Computational Linguistics, 2024).

Rathje, S. et al. GPT is an effective tool for multilingual psychological text analysis. Proc. Natl Acad. Sci. USA 121, e2308950121 (2024).

She, D., Zhang, C., Yao, X., Gao, Y. & Jin, Z. MindChat-R0: a large language model for emotionally supportive dialogue through reinforcement learning. In Companion of the 2025 ACM International Joint Conference on Pervasive and Ubiquitous Computing 1209–1216 (Association for Computing Machinery, 2025).

Team, E. EmoLLM: reinventing mental health support with large language models. Preprint at https://arxiv.org/abs/2406.16442 (2024).

Chen, Y., et al. Soulchat: improving LLMs’ empathy, listening, and comfort abilities through fine-tuning with multi-turn empathy conversations. In Findings of the Association for Computational Linguistics: EMNLP 2023 1170–1183 (Association for Computational Linguistics, 2023).

Hu, J. et al. Psycollm: enhancing LLM for psychological understanding and evaluation. IEEE Trans. Comput. Soc. Syst. 12, 539–551 (2024).

Article

Google Scholar

Hiemke, C. et al. Consensus guidelines for therapeutic drug monitoring in neuropsychopharmacology: update 2017. Pharmacopsychiatry 51, 9–62 (2018).

Article

Google Scholar

Wicha, S. G. et al. From therapeutic drug monitoring to model-informed precision dosing for antibiotics. Clin. Pharmacol. Ther. 109, 928–941 (2021).

Article

Google Scholar

Relling, M. & Klein, T. CPIC: clinical pharmacogenetics implementation consortium of the pharmacogenomics research network. Clin. Pharmacol. Ther. 89, 464–467 (2011).

Article

Google Scholar

Hicks, J. K. et al. Clinical Pharmacogenetics Implementation Consortium (CPIC) guideline for CYP2D6 and CYP2C19 genotypes and dosing of selective serotonin reuptake inhibitors. Clin. Pharmacol. Ther. 98, 127–134 (2015).

Article

Google Scholar

Liu, S. et al. PsychBench: a comprehensive and professional benchmark for evaluating the performance of LLM-assisted psychiatric clinical practice. Preprint at https://arxiv.org/abs/2503.01903 (2025).

Liu, J. et al. Benchmarking large language models on CMExam—a comprehensive Chinese medical exam dataset. In Proc. 37th International Conference on Neural Information Processing System 52430–52452 (2023).

Sun, Y. et al. Ernie 3.0: large-scale knowledge enhanced pre-training for language understanding and generation. Preprint at https://arxiv.org/abs/2107.02137 (2021).

Papineni, K., Roukos, S., Ward, T. & Zhu, W.-J. Bleu: a method for automatic evaluation of machine translation. In Proc. 40th Annual Meeting of the Association for Computational Linguistics 311–318 (Association for Computational Linguistics, 2002).

Lin, C.-Y. Rouge: a package for automatic evaluation of summaries. In Text Summarization Branches Out 74-81 (Association for Computational Linguistics, 2004).

Zhang, T., Kishore, V., Wu, F., Weinberger, K. Q. & Artzi, Y. BERTScore: evaluating text generation with BERT. In International Conference on Learning Representations (ICLR) https://openreview.net/pdf?id=SkeHuCVFDr (2020).

International Statistical Classification of Diseases and Related Health Problems: Alphabetical Index (World Health Organization, 2004).

Yang, A. et al. Qwen2.5-1M technical report. Preprint at https://arxiv.org/abs/2501.15383 (2025).

Achiam, J. et al. GPT-4 technical report. Preprint at https://arxiv.org/abs/2303.08774(2023).

Guo, D. et al. DeepSeek-R1: incentivizing reasoning capability in llms via reinforcement learning. Preprint at https://arxiv.org/abs/2501.12948 (2025).

Zhang, T. et al. Prevalence of personality disorders using two diagnostic systems in psychiatric outpatients in Shanghai, China: a comparison of uni-axial and multi-axial formulation. Soc. Psychiatry Psichiatr. Epidemiol. 47, 1409–1417 (2012).

Article

Google Scholar

Demszky, D. et al. Using large language models in psychology. Nat. Rev. Psychol. 2, 688–701 (2023).

Google Scholar

Singhal, K. et al. Toward expert-level medical question answering with large language models. Nat. Med. 31, 943–950 (2025).

Article

Google Scholar

Huang, J. & Chang, K. C.-C. Towards reasoning in large language models: a survey. In Findings of the Association for Computational Linguistics: ACL 2023 1049–1065 (Association for Computational Linguistics, 2023).

Thieme, A., Belgrave, D. & Doherty, G. Machine learning in mental health: a systematic review of the HCI literature to support the development of effective and implementable ML systems. ACM Trans. Comput. Hum. Interact. 27, 1–53 (2020).

Article

Google Scholar

Kaplan, J. et al. Scaling laws for neural language models. Preprint at https://arxiv.org/abs/2001.08361 (2020).

Shao, Z. et al. DeepSeekMath: pushing the limits of mathematical reasoning in open language models. Preprint at https://arxiv.org/abs/2402.03300 (2024).

Kwon, W. et al. Efficient memory management for large language model serving with pagedattention. Association for Computing Machinery (ACM). In Proc. 29th Symposium On Operating Systems Principles 611–626 (2023).

Wang, R. et al. PsychFound: PsychFound code and dataset. Zenodo https://doi.org/10.5281/zenodo.17768150 (2025).

‘Overwhelmed’ community mental health nurses warn of patients ‘routinely coming to harm’

Might SSRIs for depression, anxiety make tinnitus worse?

Even decaf can improve mood, brain health, and here’s why

Motif Neurotech gets FDA nod to test its brain implant for depression

Does serotonin play a role in tinnitus? Mouse study raises question

Ohio mom finds support after teen son’s mental health crisis in new community

Ohio mom finds support after teen son’s mental health crisis in new community

A domain-adapted large language model to support clinicians in psychiatric clinical practice