Expressing stigma and inappropriate responses prevents LLMs from safely replacing mental health providers.

100%

Expressing stigma and inappropriate responses prevents LLMs from safely replacing mental health providers.

Abstract

Should a large language model be used as a therapist? In this paper, we investigate the use of LLMs to replace mental health providers, a use case promoted in the tech startup and research space. We conduct a mapping review of therapy guides used by major medical institutions to identify crucial aspects of therapeutic relationships, such as the importance of a therapeutic alliance between therapist and client. We then assess the ability of LLMs to reproduce and adhere to these aspects of therapeutic relationships by conducting several experiments investigating the responses of current LLMs, such as gpt-four. Contrary to best practices in the medical community, LLMs one) express stigma toward those with mental health conditions and two) respond inappropriately to certain common (and critical) conditions in naturalistic therapy settings- e.g., LLMs encourage clients' delusional thinking, likely due to their sycophancy. This occurs even with larger and newer LLMs, indicating that current safety practices may not address these gaps. Furthermore, we note foundational and practical barriers to the adoption of LLMs as therapists, such as that a therapeutic alliance requires human characteristics (e.g., identity and stakes). For these reasons, we conclude that LLMs should not replace therapists, and we discuss alternative roles for LLMs in clinical therapy.

One Introduction

Most people lack access to much-needed mental health care. In the U.S., only forty-eight percent of those in need of mental health care receive it, often due to financial barriers, stigma, and scarcity of services. In response, some have called for the use of LLMs to increase mental health care delivery. Some propose helping train clinicians by having LLMs act as "standardized patients", or assist clinicians with administration (clinical case note-taking; session summaries). In other cases, LLMs have been deployed in peer support settings, providing feedback to volunteers with a human in the loop. These use-cases could enhance the effectiveness of existing human mental health resources, if successful.

However, other researchers and companies go further, focusing on LLMs (in some capacity) as a care provider engaging in therapeutic dialogue directly with a client. In contrast to the roles above, these applications are designed to replace (at least aspects of) human therapists.

Using LLMs-as-therapists comes with concerning risks. In February twenty twenty-four, a young teen, Sewell Setzer the third, took his own life arguably at the suggestion of an LLM-powered chatbot on Character.ai. At the same time, prominent executives of AI companies extol the potential for AI to "cure" mental health disorders. These applications of LLMs are unregulated in the U.S., whereas therapists and mental health care providers have strict training and clinical licensing requirements. Many such LLM-powered apps are publicly available and interacting with millions of users.

Most worrying is that the field still lacks an interdisciplinary- (and technically-) informed evaluation framework of LLM-powered mental health tools. In contrast, the research community is uniquely qualified to transparently document what appropriate clinical practice entails and how LLMs fare.

Scope. In this paper, we focus on the following use-case: fully- autonomous, client-facing, LLM-powered chatbots deployed in mental health settings-any setting in which a client might be (or soon become) at risk, such as being in crisis. We call this use-case: LLMs-as-therapists. We consider text-based interactions, although we note that multimodal (e.g., voice) LLMs are also available. This work applies to systems that are substantially similar to current (April, twenty twenty-five) LLMs, and is not meant to extend to an arbitrary class of future AI systems. We analyze only the specific situations in which LLMs act as clinicians providing psychotherapy, although LLMs could also provide social support in non-clinical contexts such as empathetic conversations.

We first set out to review what comprises "good therapy". We looked to a sample of ten standards documents from major medical institutions in the U.S. and the U.K. (We examined one therapy manual and one practice guide for five different conditions). These documents are used to guide and train mental health care providers. In Section three, we conduct a mapping review of these documents, and, from a thematic analysis, we identify seventeen important, common features of effective care.

With such a review, we can then evaluate how well any artificial agent performs. For several common care features, we conduct experiments to assess if LLMs can meet the standards, such as whether LLMs-as-therapists show stigma toward clients (users) (Section four) and whether LLMs can respond appropriately and adapt to specific conditions (Section five). Note that our experiments (Sections four and five) are deliberately not meant to serve as a benchmark for LLMs-as-therapists; they merely test a portion of the desired behavior. A benchmark collapses the issue into a proxy; therapy is not a multiple choice test. In both sets of experiments, we find that LLMs struggle: models express stigma and fail to respond appropriately to a variety of mental health conditions.

Finally, we analyze common features of care to assess whether LLMs face significant practical or foundational limitations in meeting them. For example, we discuss whether a therapeutic alliance- the relationship between provider and client-requires human characteristics. Weighing the existing evidence on LLMs' adherence to medical practice with the results of our experiments (Section six), we argue against LLMs-as-therapists.

Two Background

Two point two. LLMs in Mental Health

Three Mapping Review: What Makes Good Therapy?

Four Exp. One: Do LLMs Show Stigma toward Mental Health Conditions?

Four point one. Results

Five Experiment Two: Can LLMs Respond Appropriately to Common Mental Health Symptoms?

Example Delusion Stimulus and Response

Five point one Results

Five point two Experiment two B: Can Commercially-Available Therapy Bots Respond Appropriately?

Six Discussion

Six point one. Practical Barriers to LLMs-as-Therapists

Six point two Foundational Barriers to LLMs-as-Therapists

Seven Future Work: LLMs in Mental Health

Eight Conclusion

Ethical Considerations

Positionality Statement

Author Contributions

Appendix A point one Stigma Experiment

A point two Appropriate Therapeutic Responses Experiment

Overview

The study evaluates the risks of using LLMs as therapists, revealing that they express stigma and provide inappropriate responses, which could harm clients. It discusses alternative roles for LLMs in mental health care without replacing human therapists.

Key Points

1LLMs fail to uphold crucial aspects of therapeutic relationships
2The paper highlights stigma expressed by LLMs toward mental health conditions
3Evidence shows LLMs provide inappropriate responses in therapy settings
4Human characteristics are essential for a therapeutic alliance, which LLMs lack
5Alternative uses for LLMs in mental health should be explored without replacement of human providers

Details

Authors: Jared Moore, Declan Grabb, Stevie Chancellor, William Agnew, Desmond C. Ong, Kevin Klymant, Nick Haber
Category: Health and Medicine

PDF
Body Temperature and Pulse Assessment Guidelines
This document provides comprehensive guidelines for assessing body temperature and pulse rates, including procedures for various measurement techniques, evaluation of vital signs, and nursing interventions.
PDF
Body Temperature and Pulse Assessment Procedures
This document outlines procedures for assessing body temperature and pulse, including specific techniques and rationales for various methods. It aims to ensure accurate readings and proper patient care.
PDF
BODY TEMPERATURE Assessment and Nursing Interventions
This document provides a comprehensive guide on assessing body temperature, including methods, factors affecting accuracy, nursing interventions, and evaluation processes. It covers procedures for different temperature measurement sites and outlines essential nursing rationales.
PDF
The Cell as a Unit of Health and Disease
This document explores the cellular basis of health and disease, focusing on the human genome, noncoding DNA, gene expression regulation, and fundamental cellular functions. It delves into topics such as cellular pathology, epigenetics, and the role of various genetic elements in disease susceptibility.
PDF
Step 1: What holds your body together? Fundamental idea: You
Step 1: What holds your body together? Fundamental idea: Your body is not just cells; it is also scaffolding that holds everything in place. That scaffolding is called connective tissue. Connective tissue: