Teens are increasingly using AI-powered mental health apps, and there’s an opportunity for schools to leverage this technology to provide more support to their students.

But a new risk assessment of the popular technology urges caution for both students and educators.

The market for these apps is unregulated and the products available can be harmful to teens, according to the assessment by Common Sense Media, a nonprofit that researches and advocates for healthy technology use among youth, and Stanford University’s Brainstorm Lab.

However, not all AI mental health apps are the same. Apps designed for use in schools that keep humans in the loop performed much better on the risk assessment than did direct-to-consumer apps.

For districts struggling with school psychologist and counselor shortages that are trying to meet the mental health needs of their students, these apps could be a useful resource, said Robbie Torney, the head of AI and digital assessments for Common Sense Media’s Youth AI Safety Institute.

“These school-based mental health apps can be a helpful part of getting students the support that they need,” he said, “but they can’t be the only part of getting students the support that they need.”

While many people—including adolescents—turn to general purpose AI chatbots like ChatGPT for mental health support, purpose-built AI mental health apps often claim to be designed with clinical expertise and provide therapeutic-based frameworks, safety protocols, and sometimes human oversight.

Three in 10 teens have used an AI mental health app and even more have used a general purpose app like ChatGPT for mental health or emotional support, according to separate research by Common Sense Media.

How the assessment was conducted

Researchers with Common Sense Media and Stanford Brainstorm Lab started by assessing two “institutional” apps and three consumer apps for safety and helpfulness. To determine whether the apps were safe, researchers created test accounts to see if apps could recognize warning signs of a variety of conditions such as anxiety, ADHD, depression, and psychosis. They also tested if the apps could asses the severity of a situation accurately, provide crisis resources and direct users to professional care when necessary, and not provide harmful advice that could worsen a user’s symptoms or delay proper treatment.

Even though the technology that supports all the apps tested is similar, the institutional apps, Alongside and Sonar, which are designed for schools and keep humans in the loop, scored significantly better on the risk assessment.

Sonar’s risk was rated as minimal and Alongside’s as low.

For Sonar, students text with well-being coaches, not an AI chatbot. AI is used to “provide context on past engagement, suggest responses, flag concerns, and assist with triage,” the report says.

Students who use Alongside can chat with an AI chatbot, but the chatbot is integrated into schools’ existing care systems rather than a standalone tool, the report noted. When chats with students broach high-risk topics, the app alerts school counselors and administrators. The chat feature is disabled if a student sends more than 60 messages in less than 3 hours.

The risk assessment still identified some weaknesses in these apps, said Torney. Alongside struggled to identify and flag signs of eating disorders, and automation bias could put Sonar’s human coaches at risk of over reliance on its chatbot, especially without proper training.

Although encouraged by the report’s “low” risk rating, Alongside is taking the recommendations seriously, said Elsa Friis, the company’s director of product and clinical care.

“We see that feedback as part of our responsibility to keep strengthening safety, accountability, and age-appropriate support,” she said. “We have already implemented the recommendations from the evaluation, including improving our eating disorder escalation pathway, and we are continuing work to make the experience easier to understand for younger students.”

Education Week reached out to Sonar for comment, but did not receive a response before publication.

In its summary, the report recommended that an app that gets a human being on the phone with a user in need quickly is the standard every product should be held to.

AI mental health apps should also be meaningfully integrated into human care systems, the assessment said. Alongside and Sonar both are both up front about the limitations of their AI features, and their apps are designed to route students to care rather than replace care, the report says.

Easily accessible consumer mental health apps show significant gaps in quality

Among the consumer apps that researchers tested, two of them disappeared from the app store during the assessment process without notice or transition support, the report says. The third consumer app, Wysa, was given a risk rating of “unacceptable.”

The risk assessment included these apps because they are popular and easily available to school-age kids to download on their own, said Torney.

A consistent issue with the direct-to-consumer products tested is that they failed to connect the dots, Torney said.

“If I share information over a series of exchanges in one conversation or a series of multiple conversations, a human therapist or a human counselor is going to be able to put those pieces of information together and have a sense of what’s going on for the young person in a holistic way,” he said. AI can identify obvious signs of mental distress, he said, but the technology often misses “bread crumbs”.

These apps also did not enforce the age limits they claimed to use, and they encouraged users to spend more time on the app.

In a statement, Wysa CEO Jo Aggarwal said the company welcomes scrutiny of its products. But, she said, the free adult version of the app was tested as part of the risk assessment, not the children and youth product it has specifically for schools and other similar settings.

“Wysa’s free consumer app is a bounded, evidence-based self-help tool for adults,” she said. “It is not a crisis service, diagnostic tool, replacement for therapy, or clinician-led pathway, and its safety protocols are designed for that context. We have addressed the genuine improvement area around safety plan retrieval identified in the report, and we are strengthening guardrails where helpful. But we strongly reject any characterization of Wysa as unsafe.”

Wysa’s youth-focused app is available only through entities like schools and counseling services that pay for the product, according to a Wysa spokesperson. Depending upon the service purchased, a school or counseling service can receive an alert if a user clicked through the app to call a crisis hotline or other similar action.

According to the risk assessment, some of the most egregious issues with Wysa’s generally available app documented by researchers include playing adult sexual games with 13-year-old test personas; mirroring users’ celebratory and enthusiastic language when they displayed signs of eating disorders, mania, and psychosis; and allowing teens to easily leave the suicide crisis pathway without any follow-up.

These kinds of AI-generated responses can delay critical treatment, the report notes.

Wysa did, however, disclose the limitations of its AI throughout conversations, even when not prompted, the risk assessment found.

An earlier risk assessment of general purpose AI chatbots for mental health—such as ChatGPT, Claude, and Gemini—by Common Sense Media found similar issues with the chatbots responding to teens’ queries safely and appropriately.

Share.

Comments are closed.