A socially assistive robot to support mental wellbeing in LGBTQ+ young people at risk of self-harm: a randomized controlled trial

Trial design and ethical approval

This was a parallel, stratified, randomized controlled trial conducted with LGBTQ+ youth in the UK. All study procedures were approved by the ethics committee at King’s College London (RESCM-22/23-34570), and all participants provided written informed consent prior to participation. All trial procedures, including anticipated safeguarding concerns related to psychological distress or potential adverse events associated with participation, were prospectively registered before enrollment of the first participant (ClinicalTrials.gov: NCT06025942, available at https://bmjopen.bmj.com/content/bmjopen/14/1/e079801.full.pdf, published 8 December 2023). As we anticipated potential adverse events associated with participation, explicit plans for identifying, monitoring and addressing these concerns were included (‘Safeguarding’ section). This paper was developed in accordance with the CONSORT 2010 checklist56, and we ensured transparency by prospectively publishing the detailed trial design and the protocol57.

Participant recruitment

Participant recruitment occurred between January and September 2024, using diverse strategies including online platforms (Twitter/X, Instagram, paid research participation sites, university participant pools and MQ Mental Health Research (https://participate.mqmentalhealth.org/)), physical community outreach (schools and community centers) and snowball sampling methods (word of mouth and Volunteer Tutors Organization). This multichannel approach facilitated broad outreach among eligible youth. Recruitment was discontinued in September 2024 with final follow-ups conducted in October 2024 due to a substantial reduction in registration rates, suggesting saturation of available recruitment channels. The target minimum sample size had already been successfully achieved by this point.

Eligible participants were individuals aged 16–25 years (inclusive). The lower bound was chosen to begin at the UK age of consent and the upper bound to align with the National Youth Agency’s definition of youth (11–25 years)58. This range captures the developmental period when self-harm is likely to reach its peak59,60. Additional inclusion criteria included residing in the UK throughout the study duration, identifying within the sexual orientation or gender identity minority spectrum (LGBTQ+) and reporting any experiences of self-harm ideation or behavior within the previous month (that is, any response greater than ‘none’ on any questions on the screener), with no upper severity limit. The 1-month timeframe for self-harm ideation was selected to capture current and recent ideation, acknowledging the typically fluctuating nature of these experiences61. Additional inclusion criteria were proficiency in reading, writing and speaking English, which was essential for active engagement and accurate reporting within the study protocol. Individuals were excluded if they identified as cisgender and heterosexual, if they were above or below the age range (16-25 years inclusive), if they did not have recent experiences of self-harm ideation or behavior (in the last month) or if they lived outside of the UK.

It was not necessary for participants to indicate whether they were receiving any additional treatments or therapies for their mental health difficulties. This design decision was intentional to reflect real-world conditions in which individuals might use Purrble either alongside or independently of other supports or interventions.

ProceduresRandomization

Randomization was conducted after consent and baseline data collection (week 3) by the study coordinator, who was the only person with access to the randomization process. Participants were grouped into blocks based on similar enrollment timeframes. Within each block, participants were individually randomized in a 1:1 ratio (stratified by gender identity and cisgender versus TGD, to ensure balance between conditions) to either the Purrble + SP condition or the SP-Only (plus waitlist) condition, using an online randomization generator. Participants were coded as TGD if they reported that they were transgender, trans-masculine/feminine, non-binary, agender, gender queer, questioning their gender, genderfluid, demiboy/girl or any combination of the above. Participants were notified of their condition allocation via email after the closure of the week 3 data collection window. On the same day, Purrble devices were dispatched to participants in the Purrble + SP condition via express delivery to ensure arrival prior to the start of the week 4 survey.

Trial procedures

Participants were randomized to their condition group and enrolled into the trial after completing a synchronous compulsory safety briefing and planning session via Zoom. These intake calls were conducted in private office spaces by the ASIST-trained researcher, and participants were asked to take the call in a private, quiet location of their choice. The trial period lasted 13 weeks, with weekly online surveys hosted by Qualtrics. Surveys were sent to participants via email, with a next-day automatic reminder from Qualtrics and a personalized message the following day to those who had not completed their survey. Surveys were open to be completed for 3 days.

The initial 3 weeks of the trial acted as baseline assessments, followed by randomization, with intervention participants receiving Purrble in the fourth week. The deployment period lasted for 10 weeks. An extended assessment was conducted at week 3 (final baseline timepoint), week 8 (fifth week of deployment) and week 13 (final week of the trial). Those in SP-Only received Purrble after week 13. All participants were able to keep their Purrbles after the trial was completed and were paid £5 for each completed survey.

Public and patient involvement

This project is supported by members of Sprouting Minds (https://digitalyouth.ac.uk/the-digital-youth-programme/about-sprouting-minds/), an advisory group comprising young individuals with lived experience of poor mental health, specifically involved in Digital Youth research initiatives. A subgroup of advisors from the wider Sprouting Minds network chose to engage more closely with this project due to shared identity characteristics and personal relevance to the study population, with approximately one-third of this subgroup self-identifying as LGBTQ+. Building on their foundational involvement in prior related research36, their input substantially shaped key design decisions throughout this trial design.

Advisors from Sprouting Minds critically reviewed the trial’s methodology, focusing particularly on participant burden and the frequency and duration of survey assessments. The final design—weekly surveys lasting approximately 5−10 minutes over 13 weeks, with a reimbursement of £5 per survey—was deemed acceptable and fair by advisors. Additionally, advisors informed the selection of the control group, strongly advocating for a waitlist control condition. They noted that, although allocation to a waitlist could be initially discouraging for participants, it accurately reflected real-world experiences associated with waiting periods for mental health services. Thus, any dropout observed in this group would realistically represent typical treatment disengagement. As participants in this condition received only our standardized safeguarding procedure (that is, safety planning), it was classified as treatment-as-usual for the purposes of the trial.

Furthermore, Sprouting Minds advisors reviewed and refined participant recruitment materials to ensure clarity and readability for young audiences. Throughout the recruitment phase, advisors actively participated in strategic meetings, providing insights into effective recruitment strategies and the optimal use of various recruitment channels. Upon trial completion, advisors were presented the quantitative analyses of primary and secondary outcomes, with thorough explanations of analytical decisions and processes.

Safeguarding

Comprehensive safeguarding measures were implemented, given that study eligibility criteria required participants to have active self-harm ideation. These safeguarding procedures were collaboratively developed with input from Sprouting Minds and drew upon established strategies previously used with youth who self-harm36,62,63,64,65,66. Prior to enrollment, participants received detailed information about safeguarding protocols through both the participant information sheet and introductory researcher emails. Participants were required to attend a compulsory 1:1 safety briefing that took place via Zoom, prior to randomization. During briefings, the study purpose, procedures and safeguarding practices were outlined, with participants being invited to ask questions. Participants also engaged in compulsory safety planning.

Compulsory safety planning

All participants completed an individualized safety plan37,38 before baseline measures and randomization. These briefings were conducted by an ASIST-certified researcher who holds a PhD in suicide research. At the start of a Zoom-based safety briefing, the researcher explained confidentiality, including the circumstances under which emergency services might be contacted (imminent risk of harm to the participant or others), and emphasized that any such action would always be discussed with the participant first. All participants were aged 16 years or older and provided their own informed consent, in line with UK regulations.

Participants were emailed an empty copy of the safety plan using Microsoft Word prior to the video call and were instructed that they could either complete the plan before the call and review it during the briefing or complete it collaboratively with the researcher. Safety planning followed the Stanley−Brown framework37, previously adapted and validated for use with LGBTQ+ youth who self-harm36,62,67. The template prompted participants to (1) identify internal warning signs (thoughts, feelings and bodily sensations) and external triggers that typically precede a crisis; (2) list social and environmental sources of support; and (3) document relevant professional and crisis services with contact details. Safety plans were sent to participants by email after the briefing, and they were encouraged to retain, update and use these safety plans throughout and beyond the study duration. Additionally, participants nominated a support contact (for example, a parent, a friend over 18 years of age or a general practitioner) who could be contacted if the research team was unable to reach the participant during periods of heightened risk. The nominated contacts received an email to inform them that the participant was taking part in a mental health study and that the research team would be in contact if we were unable to contact the individual after indication of increased risk.

Passive safeguarding procedures

Each weekly survey assessment included a visual scale (from 1: ‘very distressed’ to 10: ‘extremely happy’) to evaluate participants’ mood immediately before and after survey completion66. Responses were reviewed within 24 hours of survey closure to provide timely feedback on any potential adverse effects of participation, forming one of the triggers for reactive safeguarding (‘Reactive safeguarding procedures’ section). A potential concern was identified when a participant scored lower on the visual scale after survey completion.

At the conclusion of each assessment, participants in both intervention and control groups received contact information about external support services (for example, Kooth, LGBT Foundation Helpline, Young Minds, Samaritans, Mermaids and Allsorts) within Qualtrics and were encouraged to seek professional support if experiencing distress. Such signposting is common practice in mental health research and was effectively used by this research team previously36,62.

Reactive safeguarding procedures

To prepare for reactive safeguarding, each participant nominated a trusted adult (≥18 years) during the safety planning as a support person whose phone number could be contacted if serious concerns about safety arose; this did not need to be a parent or caregiver, given the potential for non-accepting family contexts. During the study, participants completed weekly assessments that included measures reporting on self-harm, suicidal ideation and mood symptoms. These were used to identify heightened risk during the trial. Responses were reviewed within 24 hours of the survey window closing by an ASIST-trained researcher. A reactive wellbeing check was triggered when participants reported an episode of self-harm in the past week, that they had thoughts of suicide in the past week and if the survey had reduced their mood as identified by the visual scale. This check took place as a phone call between 13:00 and 16:00 the following day. This procedure aligns with established ecological momentary assessment research protocols62,64. Across all survey points, 21 reactive wellbeing checks were implemented, and there were no instances where the nominated support person had to be contacted.

Wellbeing calls, in which participants were made aware that researchers were not licensed clinicians, involved empathetic engagement, assessment of current wellbeing (for example, conversations around current mood, recent self-harm thoughts/behaviors and potential suicidal intentions) and explicit advice to seek professional support if needed. Safety plans were reviewed and updated with the young person if needed, to more accurately reflect their experiences and include any further professional services that had been suggested by the researcher. If necessary, the ASIST-trained researcher could contact emergency services with the young person’s consent, mirroring helpline responses67. During these wellbeing checks, participants were explicitly reminded that they could withdraw from the study at any time without any penalty and asked if they wished to continue in the study.

If participants were unreachable by phone, researchers sent a follow-up email to check on wellbeing, confirm continued participation and schedule another wellbeing call at a convenient time. If no response was received within 24 hours, another call was made the next day. This process was repeated up to three times. If 4 days passed without a response, the nominated support contact would be notified to ensure the participant’s safety.

Measures

Data were collected using Qualtrics surveys across 13 weeks, after a prescreening phase used for eligibility and consent, organized into three phases: baseline (weeks 1−3) and intervention deployment (weeks 4−13), of which the follow-up time period was categorized as weeks 11−13. Each survey contained the primary and secondary measures at all timepoints, with three additional measures included at weeks 3, 8 and 13. This decision was made to balance assessment of exploratory outcomes with participant burden. Expectancy measures were not collected, as Purrble was not introduced or described as a therapeutic program or skills-based intervention, and standard expectancy instruments (for example, Credibility/Expectancy Questionnaire68) would not have been appropriate in this context. Participants were unfamiliar with Purrble at enrollment and were encouraged simply to explore the device and were not primed to expect specific benefits.

In accordance with the study protocol, outcome measures for the baseline (weeks 1−3) and follow-up (weeks 11−13) phases were calculated as the mean of scores collected during their respective time windows. As the intervention remains embedded within the participant’s environment and is not withdrawn after deployment, participants may continue to engage with it during the follow-up phase in a manner similar to their usage during the active intervention phase. Detailed information regarding the administration schedule for each outcome measure is provided below.

Primary outcome: emotion regulation

The DERS-8 (ref. 69), an eight-item self-report measure, was used to assess difficulties in emotion regulation. Items were rated on a five-point scale ranging from 1 (almost never (0−10%)) to 5 (almost always (91−100%)). Higher scores indicate greater difficulties in emotion regulation. The DERS-8 was administered at all 13 timepoints (Cronbach’s α = 0.87). An example item was: ‘When I’m upset, I have difficulty getting work done’. Previous examinations of the scale have supported its reliability and validity70.

Secondary outcomesSelf-harm

The SHQ71, comprising three screener items, was used to measure self-harm. As the data were ordinal with unequal intervals and could not be averaged appropriately, each item was examined individually. Items assess self-harm thoughts, suicidal ideation and self-harm behaviors on a frequency scale ranging from 1 (no) to 4 (yes, five or more times)—for example, ‘In the last week, have you thought about harming yourself on purpose, without wanting to die?’ Items were adapted from lifetime prevalence to capture experiences within the last week. Higher scores indicate more frequent occurrences. The SHQ was administered at all 13 timepoints, with self-harm history collected at baseline using the full SHQ71. A previous study indicated adequate reliability in adolescents72,73.

Depression symptoms

Depression symptoms were measured using the PHQ-9 (ref. 74), a nine-item measure corresponding to Diagnostic and Statistical Manual of Mental Disorders, 4th Edition (DSM-IV) criteria for depression. Participants rated the severity of their depressive symptoms over the past 2 weeks, in which items were rated on a four-point scale from 0 (not at all) to 3 (nearly every day). Higher scores reflect greater depressive severity. The PHQ-9 was assessed at all 13 timepoints (Cronbach’s α = 0.86). Previous studies have confirmed the PHQ-9’s reliability and validity75,76.

Anxiety symptoms

Anxiety symptoms were assessed using the GAD-7 (ref. 77), a seven-item measure. Participants rated the severity of their anxiety symptoms over the past week on a four-point scale from 0 (not at all) to 3 (nearly every day). Higher scores represent greater severity of anxiety, with scores of 5, 10 and 15 indicating mild, moderate and severe anxiety, respectively. The GAD-7 was administered at all 13 timepoints (Cronbach’s α = 0.87). Previous research has established reliability and validity75,76.

Mechanistic outcomesHope

Hopefulness was assessed using the State Hope Scale (SHS)78. This six-item scale measures goal-directed thinking using two subscales: agency (Cronbach’s α = 0.85) and pathways (Cronbach’s α = 0.77). The item is scored on an eight-point scale from 1 (definitely false) to 8 (definitely true), with higher scores indicating greater state hope. The SHS was administered at weeks 3, 8 and 13. Previous research has established reliability79,80.

Loneliness

The UCLA Loneliness Scale (Version 3) (ref. 81) was used to measure subjective feelings of loneliness. This scale comprises three questions that measure three dimensions of loneliness—relational connectedness, social connectedness and self-perceived isolation—and was administered at weeks 3, 8 and 13 (Cronbach’s α = 0.77). Items are rated between 1 (hardly ever or never) and 3 (often), the total score of which can be categorized into lonely (≥6) or not lonely (≤5). Previous research has established reliability and validity82,83.

Attentional deployment

The full Process Model Emotion Regulation Questionnaire84 is composed of 45 questions to assess uses of 10 emotion regulation strategies. For the present trial, nine questions were used reflecting attentional deployment as an emotion regulation strategy. There are two subscales—focus elsewhere (Cronbach’s α = 0.84) and cognitively distract (Cronbach’s α = 0.86)—reflecting an engagement and disengagement focus, respectively. To score each subscale, the average of all scale items was calculated. This was administered at weeks 3, 8 and 13. Previous research has established reliability and validity85.

Post hoc analysesPerceived Purrble fit/engagement

Fit/engagement with Purrble was assessed for those in the intervention condition using the TWente Engagement with eHealth Technologies Scale (TWEETS)86. The TWEETS is a nine-item self-report measure that conceptualizes engagement across behavioral (items 1−3), cognitive (items 4−6) and affective (items 7−9) components. The TWEETS was administered in weeks 4 through 13, after participants had access to Purrble for at least one week. Items were framed as ‘Thinking about using Purrble in the last week…’ and rated on a five-point Likert scale from 0 (strongly disagree) to 4 (strongly agree). Item responses were averaged to yield a total engagement score, with higher scores indicating greater engagement. In this sample, internal consistency was excellent (Cronbach’s α = 0.91). The TWEETS has been used with LGBTQ+ youth36 and psychometric work found adequate reliability and validity86.

Intervention conditions

Within information sheets and explicitly as part of the compulsory safety briefings, all participants were told that they would be randomly allocated to either intervention (Purrble + SP) or a safety planning plus waitlist condition (SP-Only) as part of the study and that allocation would occur after safety planning and baseline data collection. It was explained that SP-Only participants would receive a Purrble after week 13 of the trial.

Safety planning

During the compulsory safety briefing, participants in both conditions engaged in safety planning supported by an ASIST-trained researcher, using the Stanley−Brown framework37. Safety plans are a brief collaborative intervention that can help to prevent suicidal behavior64,87. The aim of safety plans is to provide participants with a crisis management tool, such that if they are in a period of distress that may lead to self-harm or suicide, they have a physical reminder of their self-harm triggers, potential coping techniques or supports that have previously been helpful and direct contact information for services88. Previous research has included safety planning as a standard self-harm safeguarding measure within LGBTQ+ youth research36,62,67, enabling participants to identify when they may need to withdraw from a study or seek additional support.

Purrble: design and emotion regulation theoretical model

The Purrble intervention consisted of an interactive plush toy (Fig. 1), designed to facilitate immediate soothing responses for participants experiencing emotional distress and to promote long-term improvements in emotion regulation. Full details regarding the toy’s development, design rationale and previous testing were published elsewhere23,89.

Participants received Purrble with no instruction of when or how to use the toy, allowing for a discovery process of engagement. They were invited to use Purrble as much or as little as they liked, in whatever capacity suited their needs. Embedded electronics enabled the toy to emit heartbeat-like vibrations, ranging from rapid to slow rhythms. When engaged in specific ways (for example, turning it upside down or touching its ear), the toy initially produces a rapid vibration accompanied by sounds (rapid heartbeat and growling) mimicking signs of anxiety, which slows down as sensors register calming interactions from the participant/user. Sustained soothing results in the toy emitting a steady purring vibration, signaling a calm and contented state. This calming transition typically occurs within 1 minute but could vary depending on participant interaction style.

The theoretical underpinning of the intervention was Gross’ Process Model of Emotion Regulation, which conceptualizes emotion regulation as processes through which individuals influence their emotions, including timing, experience and expression27. This theoretical framework informed the following three-level logic model for the intervention.

First, the intervention aimed to provide immediate emotional relief during distressing moments, such as urges to engage in self-harm. Specifically, it is designed to target two stages of emotion regulation: attentional deployment, by redirecting participant focus toward soothing interactions with the toy28,31,90,91, and response modulation, facilitating emotional downregulation through tactile engagement analogous to interactions observed in human−animal emotional regulation92,93,94,95,96,97. Second, the intervention sought to promote sustained engagement over time through its conceptual framing. Specifically, participants’ long-term engagement was encouraged by depicting the toy as a vulnerable creature needing care, fostering emotional investment and responsibility similar to long-term interactions with digital pets or social robots96,97. Third, repeated use of the intervention aimed to facilitate lasting shifts in participants’ implicit beliefs regarding emotional controllability, thereby promoting healthier emotion regulation practices44. Specifically, it was hypothesized to enhance participants’ confidence in their capacity to regulate emotions effectively and reduce reliance on maladaptive strategies such as rumination or suppression.

Power analysis

A sample size of 70 participants per condition at post-intervention (deployment weeks 11−13) was needed to detect an effect size of 0.4 (Cohen’s d) for the primary outcome with a statistical power of (1 − B) = 0.80 in a one-sided t-test (P = 0.05). A medium effect size of 0.4 was selected as a preliminary exploration based on previous intervention research with and without Purrble21,42,98,99,100,101, given the uncertainty of the intervention within this population. Therefore, a minimum of 140 participants were needed overall. Considering a predicted dropout rate of 20%, the trial aimed to recruit 168 participants. Because our preregistered analytic plan specified t-tests and we subsequently adopted an ANCOVA framework, we examined the effective power of the final sample under the ANCOVA specification. Using Shieh’s exact method for ANCOVA (ANCOVA_analytic in the Superpower package in R) with the observed post-intervention DERS means in the Purrble and waitlist conditions (mean = 25.26 and mean = 28.61, respectively), the within-condition pooled follow-up s.d. = 7.18, three covariates (baseline DERS, age and gender identity) and the empirically estimated covariate R2 = 0.38, the achieved sample size yielded an estimated power of 91.7% to detect the observed emotion regulation effect at α = 0.05.

Data statistical analysis

Analyses were conducted using R statistical software. We first conducted preliminary analyses, including basic descriptive analysis with skewness and kurtosis to assess normality. All statistical tests were two-sided. We assessed equivalence of demographic (age only) and baseline variables across conditions using t-tests for continuous variables and χ2 tests for categorical variables. We did not test equivalence for sexual identity or race/ethnicity due to the small sample size, the high number of categories in each demographic variable and low membership within most categories. Equivalence by gender identity was not tested, as randomization procedures already accounted for gender identity. Second, we performed multivariate outlier analyses to identify influential data points102. Third, we conducted attrition analyses103, with attrition operationalized as participants failing to fill in any follow-up questionnaires (weeks 11−13). A binary indicator was created to represent follow-up completion (1 = filled in at least one follow-up questionnaire; 0 = filled in none). Attrition rates were calculated overall, by condition and by gender identity, using χ2 tests to determine whether attrition differed by condition or gender identity. Then, to assess potential attrition bias, we conducted two-way ANOVAs testing for condition × attrition status effects on each baseline outcome variable (except for self-harm; see below).

Deviations from preregistered analysis plan

The preregistered protocol specified one-tailed paired t-tests to evaluate intervention effects by comparing averaged baseline and follow-up scores. Prior to analysis, the team decided, instead, to use two-tailed ANCOVA models (adjusting for baseline scores, age and gender identity) as the primary approach. This change reflected (1) recognition of the possibility of bidirectional effects, including potential iatrogenic effects, and, therefore, the need for two-sided hypothesis testing, and (2) the greater statistical rigor of ANCOVA in controlling for baseline levels of the outcome, improving precision and accounting for any small group imbalances. The protocol indicated that covariates would be included but did not specify which ones or how they would be selected; we, therefore, prespecified age (to capture potential differences across the 16−25-year age range) and gender identity (cisgender versus TGD) as covariates.

The original protocol did not outline moderator analyses. After preregistration, and after consultation with an independent statistician, we expanded the analytic plan to test whether gender identity moderated intervention effects. This addition was motivated by concerns in the LGBTQ+ literature about treating TGD youth as homogeneous with sexual minority youth, thereby obscuring potentially distinct patterns of response.

The protocol stated that we would examine moderation by engagement for the three clinical outcomes (self-harm, depression and anxiety). These engagement analyses were not included in the present paper; instead, we plan to report them in a separate paper focused specifically on engagement processes in the future.

For self-harm outcomes, the protocol implied that the same analytic approach (t-tests or analogous methods) would be used for all outcomes, presumably averaging weeks 1−3 and weeks 11−13. In practice, because the self-harm variables were assessed on a four-point ordinal scale with unequal intervals, this was not possible. As such, we analyzed week 1 versus week 12 only and used ordinal logistic regression rather than t-tests or ANCOVA, based on discussions between the independent statistician and self-harm experts on the team regarding the most appropriate and interpretable approach.

Finally, although the original document suggested including baseline DERS-8 as a covariate in secondary analyses, we did not adjust for baseline DERS-8 in models predicting depression, anxiety or self-harm. This decision, informed by statistical advice and relevant literature104, reflected concerns that adjusting for emotion regulation, given its close conceptual and empirical relation to these clinical outcomes, could overcontrol and bias estimates toward the null.

Primary and secondary analyses

As outlined in the preregistration, participant scores were averaged across weeks 1−3 to compute their baseline scores and across weeks 11−13 to compute their follow-up scores. The analytic approach specified in the preregistration originally indicated that intervention effects would be evaluated using one-tailed paired t-tests. However, upon consultation with an independent statistician who joined the research team, this strategy was revised prior to data analysis. The primary analyses were modified to use two-tailed ANCOVA instead of one-tailed paired t-tests. This adjustment addressed methodological concerns regarding the potential for bidirectional intervention effects, as the theoretical framework did not explicitly exclude the possibility of iatrogenic effects. Additionally, ANCOVA allowed for statistical control of baseline outcome scores, enhancing precision in estimating intervention efficacy. This method also facilitated the inclusion of theoretically relevant covariates known to introduce variability within the target age range (16−25 years). Specifically, age was included due to its developmental importance, given the considerable cognitive, emotional and social changes occurring during this period, along with related variability in support structures (for example, living situations and educational contexts). Second, gender identity was included as a covariate due to the heightened risk of poor adverse outcomes among TGD youth compared to sexual orientation-only youth40,105 alongside unique, additional experiences (for example, gender dysphoria, transphobia and difficulties with transitioning)106.

For all main effects analyses except self-harm, we examined individual-level change using reliable change indices (RCIs) for emotion regulation difficulties (DERS-8), anxiety (GAD-7) and depressive symptoms (PHQ-9)107. For each scale, we used the baseline standard deviation and its internal consistency (Cronbach’s α) as the reliability estimate to derive an RCI score for each participant, and we classified outcomes as reliable improvement, reliable deterioration, or no reliable change based on the ±1.96 criterion. We then compared conditions using odds ratios with 95% CIs, estimating (1) the odds of reliable improvement versus all other outcomes and (2) the odds of reliable deterioration versus all other outcomes for Purrble plus safety planning relative to safety planning alone.

Exploratory analyses

Furthermore, for all main effects analyses except self-harm in addition to the main ANCOVA models, we conducted parallel linear mixed models (LMMs) for each outcome to examine changes across the weekly assessments.

Post hoc analyses

After registration, the analytic plan was expanded to explore potential moderation effects. Specifically, we introduced secondary post hoc interaction analyses examining whether gender identity (cisgender versus TGD) moderated intervention effectiveness. Given the lack of previous research, no a priori hypotheses were made for the gender identity moderation analyses.

Self-harm analyses

Each self-harm variable was measured on a four-point ordinal scale with unequal intervals. As such, two changes from the preregistered protocol had to be made. First, we did not create averages of ‘baseline’ and ‘follow-up’ for the first weeks given that the unequal intervals would make averages difficult to interpret. As such, we selected week 1 as our baseline week (the first week and the week with the largest sample size) and week 12 (the ‘middle’ of the follow-up week with at least 2 weeks past the intervention) as our follow-up week for self-harm variables. Furthermore, given the unequal intervals, the data did not meet the baseline assumptions necessary for conducting t-tests or ANCOVA. Given the structure of the data, we considered two potential approaches: ordinal logistic regression using, or odds ratios using collapsed binary versions to indicate presence versus absence of self-harm. In ordinal logistic regression models, the relation between predictors and the outcome is assumed to be homogeneous across the outcome’s ordinal level. There are two ways in which this could be conceptualized. First, statistically—meaning that variables can be tested using Brant’s test to see if they violate the proportional odds assumption. Second, theoretically—in other words: Does the idea of equal slopes across variables make sense in theory given the nature of the content (that is, moving from no self-harm at all to some level of self-harm being equivalent to moving from some self-harm to higher levels of self-harm, for example)? We considered that, from a theoretical perspective, there may be a clinical difference between change from one active harm category to another (reduction or increase) and change between a total absence of self-harm to presence of harm. As such, we opted to take the conservative approach of using a recommended conservative estimate for the Brant test (P > 0.2) for the proportional odds assumption108. If all variables included in the models (week 1 and week 12) passed the assumption, we would use the proportional odds regression. Otherwise, we would collapse the models into binary categories—specifically, using ‘no self-harm’ coded as 0 versus ‘any self-harm’ coded as 1.

The results of the Brant tests revealed that all week 1 and week 12 had P > 0.2. As such, we opted to use ordinal logistic regression models without dichotomizing the outcome. We conducted preliminary analyses including attrition analyses and baseline equivalence tests (using the non-parametric Mann−Whitney U-test). Frequencies were examined to understand data distributions. We then used ordinal logistic regression to examine intervention outcomes, controlling for age and gender identity. Finally, we examined the interaction of condition with baseline self-harm and gender identity on the outcome.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

5 Powerful Ways to Break Free From Anxiety

COURT WATCH: Dominguez Retrial Weighs Mental Health vs. Forensic Evidence

Tulsa Health Department to host two-part event on behavioral health | News

New Trials Show ECT and MST Both Effective in Reducing Bipolar Depression & “Unipolar” Major Depression, with a Safety Advantage for MST

Tulsa Health Department to host two-part event on behavioral health

Men’s mental health awareness: Confronting stigmas that keep men from seeking help

Hawaiʻi to participate in Medicaid program to expand mental health, substance use treatment services : Kauai Now

A socially assistive robot to support mental wellbeing in LGBTQ+ young people at risk of self-harm: a randomized controlled trial