How We Build Safe AI for a Mental Health App: Ya Tut Case Study

When UA Mental Help approached us to build "Ya Tut", a mobile mental health app with AI assistant, we knew this wasn't going to be a typical chatbot project.

Mental health is a domain where AI can do tremendous good: providing 24/7 support, delivering evidence-based techniques, and reaching people who might never visit a therapist. But it's also a domain where AI can cause real harm - through misdiagnosis, inappropriate advice, or simply failing someone in crisis. Recent research from Brown University has shown that many AI chatbots systematically violate mental health ethics standards, including failing to refer users to appropriate resources during crisis situations.

This post shares the technical and design decisions KeyToTech team made to build AI that genuinely helps without overstepping its boundaries. We'll cover our research process, the guardrails we implemented, how we used RAG architecture to keep responses grounded in clinical evidence, and the safety protocols that protect vulnerable users.

The Challenge: mental health AI assistant that helps without harming

Before writing a single line of code, we spent weeks researching the mental health app landscape. We analyzed competitors like Woebot, Wysa, Headspace, and Calm. We interviewed clinical psychologists working with UAMentalHelp organization. We studied documented cases where mental health chatbots failed and there were more of those than we expected.

The mental health app market has exploded in recent years. According to industry reports, there are over 20,000 mental health apps available for download. But quality varies wildly. Some are built by clinical teams with rigorous safety protocols. Others are built by developers with good intentions but a limited understanding of psychological harm. As Stanford researchers have warned, AI therapy tools present significant risks when not properly designed.

We approached this project with a fundamental question: How do we build AI that genuinely helps vulnerable people - without causing unintended harm?

What we found during research was concerning:

Some apps positioned AI as a "therapist in your pocket" - a dangerous framing that sets wrong expectations. Others had chatbots that would confidently dispense medical advice they had no business giving. Several had no protocols for detecting crisis situations.

We identified three core risks we needed to address:

Risk 1: AI Playing Doctor LLMs are confident by nature. Without constraints, GPT will happily diagnose depression, recommend medication dosages, or interpret clinical questionnaire results. This isn't just unhelpful, it's potentially dangerous.

Risk 2: Hallucinated Therapeutic Advice Language models can generate plausible-sounding but completely fabricated therapeutic techniques. A user in distress deserves evidence-based methods, not AI-generated wellness fiction.

Risk 3: Missing Crisis Signals Someone expressing suicidal ideation needs immediate human intervention, not a chatbot response about breathing exercises. The AI must recognize these moments and respond appropriately.

Our Solution: A three-layer safety architecture for AI verificaton

KeyToTech team designed AI system around three interconnected safety layers:

Layer 1: Strict Role Definition (System Prompt Guardrails)

The foundation of safe AI behavior is a clear role definition. Our system prompt explicitly defined what the AI is and isn't:

A supportive companion who helps users navigate mental health resources.

NOT:

- A therapist or counselor

- A medical professional

- Qualified to diagnose any condition

- Able to interpret clinical assessment results

ALWAYS:

- Recommend professional help for serious concerns

- Acknowledge limitations openly

- Provide warm, empathetic responses

- Direct users to appropriate app resources

But system prompts alone aren't enough. Users can sometimes "jailbreak" them through creative prompting. So we added additional layers.

Layer 2: Response Filtering and Classification

Every AI response passes through a classification layer before reaching the user. This layer checks for:

Diagnostic language ("You have depression", "This indicates anxiety")
Medical advice (medication mentions, treatment recommendations)
Dismissive responses (minimizing the user's feelings)
Inappropriate confidence (certainty about clinical matters)

Flagged responses are either modified or replaced with safe alternatives. In our testing, we ran over 500 adversarial prompts specifically designed to elicit unsafe responses. The filter caught 99.2% of problematic outputs.

Layer 3: Crisis Detection and Escalation

This is the most critical safety component. The system continuously monitors conversations for crisis indicators:

Direct statements about self-harm or suicide
Expressions of hopelessness or worthlessness
Mentions of specific plans or methods
Sudden shifts in emotional tone

When crisis signals are detected, the AI immediately shifts behavior:

Acknowledges the user's pain without minimizing
Provides crisis hotline contact (specific to Ukraine)
Encourages immediate professional contact
Offers to help find local emergency resources
Does NOT attempt to "solve" the crisis through conversation

We tested this system extensively with clinical psychologists to ensure appropriate sensitivity catching real crises without false alarms that would desensitize users.

RAG Architecture: Grounding AI in Clinical Evidence

One of the biggest risks with LLMs in healthcare is hallucination - generating plausible but incorrect information. In mental health, this isn't just annoying, it's dangerous. A fabricated coping technique might seem harmless, but for someone in genuine distress, ineffective advice erodes trust and delays real help.

Traditional chatbots solve this by using rigid decision trees. But decision trees feel robotic and can't handle the nuance of human emotional expression. We wanted the warmth and flexibility of LLMs without the hallucination risk.

Our solution was Retrieval-Augmented Generation (RAG). Instead of letting the AI generate therapeutic content from its training data, we force it to retrieve information from a curated knowledge base built by licensed psychologists.

How Our RAG System Works:

Content Library Creation Clinical psychologists from UA Mental Help created a library of evidence-based content: CBT exercises, breathing techniques, journaling prompts, psychoeducational materials, and coping strategies. Each piece was reviewed for clinical accuracy.
Semantic Indexing All content was indexed using embedding vectors, allowing semantic search. When a user describes feeling anxious before a presentation, the system finds relevant content about performance anxiety - even if they didn't use that exact term.
Context-Aware Retrieval The AI doesn't just retrieve content randomly. It considers:

User's current emotional state (from mood tracking)
Previous modules they've completed
Time of day and usage patterns
Conversation context

Grounded Generation The AI uses retrieved content as the basis for its responses. It can paraphrase, summarize, and adapt the tone - but it cannot invent new therapeutic techniques or advice.

Example in Practice:

User: "I can't stop worrying about everything. My mind races constantly."

Without RAG, GPT might generate generic advice or even fabricate a technique. With RAG:

System retrieves relevant content about anxiety and rumination
Finds a validated grounding exercise (5-4-3-2-1 technique)
AI presents this technique conversationally, with the exact steps from the clinical library
Response includes link to the full exercise in the app

The user gets evidence-based help, presented in a warm, accessible way, without any risk of hallucinated advice.

Technical Implementation: The stack behind safe AI in mental health

Building safe AI for mental health required careful technology choices. Every decision was evaluated not just for functionality, but for security, privacy, and long-term maintainability.

Core Stack:

Flutter - Cross-platform development for iOS and Android, allowing us to maintain a single codebase while delivering native performance on both platforms
PostgreSQL - Secure storage for user data and conversation history, with row-level encryption for sensitive content
DigitalOcean - Cloud infrastructure with European data residency, important for GDPR compliance and user trust
Firebase - Analytics and push notifications, configured to minimize data collection

AI Components:

OpenAI GPT API - Base language model with custom system prompts defining role and constraints
Custom RAG pipeline - Vector embeddings using sentence transformers + semantic search via pgvector
Response classifier - Fine-tuned model for safety filtering, trained on our adversarial prompt database
Crisis detection module - Combination of pattern matching for explicit signals and ML classification for subtle indicators

Integration Architecture:

The AI doesn't operate in isolation. It's deeply integrated with the app's content management system. When the RAG pipeline retrieves relevant content, it also pulls metadata about which app module contains the full exercise, allowing the AI to provide direct deep links. This creates a seamless experience where the AI conversation naturally guides users to structured therapeutic content.

Security Considerations:

Mental health data is extremely sensitive. Our security measures include:

End-to-end encryption for all user communications
Minimal data collection principle (we don't store what we don't need)
Anonymized analytics that never expose personal content
GDPR-compliant data handling
Regular security audits

What the AI Can and Cannot Do in mental health app

Clear boundaries are essential. Here's exactly how we configured AI to operate:

The AI DOES:

Help users find relevant self-help modules
Provide warm, empathetic acknowledgment of feelings
Guide users through evidence-based exercises
Explain concepts from the content library
Track mood patterns and highlight insights
Recommend professional consultation when appropriate

The AI DOES NOT:

Diagnose any mental health condition
Interpret PHQ-9, GAD-7, or other clinical assessment results
Recommend medications or treatments
Provide therapy or counseling
Replace professional mental health care
Make predictions about user's mental health trajectory

We communicated these boundaries clearly to users through onboarding and in-app messaging. Transparency about AI limitations actually increases trust.

Clinical Integration: Where AI ends and humans begin

Ya Tut includes validated clinical screening tools - PHQ-9 for depression and GAD-7 for anxiety. These are widely used instruments that help identify people who might benefit from professional support.

But here's the critical design decision:

The AI never interprets these results.

When a user completes PHQ-9, they see their score and a general explanation of what the ranges mean. But the app immediately recommends connecting with a licensed psychologist from UA Mental Help's network for proper interpretation and next steps.

Why? Because clinical questionnaires are screening tools, not diagnostic instruments. A high PHQ-9 score might indicate depression, or it might reflect temporary stress, physical illness, or medication side effects. Only a trained professional can make that determination.

This human-AI handoff is intentional and crucial. Technology opens the door, professionals guide the journey.

Lessons Learned: What we'd tell other teams looking to build AI mental health assistant

After launching Ya Tut and gathering months of real-world feedback, here's what we'd share with teams building AI for sensitive domains:

Define Boundaries Before Writing Code

Decide what your AI will and won't do before implementation. Document these boundaries clearly. Review them with domain experts (in our case, clinical psychologists). These boundaries should be treated as product requirements, not afterthoughts. KeyToTech discovery process treats safety requirements with the same rigor as functional requirements.

Test Adversarially - Then Test More

Don't just test happy paths. Actively try to break your safety measures. What happens when a user tries to get medical advice? What if they claim to be a doctor? What if they express suicidal thoughts in indirect language? Real users will always find edge cases you didn't anticipate. Our QA team developed specialized testing protocols for AI safety validation.

RAG Over Pure Generation

For any domain where accuracy matters, RAG architecture is essential. The content library review process is time-consuming but worth it - you know exactly what your AI might say. Pure LLM generation is a black box; RAG gives you control and auditability.

Build Crisis Protocols First

Crisis detection shouldn't be an afterthought. Design these protocols early and test them thoroughly. Work with mental health professionals who understand what crisis really looks like - the subtle signs, the indirect language, the cultural variations in expressing distress. This is the highest-stakes part of your system.

Transparency Builds Trust

Users appreciate knowing what AI can and can't do. Clear disclaimers and honest communication about limitations don't reduce engagement, they increase trust. We found that users who understood the AI's boundaries actually engaged more deeply with the app.

Human Oversight Always

AI in mental health should augment human care, not replace it. Build clear pathways from AI interactions to human professionals. Make those pathways easy and inviting. The goal is to lower barriers to professional help, not create a substitute for it.

Results: What Ya Tut Achieved

Ya Tut launched on both iOS and Android, extending UA Mental Help's mission into the digital space. The app provides:

24/7 Access to evidence-based mental health resources
80+ Specialists contributed content to the library
2 Platforms (iOS and Android) with consistent experience
Zero Safety Incidents since launch
Thousands of sessions completed through the AI assistant

The launch was particularly meaningful given the context. Ukraine has faced unprecedented mental health challenges in recent years. Professional psychological support, while critical, has limited capacity. Ya Tut helps bridge this gap, not by replacing therapists, but by making the first step toward mental health support easier and less intimidating.

More importantly, the app creates a pathway to professional care. Many users who might never have contacted a psychologist directly found their way to UA Mental Help's services through the app's gentle guidance. The AI assistant serves as a warm introduction to mental health concepts, reducing stigma and building comfort with seeking help.

User feedback has highlighted specific features that resonate: the non-judgmental tone of the AI assistant, the practical nature of CBT exercises, and the privacy of being able to explore emotions without another person present. For many Ukrainians, this is their first experience with structured mental health support.

Conclusion: AI as a bridge, not a destination

Mental health technology is at an inflection point. LLMs make it possible to create genuinely helpful, accessible mental health tools. But with that power comes responsibility.

The question isn't whether AI can help people with mental health, it clearly can. The question is whether we're building it thoughtfully, with appropriate guardrails, grounded in clinical evidence, and integrated with human care.

At KeyToTech, we believe AI should be a bridge to help, not a destination that replaces human connection. Ya Tut embodies this philosophy: warm, accessible AI that knows its limits and guides users toward the professional support they deserve.

Building Something Similar in Mental Health or Similar Industry?

Mental health and wellness apps require both technical excellence and deep sensitivity to user needs. Our team has experience navigating the unique challenges of HealthTech development - from clinical workflow integration to AI safety protocols.

Whether you're building a startup MVP or scaling an existing product, we can help you implement responsible AI that genuinely serves your users.