A therapist told us this story. We’ve changed nothing about it except her name, which we never had.

New to private practice. A prospective client reached out through a directory. On the phone he was vague, then suddenly, aggressively explicit about his sexual history. She tried to steer it back — professionally, gently — and he finished, audibly, over the line. She hung up. Six minutes. She almost cried. The word she used was used.

She did everything right. She set a boundary, and when he ran straight through it, she ended the call. The software she was paying for did nothing — because there was nothing it could do. The call wasn’t on her practice platform at all. It was on a phone number a stranger pulled off a directory.

That gap is not an accident. It’s the predictable result of a decision almost every therapy platform has made: rent your video from a third party, and wrap a billing screen around it. This is the story of why we didn’t.

The wrapper problem

Open up most “telehealth” features in practice-management software and look closely at the video call. It’s Zoom. Or Doxy. Or Twilio, or a Daily embed, or a generic WebRTC vendor with the logo filed off. The practice software handles scheduling, the chart, the invoice — and at session time it hands you off to someone else’s video room and steps back.

That works, right up until you need the video itself to do something. And then you discover the ceiling:

You can’t inspect what’s on the stream, because you don’t run the media server.
You can’t act on what’s on the stream — mute a track, blur it, end a room on a signal — because the controls live in the vendor’s cloud, not your code.
You can’t build anything the vendor didn’t anticipate. Their roadmap is your roadmap.

So when a clinician is exposed to something they never consented to see, the wrapper has no answer. Zoom has no concept of “this is a therapy session and that should never be on screen.” Doxy doesn’t either. They aren’t built to. They’re general-purpose video, and general-purpose video treats every pixel the same.

We took the harder path on purpose. We run our own media stack — a self-hosted LiveKit SFU, our own signaling, our own server-side agents sitting inside the media path. It cost us more to build. It’s the reason the rest of this post is possible.

What owning the stack actually unlocks

When you own the video path, you can put a process inside the call — a server-side participant that sees the stream and can act on it, under rules you write. We already do this for one thing clinicians love: an ambient AI scribe that transcribes the session for the note, but only while consent is explicitly on, with audio that never touches disk.

The same architecture — a trusted agent inside the room — is what lets us answer the story at the top of this post. We call the feature Exposure Guard.

Exposure Guard: the clinician draws the line, the software holds it

Here’s the principle we refuse to compromise on: the platform does not decide what’s appropriate. The clinician does.

That’s not a throwaway line. Nudity is clinically normal in plenty of telehealth — dermatology, lactation support, wound care, physical therapy. A platform that hardcodes “no skin on camera” is wrong for those clinicians and condescending to all of them. So we don’t hardcode a rule. We give the clinician a control surface and enforce their boundary.

For a psychotherapist, that boundary is usually simple: this should never happen on my screen. So Exposure Guard, when she turns it on:

Watches the incoming video server-side — a model running inside our own infrastructure, on our own servers. Frames are analyzed in-process and thrown away. No image is ever stored, and nothing is sent to any outside vision API. There’s nothing to leak because nothing is kept.
Hides it before she has to see it. On a confident detection the track is blurred and muted for everyone in the room — the server does this, not her browser, so it works no matter what device she’s on.
Hands her a humane next step, not a helpless one. A prompt appears: Possible exposure detected — the video has been hidden. Two buttons. End session and report, which closes the room and writes an incident record she can take to her supervisor or licensing board. Or This is clinically expected — resume, for the clinician whose work involves the human body, with that override logged too.

The therapist in the story hung up and then sat alone with it. With Exposure Guard, she’d never have been made to watch, and she wouldn’t have been left holding it by herself afterward — there’d be a record, a report, and a blocked contact, all in one motion.

A note on what we don’t keep: no user data, no client account, no stored video, no captured frames. The only thing that persists from a detection is a small, image-free record — a score, a timestamp, the action taken. Everything else is gone the instant the frame is analyzed. That’s not a limitation we worked around; it’s the design. Protection that requires hoarding evidence isn’t protection a therapist can trust.

Why a wrapper can’t ship this — and we can

Walk it back through the architecture and the reason is concrete, not rhetorical:

What Exposure Guard needs	Wrapper (Zoom / Doxy / embed)	Owning the stack (us)
A process inside the media path	No — the media lives in the vendor’s cloud	Yes — our agent is a room participant
Server-side action on a track (mute/blur/end)	No — controls are the vendor’s	Yes — our Go service holds room admin
Frames analyzed without leaving your trust boundary	No — pixels are already in someone else’s cloud	Yes — in-process, on our servers, then discarded
A feature the vendor never planned for	No — bound to their roadmap	Yes — it’s our code end to end

This is why “build vs. wrap” isn’t an engineering vanity. A wrapped platform structurally cannot protect the clinician at the level the job demands, because the part that would do the protecting belongs to someone whose business is generic video conferencing. We built our own so that the hardest cases — the rare, awful, six-minute ones — have an answer.

The rare case is exactly the one that matters

Exposure Guard is for something that won’t happen in most practices most weeks. We built it anyway, and we’re putting it on our top tier, because the cost of not having it lands entirely on the clinician — disproportionately on new therapists, disproportionately on women, at the most vulnerable moment of their work: first contact with a stranger.

Most software optimizes the common path and shrugs at the tail. Safety lives in the tail. The whole reason to own your stack instead of renting it is so that when the rare, serious thing happens, your software is standing where it can help — not handing your clinician off to a video vendor and stepping back.

That’s why we built our own telehealth. Not because Zoom is bad at video calls. Because a video call was never the point.

Exposure Guard ships as part of our premium safety tier. If you’re a clinician with a story like this one, we’d genuinely like to hear it — it’s how we decide what to build next.

The AI Scribe writes the note

Why we built our own telehealth — instead of wrapping Zoom

The wrapper problem

What owning the stack actually unlocks

Exposure Guard: the clinician draws the line, the software holds it

Why a wrapper can’t ship this — and we can

The rare case is exactly the one that matters