Notes from inside Trust & Safety

Before I started working in Trust & Safety, I thought the line between harm and free speech would be obvious.
It isn’t.
In theory, free speech protects expression. Harm prevention protects people. Most people assume the boundary between the two is clear.
From inside moderation rooms, I can tell you it’s rarely that simple.
Every day, we review content that sits exactly on that line.
Speech That Offends vs Speech That Harms
Not all offensive speech is harmful. And not all harmful speech looks extreme.
People are allowed to express unpopular opinions. They are allowed to criticize governments, religions, ideologies, and corporations. If platforms removed everything that offended someone, there would be little left online.
But harm begins when speech targets individuals or groups in ways that dehumanize, threaten, or incite violence.
The shift from opinion to attack can be subtle.
A statement about a policy may be legitimate debate. A statement about a group’s inherent worth crosses into something else entirely.
Distinguishing the two requires context, not just keywords.
Intent vs Impact
One of the hardest lessons I’ve learned is that intent does not always match impact.
Some users claim they were “just joking.” Others argue they were “just being honest.”
But if speech leads to coordinated harassment, real-world threats, or systemic intimidation, the impact outweighs the claimed intent.
At the same time, overcorrecting can suppress satire, cultural expression, or political dissent.
The line is not drawn based on emotion. It’s drawn based on risk.
The Role of Power
Another factor people rarely consider is power dynamics.
Speech from a marginalized individual criticizing authority is different from speech targeting a vulnerable group.
The same words can have different consequences depending on who is speaking and who is being targeted.
Moderation policies try to account for this, but applying them consistently across cultures and political climates is challenging.
What feels like accountability in one context can feel like censorship in another.
Automation Isn’t Enough
AI can flag keywords. It can detect patterns. It can estimate risk.
But it cannot fully understand sarcasm, evolving slang, or cultural tension.
Human reviewers step in where nuance matters most.
And that’s where the weight of the decision becomes real.
Remove the content, and someone may feel silenced.
Leave it up, and someone else may feel unsafe.
There is no option that pleases everyone.
So Where Is the Line?
From my experience, the line sits where expression begins to create credible risk of harm.
Not discomfort. Not disagreement.
Harm.
Threats. Incitement. Dehumanization. Coordinated abuse.
Free speech is essential to open societies. But absolute freedom without guardrails often empowers the loudest and most aggressive voices.
Moderation is not about choosing sides. It’s about protecting space for conversation without allowing that space to become dangerous.
The line is not fixed. It shifts with context, culture, and consequence.
And drawing it is one of the most difficult responsibilities in the digital age.