Speaker diarisation explained: how AI knows who said what

Behind the scenes of speaker recognition: how it works, when it fails, and how to get the best results.

All blog posts
Product updates8 February 20265 min readTeam ForgetLess

What is speaker diarisation?

Speaker diarisation is the process in which an AI model analyses audio and works out who is saying something β€” not just what is being said. In a transcript it looks like this:

Speaker A: Good morning, how are you? Speaker B: I'm well, thank you. And you?

It sounds simple, but it is one of the hardest tasks in speech technology.

How does it work?

The model listens for voice characteristics such as:

  • Pitch
  • Speaking pace
  • Timbre (tone colour)
  • Breathing patterns

It then clusters audio into segments that appear to belong to the same voice. The model does not recognise names or identities β€” it only knows that "this voice is different from that voice".

When does it work well?

Under ideal conditions our diarisation reaches over 95% accuracy. The conditions:

  • 2 to 4 speakers
  • Clearly distinct voices (for example male and female)
  • Good recording quality without background noise
  • Speakers do not switch too quickly (no interruptions)

When does it fail?

Difficult situations include:

  • Two people talking over each other
  • Poor recording quality (distant microphone, noise)
  • Speakers with very similar voices
  • Many speakers (more than 5 already gets tricky)

In those cases the AI may swap speakers or split them incorrectly. That is exactly why we always provide an edit mode where you can correct speaker labels.

Tips for better results

  • Use a good microphone β€” preferably a lavalier or headset per speaker
  • Avoid cross-talk β€” ask people to let each other finish
  • For online meetings β€” use the "per-speaker recording" option in Zoom or Teams when available; we combine them automatically
  • Test first β€” for important work, do a short test recording to check the quality

New in ForgetLess: speaker colour coding

Starting this week you will see coloured speaker labels in our transcript view. Each speaker gets their own colour, so you can grasp the structure of the conversation at a glance. Especially handy for focus groups or panel discussions.

Give it a try on your next transcript!

Questions about our privacy or security?
We’re happy to help. Email us at [email protected].
Start free