Writing a subtitle quality quantification tool

Following an interesting conversation I had with someone from Netflix, I’ve been thinking for a bit about one specific project that’s really interesting to me (interested in subtitles and accessibility as I am): A subtitle quality quantification tool.

While for video quality there’s VMAF, PSNR and more, audio has PEAQ, there’s nothing that I know of that does the same thing for subtitles. In fact, how would such a tool look like? What does it take into consideration, how do you make all of it automatic…?

A few things are more or less clear; in fact the fan scene and different sites even have some rules that you have to follow:

That’s mostly very low hanging fruit. What else can we do? I can think of many things, but these are the ones that I would say instantly break a good subtitle experience if they are not correct:

So what’s the plan here? My idea is to work on this on the context of Google Summer of Code 2022, which today seems far away but it’s really just around the corner when you consider all the preparation it takes.

Interested students, get in touch.

Notes: