We often get requests for samples from other developers and users. Our full collection is available to developers that need it (some of the samples were submitted by people who explicitly told us not to make them public, which we honor) However we're starting to build a small public repository for everyone who wants to test.
The following link contains 10 minutes recorded of over 30 US TV channels. They were made with a HDHomeRun on Dec 14, 2016. The provider is Comcast. Nothing really interesting in content - 10 minutes per channel, recording in the West Coast morning, so just daytime TV, whatever was on.
The following link contains short recordings from some non-subscription UK channels. Some of them come from a multicast stream and some were recorded with a HDHomeRun. Whatever was on at the time (8 pm to 10 pm UK time). Be aware that the UK considers their citizens to be adults that can just switch channels if they don't like what they see.
The following link contains 15 minute recording from the same UK channels as above. We have the original .ts files (that would be the input for any CCExtractor processing) and the same files with the DVB subtitles burned-in with FFMpeg in .mp4, which is very convenient to check timing. Also the .srt files with default CCExtractor settings (as of 0.85 prelease) and with -ignoreptsjumps are included.
The following link contains some recordings from Scandinavian countries. Teletext.
The following link contains some Russian samples. Teletext.
And this one, more (unclassified) Russian samples. Some seem to have DVB.
Multiprogram transport streams. multiprogram_spanish.ts is quite interesting in that there's a mix of DVB and teletext plus TV and radio channels.
Korean samples. See this issue in GitHub for details.
The following link is a TV show with both regular closed captions and burned-in subtitles (in English, when the characters speak in Russian). This is the original unedited transport stream, with commercials. For development purposes only.
ccextractor_bugs_allcaps_29fps_leftjustify.m2ts dvb-sub captions containing multiple lines of text.
big_buck_bunny_eac3_4.m2ts DVB-sub captions which prior versions of ccextractor failed to extract.
channel5-2018-02-12.ts A recording from Channel 5 (UK). Forget about the content itself (it's in the middle of two random programs). The important thing is that we do a terrible job with the OCR.
DJI_0019.MP4 A recording (.mp4) from a Drone in which the telemetry data is saved as subtitles.