Command line usage
CCExtractor's main program is console based. There's a GUI for Windows, as well as provisions so other programs can easily interface with CCExtractor, but the heavy lefting is done by a command line program (that can be called by scripts so integration with larger processes is straightforward).
Running CCExtractor without any parameter will display a help screen with all the options. As of version 0.88 the help screen is as follows:
1
CCExtractor 0.88, Carlos Fernandez Sanz, Volker Quetschke.
2
Teletext portions taken from Petr Kutalek's telxcc
3
--------------------------------------------------------------------------
4
Originally based on McPoodle's tools. Check his page for lots of information
5
on closed captions technical details.
6
(http://www.theneitherworld.com/mcpoodle/SCC_TOOLS/DOCS/SCC_TOOLS.HTML)
8
This tool home page:
9
http://www.ccextractor.org
10
Extracts closed captions and teletext subtitles from video streams.
11
(DVB, .TS, ReplayTV 4000 and 5000, dvr-ms, bttv, Tivo, Dish Network,
12
.mp4, HDHomeRun are known to work).
14
Syntax:
15
ccextractor [options] inputfile1 [inputfile2...] [-o outputfilename]
17
To see This Help Message: -h or --help
19
File name related options:
20
inputfile: file(s) to process
21
-o outputfilename: Use -o parameters to define output filename if you don't
22
like the default ones (same as infile plus _1 or _2 when
23
needed and file extension, e.g. .srt).
24
-stdout: Write output to stdout (console) instead of file. If
25
stdout is used, then -o can't be used. Also
26
-stdout will redirect all messages to stderr (error).
27
-pesheader: Dump the PES Header to stdout (console). This is
28
used for debugging purposes to see the contents
29
of each PES packet header.
30
-debugdvbsub: Write the DVB subtitle debug traces to console.
31
-ignoreptsjumps: Ignore PTS jumps (default).
32
-fixptsjumps: fix pts jumps. Use this parameter if you
33
experience timeline resets/jumps in the output.
34
-stdin: Reads input from stdin (console) instead of file.
35
Alternatively, - can be used instead of -stdin
36
You can pass as many input files as you need. They will be processed in order.
37
If a file name is suffixed by +, ccextractor will try to follow a numerical
38
sequence. For example, DVD001.VOB+ means DVD001.VOB, DVD002.VOB and so on
39
until there are no more files.
40
Output will be one single file (either raw or srt). Use this if you made your
41
recording in several cuts (to skip commercials for example) but you want one
42
subtitle file with contiguous timing.
44
Output file segmentation:
45
-outinterval x output in interval of x seconds
46
--segmentonkeyonly -key: When segmenting files, do it only after a I frame
47
trying to behave like FFmpeg
49
Network support:
50
-udp port: Read the input via UDP (listening in the specified port)
51
instead of reading a file.
53
-udp [host:]port: Read the input via UDP (listening in the specified
54
port) instead of reading a file. Host can be a
55
hostname or IPv4 address. If host is not specified
56
then listens on the local host.
58
-udp [src@host:]port: Read the input via UDP (listening in the specified
59
port) instead of reading a file. Host and src can be a
60
hostname or IPv4 address. If host is not specified
61
then listens on the local host.
63
-sendto host[:port]: Sends data in BIN format to the server
64
according to the CCExtractor's protocol over
65
TCP. For IPv6 use [address]:port
66
-tcp port: Reads the input data in BIN format according to
67
CCExtractor's protocol, listening specified port on the
68
local host
69
-tcppassword password: Sets server password for new connections to
70
tcp server
71
-tcpdesc description: Sends to the server short description about
72
captions e.g. channel name or file name
73
Options that affect what will be processed:
74
-1, -2, -12: Output Field 1 data, Field 2 data, or both
75
(DEFAULT is -1)
76
Use --append to prevent overwriting of existing files. The output will be
77
appended instead.
78
-cc2: When in srt/sami mode, process captions in channel 2
79
instead of channel 1. Alternatively, -CC2 can also be used.
80
-svc --service N1[cs1],N2[cs2]...:
81
Enable CEA-708 (DTVCC) captions processing for the listed
82
services. The parameter is a comma delimited list
83
of services numbers, such as "1,2" to process the
84
primary and secondary language services.
85
Pass "all" to process all services found.
87
If captions in a service are stored in 16-bit encoding,
88
you can specify what charset or encoding was used. Pass
89
its name after service number (e.g. "1[EUC-KR],3" or
90
"all[EUC-KR]") and it will encode specified charset to
91
UTF-8 using iconv. See iconv documentation to check if
92
required encoding/charset is supported.
94
In general, if you want English subtitles you don't need to use these options
95
as they are broadcast in field 1, channel 1. If you want the second language
96
(usually Spanish) you may need to try -2, or -cc2, or both.
98
Input formats:
99
With the exception of McPoodle's raw format, which is just the closed
100
caption data with no other info, CCExtractor can usually detect the
101
input format correctly. To force a specific format:
103
-in=format
105
where format is one of these:
106
ts -> For Transport Streams.
107
ps -> For Program Streams.
108
es -> For Elementary Streams.
109
asf -> ASF container (such as DVR-MS).
110
wtv -> Windows Television (WTV)
111
bin -> CCExtractor's own binary format.
112
raw -> For McPoodle's raw files.
113
mp4 -> MP4/MOV/M4V and similar.
114
m2ts -> BDAV MPEG-2 Transport Stream
115
mkv -> Matroska container and WebM.
116
mxf -> Material Exchange Format (MXF).
117
-ts, -ps, -es, -mp4, -wtv, -mkv and -asf/--dvr-ms can be used as shorts.
119
Output formats:
121
-out=format
123
where format is one of these:
124
srt -> SubRip (default, so not actually needed).
125
ass/ssa -> SubStation Alpha.
126
ccd -> Scenarist Closed Caption Disassembly format
127
scc -> Scenarist Closed Caption format
128
webvtt -> WebVTT format
129
webvtt-full -> WebVTT format with styling
130
sami -> MS Synchronized Accesible Media Interface.
131
bin -> CC data in CCExtractor's own binary format.
132
raw -> CC data in McPoodle's Broadcast format.
133
dvdraw -> CC data in McPoodle's DVD format.
134
mcc -> CC data compressed using MacCaption Format.
135
txt -> Transcript (no time codes, no roll-up
136
captions, just the plain transcription.
137
ttxt -> Timed Transcript (transcription with time
138
info)
139
g608 -> Grid 608 format.
140
smptett -> SMPTE Timed Text (W3C TTML) format.
141
spupng -> Set of .xml and .png files for use with
142
dvdauthor's spumux.
143
See "Notes on spupng output format"
144
null -> Don't produce any file output
145
report -> Prints to stdout information about captions
146
in specified input. Don't produce any file
147
output
149
-srt, -dvdraw, -sami, -webvtt, -txt, -ttxt and -null can be used as shorts.
151
Options that affect how input files will be processed.
152
-gt --goptime: Use GOP for timing instead of PTS. This only applies
153
to Program or Transport Streams with MPEG2 data and
154
overrides the default PTS timing.
155
GOP timing is always used for Elementary Streams.
156
-nogt --nogoptime: Never use GOP timing (use PTS), even if ccextractor
157
detects GOP timing is the reasonable choice.
158
-fp --fixpadding: Fix padding - some cards (or providers, or whatever)
159
seem to send 0000 as CC padding instead of 8080. If you
160
get bad timing, this might solve it.
161
-90090: Use 90090 (instead of 90000) as MPEG clock frequency.
162
(reported to be needed at least by Panasonic DMR-ES15
163
DVD Recorder)
164
-ve --videoedited: By default, ccextractor will process input files in
165
sequence as if they were all one large file (i.e.
166
split by a generic, non video-aware tool. If you
167
are processing video hat was split with a editing
168
tool, use -ve so ccextractor doesn't try to rebuild
169
the original timing.
170
-s --stream [secs]: Consider the file as a continuous stream that is
171
growing as ccextractor processes it, so don't try
172
to figure out its size and don't terminate processing
173
when reaching the current end (i.e. wait for more
174
data to arrive). If the optional parameter secs is
175
present, it means the number of seconds without any
176
new data after which ccextractor should exit. Use
177
this parameter if you want to process a live stream
178
but not kill ccextractor externally.
179
Note: If -s is used then only one input file is
180
allowed.
181
-poc --usepicorder: Use the pic_order_cnt_lsb in AVC/H.264 data streams
182
to order the CC information. The default way is to
183
use the PTS information. Use this switch only when
184
needed.
185
-myth: Force MythTV code branch.
186
-nomyth: Disable MythTV code branch.
187
The MythTV branch is needed for analog captures where
188
the closed caption data is stored in the VBI, such as
189
those with bttv cards (Hauppage 250 for example). This
190
is detected automatically so you don't need to worry
191
about this unless autodetection doesn't work for you.
192
-wtvconvertfix: This switch works around a bug in Windows 7's built in
193
software to convert *.wtv to *.dvr-ms. For analog NTSC
194
recordings the CC information is marked as digital
195
captions. Use this switch only when needed.
196
-wtvmpeg2: Read the captions from the MPEG2 video stream rather
197
than the captions stream in WTV files
198
-pn --program-number: In TS mode, specifically select a program to process.
199
Not needed if the TS only has one. If this parameter
200
is not specified and CCExtractor detects more than one
201
program in the input, it will list the programs found
202
and terminate without doing anything, unless
203
-autoprogram (see below) is used.
204
-autoprogram: If there's more than one program in the stream, just use
205
the first one we find that contains a suitable stream.
206
-multiprogram: Uses multiple programs from the same input stream.
207
-datapid: Don't try to find out the stream for caption/teletext
208
data, just use this one instead.
209
-datastreamtype: Instead of selecting the stream by its PID, select it
210
by its type (pick the stream that has this type in
211
the PMT)
212
-streamtype: Assume the data is of this type, don't autodetect. This
213
parameter may be needed if -datapid or -datastreamtype
214
is used and CCExtractor cannot determine how to process
215
the stream. The value will usually be 2 (MPEG video) or
216
6 (MPEG private data).
217
-haup --hauppauge: If the video was recorder using a Hauppauge card, it
218
might need special processing. This parameter will
219
force the special treatment.
220
-mp4vidtrack: In MP4 files the closed caption data can be embedded in
221
the video track or in a dedicated CC track. If a
222
dedicated track is detected it will be processed instead
223
of the video track. If you need to force the video track
224
to be processed instead use this option.
225
-noautotimeref: Some streams come with broadcast date information. When
226
such data is available, CCExtractor will set its time
227
reference to the received data. Use this parameter if
228
you prefer your own reference. Note: Current this only
229
affects Teletext in timed transcript with -datets.
230
--noscte20: Ignore SCTE-20 data if present.
231
--webvtt-create-css: Create a separate file for CSS instead of inline.
232
-deblev: Enable debug so the calculated distance for each two
233
strings is displayed. The output includes both strings,
234
the calculated distance, the maximum allowed distance,
235
and whether the strings are ultimately considered
236
equivalent or not, i.e. the calculated distance is
237
less or equal than the max allowed..
238
-anvid --analyzevideo Analyze the video stream even if it's not used for
239
subtitles. This allows to provide video information.
240
--no-timestamp-map Use this flag to disable the X-TIMESTAMP-MAP header for WebVTT
241
Levenshtein distance:
243
When processing teletext files CCExtractor tries to correct typos by
244
comparing consecutive lines. If line N+1 is almost identical to line N except
245
for minor changes (plus next characters) then it assumes that line N that a
246
typo that was corrected in N+1. This is currently implemented in teletext
247
because it's where samples files that could benefit from this were available.
248
You can adjust, or disable, the algorithm settings with the following
249
parameters.
251
-nolevdist: Don't attempt to correct typos with Levenshtein distance.
252
-levdistmincnt value: Minimum distance we always allow regardless
253
of the length of the strings.Default 2.
254
This means that if the calculated distance
255
is 0,1 or 2, we consider the strings to be equivalent.
256
-levdistmaxpct value: Maximum distance we allow, as a percentage of
257
the shortest string length. Default 10%.0
258
For example, consider a comparison of one string of
259
30 characters and one of 60 characters. We want to
260
determine whether the first 30 characters of the longer
261
string are more or less the same as the shortest string,
262
i.e. whether the longest string is the shortest one
263
plus new characters and maybe some corrections. Since
264
the shortest string is 30 characters and the default
265
percentage is 10%, we would allow a distance of up
266
to 3 between the first 30 characters.
268
Options that affect what kind of output will be produced:
269
-chapters: (Experimental) Produces a chapter file from MP4 files.
270
Note that this must only be used with MP4 files,
271
for other files it will simply generate subtitles file.
272
-bom: Append a BOM (Byte Order Mark) to output files.
273
Note that most text processing tools in linux will not
274
like BOM.
275
This is the default in Windows builds.
276
-nobom: Do not append a BOM (Byte Order Mark) to output
277
files. Note that this may break files when using
278
Windows. This is the default in non-Windows builds.
279
-unicode: Encode subtitles in Unicode instead of Latin-1.
280
-utf8: Encode subtitles in UTF-8 (no longer needed.
281
because UTF-8 is now the default).
282
-latin1: Encode subtitles in Latin-1
283
-nofc --nofontcolor: For .srt/.sami/.vtt, don't add font color tags.
284
--nohtmlescape: For .srt/.sami/.vtt, don't covert html unsafe character
285
-nots --notypesetting: For .srt/.sami/.vtt, don't add typesetting tags.
286
-trim: Trim lines.
287
-dc --defaultcolor: Select a different default color (instead of
288
white). This causes all output in .srt/.smi/.vtt
289
files to have a font tag, which makes the files
290
larger. Add the color you want in RGB, such as
291
-dc #FF0000 for red.
292
-sc --sentencecap: Sentence capitalization. Use if you hate
293
ALL CAPS in subtitles.
294
--capfile -caf file: Add the contents of 'file' to the list of words
295
that must be capitalized. For example, if file
296
is a plain text file that contains
298
Tony
299
Alan
301
Whenever those words are found they will be written
302
exactly as they appear in the file.
303
Use one line per word. Lines starting with # are
304
considered comments and discarded.
306
--kf: Censors profane words from subtitles.
307
--profanity-file : Add the contents of to the list of words that.
308
must be censored. The content of , follows the
309
same syntax as for the capitalization file
310
-sbs --splitbysentence: Split output text so each frame contains a complete
311
sentence. Timings are adjusted based on number of
312
characters
313
. -unixts REF: For timed transcripts that have an absolute date
314
instead of a timestamp relative to the file start), use
315
this time reference (UNIX timestamp). 0 => Use current
316
system time.
317
ccextractor will automatically switch to transport
318
stream UTC timestamps when available.
319
-datets: In transcripts, write time as YYYYMMDDHHMMss,ms.
320
-sects: In transcripts, write time as ss,ms
321
-UCLA: Transcripts are generated with a specific format
322
that is convenient for a specific project, feel
323
free to play with it but be aware that this format
324
is really live - don't rely on its output format
325
not changing between versions.
326
-latrusmap Map Latin symbols to Cyrillic ones in special cases
327
of Russian Teletext files (issue #1086)
328
-xds: In timed transcripts, all XDS information will be saved
329
to the output file.
330
-lf: Use LF (UNIX) instead of CRLF (DOS, Windows) as line
331
terminator.
332
-df: For MCC Files, force dropframe frame count.
333
-autodash: Based on position on screen, attempt to determine
334
the different speakers and a dash (-) when each
335
of them talks (.srt/.vtt only, -trim required).
336
-xmltv mode: produce an XMLTV file containing the EPG data from
337
the source TS file. Mode: 1 = full output
338
2 = live output. 3 = both
339
-xmltvliveinterval x: interval of x seconds between writing live mode xmltv output.
340
-xmltvoutputinterval x: interval of x seconds between writing full file xmltv output.
341
-xmltvonlycurrent: Only print current events for xmltv output.
342
-sem: Create a .sem file for each output file that is open
343
and delete it on file close.
344
-dvblang: For DVB subtitles, select which language's caption
345
stream will be processed. e.g. 'eng' for English.
346
If there are multiple languages, only this specified
347
language stream will be processed (default).
348
-ocrlang: Manually select the name of the Tesseract .traineddata
349
file. Helpful if you want to OCR a caption stream of
350
one language with the data of another language.
351
e.g. '-dvblang chs -ocrlang chi_tra' will decode the
352
Chinese (Simplified) caption stream but perform OCR
353
using the Chinese (Traditional) trained data
354
This option is also helpful when the traineddata file
355
has non standard names that don't follow ISO specs
356
-quant mode: How to quantize the bitmap before passing it to tesseract
357
for OCR'ing.
358
0: Don't quantize at all.
359
1: Use CCExtractor's internal function (default).
360
2: Reduce distinct color count in image for faster results.
361
-oem: Select the OEM mode for Tesseract.
362
Available modes :
363
0: OEM_TESSERACT_ONLY - the fastest mode.
364
1: OEM_LSTM_ONLY - use LSTM algorithm for recognition.
365
2: OEM_TESSERACT_LSTM_COMBINED - both algorithms.
366
Default value depends on the tesseract version linked :
367
Tesseract v3 : default mode is 0,
368
Tesseract v4 : default mode is 1.
369
-mkvlang: For MKV subtitles, select which language's caption
370
stream will be processed. e.g. 'eng' for English.
371
Language codes can be either the 3 letters bibliographic
372
ISO-639-2 form (like "fre" for french) or a language
373
code followed by a dash and a country code for specialities
374
in languages (like "fre-ca" for Canadian French).
375
-nospupngocr When processing DVB don't use the OCR to write the text as
376
comments in the XML file.
377
-font: Specify the full path of the font that is to be used when
378
generating SPUPNG files. If not specified, you need to
379
have the default font installed (Helvetica for macOS, Calibri
380
for Windows, and Noto for other operating systems at their
381
) default location
382
) -italics: Specify the full path of the italics font that is to be used when
383
generating SPUPNG files. If not specified, you need to
384
have the default font installed (Helvetica Oblique for macOS, Calibri Italic
385
for Windows, and NotoSans Italic for other operating systems at their
386
) default location
387
)
388
Options that affect how ccextractor reads and writes (buffering):
389
-bi --bufferinput: Forces input buffering.
390
-nobi -nobufferinput: Disables input buffering.
391
-bs --buffersize val: Specify a size for reading, in bytes (suffix with K or
392
or M for kilobytes and megabytes). Default is 16M.
393
-koc: keep-output-close. If used then CCExtractor will close
394
the output file after writing each subtitle frame and
395
attempt to create it again when needed.
396
-ff --forceflush: Flush the file buffer whenever content is written.
398
Options that affect the built-in 608 closed caption decoder:
399
-dru: Direct Roll-Up. When in roll-up mode, write character by
400
character instead of line by line. Note that this
401
produces (much) larger files.
402
-noru --norollup: If you hate the repeated lines caused by the roll-up
403
emulation, you can have ccextractor write only one
404
line at a time, getting rid of these repeated lines.
405
-ru1 / ru2 / ru3: roll-up captions can consist of 2, 3 or 4 visible
406
lines at any time (the number of lines is part of
407
the transmission). If having 3 or 4 lines annoys
408
you you can use -ru to force the decoder to always
409
use 1, 2 or 3 lines. Note that 1 line is not
410
a real mode rollup mode, so CCExtractor does what
411
it can.
412
In -ru1 the start timestamp is actually the timestamp
413
of the first character received which is possibly more
414
accurate.
416
Options that affect timing:
417
-delay ms: For srt/sami/webvtt, add this number of milliseconds to
418
all times. For example, -delay 400 makes subtitles
419
appear 400ms late. You can also use negative numbers
420
to make subs appear early.
421
Notes on times: -startat and -endat times are used first, then -delay.
422
So if you use -srt -startat 3:00 -endat 5:00 -delay 120000, ccextractor will
423
generate a .srt file, with only data from 3:00 to 5:00 in the input file(s)
424
and then add that (huge) delay, which would make the final file start at
425
5:00 and end at 7:00.
427
Options that affect what segment of the input file(s) to process:
428
-startat time: Only write caption information that starts after the
429
given time.
430
Time can be seconds, MM:SS or HH:MM:SS.
431
For example, -startat 3:00 means 'start writing from
432
minute 3.
433
-endat time: Stop processing after the given time (same format as
434
-startat).
435
The -startat and -endat options are honored in all
436
output formats. In all formats with timing information
437
the times are unchanged.
438
-scr --screenfuls num: Write 'num' screenfuls and terminate processing.
440
Options that affect which codec is to be used have to be searched in input
441
If codec type is not selected then first elementary stream suitable for
442
subtitle is selected, please consider -teletext -noteletext override this
443
option.
444
-codec dvbsub select the dvb subtitle from all elementary stream,
445
if stream of dvb subtitle type is not found then
446
nothing is selected and no subtitle is generated
447
-nocodec dvbsub ignore dvb subtitle and follow default behaviour
448
-codec teletext select the teletext subtitle from elementary stream
449
-nocodec teletext ignore teletext subtitle
450
NOTE: option given in form -foo=bar ,-foo = bar and --foo=bar are invalid
451
valid option are only in form -foo bar
452
nocodec and codec parameter must not be same if found to be same
453
then parameter of nocodec is ignored, this flag should be passed
454
once, more then one are not supported yet and last parameter would
455
taken in consideration
456
Adding start and end credits:
457
CCExtractor can _try_ to add a custom message (for credits for example) at
458
the start and end of the file, looking for a window where there are no
459
captions. If there is no such window, then no text will be added.
460
The start window must be between the times given and must have enough time
461
to display the message for at least the specified time.
462
--startcreditstext txt: Write this text as start credits. If there are
463
several lines, separate them with the
464
characters \n, for example Line1\nLine 2.
465
--startcreditsnotbefore time: Don't display the start credits before this
466
time (S, or MM:SS). Default: 0
467
--startcreditsnotafter time: Don't display the start credits after this
468
time (S, or MM:SS). Default: 5:00
469
--startcreditsforatleast time: Start credits need to be displayed for at least
470
this time (S, or MM:SS). Default: 2
471
--startcreditsforatmost time: Start credits should be displayed for at most
472
this time (S, or MM:SS). Default: 5
473
--endcreditstext txt: Write this text as end credits. If there are
474
several lines, separate them with the
475
characters \n, for example Line1\nLine 2.
476
--endcreditsforatleast time: End credits need to be displayed for at least
477
this time (S, or MM:SS). Default: 2
478
--endcreditsforatmost time: End credits should be displayed for at most
479
this time (S, or MM:SS). Default: 5
481
Options that affect debug data:
482
-debug: Show lots of debugging output.
483
-608: Print debug traces from the EIA-608 decoder.
484
If you need to submit a bug report, please send
485
the output from this option.
486
-708: Print debug information from the (currently
487
in development) EIA-708 (DTV) decoder.
488
-goppts: Enable lots of time stamp output.
489
-xdsdebug: Enable XDS debug data (lots of it).
490
-vides: Print debug info about the analysed elementary
491
video stream.
492
-cbraw: Print debug trace with the raw 608/708 data with
493
time stamps.
494
-nosync: Disable the syncing code. Only useful for debugging
495
purposes.
496
-fullbin: Disable the removal of trailing padding blocks
497
when exporting to bin format. Only useful for
498
for debugging purposes.
499
-parsedebug: Print debug info about the parsed container
500
file. (Only for TS/ASF files at the moment.)
501
-parsePAT: Print Program Association Table dump.
502
-parsePMT: Print Program Map Table dump.
503
-dumpdef: Hex-dump defective TS packets.
504
-investigate_packets: If no CC packets are detected based on the PMT, try
505
to find data in all packets by scanning.
507
Teletext related options:
508
-tpage page: Use this page for subtitles (if this parameter
509
is not used, try to autodetect). In Spain the
510
page is always 888, may vary in other countries.
511
-tverbose: Enable verbose mode in the teletext decoder.
513
-teletext: Force teletext mode even if teletext is not detected.
514
If used, you should also pass -datapid to specify
515
the stream ID you want to process.
516
-noteletext: Disable teletext processing. This might be needed
517
for video streams that have both teletext packets
518
and CEA-608/708 packets (if teletext is processed
519
then CEA-608/708 processing is disabled).
521
Transcript customizing options:
522
-customtxt format: Use the passed format to customize the (Timed) Transcript
523
output. The format must be like this: 1100100 (7 digits).
524
These indicate whether the next things should be
525
displayed or not in the (timed) transcript. They
526
represent (in order):
527
- Display start time
528
- Display end time
529
- Display caption mode
530
- Display caption channel
531
- Use a relative timestamp ( relative to the sample)
532
- Display XDS info
533
- Use colors
534
Examples:
535
0000101 is the default setting for transcripts
536
1110101 is the default for timed transcripts
537
1111001 is the default setting for -ucla
538
Make sure you use this parameter after others that might
539
affect these settings (-out, -ucla, -xds, -txt,
540
-ttxt ...)
542
Communication with other programs and console output:
543
--gui_mode_reports: Report progress and interesting events to stderr
544
in a easy to parse format. This is intended to be
545
used by other programs. See docs directory for.
546
details.
547
--no_progress_bar: Suppress the output of the progress bar
548
-quiet: Don't write any message.
550
Notes on the CEA-708 decoder: While it is starting to be useful, it's
551
a work in progress. A number of things don't work yet in the decoder
552
itself, and many of the auxiliary tools (case conversion to name one)
553
won't do anything yet. Feel free to submit samples that cause problems
554
and feature requests.
556
Notes on spupng output format:
557
One .xml file is created per output field. A set of .png files are created in
558
a directory with the same base name as the corresponding .xml file(s), but with
559
a .d extension. Each .png file will contain an image representing one caption
560
and named subNNNN.png, starting with sub0000.png.
561
For example, the command:
562
ccextractor -out=spupng input.mpg
563
will create the files:
564
input.xml
565
input.d/sub0000.png
566
input.d/sub0001.png
567
...
568
The command:
569
ccextractor -out=spupng -o /tmp/output -12 input.mpg
570
will create the files:
571
/tmp/output_1.xml
572
/tmp/output_1.d/sub0000.png
573
/tmp/output_1.d/sub0001.png
574
...
575
/tmp/output_2.xml
576
/tmp/output_2.d/sub0000.png
577
/tmp/output_2.d/sub0001.png
578
...
580
Burned-in subtitle extraction:
581
-hardsubx : Enable the burned-in subtitle extraction subsystem.
583
NOTE: The following options will work only if -hardsubx is
584
specified before them:-
586
-tickertext : Search for burned-in ticker text at the bottom of
587
the screen.
589
-ocr_mode : Set the OCR mode to either frame-wise, word-wise
590
or letter wise.
591
e.g. -ocr_mode frame (default), -ocr_mode word,
592
-ocr_mode letter
594
-subcolor : Specify the color of the subtitles
595
Possible values are in the set
596
{white,yellow,green,cyan,blue,magenta,red}.
597
Alternatively, a custom hue value between 1 and 360
598
may also be specified.
599
e.g. -subcolor white or -subcolor 270 (for violet).
600
Refer to an HSV color chart for values.
602
-min_sub_duration : Specify the minimum duration that a subtitle line
603
must exist on the screen.
604
The value is specified in seconds.
605
A lower value gives better results, but takes more
606
processing time.
607
The recommended value is 0.5 (default).
608
e.g. -min_sub_duration 1.0 (for a duration of 1 second)
610
-detect_italics : Specify whether italics are to be detected from the
611
OCR text.
612
Italic detection automatically enforces the OCR mode
613
to be word-wise
614
-conf_thresh : Specify the classifier confidence threshold between
615
1 and 100.
616
Try and use a threshold which works for you if you get
617
a lot of garbage text.
618
e.g. -conf_thresh 50
620
-whiteness_thresh : For white subtitles only, specify the luminance
621
threshold between 1 and 100
622
This threshold is content dependent, and adjusting
623
values may give you better results
624
Recommended values are in the range 80 to 100.
625
The default value is 95
627
An example command for burned-in subtitle extraction is as follows:
628
ccextractor video.mp4 -hardsubx -subcolor white -detect_italics
629
-whiteness_thresh 90 -conf_thresh 60
632
--version : Display current CCExtractor version and detailed information.
633
Error: (This help screen was shown because there were no input files)
635
Issues? Open a ticket here
636
https://github.com/CCExtractor/ccextractor/issues
not set