Command line usage
CCExtractor's main program is console based. There's a GUI for Windows, as well as provisions so other programs can easily interface with CCExtractor, but the heavy lefting is done by a command line program (that can be called by scripts so integration with larger processes is straightforward).
Running CCExtractor without any parameter will display a help screen with all the options. As of version 0.88 the help screen is as follows:
1
CCExtractor 1.0, Carlos Fernandez Sanz, Volker Quetschke..
2
Teletext portions taken from Petr Kutalek's telxcc
3
--------------------------------------------------------------------------
4
Originally based on McPoodle's tools. Check his page for lots of information
5
on closed captions technical details.
6
(http://www.theneitherworld.com/mcpoodle/SCC_TOOLS/DOCS/SCC_TOOLS.HTML)
8
This tool home page:
9
http://www.ccextractor.org
10
Extracts closed captions and teletext subtitles from video streams.
11
(DVB, .TS, ReplayTV 4000 and 5000, dvr-ms, bttv, Tivo, Dish Network,
12
.mp4, HDHomeRun are known to work).
14
Syntax:
15
ccextractor [options] inputfile1 [inputfile2...] [-o outputfilename]
17
Arguments:
18
[inputfile]...
19
file(s) to process
21
Options:
22
-h, --help
23
Print help (see a summary with '-h')
25
-V, --version
26
Print version
28
File name related options:
29
-o
30
Use -o parameters to define output filename if you don't
31
like the default ones (same as infile plus _1 or _2 when
32
needed and file extension, e.g. .srt).
34
--stdout
35
Write output to stdout (console) instead of file. If
36
stdout is used, then -o can't be used. Also
37
--stdout will redirect all messages to stderr (error).
39
--pesheader
40
Dump the PES Header to stdout (console). This is
41
used for debugging purposes to see the contents
42
of each PES packet header.
44
--debugdvbsub
45
Write the DVB subtitle debug traces to console
47
--ignoreptsjumps
48
Ignore PTS jumps (default)
50
--fixptsjumps
51
fix pts jumps. Use this parameter if you
52
experience timeline resets/jumps in the output.
54
--stdin
55
Reads input from stdin (console) instead of file.
56
Alternatively, - can be used instead of --stdin
58
Output File Segmentation:
59
--outinterval
62
--segmentonkeyonly
63
When segmenting files, do it only after a I frame
64
trying to behave like FFmpeg
66
Network support:
67
--udp <[[src@]host:]port>
68
Read the input via UDP (listening in the specified port)
69
instead of reading a file.
70
Host and src can be a hostname or IPv4 address.
71
If host is not specified then listens on the local host.
73
--src
74
Can be a hostname or IPv4 address.
76
--sendto
77
Sends data in BIN format to the server
78
according to the CCExtractor's protocol over
79
TCP. For IPv6 use [address] instead
81
--sendto-port
82
Specfies optional port for sendto
84
--tcp
85
Reads the input da`ta in BIN format according to
86
CCExtractor's protocol, listening specified port on the
87
local host
89
--tcp-password
90
Sets server password for new connections to
91
tcp server
93
--tcp-description
94
Sends to the server short description about
95
captions e.g. channel name or file name
97
Options that affect what will be processed:
98
--output-field
99
Values: 1 = Output Field 1
100
2 = Output Field 2
101
both = Both Output Field 1 and 2
102
Defaults to 1
104
--append
105
Use --append to prevent overwriting of existing files. The output will be
106
appended instead.
108
--cc2
109
When in srt/sami mode, process captions in channel 2
110
instead of channel 1.
112
--service
113
Enable CEA-708 (DTVCC) captions processing for the listed
114
services. The parameter is a comma delimited list
115
of services numbers, such as "1,2" to process the
116
primary and secondary language services.
117
Pass "all" to process all services found.
118
If captions in a service are stored in 16-bit encoding,
119
you can specify what charset or encoding was used. Pass
120
its name after service number (e.g. "1[EUC-KR],3" or
121
"all[EUC-KR]") and it will encode specified charset to
122
UTF-8 using iconv. See iconv documentation to check if
123
required encoding/charset is supported.
125
Input Formats:
126
--input
127
With the exception of McPoodle's raw format, which is just the closed
128
caption data with no other info, CCExtractor can usually detect the
129
input format correctly. Use this parameter to override the detected
131
Possible values:
132
- ts: For Transport Streams
133
- ps: For Program Streams
134
- es: For Elementary Streams
135
- asf: ASF container (such as DVR-MS)
136
- wtv: Windows Television (WTV)
137
- bin: CCExtractor's own binary format
138
- raw: For McPoodle's raw files
139
- mp4: MP4/MOV/M4V and similar
140
- m2ts: BDAV MPEG-2 Transport Stream
141
- mkv: Matroska container and WebM
142
- mxf: Material Exchange Format (MXF)
144
Output Formats:
145
--out
146
Possible values:
147
- srt: SubRip (default, so not actually needed)
148
- ass: SubStation Alpha
149
- ssa: SubStation Alpha
150
- ccd: Scenarist Closed Caption Disassembly format
151
- scc: Scenarist Closed Caption format
152
- webvtt: WebVTT format
153
- webvtt-full: WebVTT format with styling
154
- sami: MS Synchronized Accesible Media Interface
155
- bin: CC data in CCExtractor's own binary format
156
- raw: CC data in McPoodle's Broadcast format
157
- dvdraw: CC data in McPoodle's DVD format
158
- mcc: CC data compressed using MacCaption Format
159
- txt: Transcript (no time codes, no roll-up captions, just the plain transcription)
160
- ttxt: Timed Transcript (transcription with time info)
161
- g608: Grid 608 format
162
- smptett: SMPTE Timed Text (W3C TTML) format
163
- spupng: Set of .xml and .png files for use with dvdauthor's spumux. See "Notes on spupng output format"
164
- null: Don't produce any file output
165
- report: Prints to stdout information about captions in specified input. Don't produce any file output
166
- simple-xml
168
Options that affect how input files will be processed:
169
--goptime
170
Use GOP for timing instead of PTS. This only applies
171
to Program or Transport Streams with MPEG2 data and
172
overrides the default PTS timing.
173
GOP timing is always used for Elementary Streams.
175
--no-goptime
176
Never use GOP timing (use PTS), even if ccextractor
177
detects GOP timing is the reasonable choice.
179
--fixpadding
180
Fix padding - some cards (or providers, or whatever)
181
seem to send 0000 as CC padding instead of 8080. If you
182
get bad timing, this might solve it.
184
--90090
185
Use 90090 (instead of 90000) as MPEG clock frequency.
186
(reported to be needed at least by Panasonic DMR-ES15
187
DVD Recorder)
189
--videoedited
190
By default, ccextractor will process input files in
191
sequence as if they were all one large file (i.e.
192
split by a generic, non video-aware tool. If you
193
are processing video hat was split with a editing
194
tool, use --videoedited so ccextractor doesn't try to rebuild
195
the original timing.
197
-s, --stream
198
Consider the file as a continuous stream that is
199
growing as ccextractor processes it, so don't try
200
to figure out its size and don't terminate processing
201
when reaching the current end (i.e. wait for more
202
data to arrive). If the optional parameter secs is
203
present, it means the number of seconds without any
204
new data after which ccextractor should exit. Use
205
this parameter if you want to process a live stream
206
but not kill ccextractor externally.
207
Note: If --s is used then only one input file is
208
allowed.
210
--usepicorder
211
Use the pic_order_cnt_lsb in AVC/H.264 data streams
212
to order the CC information. The default way is to
213
use the PTS information. Use this switch only when
214
needed.
216
--myth
217
Force MythTV code branch.
219
--no-myth
220
Disable MythTV code branch.
221
The MythTV branch is needed for analog captures where
222
the closed caption data is stored in the VBI, such as
223
those with bttv cards (Hauppage 250 for example). This
224
is detected automatically so you don't need to worry
225
about this unless autodetection doesn't work for you.
227
--wtvconvertfix
228
This switch works around a bug in Windows 7's built in
229
software to convert *.wtv to *.dvr-ms. For analog NTSC
230
recordings the CC information is marked as digital
231
captions. Use this switch only when needed.
233
--wtvmpeg2
234
Read the captions from the MPEG2 video stream rather
235
than the captions stream in WTV files
237
--program-number
238
In TS mode, specifically select a program to process.
239
Not needed if the TS only has one. If this parameter
240
is not specified and CCExtractor detects more than one
241
program in the input, it will list the programs found
242
and terminate without doing anything, unless
243
--autoprogram (see below) is used.
245
--autoprogram
246
If there's more than one program in the stream, just use
247
the first one we find that contains a suitable stream.
249
--multiprogram
250
Uses multiple programs from the same input stream.
252
-L, --list-tracks
253
List all tracks found in the input file and exit without
254
processing. Useful for exploring media files before extraction.
256
--datapid
257
Don't try to find out the stream for caption/teletext
258
data, just use this one instead.
260
--datastreamtype
261
Instead of selecting the stream by its PID, select it
262
by its type (pick the stream that has this type in
263
the PMT)
265
--streamtype
266
Assume the data is of this type, don't autodetect. This
267
parameter may be needed if --datapid or --datastreamtype
268
is used and CCExtractor cannot determine how to process
269
the stream. The value will usually be 2 (MPEG video) or
270
6 (MPEG private data).
272
--hauppauge
273
If the video was recorder using a Hauppauge card, it
274
might need special processing. This parameter will
275
force the special treatment.
277
--mp4vidtrack
278
In MP4 files the closed caption data can be embedded in
279
the video track or in a dedicated CC track. If a
280
dedicated track is detected it will be processed instead
281
of the video track. If you need to force the video track
282
to be processed instead use this option.
284
--no-autotimeref
285
Some streams come with broadcast date information. When
286
such data is available, CCExtractor will set its time
287
reference to the received data. Use this parameter if
288
you prefer your own reference. Note: Current this only
289
affects Teletext in timed transcript with --datets.
291
--no-scte20
292
Ignore SCTE-20 data if present.
294
--webvtt-create-css
295
Create a separate file for CSS instead of inline.
297
--deblev
298
Enable debug so the calculated distance for each two
299
strings is displayed. The output includes both strings,
300
the calculated distance, the maximum allowed distance,
301
and whether the strings are ultimately considered
302
equivalent or not, i.e. the calculated distance is
303
less or equal than the max allowed.
305
--analyzevideo
306
Analyze the video stream even if it's not used for
307
subtitles. This allows to provide video information.
309
--timestamp-map
310
Enable the X-TIMESTAMP-MAP header for WebVTT (HLS)
312
Levenshtein distance:
313
--no-levdist
314
Don't attempt to correct typos with Levenshtein distance.
316
--levdistmincnt
317
Minimum distance we always allow regardless
318
of the length of the strings.Default 2.
319
This means that if the calculated distance
320
is 0,1 or 2, we consider the strings to be equivalent.
322
--levdistmaxpct
323
Maximum distance we allow, as a percentage of
324
the shortest string length. Default 10%.0
325
For example, consider a comparison of one string of
326
30 characters and one of 60 characters. We want to
327
determine whether the first 30 characters of the longer
328
string are more or less the same as the shortest string,
329
i.e. whether the longest string is the shortest one
330
plus new characters and maybe some corrections. Since
331
the shortest string is 30 characters and the default
332
percentage is 10%, we would allow a distance of up
333
to 3 between the first 30 characters.
335
Options that affect what kind of output will be produced:
336
--chapters
337
(Experimental) Produces a chapter file from MP4 files.
338
Note that this must only be used with MP4 files,
339
for other files it will simply generate subtitles file.
341
--bom
342
Append a BOM (Byte Order Mark) to output files.
343
Note that most text processing tools in linux will not
344
like BOM.
346
--no-bom
347
Do not append a BOM (Byte Order Mark) to output
348
files. Note that this may break files when using
349
Windows. This is the default in non-Windows builds.
351
--unicode
352
Encode subtitles in Unicode instead of Latin-1.
354
--utf8
355
Encode subtitles in UTF-8 (no longer needed.
356
because UTF-8 is now the default).
358
--latin1
359
Encode subtitles in Latin-1
361
--no-fontcolor
362
For .srt/.sami/.vtt, don't add font color tags.
364
--no-htmlescape
365
For .srt/.sami/.vtt, don't covert html unsafe character
367
--no-typesetting
368
For .srt/.sami/.vtt, don't add typesetting tags.
370
--trim
371
Trim lines.
373
--defaultcolor
374
Select a different default color (instead of
375
white). This causes all output in .srt/.smi/.vtt
376
files to have a font tag, which makes the files
377
larger. Add the color you want in RGB, such as
378
--defaultcolor #FF0000 for red.
380
--sentencecap
381
Sentence capitalization. Use if you hate
382
ALL CAPS in subtitles.
384
--capfile
385
Add the contents of 'file' to the list of words
386
that must be capitalized. For example, if file
387
is a plain text file that contains
389
Tony
390
Alan
392
Whenever those words are found they will be written
393
exactly as they appear in the file.
394
Use one line per word. Lines starting with # are
395
considered comments and discarded.
397
--kf
398
Censors profane words from subtitles.
400
--profanity-file
401
Add the contents of to the list of words that.
402
must be censored. The content of , follows the
403
same syntax as for the capitalization file
405
--splitbysentence
406
Split output text so each frame contains a complete
407
sentence. Timings are adjusted based on number of
408
characters
410
--unixts
411
For timed transcripts that have an absolute date
412
instead of a timestamp relative to the file start), use
413
this time reference (UNIX timestamp). 0 => Use current
414
system time.
415
ccextractor will automatically switch to transport
416
stream UTC timestamps when available.
418
--datets
419
In transcripts, write time as YYYYMMDDHHMMss,ms.
421
--sects
422
In transcripts, write time as ss,ms
424
--ucla
425
Transcripts are generated with a specific format
426
that is convenient for a specific project, feel
427
free to play with it but be aware that this format
428
is really live - don't rely on its output format
429
not changing between versions.
431
--latrusmap
432
Map Latin symbols to Cyrillic ones in special cases
433
of Russian Teletext files (issue #1086)
435
--ttxtforcelatin
436
Force Latin G0 charset for Teletext, ignoring any Cyrillic
437
designation in the stream. Use when broadcasts incorrectly
438
signal Cyrillic but content is Latin (issue #1395)
440
--xds
441
In timed transcripts, all XDS information will be saved
442
to the output file.
444
--lf
445
Use LF (UNIX) instead of CRLF (DOS, Windows) as line
446
terminator.
448
--df
449
For MCC Files, force dropframe frame count.
451
--autodash
452
Based on position on screen, attempt to determine
453
the different speakers and a dash (-) when each
454
of them talks (.srt/.vtt only, --trim required).
456
--xmltv
457
produce an XMLTV file containing the EPG data from
458
the source TS file. Mode: 1 = full output
459
2 = live output. 3 = both
461
--xmltvliveinterval
462
interval of x seconds between writing live mode xmltv output.
464
--xmltvoutputinterval
465
interval of x seconds between writing full file xmltv output.
467
--xmltvonlycurrent
468
Only print current events for xmltv output.
470
--sem
471
Create a .sem file for each output file that is open
472
and delete it on file close.
474
--dvblang
475
For DVB subtitles, select which language's caption
476
stream will be processed. e.g. 'eng' for English.
477
If there are multiple languages, only this specified
478
language stream will be processed (default).
480
--ocrlang
481
Manually select the name of the Tesseract .traineddata
482
file. Helpful if you want to OCR a caption stream of
483
one language with the data of another language.
484
e.g. '-dvblang chs --ocrlang chi_tra' will decode the
485
Chinese (Simplified) caption stream but perform OCR
486
using the Chinese (Traditional) trained data
487
This option is also helpful when the traineddata file
488
has non standard names that don't follow ISO specs
490
--quant
491
How to quantize the bitmap before passing it to tesseract
492
for OCR'ing.
493
0: Don't quantize at all.
494
1: Use CCExtractor's internal function (default).
495
2: Reduce distinct color count in image for faster results.
497
--oem
498
Select the OEM mode for Tesseract.
499
Available modes :
500
0: OEM_TESSERACT_ONLY - the fastest mode.
501
1: OEM_LSTM_ONLY - use LSTM algorithm for recognition.
502
2: OEM_TESSERACT_LSTM_COMBINED - both algorithms.
503
Default value depends on the tesseract version linked :
504
Tesseract v3 : default mode is 0,
505
Tesseract v4 : default mode is 1.
507
--psm
508
Select the PSM mode for Tesseract.
509
Available Page segmentation modes:
510
0 Orientation and script detection (OSD) only.
511
1 Automatic page segmentation with OSD.
512
2 Automatic page segmentation, but no OSD, or OCR.
513
3 Fully automatic page segmentation, but no OSD. (Default)
514
4 Assume a single column of text of variable sizes.
515
5 Assume a single uniform block of vertically aligned text.
516
6 Assume a single uniform block of text.
517
7 Treat the image as a single text line.
518
8 Treat the image as a single word.
519
9 Treat the image as a single word in a circle.
520
10 Treat the image as a single character.
521
11 Sparse text. Find as much text as possible in no particular order.
522
12 Sparse text with OSD.
523
13 Raw line. Treat the image as a single text line,
524
bypassing hacks that are Tesseract-specific.
526
--mkvlang
527
For MKV subtitles, select which language's caption
528
stream will be processed. e.g. 'eng' for English.
529
Language codes can be either the 3 letters bibliographic
530
ISO-639-2 form (like "fre" for french) or a language
531
code followed by a dash and a country code for specialities
532
in languages (like "fre-ca" for Canadian French).
534
--no-spupngocr
535
When processing DVB don't use the OCR to write the text as
536
comments in the XML file.
538
--font
539
Specify the full path of the font that is to be used when
540
generating SPUPNG files. If not specified, you need to
541
have the default font installed (Helvetica for macOS, Calibri
542
for Windows, and Noto for other operating systems at their
543
default location)
545
--italics
546
Specify the full path of the italics font that is to be used when
547
generating SPUPNG files. If not specified, you need to
548
have the default font installed (Helvetica Oblique for macOS, Calibri Italic
549
for Windows, and NotoSans Italic for other operating systems at their
550
default location)
552
Options that affect how ccextractor reads and writes (buffering):
553
--bufferinput
554
Forces input buffering.
556
--no-bufferinput
557
Disables input buffering.
559
--buffersize
560
Specify a size for reading, in bytes (suffix with K or
561
or M for kilobytes and megabytes). Default is 16M.
563
--koc
564
keep-output-close. If used then CCExtractor will close
565
the output file after writing each subtitle frame and
566
attempt to create it again when needed.
568
--forceflush
569
Flush the file buffer whenever content is written.
571
--dru
572
Direct Roll-Up. When in roll-up mode, write character by
573
character instead of line by line. Note that this
574
produces (much) larger files.
576
--no-rollup
577
If you hate the repeated lines caused by the roll-up
578
emulation, you can have ccextractor write only one
579
line at a time, getting rid of these repeated lines.
581
--ru1
582
roll-up captions can consist of 2, 3 or 4 visible
583
lines at any time (the number of lines is part of
584
the transmission). If having 3 or 4 lines annoys
585
you you can use --ru to force the decoder to always
586
use 1, 2 or 3 lines. Note that 1 line is not
587
a real mode rollup mode, so CCExtractor does what
588
it can.
589
In --ru1 the start timestamp is actually the timestamp
590
of the first character received which is possibly more
591
accurate.
593
--ru2
596
--ru3
599
Options that affect timing:
600
--delay
601
For srt/sami/webvtt, add this number of milliseconds to
602
all times. For example, --delay 400 makes subtitles
603
appear 400ms late. You can also use negative numbers
604
to make subs appear early.
606
Options that affect what segment of the input file(s) to process:
607
--startat
608
Only write caption information that starts after the
609
given time.
610
Time can be seconds, MM:SS or HH:MM:SS.
611
For example, --startat 3:00 means 'start writing from
612
minute 3.
614
--endat
615
Stop processing after the given time (same format as
616
--startat).
617
The --startat and --endat options are honored in all
618
output formats. In all formats with timing information
619
the times are unchanged.
621
--screenfuls
622
Write 'num' screenfuls and terminate processing.
624
Options that affect which codec is to be used have to be searched in input:
625
--codec
626
--codec dvbsub
627
select the dvb subtitle from all elementary stream,
628
if stream of dvb subtitle type is not found then
629
nothing is selected and no subtitle is generated
630
--codec teletext
631
select the teletext subtitle from elementary stream
633
[possible values: dvbsub, teletext]
635
--no-codec
636
--no-codec dvbsub
637
ignore dvb subtitle and follow default behaviour
638
--no-codec teletext
639
ignore teletext subtitle
641
[possible values: dvbsub, teletext]
643
Adding start and end credits:
644
--startcreditstext
645
Write this text as start credits. If there are
646
several lines, separate them with the
647
characters \n, for example Line1\nLine 2.
649
--startcreditsnotbefore
650
Don't display the start credits before this
651
time (S, or MM:SS). Default: 0
653
--startcreditsnotafter
654
Don't display the start credits after this
655
time (S, or MM:SS). Default: 5:00
657
--startcreditsforatleast
658
Start credits need to be displayed for at least
659
this time (S, or MM:SS). Default: 2
661
--startcreditsforatmost
662
Start credits should be displayed for at most
663
this time (S, or MM:SS). Default: 5
665
--endcreditstext
666
Write this text as end credits. If there are
667
several lines, separate them with the
668
characters \n, for example Line1\nLine 2.
670
--endcreditsforatleast
671
End credits need to be displayed for at least
672
this time (S, or MM:SS). Default: 2
674
--endcreditsforatmost
675
End credits should be displayed for at most
676
this time (S, or MM:SS). Default: 5
678
Options that affect debug data:
679
--debug
680
Show lots of debugging output.
682
--608
683
Print debug traces from the EIA-608 decoder.
684
If you need to submit a bug report, please send
685
the output from this option.
687
--708
688
Print debug information from the (currently
689
in development) EIA-708 (DTV) decoder.
691
--goppts
692
Enable lots of time stamp output.
694
--xdsdebug
695
Enable XDS debug data (lots of it).
697
--vides
698
Print debug info about the analysed elementary
699
video stream.
701
--cbraw
702
Print debug trace with the raw 608/708 data with
703
time stamps.
705
--no-sync
706
Disable the syncing code. Only useful for debugging
707
purposes.
709
--fullbin
710
Disable the removal of trailing padding blocks
711
when exporting to bin format. Only useful for
712
for debugging purposes.
714
--parsedebug
715
Print debug info about the parsed container
716
file. (Only for TS/ASF files at the moment.)
718
--parsePAT
719
Print Program Association Table dump.
721
--parsePMT
722
Print Program Map Table dump.
724
--dumpdef
725
Hex-dump defective TS packets.
727
--investigate-packets
728
If no CC packets are detected based on the PMT, try
729
to find data in all packets by scanning.
731
Teletext related options:
732
--tpage
733
Use this page for subtitles (if this parameter
734
is not used, try to autodetect). In Spain the
735
page is always 888, may vary in other countries.
737
--tverbose
738
Enable verbose mode in the teletext decoder.
740
--teletext
741
Force teletext mode even if teletext is not detected.
742
If used, you should also pass --datapid to specify
743
the stream ID you want to process.
745
--no-teletext
746
Disable teletext processing. This might be needed
747
for video streams that have both teletext packets
748
and CEA-608/708 packets (if teletext is processed
749
then CEA-608/708 processing is disabled).
751
Transcript customizing options:
752
--customtxt
753
Use the passed format to customize the (Timed) Transcript
754
output. The format must be like this: 1100100 (7 digits).
755
These indicate whether the next things should be
756
displayed or not in the (timed) transcript. They
757
represent (in order):
758
- Display start time
759
- Display end time
760
- Display caption mode
761
- Display caption channel
762
- Use a relative timestamp ( relative to the sample)
763
- Display XDS info
764
- Use colors
765
Examples:
766
0000101 is the default setting for transcripts
767
1110101 is the default for timed transcripts
768
1111001 is the default setting for --ucla
769
Make sure you use this parameter after others that might
770
affect these settings (--out, --ucla, --xds, --txt,
771
--ttxt ...)
773
Communication with other programs and console output:
774
--gui-mode-reports
775
Report progress and interesting events to stderr
776
in a easy to parse format. This is intended to be
777
used by other programs. See docs directory for.
778
details.
780
--no-progress-bar
781
Suppress the output of the progress bar
783
--quiet
784
Don't write any message.
786
Burned-in subtitle extraction:
787
--hardsubx
788
Enable the burned-in subtitle extraction subsystem.
790
NOTE: This is needed to use the below burned-in
791
subtitle extractor options
793
--tickertext
794
Search for burned-in ticker text at the bottom of
795
the screen.
797
--ocr-mode
798
Set the OCR mode to either frame-wise, word-wise
799
or letter wise.
800
e.g. --ocr-mode frame (default), --ocr-mode word,
801
--ocr-mode letter
803
--subcolor
804
Specify the color of the subtitles
805
Possible values are in the set
806
{white,yellow,green,cyan,blue,magenta,red}.
807
Alternatively, a custom hue value between 1 and 360
808
may also be specified.
809
e.g. --subcolor white or --subcolor 270 (for violet).
810
Refer to an HSV color chart for values.
812
--min-sub-duration
813
Specify the minimum duration that a subtitle line
814
must exist on the screen.
815
The value is specified in seconds.
816
A lower value gives better results, but takes more
817
processing time.
818
The recommended value is 0.5 (default).
819
e.g. --min-sub-duration 1.0 (for a duration of 1 second)
821
--detect-italics
822
Specify whether italics are to be detected from the
823
OCR text.
824
Italic detection automatically enforces the OCR mode
825
to be word-wise
827
--conf-thresh
828
Specify the classifier confidence threshold between
829
1 and 100.
830
Try and use a threshold which works for you if you get
831
a lot of garbage text.
832
e.g. --conf-thresh 50
834
--whiteness-thresh
835
For white subtitles only, specify the luminance
836
threshold between 1 and 100
837
This threshold is content dependent, and adjusting
838
values may give you better results
839
Recommended values are in the range 80 to 100.
840
The default value is 95
842
--hcc
843
This option will be used if the file should have both
844
closed captions and burned in subtitles
846
An example command for burned-in subtitle extraction is as follows:
847
ccextractor video.mp4 --hardsubx --subcolor white --detect-italics --whiteness-thresh 90 --conf-thresh 60
849
Notes on File name related options:
850
You can pass as many input files as you need. They will be processed in order.
851
If a file name is suffixed by +, ccextractor will try to follow a numerical
852
sequence. For example, DVD001.VOB+ means DVD001.VOB, DVD002.VOB and so on
853
until there are no more files.
854
Output will be one single file (either raw or srt). Use this if you made your
855
recording in several cuts (to skip commercials for example) but you want one
856
subtitle file with contiguous timing.
858
Notes on Options that affect what will be processed:
859
In general, if you want English subtitles you don't need to use these options
860
as they are broadcast in field 1, channel 1. If you want the second language
861
(usually Spanish) you may need to try -2, or -cc2, or both.
863
Notes on Levenshtein distance:
864
When processing teletext files CCExtractor tries to correct typos by
865
comparing consecutive lines. If line N+1 is almost identical to line N except
866
for minor changes (plus next characters) then it assumes that line N that a
867
typo that was corrected in N+1. This is currently implemented in teletext
868
because it's where samples files that could benefit from this were available.
869
You can adjust, or disable, the algorithm settings with the following
870
parameters.
872
Notes on times:
873
--startat and --endat times are used first, then -delay.
874
So if you use --srt -startat 3:00 --endat 5:00 --delay 120000, ccextractor will
875
generate a .srt file, with only data from 3:00 to 5:00 in the input file(s)
876
and then add that (huge) delay, which would make the final file start at
877
5:00 and end at 7:00.
879
Notes on codec options:
880
If codec type is not selected then first elementary stream suitable for
881
subtitle is selected, please consider --teletext -noteletext override this
882
option.
883
no-codec and codec parameter must not be same if found to be same
884
then parameter of no-codec is ignored, this flag should be passed
885
once, more then one are not supported yet and last parameter would
886
taken in consideration
888
Notes on adding credits:
889
CCExtractor can _try_ to add a custom message (for credits for example) at
890
the start and end of the file, looking for a window where there are no
891
captions. If there is no such window, then no text will be added.
892
The start window must be between the times given and must have enough time
893
to display the message for at least the specified time.
895
Notes on the CEA-708 decoder:
896
By default, ccextractor now extracts both CEA-608 and CEA-708 subtitles
897
if they are present in the input. This results in two output files: one
898
for CEA-608 and one for CEA-708.
899
To extract only CEA-608 subtitles, use -1, -2, or -12.
900
To extract only CEA-708 subtitles, use -svc.
901
To extract both CEA-608 and CEA-708 subtitles, use both -1/-2/-12 and -svc.
902
While it is starting to be useful, it's
903
a work in progress. A number of things don't work yet in the decoder
904
itself, and many of the auxiliary tools (case conversion to name one)
905
won't do anything yet. Feel free to submit samples that cause problems
906
and feature requests.
908
Notes on spupng output format:
909
One .xml file is created per output field. A set of .png files are created in
910
a directory with the same base name as the corresponding .xml file(s), but with
911
a .d extension. Each .png file will contain an image representing one caption
912
and named subNNNN.png, starting with sub0000.png.
913
For example, the command:
914
ccextractor --out=spupng input.mpg
915
will create the files:
916
input.xml
917
input.d/sub0000.png
918
input.d/sub0001.png
919
...
920
The command:
921
ccextractor --out=spupng -o /tmp/output --output-field both input.mpg
922
will create the files:
923
/tmp/output_1.xml
924
/tmp/output_1.d/sub0000.png
925
/tmp/output_1.d/sub0001.png
926
...
927
/tmp/output_2.xml
928
/tmp/output_2.d/sub0000.png
929
/tmp/output_2.d/sub0001.png
930
...
not set