Command line usage
CCExtractor's main program is console based. There's a GUI for Windows, as well as provisions so other programs can easily interface with CCExtractor, but the heavy lefting is done by a command line program (that can be called by scripts so integration with larger processes is straightforward).
Running CCExtractor without any parameter will display a help screen with all the options. As of version 0.96.5 the help screen is as follows:
1
CCExtractor 0.96.5, Carlos Fernandez Sanz, Volker Quetschke..
2
Teletext portions taken from Petr Kutalek's telxcc
3
--------------------------------------------------------------------------
4
Originally based on McPoodle's tools. Check his page for lots of information
5
on closed captions technical details.
6
(http://www.theneitherworld.com/mcpoodle/SCC_TOOLS/DOCS/SCC_TOOLS.HTML)
8
This tool home page:
9
http://www.ccextractor.org
10
Extracts closed captions and teletext subtitles from video streams.
11
(DVB, .TS, ReplayTV 4000 and 5000, dvr-ms, bttv, Tivo, Dish Network,
12
.mp4, HDHomeRun are known to work).
14
Syntax:
15
ccextractor [options] inputfile1 [inputfile2...] [-o outputfilename]
17
Arguments:
18
[inputfile]...
19
file(s) to process
21
Options:
22
-h, --help
23
Print help (see a summary with '-h')
25
-V, --version
26
Print version
28
File name related options:
29
-o
30
Use -o parameters to define output filename if you don't
31
like the default ones (same as infile plus _1 or _2 when
32
needed and file extension, e.g. .srt).
34
--stdout
35
Write output to stdout (console) instead of file. If
36
stdout is used, then -o can't be used. Also
37
--stdout will redirect all messages to stderr (error).
39
--pesheader
40
Dump the PES Header to stdout (console). This is
41
used for debugging purposes to see the contents
42
of each PES packet header.
44
--debugdvbsub
45
Write the DVB subtitle debug traces to console
47
--ignoreptsjumps
48
Ignore PTS jumps (default)
50
--fixptsjumps
51
fix pts jumps. Use this parameter if you
52
experience timeline resets/jumps in the output.
54
--stdin
55
Reads input from stdin (console) instead of file.
56
Alternatively, - can be used instead of --stdin
58
Output File Segmentation:
59
--outinterval
62
--segmentonkeyonly
63
When segmenting files, do it only after a I frame
64
trying to behave like FFmpeg
66
Network support:
67
--udp <[[src@]host:]port>
68
Read the input via UDP (listening in the specified port)
69
instead of reading a file.
70
Host and src can be a hostname or IPv4 address.
71
If host is not specified then listens on the local host.
73
--src
74
Can be a hostname or IPv4 address.
76
--sendto
77
Sends data in BIN format to the server
78
according to the CCExtractor's protocol over
79
TCP. For IPv6 use [address] instead
81
--sendto-port
82
Specfies optional port for sendto
84
--tcp
85
Reads the input da`ta in BIN format according to
86
CCExtractor's protocol, listening specified port on the
87
local host
89
--tcp-password
90
Sets server password for new connections to
91
tcp server
93
--tcp-description
94
Sends to the server short description about
95
captions e.g. channel name or file name
97
Options that affect what will be processed:
98
--output-field
99
Values: 1 = Output Field 1
100
2 = Output Field 2
101
both = Both Output Field 1 and 2
102
Defaults to 1
104
--append
105
Use --append to prevent overwriting of existing files. The output will be
106
appended instead.
108
--cc2
109
When in srt/sami mode, process captions in channel 2
110
instead of channel 1.
112
--service
113
Enable CEA-708 (DTVCC) captions processing for the listed
114
services. The parameter is a comma delimited list
115
of services numbers, such as "1,2" to process the
116
primary and secondary language services.
117
Pass "all" to process all services found.
118
If captions in a service are stored in 16-bit encoding,
119
you can specify what charset or encoding was used. Pass
120
its name after service number (e.g. "1[EUC-KR],3" or
121
"all[EUC-KR]") and it will encode specified charset to
122
UTF-8 using iconv. See iconv documentation to check if
123
required encoding/charset is supported.
125
Input Formats:
126
--input
127
With the exception of McPoodle's raw format, which is just the closed
128
caption data with no other info, CCExtractor can usually detect the
129
input format correctly. Use this parameter to override the detected
131
Possible values:
132
- ts: For Transport Streams
133
- ps: For Program Streams
134
- es: For Elementary Streams
135
- asf: ASF container (such as DVR-MS)
136
- wtv: Windows Television (WTV)
137
- bin: CCExtractor's own binary format
138
- raw: For McPoodle's raw files
139
- mp4: MP4/MOV/M4V and similar
140
- m2ts: BDAV MPEG-2 Transport Stream
141
- mkv: Matroska container and WebM
142
- mxf: Material Exchange Format (MXF)
143
- scc: Scenarist Closed Caption (SCC)
145
Output Formats:
146
--out
147
Possible values:
148
- srt: SubRip (default, so not actually needed)
149
- ass: SubStation Alpha
150
- ssa: SubStation Alpha
151
- ccd: Scenarist Closed Caption Disassembly format
152
- scc: Scenarist Closed Caption format
153
- webvtt: WebVTT format
154
- webvtt-full: WebVTT format with styling
155
- sami: MS Synchronized Accesible Media Interface
156
- bin: CC data in CCExtractor's own binary format
157
- raw: CC data in McPoodle's Broadcast format
158
- dvdraw: CC data in McPoodle's DVD format
159
- mcc: CC data compressed using MacCaption Format
160
- txt: Transcript (no time codes, no roll-up captions, just the plain transcription)
161
- ttxt: Timed Transcript (transcription with time info)
162
- g608: Grid 608 format
163
- smptett: SMPTE Timed Text (W3C TTML) format
164
- spupng: Set of .xml and .png files for use with dvdauthor's spumux. See "Notes on spupng output format"
165
- null: Don't produce any file output
166
- report: Prints to stdout information about captions in specified input. Don't produce any file output
167
- simple-xml
169
Options that affect how input files will be processed:
170
--goptime
171
Use GOP for timing instead of PTS. This only applies
172
to Program or Transport Streams with MPEG2 data and
173
overrides the default PTS timing.
174
GOP timing is always used for Elementary Streams.
176
--no-goptime
177
Never use GOP timing (use PTS), even if ccextractor
178
detects GOP timing is the reasonable choice.
180
--fixpadding
181
Fix padding - some cards (or providers, or whatever)
182
seem to send 0000 as CC padding instead of 8080. If you
183
get bad timing, this might solve it.
185
--90090
186
Use 90090 (instead of 90000) as MPEG clock frequency.
187
(reported to be needed at least by Panasonic DMR-ES15
188
DVD Recorder)
190
--scc-framerate
191
Set the frame rate for SCC (Scenarist Closed Caption) input files.
192
Valid values: 29.97 (default), 24, 25, 30
193
Example: --scc-framerate 25
195
--scc-accurate-timing
196
Enable bandwidth-aware timing for SCC output (issue #1120).
197
When enabled, captions are pre-loaded ahead of their display time
198
based on the EIA-608 transmission bandwidth (2 bytes/frame).
199
This ensures YouTube and broadcast compliance by preventing
200
caption collisions. Use this for professional SCC output.
202
--videoedited
203
By default, ccextractor will process input files in
204
sequence as if they were all one large file (i.e.
205
split by a generic, non video-aware tool. If you
206
are processing video hat was split with a editing
207
tool, use --videoedited so ccextractor doesn't try to rebuild
208
the original timing.
210
-s, --stream
211
Consider the file as a continuous stream that is
212
growing as ccextractor processes it, so don't try
213
to figure out its size and don't terminate processing
214
when reaching the current end (i.e. wait for more
215
data to arrive). If the optional parameter secs is
216
present, it means the number of seconds without any
217
new data after which ccextractor should exit. Use
218
this parameter if you want to process a live stream
219
but not kill ccextractor externally.
220
Note: If --s is used then only one input file is
221
allowed.
223
--usepicorder
224
Use the pic_order_cnt_lsb in AVC/H.264 data streams
225
to order the CC information. The default way is to
226
use the PTS information. Use this switch only when
227
needed.
229
--myth
230
Force MythTV code branch.
232
--no-myth
233
Disable MythTV code branch.
234
The MythTV branch is needed for analog captures where
235
the closed caption data is stored in the VBI, such as
236
those with bttv cards (Hauppage 250 for example). This
237
is detected automatically so you don't need to worry
238
about this unless autodetection doesn't work for you.
240
--wtvconvertfix
241
This switch works around a bug in Windows 7's built in
242
software to convert *.wtv to *.dvr-ms. For analog NTSC
243
recordings the CC information is marked as digital
244
captions. Use this switch only when needed.
246
--wtvmpeg2
247
Read the captions from the MPEG2 video stream rather
248
than the captions stream in WTV files
250
--program-number
251
In TS mode, specifically select a program to process.
252
Not needed if the TS only has one. If this parameter
253
is not specified and CCExtractor detects more than one
254
program in the input, it will list the programs found
255
and terminate without doing anything, unless
256
--autoprogram (see below) is used.
258
--autoprogram
259
If there's more than one program in the stream, just use
260
the first one we find that contains a suitable stream.
262
--multiprogram
263
Uses multiple programs from the same input stream.
265
-L, --list-tracks
266
List all tracks found in the input file and exit without
267
processing. Useful for exploring media files before extraction.
269
--datapid
270
Don't try to find out the stream for caption/teletext
271
data, just use this one instead.
273
--datastreamtype
274
Instead of selecting the stream by its PID, select it
275
by its type (pick the stream that has this type in
276
the PMT)
278
--streamtype
279
Assume the data is of this type, don't autodetect. This
280
parameter may be needed if --datapid or --datastreamtype
281
is used and CCExtractor cannot determine how to process
282
the stream. The value will usually be 2 (MPEG video) or
283
6 (MPEG private data).
285
--hauppauge
286
If the video was recorder using a Hauppauge card, it
287
might need special processing. This parameter will
288
force the special treatment.
290
--mp4vidtrack
291
In MP4 files the closed caption data can be embedded in
292
the video track or in a dedicated CC track. If a
293
dedicated track is detected it will be processed instead
294
of the video track. If you need to force the video track
295
to be processed instead use this option.
297
--no-autotimeref
298
Some streams come with broadcast date information. When
299
such data is available, CCExtractor will set its time
300
reference to the received data. Use this parameter if
301
you prefer your own reference. Note: Current this only
302
affects Teletext in timed transcript with --datets.
304
--no-scte20
305
Ignore SCTE-20 data if present.
307
--webvtt-create-css
308
Create a separate file for CSS instead of inline.
310
--deblev
311
Enable debug so the calculated distance for each two
312
strings is displayed. The output includes both strings,
313
the calculated distance, the maximum allowed distance,
314
and whether the strings are ultimately considered
315
equivalent or not, i.e. the calculated distance is
316
less or equal than the max allowed.
318
--analyzevideo
319
Analyze the video stream even if it's not used for
320
subtitles. This allows to provide video information.
322
--timestamp-map
323
Enable the X-TIMESTAMP-MAP header for WebVTT (HLS)
325
Levenshtein distance:
326
--no-levdist
327
Don't attempt to correct typos with Levenshtein distance.
329
--levdistmincnt
330
Minimum distance we always allow regardless
331
of the length of the strings.Default 2.
332
This means that if the calculated distance
333
is 0,1 or 2, we consider the strings to be equivalent.
335
--levdistmaxpct
336
Maximum distance we allow, as a percentage of
337
the shortest string length. Default 10%.0
338
For example, consider a comparison of one string of
339
30 characters and one of 60 characters. We want to
340
determine whether the first 30 characters of the longer
341
string are more or less the same as the shortest string,
342
i.e. whether the longest string is the shortest one
343
plus new characters and maybe some corrections. Since
344
the shortest string is 30 characters and the default
345
percentage is 10%, we would allow a distance of up
346
to 3 between the first 30 characters.
348
Options that affect what kind of output will be produced:
349
--chapters
350
(Experimental) Produces a chapter file from MP4 files.
351
Note that this must only be used with MP4 files,
352
for other files it will simply generate subtitles file.
354
--bom
355
Append a BOM (Byte Order Mark) to output files.
356
Note that most text processing tools in linux will not
357
like BOM.
359
--no-bom
360
Do not append a BOM (Byte Order Mark) to output
361
files. Note that this may break files when using
362
Windows. This is the default in non-Windows builds.
364
--unicode
365
Encode subtitles in Unicode instead of Latin-1.
367
--utf8
368
Encode subtitles in UTF-8 (no longer needed.
369
because UTF-8 is now the default).
371
--latin1
372
Encode subtitles in Latin-1
374
--no-fontcolor
375
For .srt/.sami/.vtt, don't add font color tags.
377
--no-htmlescape
378
For .srt/.sami/.vtt, don't covert html unsafe character
380
--no-typesetting
381
For .srt/.sami/.vtt, don't add typesetting tags.
383
--trim
384
Trim lines.
386
--defaultcolor
387
Select a different default color (instead of
388
white). This causes all output in .srt/.smi/.vtt
389
files to have a font tag, which makes the files
390
larger. Add the color you want in RGB, such as
391
--defaultcolor #FF0000 for red.
393
--sentencecap
394
Sentence capitalization. Use if you hate
395
ALL CAPS in subtitles.
397
--capfile
398
Add the contents of 'file' to the list of words
399
that must be capitalized. For example, if file
400
is a plain text file that contains
402
Tony
403
Alan
405
Whenever those words are found they will be written
406
exactly as they appear in the file.
407
Use one line per word. Lines starting with # are
408
considered comments and discarded.
410
--kf
411
Censors profane words from subtitles.
413
--profanity-file
414
Add the contents of to the list of words that.
415
must be censored. The content of , follows the
416
same syntax as for the capitalization file
418
--splitbysentence
419
Split output text so each frame contains a complete
420
sentence. Timings are adjusted based on number of
421
characters
423
--unixts
424
For timed transcripts that have an absolute date
425
instead of a timestamp relative to the file start), use
426
this time reference (UNIX timestamp). 0 => Use current
427
system time.
428
ccextractor will automatically switch to transport
429
stream UTC timestamps when available.
431
--datets
432
In transcripts, write time as YYYYMMDDHHMMss,ms.
434
--sects
435
In transcripts, write time as ss,ms
437
--ucla
438
Transcripts are generated with a specific format
439
that is convenient for a specific project, feel
440
free to play with it but be aware that this format
441
is really live - don't rely on its output format
442
not changing between versions.
444
--latrusmap
445
Map Latin symbols to Cyrillic ones in special cases
446
of Russian Teletext files (issue #1086)
448
--ttxtforcelatin
449
Force Latin G0 charset for Teletext, ignoring any Cyrillic
450
designation in the stream. Use when broadcasts incorrectly
451
signal Cyrillic but content is Latin (issue #1395)
453
--xds
454
In timed transcripts, all XDS information will be saved
455
to the output file.
457
--lf
458
Use LF (UNIX) instead of CRLF (DOS, Windows) as line
459
terminator.
461
--df
462
For MCC Files, force dropframe frame count.
464
--autodash
465
Based on position on screen, attempt to determine
466
the different speakers and a dash (-) when each
467
of them talks (.srt/.vtt only, --trim required).
469
--xmltv
470
produce an XMLTV file containing the EPG data from
471
the source TS file. Mode: 1 = full output
472
2 = live output. 3 = both
474
--xmltvliveinterval
475
interval of x seconds between writing live mode xmltv output.
477
--xmltvoutputinterval
478
interval of x seconds between writing full file xmltv output.
480
--xmltvonlycurrent
481
Only print current events for xmltv output.
483
--sem
484
Create a .sem file for each output file that is open
485
and delete it on file close.
487
--dvblang
488
For DVB subtitles, select which language's caption
489
stream will be processed. e.g. 'eng' for English.
490
If there are multiple languages, only this specified
491
language stream will be processed (default).
493
--ocrlang
494
Manually select the name of the Tesseract .traineddata
495
file. Helpful if you want to OCR a caption stream of
496
one language with the data of another language.
497
e.g. '-dvblang chs --ocrlang chi_tra' will decode the
498
Chinese (Simplified) caption stream but perform OCR
499
using the Chinese (Traditional) trained data
500
This option is also helpful when the traineddata file
501
has non standard names that don't follow ISO specs
503
--quant
504
How to quantize the bitmap before passing it to tesseract
505
for OCR'ing.
506
0: Don't quantize at all.
507
1: Use CCExtractor's internal function (default).
508
2: Reduce distinct color count in image for faster results.
510
--oem
511
Select the OEM mode for Tesseract.
512
Available modes :
513
0: OEM_TESSERACT_ONLY - the fastest mode.
514
1: OEM_LSTM_ONLY - use LSTM algorithm for recognition.
515
2: OEM_TESSERACT_LSTM_COMBINED - both algorithms.
516
Default value depends on the tesseract version linked :
517
Tesseract v3 : default mode is 0,
518
Tesseract v4 : default mode is 1.
520
--psm
521
Select the PSM mode for Tesseract.
522
Available Page segmentation modes:
523
0 Orientation and script detection (OSD) only.
524
1 Automatic page segmentation with OSD.
525
2 Automatic page segmentation, but no OSD, or OCR.
526
3 Fully automatic page segmentation, but no OSD. (Default)
527
4 Assume a single column of text of variable sizes.
528
5 Assume a single uniform block of vertically aligned text.
529
6 Assume a single uniform block of text.
530
7 Treat the image as a single text line.
531
8 Treat the image as a single word.
532
9 Treat the image as a single word in a circle.
533
10 Treat the image as a single character.
534
11 Sparse text. Find as much text as possible in no particular order.
535
12 Sparse text with OSD.
536
13 Raw line. Treat the image as a single text line,
537
bypassing hacks that are Tesseract-specific.
539
--ocr-line-split
540
Split subtitle images into lines before OCR.
541
Uses PSM 7 (single text line mode) for each line,
542
which can improve accuracy for multi-line bitmap subtitles
543
(VOBSUB, DVD, DVB).
545
--no-ocr-blacklist
546
Disable the OCR character blacklist.
547
By default, CCExtractor blacklists characters like |, \, `, _
548
that are commonly misrecognized (e.g. 'I' as '|').
549
Use this flag to disable the blacklist.
551
--mkvlang
552
For MKV subtitles, select which language's caption
553
stream will be processed. e.g. 'eng' for English.
554
Language codes can be either the 3 letters bibliographic
555
ISO-639-2 form (like "fre" for french) or a language
556
code followed by a dash and a country code for specialities
557
in languages (like "fre-ca" for Canadian French).
559
--no-spupngocr
560
When processing DVB don't use the OCR to write the text as
561
comments in the XML file.
563
--font
564
Specify the full path of the font that is to be used when
565
generating SPUPNG files. If not specified, you need to
566
have the default font installed (Helvetica for macOS, Calibri
567
for Windows, and Noto for other operating systems at their
568
default location)
570
--italics
571
Specify the full path of the italics font that is to be used when
572
generating SPUPNG files. If not specified, you need to
573
have the default font installed (Helvetica Oblique for macOS, Calibri Italic
574
for Windows, and NotoSans Italic for other operating systems at their
575
default location)
577
Options that affect how ccextractor reads and writes (buffering):
578
--bufferinput
579
Forces input buffering.
581
--no-bufferinput
582
Disables input buffering.
584
--buffersize
585
Specify a size for reading, in bytes (suffix with K or
586
or M for kilobytes and megabytes). Default is 16M.
588
--koc
589
keep-output-close. If used then CCExtractor will close
590
the output file after writing each subtitle frame and
591
attempt to create it again when needed.
593
--forceflush
594
Flush the file buffer whenever content is written.
596
--dru
597
Direct Roll-Up. When in roll-up mode, write character by
598
character instead of line by line. Note that this
599
produces (much) larger files.
601
--no-rollup
602
If you hate the repeated lines caused by the roll-up
603
emulation, you can have ccextractor write only one
604
line at a time, getting rid of these repeated lines.
606
--ru1
607
roll-up captions can consist of 2, 3 or 4 visible
608
lines at any time (the number of lines is part of
609
the transmission). If having 3 or 4 lines annoys
610
you you can use --ru to force the decoder to always
611
use 1, 2 or 3 lines. Note that 1 line is not
612
a real mode rollup mode, so CCExtractor does what
613
it can.
614
In --ru1 the start timestamp is actually the timestamp
615
of the first character received which is possibly more
616
accurate.
618
--ru2
621
--ru3
624
Options that affect timing:
625
--delay
626
For srt/sami/webvtt, add this number of milliseconds to
627
all times. For example, --delay 400 makes subtitles
628
appear 400ms late. You can also use negative numbers
629
to make subs appear early.
631
Options that affect what segment of the input file(s) to process:
632
--startat
633
Only write caption information that starts after the
634
given time.
635
Time can be seconds, MM:SS or HH:MM:SS.
636
For example, --startat 3:00 means 'start writing from
637
minute 3.
639
--endat
640
Stop processing after the given time (same format as
641
--startat).
642
The --startat and --endat options are honored in all
643
output formats. In all formats with timing information
644
the times are unchanged.
646
--screenfuls
647
Write 'num' screenfuls and terminate processing.
649
Options that affect which codec is to be used have to be searched in input:
650
--codec
651
--codec dvbsub
652
select the dvb subtitle from all elementary stream,
653
if stream of dvb subtitle type is not found then
654
nothing is selected and no subtitle is generated
655
--codec teletext
656
select the teletext subtitle from elementary stream
658
[possible values: dvbsub, teletext]
660
--no-codec
661
--no-codec dvbsub
662
ignore dvb subtitle and follow default behaviour
663
--no-codec teletext
664
ignore teletext subtitle
666
[possible values: dvbsub, teletext]
668
Adding start and end credits:
669
--startcreditstext
670
Write this text as start credits. If there are
671
several lines, separate them with the
672
characters \n, for example Line1\nLine 2.
674
--startcreditsnotbefore
675
Don't display the start credits before this
676
time (S, or MM:SS). Default: 0
678
--startcreditsnotafter
679
Don't display the start credits after this
680
time (S, or MM:SS). Default: 5:00
682
--startcreditsforatleast
683
Start credits need to be displayed for at least
684
this time (S, or MM:SS). Default: 2
686
--startcreditsforatmost
687
Start credits should be displayed for at most
688
this time (S, or MM:SS). Default: 5
690
--endcreditstext
691
Write this text as end credits. If there are
692
several lines, separate them with the
693
characters \n, for example Line1\nLine 2.
695
--endcreditsforatleast
696
End credits need to be displayed for at least
697
this time (S, or MM:SS). Default: 2
699
--endcreditsforatmost
700
End credits should be displayed for at most
701
this time (S, or MM:SS). Default: 5
703
Options that affect debug data:
704
--debug
705
Show lots of debugging output.
707
--608
708
Print debug traces from the EIA-608 decoder.
709
If you need to submit a bug report, please send
710
the output from this option.
712
--708
713
Print debug information from the (currently
714
in development) EIA-708 (DTV) decoder.
716
--goppts
717
Enable lots of time stamp output.
719
--xdsdebug
720
Enable XDS debug data (lots of it).
722
--vides
723
Print debug info about the analysed elementary
724
video stream.
726
--cbraw
727
Print debug trace with the raw 608/708 data with
728
time stamps.
730
--no-sync
731
Disable the syncing code. Only useful for debugging
732
purposes.
734
--fullbin
735
Disable the removal of trailing padding blocks
736
when exporting to bin format. Only useful for
737
for debugging purposes.
739
--parsedebug
740
Print debug info about the parsed container
741
file. (Only for TS/ASF files at the moment.)
743
--parsePAT
744
Print Program Association Table dump.
746
--parsePMT
747
Print Program Map Table dump.
749
--dumpdef
750
Hex-dump defective TS packets.
752
--investigate-packets
753
If no CC packets are detected based on the PMT, try
754
to find data in all packets by scanning.
756
Teletext related options:
757
--tpage
758
Use this page for subtitles (if this parameter
759
is not used, try to autodetect). In Spain the
760
page is always 888, may vary in other countries.
761
You can specify multiple pages by using --tpage
762
multiple times (e.g., --tpage 891 --tpage 892).
763
Each page will be output to a separate file with
764
suffix _pNNN (e.g., output_p891.srt, output_p892.srt).
766
--tpages-all
767
Extract all teletext subtitle pages found in the stream.
768
Each page will be output to a separate file with
769
suffix _pNNN (e.g., output_p891.srt, output_p892.srt).
771
--tverbose
772
Enable verbose mode in the teletext decoder.
774
--teletext
775
Force teletext mode even if teletext is not detected.
776
If used, you should also pass --datapid to specify
777
the stream ID you want to process.
779
--no-teletext
780
Disable teletext processing. This might be needed
781
for video streams that have both teletext packets
782
and CEA-608/708 packets (if teletext is processed
783
then CEA-608/708 processing is disabled).
785
Transcript customizing options:
786
--customtxt
787
Use the passed format to customize the (Timed) Transcript
788
output. The format must be like this: 1100100 (7 digits).
789
These indicate whether the next things should be
790
displayed or not in the (timed) transcript. They
791
represent (in order):
792
- Display start time
793
- Display end time
794
- Display caption mode
795
- Display caption channel
796
- Use a relative timestamp ( relative to the sample)
797
- Display XDS info
798
- Use colors
799
Examples:
800
0000101 is the default setting for transcripts
801
1110101 is the default for timed transcripts
802
1111001 is the default setting for --ucla
803
Make sure you use this parameter after others that might
804
affect these settings (--out, --ucla, --xds, --txt,
805
--ttxt ...)
807
Communication with other programs and console output:
808
--gui-mode-reports
809
Report progress and interesting events to stderr
810
in a easy to parse format. This is intended to be
811
used by other programs. See docs directory for.
812
details.
814
--no-progress-bar
815
Suppress the output of the progress bar
817
--quiet
818
Don't write any message.
820
Burned-in subtitle extraction:
821
--hardsubx
822
Enable the burned-in subtitle extraction subsystem.
824
NOTE: This is needed to use the below burned-in
825
subtitle extractor options
827
--tickertext
828
Search for burned-in ticker text at the bottom of
829
the screen.
831
--ocr-mode
832
Set the OCR mode to either frame-wise, word-wise
833
or letter wise.
834
e.g. --ocr-mode frame (default), --ocr-mode word,
835
--ocr-mode letter
837
--subcolor
838
Specify the color of the subtitles
839
Possible values are in the set
840
{white,yellow,green,cyan,blue,magenta,red}.
841
Alternatively, a custom hue value between 1 and 360
842
may also be specified.
843
e.g. --subcolor white or --subcolor 270 (for violet).
844
Refer to an HSV color chart for values.
846
--min-sub-duration
847
Specify the minimum duration that a subtitle line
848
must exist on the screen.
849
The value is specified in seconds.
850
A lower value gives better results, but takes more
851
processing time.
852
The recommended value is 0.5 (default).
853
e.g. --min-sub-duration 1.0 (for a duration of 1 second)
855
--detect-italics
856
Specify whether italics are to be detected from the
857
OCR text.
858
Italic detection automatically enforces the OCR mode
859
to be word-wise
861
--conf-thresh
862
Specify the classifier confidence threshold between
863
1 and 100.
864
Try and use a threshold which works for you if you get
865
a lot of garbage text.
866
e.g. --conf-thresh 50
868
--whiteness-thresh
869
For white subtitles only, specify the luminance
870
threshold between 1 and 100
871
This threshold is content dependent, and adjusting
872
values may give you better results
873
Recommended values are in the range 80 to 100.
874
The default value is 95
876
--hcc
877
This option will be used if the file should have both
878
closed captions and burned in subtitles
880
An example command for burned-in subtitle extraction is as follows:
881
ccextractor video.mp4 --hardsubx --subcolor white --detect-italics --whiteness-thresh 90 --conf-thresh 60
883
Notes on File name related options:
884
You can pass as many input files as you need. They will be processed in order.
885
If a file name is suffixed by +, ccextractor will try to follow a numerical
886
sequence. For example, DVD001.VOB+ means DVD001.VOB, DVD002.VOB and so on
887
until there are no more files.
888
Output will be one single file (either raw or srt). Use this if you made your
889
recording in several cuts (to skip commercials for example) but you want one
890
subtitle file with contiguous timing.
892
Notes on Options that affect what will be processed:
893
In general, if you want English subtitles you don't need to use these options
894
as they are broadcast in field 1, channel 1. If you want the second language
895
(usually Spanish) you may need to try -2, or -cc2, or both.
897
Notes on Levenshtein distance:
898
When processing teletext files CCExtractor tries to correct typos by
899
comparing consecutive lines. If line N+1 is almost identical to line N except
900
for minor changes (plus next characters) then it assumes that line N that a
901
typo that was corrected in N+1. This is currently implemented in teletext
902
because it's where samples files that could benefit from this were available.
903
You can adjust, or disable, the algorithm settings with the following
904
parameters.
906
Notes on times:
907
--startat and --endat times are used first, then -delay.
908
So if you use --srt -startat 3:00 --endat 5:00 --delay 120000, ccextractor will
909
generate a .srt file, with only data from 3:00 to 5:00 in the input file(s)
910
and then add that (huge) delay, which would make the final file start at
911
5:00 and end at 7:00.
913
Notes on codec options:
914
If codec type is not selected then first elementary stream suitable for
915
subtitle is selected, please consider --teletext -noteletext override this
916
option.
917
no-codec and codec parameter must not be same if found to be same
918
then parameter of no-codec is ignored, this flag should be passed
919
once, more then one are not supported yet and last parameter would
920
taken in consideration
922
Notes on adding credits:
923
CCExtractor can _try_ to add a custom message (for credits for example) at
924
the start and end of the file, looking for a window where there are no
925
captions. If there is no such window, then no text will be added.
926
The start window must be between the times given and must have enough time
927
to display the message for at least the specified time.
929
Notes on the CEA-708 decoder:
930
By default, ccextractor now extracts both CEA-608 and CEA-708 subtitles
931
if they are present in the input. This results in two output files: one
932
for CEA-608 and one for CEA-708.
933
To extract only CEA-608 subtitles, use -1, -2, or -12.
934
To extract only CEA-708 subtitles, use -svc.
935
To extract both CEA-608 and CEA-708 subtitles, use both -1/-2/-12 and -svc.
936
While it is starting to be useful, it's
937
a work in progress. A number of things don't work yet in the decoder
938
itself, and many of the auxiliary tools (case conversion to name one)
939
won't do anything yet. Feel free to submit samples that cause problems
940
and feature requests.
942
Notes on spupng output format:
943
One .xml file is created per output field. A set of .png files are created in
944
a directory with the same base name as the corresponding .xml file(s), but with
945
a .d extension. Each .png file will contain an image representing one caption
946
and named subNNNN.png, starting with sub0000.png.
947
For example, the command:
948
ccextractor --out=spupng input.mpg
949
will create the files:
950
input.xml
951
input.d/sub0000.png
952
input.d/sub0001.png
953
...
954
The command:
955
ccextractor --out=spupng -o /tmp/output --output-field both input.mpg
956
will create the files:
957
/tmp/output_1.xml
958
/tmp/output_1.d/sub0000.png
959
/tmp/output_1.d/sub0001.png
960
...
961
/tmp/output_2.xml
962
/tmp/output_2.d/sub0000.png
963
/tmp/output_2.d/sub0001.png
964
...
not set