Command line usage

CCExtractor's main program is console based. There's a GUI for Windows, as well as provisions so other programs can easily interface with CCExtractor, but the heavy lefting is done by a command line program (that can be called by scripts so integration with larger processes is straightforward).

Running CCExtractor without any parameter will display a help screen with all the options. As of version 0.96.5 the help screen is as follows:


          1
          CCExtractor 0.96.5, Carlos Fernandez Sanz, Volker Quetschke..
        
          2
          Teletext portions taken from Petr Kutalek's telxcc
        
          3
          --------------------------------------------------------------------------
        
          4
          Originally based on McPoodle's tools. Check his page for lots of information
        
          5
          on closed captions technical details.
        
          6
          (http://www.theneitherworld.com/mcpoodle/SCC_TOOLS/DOCS/SCC_TOOLS.HTML)
        
          8
          This tool home page:
        
          9
          http://www.ccextractor.org
        
          10
          Extracts closed captions and teletext subtitles from video streams.
        
          11
          (DVB, .TS, ReplayTV 4000 and 5000, dvr-ms, bttv, Tivo, Dish Network,
        
          12
          .mp4, HDHomeRun are known to work).
        
          14
          Syntax:
        
          15
          ccextractor [options] inputfile1 [inputfile2...] [-o outputfilename]
        
          17
          Arguments:
        
          18
          [inputfile]...
        
          19
          file(s) to process
        
          21
          Options:
        
          22
          -h, --help
        
          23
          Print help (see a summary with '-h')
        
          25
          -V, --version
        
          26
          Print version
        
          28
          File name related options:
        
          29
          -o 
        
          30
          Use -o parameters to define output filename if you don't
        
          31
          like the default ones (same as infile plus _1 or _2 when
        
          32
          needed and file extension, e.g. .srt).
        
          34
          --stdout
        
          35
          Write output to stdout (console) instead of file. If
        
          36
          stdout is used, then -o can't be used. Also
        
          37
          --stdout will redirect all messages to stderr (error).
        
          39
          --pesheader
        
          40
          Dump the PES Header to stdout (console). This is
        
          41
          used for debugging purposes to see the contents
        
          42
          of each PES packet header.
        
          44
          --debugdvbsub
        
          45
          Write the DVB subtitle debug traces to console
        
          47
          --ignoreptsjumps
        
          48
          Ignore PTS jumps (default)
        
          50
          --fixptsjumps
        
          51
          fix pts jumps. Use this parameter if you
        
          52
          experience timeline resets/jumps in the output.
        
          54
          --stdin
        
          55
          Reads input from stdin (console) instead of file.
        
          56
          Alternatively, - can be used instead of --stdin
        
          58
          Output File Segmentation:
        
          59
          --outinterval 
        
          62
          --segmentonkeyonly
        
          63
          When segmenting files, do it only after a I frame
        
          64
          trying to behave like FFmpeg
        
          66
          Network support:
        
          67
          --udp <[[src@]host:]port>
        
          68
          Read the input via UDP (listening in the specified port)
        
          69
          instead of reading a file.
        
          70
          Host and src can be a hostname or IPv4 address.
        
          71
          If host is not specified then listens on the local host.
        
          73
          --src 
        
          74
          Can be a hostname or IPv4 address.
        
          76
          --sendto 
        
          77
          Sends data in BIN format to the server
        
          78
          according to the CCExtractor's protocol over
        
          79
          TCP. For IPv6 use [address] instead
        
          81
          --sendto-port 
        
          82
          Specfies optional port for sendto
        
          84
          --tcp 
        
          85
          Reads the input da`ta in BIN format according to
        
          86
          CCExtractor's protocol, listening specified port on the
        
          87
          local host
        
          89
          --tcp-password 
        
          90
          Sets server password for new connections to
        
          91
          tcp server
        
          93
          --tcp-description 
        
          94
          Sends to the server short description about
        
          95
          captions e.g. channel name or file name
        
          97
          Options that affect what will be processed:
        
          98
          --output-field 
        
          99
          Values: 1 = Output Field 1
        
          100
          2 = Output Field 2
        
          101
          both = Both Output Field 1 and 2
        
          102
          Defaults to 1
        
          104
          --append
        
          105
          Use --append to prevent overwriting of existing files. The output will be
        
          106
          appended instead.
        
          108
          --cc2
        
          109
          When in srt/sami mode, process captions in channel 2
        
          110
          instead of channel 1.
        
          112
          --service 
        
          113
          Enable CEA-708 (DTVCC) captions processing for the listed
        
          114
          services. The parameter is a comma delimited list
        
          115
          of services numbers, such as "1,2" to process the
        
          116
          primary and secondary language services.
        
          117
          Pass "all" to process all services found.
        
          118
          If captions in a service are stored in 16-bit encoding,
        
          119
          you can specify what charset or encoding was used. Pass
        
          120
          its name after service number (e.g. "1[EUC-KR],3" or
        
          121
          "all[EUC-KR]") and it will encode specified charset to
        
          122
          UTF-8 using iconv. See iconv documentation to check if
        
          123
          required encoding/charset is supported.
        
          125
          Input Formats:
        
          126
          --input 
        
          127
          With the exception of McPoodle's raw format, which is just the closed
        
          128
          caption data with no other info, CCExtractor can usually detect the
        
          129
          input format correctly. Use this parameter to override the detected
        
          131
          Possible values:
        
          132
          - ts:   For Transport Streams
        
          133
          - ps:   For Program Streams
        
          134
          - es:   For Elementary Streams
        
          135
          - asf:  ASF container (such as DVR-MS)
        
          136
          - wtv:  Windows Television (WTV)
        
          137
          - bin:  CCExtractor's own binary format
        
          138
          - raw:  For McPoodle's raw files
        
          139
          - mp4:  MP4/MOV/M4V and similar
        
          140
          - m2ts: BDAV MPEG-2 Transport Stream
        
          141
          - mkv:  Matroska container and WebM
        
          142
          - mxf:  Material Exchange Format (MXF)
        
          143
          - scc:  Scenarist Closed Caption (SCC)
        
          145
          Output Formats:
        
          146
          --out 
        
          147
          Possible values:
        
          148
          - srt:         SubRip (default, so not actually needed)
        
          149
          - ass:         SubStation Alpha
        
          150
          - ssa:         SubStation Alpha
        
          151
          - ccd:         Scenarist Closed Caption Disassembly format
        
          152
          - scc:         Scenarist Closed Caption format
        
          153
          - webvtt:      WebVTT format
        
          154
          - webvtt-full: WebVTT format with styling
        
          155
          - sami:        MS Synchronized Accesible Media Interface
        
          156
          - bin:         CC data in CCExtractor's own binary format
        
          157
          - raw:         CC data in McPoodle's Broadcast format
        
          158
          - dvdraw:      CC data in McPoodle's DVD format
        
          159
          - mcc:         CC data compressed using MacCaption Format
        
          160
          - txt:         Transcript (no time codes, no roll-up captions, just the plain transcription)
        
          161
          - ttxt:        Timed Transcript (transcription with time info)
        
          162
          - g608:        Grid 608 format
        
          163
          - smptett:     SMPTE Timed Text (W3C TTML) format
        
          164
          - spupng:      Set of .xml and .png files for use with dvdauthor's spumux. See "Notes on spupng output format"
        
          165
          - null:        Don't produce any file output
        
          166
          - report:      Prints to stdout information about captions in specified input. Don't produce any file output
        
          167
          - simple-xml
        
          169
          Options that affect how input files will be processed:
        
          170
          --goptime
        
          171
          Use GOP for timing instead of PTS. This only applies
        
          172
          to Program or Transport Streams with MPEG2 data and
        
          173
          overrides the default PTS timing.
        
          174
          GOP timing is always used for Elementary Streams.
        
          176
          --no-goptime
        
          177
          Never use GOP timing (use PTS), even if ccextractor
        
          178
          detects GOP timing is the reasonable choice.
        
          180
          --fixpadding
        
          181
          Fix padding - some cards (or providers, or whatever)
        
          182
          seem to send 0000 as CC padding instead of 8080. If you
        
          183
          get bad timing, this might solve it.
        
          185
          --90090
        
          186
          Use 90090 (instead of 90000) as MPEG clock frequency.
        
          187
          (reported to be needed at least by Panasonic DMR-ES15
        
          188
          DVD Recorder)
        
          190
          --scc-framerate 
        
          191
          Set the frame rate for SCC (Scenarist Closed Caption) input files.
        
          192
          Valid values: 29.97 (default), 24, 25, 30
        
          193
          Example: --scc-framerate 25
        
          195
          --scc-accurate-timing
        
          196
          Enable bandwidth-aware timing for SCC output (issue #1120).
        
          197
          When enabled, captions are pre-loaded ahead of their display time
        
          198
          based on the EIA-608 transmission bandwidth (2 bytes/frame).
        
          199
          This ensures YouTube and broadcast compliance by preventing
        
          200
          caption collisions. Use this for professional SCC output.
        
          202
          --videoedited
        
          203
          By default, ccextractor will process input files in
        
          204
          sequence as if they were all one large file (i.e.
        
          205
          split by a generic, non video-aware tool. If you
        
          206
          are processing video hat was split with a editing
        
          207
          tool, use --videoedited so ccextractor doesn't try to rebuild
        
          208
          the original timing.
        
          210
          -s, --stream 
        
          211
          Consider the file as a continuous stream that is
        
          212
          growing as ccextractor processes it, so don't try
        
          213
          to figure out its size and don't terminate processing
        
          214
          when reaching the current end (i.e. wait for more
        
          215
          data to arrive). If the optional parameter secs is
        
          216
          present, it means the number of seconds without any
        
          217
          new data after which ccextractor should exit. Use
        
          218
          this parameter if you want to process a live stream
        
          219
          but not kill ccextractor externally.
        
          220
          Note: If --s is used then only one input file is
        
          221
          allowed.
        
          223
          --usepicorder
        
          224
          Use the pic_order_cnt_lsb in AVC/H.264 data streams
        
          225
          to order the CC information.  The default way is to
        
          226
          use the PTS information.  Use this switch only when
        
          227
          needed.
        
          229
          --myth
        
          230
          Force MythTV code branch.
        
          232
          --no-myth
        
          233
          Disable MythTV code branch.
        
          234
          The MythTV branch is needed for analog captures where
        
          235
          the closed caption data is stored in the VBI, such as
        
          236
          those with bttv cards (Hauppage 250 for example). This
        
          237
          is detected automatically so you don't need to worry
        
          238
          about this unless autodetection doesn't work for you.
        
          240
          --wtvconvertfix
        
          241
          This switch works around a bug in Windows 7's built in
        
          242
          software to convert *.wtv to *.dvr-ms. For analog NTSC
        
          243
          recordings the CC information is marked as digital
        
          244
          captions. Use this switch only when needed.
        
          246
          --wtvmpeg2
        
          247
          Read the captions from the MPEG2 video stream rather
        
          248
          than the captions stream in WTV files
        
          250
          --program-number 
        
          251
          In TS mode, specifically select a program to process.
        
          252
          Not needed if the TS only has one. If this parameter
        
          253
          is not specified and CCExtractor detects more than one
        
          254
          program in the input, it will list the programs found
        
          255
          and terminate without doing anything, unless
        
          256
          --autoprogram (see below) is used.
        
          258
          --autoprogram
        
          259
          If there's more than one program in the stream, just use
        
          260
          the first one we find that contains a suitable stream.
        
          262
          --multiprogram
        
          263
          Uses multiple programs from the same input stream.
        
          265
          -L, --list-tracks
        
          266
          List all tracks found in the input file and exit without
        
          267
          processing. Useful for exploring media files before extraction.
        
          269
          --datapid 
        
          270
          Don't try to find out the stream for caption/teletext
        
          271
          data, just use this one instead.
        
          273
          --datastreamtype 
        
          274
          Instead of selecting the stream by its PID, select it
        
          275
          by its type (pick the stream that has this type in
        
          276
          the PMT)
        
          278
          --streamtype 
        
          279
          Assume the data is of this type, don't autodetect. This
        
          280
          parameter may be needed if --datapid or --datastreamtype
        
          281
          is used and CCExtractor cannot determine how to process
        
          282
          the stream. The value will usually be 2 (MPEG video) or
        
          283
          6 (MPEG private data).
        
          285
          --hauppauge
        
          286
          If the video was recorder using a Hauppauge card, it
        
          287
          might need special processing. This parameter will
        
          288
          force the special treatment.
        
          290
          --mp4vidtrack
        
          291
          In MP4 files the closed caption data can be embedded in
        
          292
          the video track or in a dedicated CC track. If a
        
          293
          dedicated track is detected it will be processed instead
        
          294
          of the video track. If you need to force the video track
        
          295
          to be processed instead use this option.
        
          297
          --no-autotimeref
        
          298
          Some streams come with broadcast date information. When
        
          299
          such data is available, CCExtractor will set its time
        
          300
          reference to the received data. Use this parameter if
        
          301
          you prefer your own reference. Note: Current this only
        
          302
          affects Teletext in timed transcript with --datets.
        
          304
          --no-scte20
        
          305
          Ignore SCTE-20 data if present.
        
          307
          --webvtt-create-css
        
          308
          Create a separate file for CSS instead of inline.
        
          310
          --deblev
        
          311
          Enable debug so the calculated distance for each two
        
          312
          strings is displayed. The output includes both strings,
        
          313
          the calculated distance, the maximum allowed distance,
        
          314
          and whether the strings are ultimately considered
        
          315
          equivalent or not, i.e. the calculated distance is
        
          316
          less or equal than the max allowed.
        
          318
          --analyzevideo
        
          319
          Analyze the video stream even if it's not used for
        
          320
          subtitles. This allows to provide video information.
        
          322
          --timestamp-map
        
          323
          Enable the X-TIMESTAMP-MAP header for WebVTT (HLS)
        
          325
          Levenshtein distance:
        
          326
          --no-levdist
        
          327
          Don't attempt to correct typos with Levenshtein distance.
        
          329
          --levdistmincnt 
        
          330
          Minimum distance we always allow regardless
        
          331
          of the length of the strings.Default 2.
        
          332
          This means that if the calculated distance
        
          333
          is 0,1 or 2, we consider the strings to be equivalent.
        
          335
          --levdistmaxpct 
        
          336
          Maximum distance we allow, as a percentage of
        
          337
          the shortest string length. Default 10%.0
        
          338
          For example, consider a comparison of one string of
        
          339
          30 characters and one of 60 characters. We want to
        
          340
          determine whether the first 30 characters of the longer
        
          341
          string are more or less the same as the shortest string,
        
          342
          i.e. whether the longest string  is the shortest one
        
          343
          plus new characters and maybe some corrections. Since
        
          344
          the shortest string is 30 characters and  the default
        
          345
          percentage is 10%, we would allow a distance of up
        
          346
          to 3 between the first 30 characters.
        
          348
          Options that affect what kind of output will be produced:
        
          349
          --chapters
        
          350
          (Experimental) Produces a chapter file from MP4 files.
        
          351
          Note that this must only be used with MP4 files,
        
          352
          for other files it will simply generate subtitles file.
        
          354
          --bom
        
          355
          Append a BOM (Byte Order Mark) to output files.
        
          356
          Note that most text processing tools in linux will not
        
          357
          like BOM.
        
          359
          --no-bom
        
          360
          Do not append a BOM (Byte Order Mark) to output
        
          361
          files. Note that this may break files when using
        
          362
          Windows. This is the default in non-Windows builds.
        
          364
          --unicode
        
          365
          Encode subtitles in Unicode instead of Latin-1.
        
          367
          --utf8
        
          368
          Encode subtitles in UTF-8 (no longer needed.
        
          369
          because UTF-8 is now the default).
        
          371
          --latin1
        
          372
          Encode subtitles in Latin-1
        
          374
          --no-fontcolor
        
          375
          For .srt/.sami/.vtt, don't add font color tags.
        
          377
          --no-htmlescape
        
          378
          For .srt/.sami/.vtt, don't covert html unsafe character
        
          380
          --no-typesetting
        
          381
          For .srt/.sami/.vtt, don't add typesetting tags.
        
          383
          --trim
        
          384
          Trim lines.
        
          386
          --defaultcolor 
        
          387
          Select a different default color (instead of
        
          388
          white). This causes all output in .srt/.smi/.vtt
        
          389
          files to have a font tag, which makes the files
        
          390
          larger. Add the color you want in RGB, such as
        
          391
          --defaultcolor #FF0000 for red.
        
          393
          --sentencecap
        
          394
          Sentence capitalization. Use if you hate
        
          395
          ALL CAPS in subtitles.
        
          397
          --capfile 
        
          398
          Add the contents of 'file' to the list of words
        
          399
          that must be capitalized. For example, if file
        
          400
          is a plain text file that contains
        
          402
          Tony
        
          403
          Alan
        
          405
          Whenever those words are found they will be written
        
          406
          exactly as they appear in the file.
        
          407
          Use one line per word. Lines starting with # are
        
          408
          considered comments and discarded.
        
          410
          --kf
        
          411
          Censors profane words from subtitles.
        
          413
          --profanity-file 
        
          414
          Add the contents of  to the list of words that.
        
          415
          must be censored. The content of , follows the
        
          416
          same syntax as for the capitalization file
        
          418
          --splitbysentence
        
          419
          Split output text so each frame contains a complete
        
          420
          sentence. Timings are adjusted based on number of
        
          421
          characters
        
          423
          --unixts 
        
          424
          For timed transcripts that have an absolute date
        
          425
          instead of a timestamp relative to the file start), use
        
          426
          this time reference (UNIX timestamp). 0 => Use current
        
          427
          system time.
        
          428
          ccextractor will automatically switch to transport
        
          429
          stream UTC timestamps when available.
        
          431
          --datets
        
          432
          In transcripts, write time as YYYYMMDDHHMMss,ms.
        
          434
          --sects
        
          435
          In transcripts, write time as ss,ms
        
          437
          --ucla
        
          438
          Transcripts are generated with a specific format
        
          439
          that is convenient for a specific project, feel
        
          440
          free to play with it but be aware that this format
        
          441
          is really live - don't rely on its output format
        
          442
          not changing between versions.
        
          444
          --latrusmap
        
          445
          Map Latin symbols to Cyrillic ones in special cases
        
          446
          of Russian Teletext files (issue #1086)
        
          448
          --ttxtforcelatin
        
          449
          Force Latin G0 charset for Teletext, ignoring any Cyrillic
        
          450
          designation in the stream. Use when broadcasts incorrectly
        
          451
          signal Cyrillic but content is Latin (issue #1395)
        
          453
          --xds
        
          454
          In timed transcripts, all XDS information will be saved
        
          455
          to the output file.
        
          457
          --lf
        
          458
          Use LF (UNIX) instead of CRLF (DOS, Windows) as line
        
          459
          terminator.
        
          461
          --df
        
          462
          For MCC Files, force dropframe frame count.
        
          464
          --autodash
        
          465
          Based on position on screen, attempt to determine
        
          466
          the different speakers and a dash (-) when each
        
          467
          of them talks (.srt/.vtt only, --trim required).
        
          469
          --xmltv 
        
          470
          produce an XMLTV file containing the EPG data from
        
          471
          the source TS file. Mode: 1 = full output
        
          472
          2 = live output. 3 = both
        
          474
          --xmltvliveinterval 
        
          475
          interval of x seconds between writing live mode xmltv output.
        
          477
          --xmltvoutputinterval 
        
          478
          interval of x seconds between writing full file xmltv output.
        
          480
          --xmltvonlycurrent
        
          481
          Only print current events for xmltv output.
        
          483
          --sem
        
          484
          Create a .sem file for each output file that is open
        
          485
          and delete it on file close.
        
          487
          --dvblang 
        
          488
          For DVB subtitles, select which language's caption
        
          489
          stream will be processed. e.g. 'eng' for English.
        
          490
          If there are multiple languages, only this specified
        
          491
          language stream will be processed (default).
        
          493
          --ocrlang 
        
          494
          Manually select the name of the Tesseract .traineddata
        
          495
          file. Helpful if you want to OCR a caption stream of
        
          496
          one language with the data of another language.
        
          497
          e.g. '-dvblang chs --ocrlang chi_tra' will decode the
        
          498
          Chinese (Simplified) caption stream but perform OCR
        
          499
          using the Chinese (Traditional) trained data
        
          500
          This option is also helpful when the traineddata file
        
          501
          has non standard names that don't follow ISO specs
        
          503
          --quant 
        
          504
          How to quantize the bitmap before passing it to tesseract
        
          505
          for OCR'ing.
        
          506
          0: Don't quantize at all.
        
          507
          1: Use CCExtractor's internal function (default).
        
          508
          2: Reduce distinct color count in image for faster results.
        
          510
          --oem 
        
          511
          Select the OEM mode for Tesseract.
        
          512
          Available modes :
        
          513
          0: OEM_TESSERACT_ONLY - the fastest mode.
        
          514
          1: OEM_LSTM_ONLY - use LSTM algorithm for recognition.
        
          515
          2: OEM_TESSERACT_LSTM_COMBINED - both algorithms.
        
          516
          Default value depends on the tesseract version linked :
        
          517
          Tesseract v3 : default mode is 0,
        
          518
          Tesseract v4 : default mode is 1.
        
          520
          --psm 
        
          521
          Select the PSM mode for Tesseract.
        
          522
          Available Page segmentation modes:
        
          523
          0    Orientation and script detection (OSD) only.
        
          524
          1    Automatic page segmentation with OSD.
        
          525
          2    Automatic page segmentation, but no OSD, or OCR.
        
          526
          3    Fully automatic page segmentation, but no OSD. (Default)
        
          527
          4    Assume a single column of text of variable sizes.
        
          528
          5    Assume a single uniform block of vertically aligned text.
        
          529
          6    Assume a single uniform block of text.
        
          530
          7    Treat the image as a single text line.
        
          531
          8    Treat the image as a single word.
        
          532
          9    Treat the image as a single word in a circle.
        
          533
          10    Treat the image as a single character.
        
          534
          11    Sparse text. Find as much text as possible in no particular order.
        
          535
          12    Sparse text with OSD.
        
          536
          13    Raw line. Treat the image as a single text line,
        
          537
          bypassing hacks that are Tesseract-specific.
        
          539
          --ocr-line-split
        
          540
          Split subtitle images into lines before OCR.
        
          541
          Uses PSM 7 (single text line mode) for each line,
        
          542
          which can improve accuracy for multi-line bitmap subtitles
        
          543
          (VOBSUB, DVD, DVB).
        
          545
          --no-ocr-blacklist
        
          546
          Disable the OCR character blacklist.
        
          547
          By default, CCExtractor blacklists characters like |, \, `, _
        
          548
          that are commonly misrecognized (e.g. 'I' as '|').
        
          549
          Use this flag to disable the blacklist.
        
          551
          --mkvlang 
        
          552
          For MKV subtitles, select which language's caption
        
          553
          stream will be processed. e.g. 'eng' for English.
        
          554
          Language codes can be either the 3 letters bibliographic
        
          555
          ISO-639-2 form (like "fre" for french) or a language
        
          556
          code followed by a dash and a country code for specialities
        
          557
          in languages (like "fre-ca" for Canadian French).
        
          559
          --no-spupngocr
        
          560
          When processing DVB don't use the OCR to write the text as
        
          561
          comments in the XML file.
        
          563
          --font 
        
          564
          Specify the full path of the font that is to be used when
        
          565
          generating SPUPNG files. If not specified, you need to
        
          566
          have the default font installed (Helvetica for macOS, Calibri
        
          567
          for Windows, and Noto for other operating systems at their
        
          568
          default location)
        
          570
          --italics 
        
          571
          Specify the full path of the italics font that is to be used when
        
          572
          generating SPUPNG files. If not specified, you need to
        
          573
          have the default font installed (Helvetica Oblique for macOS, Calibri Italic
        
          574
          for Windows, and NotoSans Italic for other operating systems at their
        
          575
          default location)
        
          577
          Options that affect how ccextractor reads and writes (buffering):
        
          578
          --bufferinput
        
          579
          Forces input buffering.
        
          581
          --no-bufferinput
        
          582
          Disables input buffering.
        
          584
          --buffersize 
        
          585
          Specify a size for reading, in bytes (suffix with K or
        
          586
          or M for kilobytes and megabytes). Default is 16M.
        
          588
          --koc
        
          589
          keep-output-close. If used then CCExtractor will close
        
          590
          the output file after writing each subtitle frame and
        
          591
          attempt to create it again when needed.
        
          593
          --forceflush
        
          594
          Flush the file buffer whenever content is written.
        
          596
          --dru
        
          597
          Direct Roll-Up. When in roll-up mode, write character by
        
          598
          character instead of line by line. Note that this
        
          599
          produces (much) larger files.
        
          601
          --no-rollup
        
          602
          If you hate the repeated lines caused by the roll-up
        
          603
          emulation, you can have ccextractor write only one
        
          604
          line at a time, getting rid of these repeated lines.
        
          606
          --ru1
        
          607
          roll-up captions can consist of 2, 3 or 4 visible
        
          608
          lines at any time (the number of lines is part of
        
          609
          the transmission). If having 3 or 4 lines annoys
        
          610
          you you can use --ru to force the decoder to always
        
          611
          use 1, 2 or 3 lines. Note that 1 line is not
        
          612
          a real mode rollup mode, so CCExtractor does what
        
          613
          it can.
        
          614
          In --ru1 the start timestamp is actually the timestamp
        
          615
          of the first character received which is possibly more
        
          616
          accurate.
        
          618
          --ru2
        
          621
          --ru3
        
          624
          Options that affect timing:
        
          625
          --delay 
        
          626
          For srt/sami/webvtt, add this number of milliseconds to
        
          627
          all times. For example, --delay 400 makes subtitles
        
          628
          appear 400ms late. You can also use negative numbers
        
          629
          to make subs appear early.
        
          631
          Options that affect what segment of the input file(s) to process:
        
          632
          --startat 
        
          633
          Only write caption information that starts after the
        
          634
          given time.
        
          635
          Time can be seconds, MM:SS or HH:MM:SS.
        
          636
          For example, --startat 3:00 means 'start writing from
        
          637
          minute 3.
        
          639
          --endat 
        
          640
          Stop processing after the given time (same format as
        
          641
          --startat).
        
          642
          The --startat and --endat options are honored in all
        
          643
          output formats.  In all formats with timing information
        
          644
          the times are unchanged.
        
          646
          --screenfuls 
        
          647
          Write 'num' screenfuls and terminate processing.
        
          649
          Options that affect which codec is to be used have to be searched in input:
        
          650
          --codec 
        
          651
          --codec dvbsub
        
          652
          select the dvb subtitle from all elementary stream,
        
          653
          if stream of dvb subtitle type is not found then
        
          654
          nothing is selected and no subtitle is generated
        
          655
          --codec teletext
        
          656
          select the teletext subtitle from elementary stream
        
          658
          [possible values: dvbsub, teletext]
        
          660
          --no-codec 
        
          661
          --no-codec dvbsub
        
          662
          ignore dvb subtitle and follow default behaviour
        
          663
          --no-codec teletext
        
          664
          ignore teletext subtitle
        
          666
          [possible values: dvbsub, teletext]
        
          668
          Adding start and end credits:
        
          669
          --startcreditstext 
        
          670
          Write this text as start credits. If there are
        
          671
          several lines, separate them with the
        
          672
          characters \n, for example Line1\nLine 2.
        
          674
          --startcreditsnotbefore 
        
          675
          Don't display the start credits before this
        
          676
          time (S, or MM:SS). Default: 0
        
          678
          --startcreditsnotafter 
        
          679
          Don't display the start credits after this
        
          680
          time (S, or MM:SS). Default: 5:00
        
          682
          --startcreditsforatleast 
        
          683
          Start credits need to be displayed for at least
        
          684
          this time (S, or MM:SS). Default: 2
        
          686
          --startcreditsforatmost 
        
          687
          Start credits should be displayed for at most
        
          688
          this time (S, or MM:SS). Default: 5
        
          690
          --endcreditstext 
        
          691
          Write this text as end credits. If there are
        
          692
          several lines, separate them with the
        
          693
          characters \n, for example Line1\nLine 2.
        
          695
          --endcreditsforatleast 
        
          696
          End credits need to be displayed for at least
        
          697
          this time (S, or MM:SS). Default: 2
        
          699
          --endcreditsforatmost 
        
          700
          End credits should be displayed for at most
        
          701
          this time (S, or MM:SS). Default: 5
        
          703
          Options that affect debug data:
        
          704
          --debug
        
          705
          Show lots of debugging output.
        
          707
          --608
        
          708
          Print debug traces from the EIA-608 decoder.
        
          709
          If you need to submit a bug report, please send
        
          710
          the output from this option.
        
          712
          --708
        
          713
          Print debug information from the (currently
        
          714
          in development) EIA-708 (DTV) decoder.
        
          716
          --goppts
        
          717
          Enable lots of time stamp output.
        
          719
          --xdsdebug
        
          720
          Enable XDS debug data (lots of it).
        
          722
          --vides
        
          723
          Print debug info about the analysed elementary
        
          724
          video stream.
        
          726
          --cbraw
        
          727
          Print debug trace with the raw 608/708 data with
        
          728
          time stamps.
        
          730
          --no-sync
        
          731
          Disable the syncing code.  Only useful for debugging
        
          732
          purposes.
        
          734
          --fullbin
        
          735
          Disable the removal of trailing padding blocks
        
          736
          when exporting to bin format.  Only useful for
        
          737
          for debugging purposes.
        
          739
          --parsedebug
        
          740
          Print debug info about the parsed container
        
          741
          file. (Only for TS/ASF files at the moment.)
        
          743
          --parsePAT
        
          744
          Print Program Association Table dump.
        
          746
          --parsePMT
        
          747
          Print Program Map Table dump.
        
          749
          --dumpdef
        
          750
          Hex-dump defective TS packets.
        
          752
          --investigate-packets
        
          753
          If no CC packets are detected based on the PMT, try
        
          754
          to find data in all packets by scanning.
        
          756
          Teletext related options:
        
          757
          --tpage 
        
          758
          Use this page for subtitles (if this parameter
        
          759
          is not used, try to autodetect). In Spain the
        
          760
          page is always 888, may vary in other countries.
        
          761
          You can specify multiple pages by using --tpage
        
          762
          multiple times (e.g., --tpage 891 --tpage 892).
        
          763
          Each page will be output to a separate file with
        
          764
          suffix _pNNN (e.g., output_p891.srt, output_p892.srt).
        
          766
          --tpages-all
        
          767
          Extract all teletext subtitle pages found in the stream.
        
          768
          Each page will be output to a separate file with
        
          769
          suffix _pNNN (e.g., output_p891.srt, output_p892.srt).
        
          771
          --tverbose
        
          772
          Enable verbose mode in the teletext decoder.
        
          774
          --teletext
        
          775
          Force teletext mode even if teletext is not detected.
        
          776
          If used, you should also pass --datapid to specify
        
          777
          the stream ID you want to process.
        
          779
          --no-teletext
        
          780
          Disable teletext processing. This might be needed
        
          781
          for video streams that have both teletext packets
        
          782
          and CEA-608/708 packets (if teletext is processed
        
          783
          then CEA-608/708 processing is disabled).
        
          785
          Transcript customizing options:
        
          786
          --customtxt 
        
          787
          Use the passed format to customize the (Timed) Transcript
        
          788
          output. The format must be like this: 1100100 (7 digits).
        
          789
          These indicate whether the next things should be
        
          790
          displayed or not in the (timed) transcript. They
        
          791
          represent (in order):
        
          792
          - Display start time
        
          793
          - Display end time
        
          794
          - Display caption mode
        
          795
          - Display caption channel
        
          796
          - Use a relative timestamp ( relative to the sample)
        
          797
          - Display XDS info
        
          798
          - Use colors
        
          799
          Examples:
        
          800
          0000101 is the default setting for transcripts
        
          801
          1110101 is the default for timed transcripts
        
          802
          1111001 is the default setting for --ucla
        
          803
          Make sure you use this parameter after others that might
        
          804
          affect these settings (--out, --ucla, --xds, --txt,
        
          805
          --ttxt ...)
        
          807
          Communication with other programs and console output:
        
          808
          --gui-mode-reports
        
          809
          Report progress and interesting events to stderr
        
          810
          in a easy to parse format. This is intended to be
        
          811
          used by other programs. See docs directory for.
        
          812
          details.
        
          814
          --no-progress-bar
        
          815
          Suppress the output of the progress bar
        
          817
          --quiet
        
          818
          Don't write any message.
        
          820
          Burned-in subtitle extraction:
        
          821
          --hardsubx
        
          822
          Enable the burned-in subtitle extraction subsystem.
        
          824
          NOTE: This is needed to use the below burned-in
        
          825
          subtitle extractor options
        
          827
          --tickertext
        
          828
          Search for burned-in ticker text at the bottom of
        
          829
          the screen.
        
          831
          --ocr-mode 
        
          832
          Set the OCR mode to either frame-wise, word-wise
        
          833
          or letter wise.
        
          834
          e.g. --ocr-mode frame (default), --ocr-mode word,
        
          835
          --ocr-mode letter
        
          837
          --subcolor 
        
          838
          Specify the color of the subtitles
        
          839
          Possible values are in the set
        
          840
          {white,yellow,green,cyan,blue,magenta,red}.
        
          841
          Alternatively, a custom hue value between 1 and 360
        
          842
          may also be specified.
        
          843
          e.g. --subcolor white or --subcolor 270 (for violet).
        
          844
          Refer to an HSV color chart for values.
        
          846
          --min-sub-duration 
        
          847
          Specify the minimum duration that a subtitle line
        
          848
          must exist on the screen.
        
          849
          The value is specified in seconds.
        
          850
          A lower value gives better results, but takes more
        
          851
          processing time.
        
          852
          The recommended value is 0.5 (default).
        
          853
          e.g. --min-sub-duration 1.0 (for a duration of 1 second)
        
          855
          --detect-italics
        
          856
          Specify whether italics are to be detected from the
        
          857
          OCR text.
        
          858
          Italic detection automatically enforces the OCR mode
        
          859
          to be word-wise
        
          861
          --conf-thresh 
        
          862
          Specify the classifier confidence threshold between
        
          863
          1 and 100.
        
          864
          Try and use a threshold which works for you if you get
        
          865
          a lot of garbage text.
        
          866
          e.g. --conf-thresh 50
        
          868
          --whiteness-thresh 
        
          869
          For white subtitles only, specify the luminance
        
          870
          threshold between 1 and 100
        
          871
          This threshold is content dependent, and adjusting
        
          872
          values may give you better results
        
          873
          Recommended values are in the range 80 to 100.
        
          874
          The default value is 95
        
          876
          --hcc
        
          877
          This option will be used if the file should have both
        
          878
          closed captions and burned in subtitles
        
          880
          An example command for burned-in subtitle extraction is as follows:
        
          881
          ccextractor video.mp4 --hardsubx --subcolor white --detect-italics --whiteness-thresh 90 --conf-thresh 60
        
          883
          Notes on File name related options:
        
          884
          You can pass as many input files as you need. They will be processed in order.
        
          885
          If a file name is suffixed by +, ccextractor will try to follow a numerical
        
          886
          sequence. For example, DVD001.VOB+ means DVD001.VOB, DVD002.VOB and so on
        
          887
          until there are no more files.
        
          888
          Output will be one single file (either raw or srt). Use this if you made your
        
          889
          recording in several cuts (to skip commercials for example) but you want one
        
          890
          subtitle file with contiguous timing.
        
          892
          Notes on Options that affect what will be processed:
        
          893
          In general, if you want English subtitles you don't need to use these options
        
          894
          as they are broadcast in field 1, channel 1. If you want the second language
        
          895
          (usually Spanish) you may need to try -2, or -cc2, or both.
        
          897
          Notes on Levenshtein distance:
        
          898
          When processing teletext files CCExtractor tries to correct typos by
        
          899
          comparing consecutive lines. If line N+1 is almost identical to line N except
        
          900
          for minor changes (plus next characters) then it assumes that line N that a
        
          901
          typo that was corrected in N+1. This is currently implemented in teletext
        
          902
          because it's where samples files that could benefit from this were available.
        
          903
          You can adjust, or disable, the algorithm settings with the following
        
          904
          parameters.
        
          906
          Notes on times:
        
          907
          --startat and --endat times are used first, then -delay.
        
          908
          So if you use --srt -startat 3:00 --endat 5:00 --delay 120000, ccextractor will
        
          909
          generate a .srt file, with only data from 3:00 to 5:00 in the input file(s)
        
          910
          and then add that (huge) delay, which would make the final file start at
        
          911
          5:00 and end at 7:00.
        
          913
          Notes on codec options:
        
          914
          If codec type is not selected then first elementary stream suitable for
        
          915
          subtitle is selected, please consider --teletext -noteletext override this
        
          916
          option.
        
          917
          no-codec and codec parameter must not be same if found to be same
        
          918
          then parameter of no-codec is ignored, this flag should be passed
        
          919
          once, more then one are not supported yet and last parameter would
        
          920
          taken in consideration
        
          922
          Notes on adding credits:
        
          923
          CCExtractor can _try_ to add a custom message (for credits for example) at
        
          924
          the start and end of the file, looking for a window where there are no
        
          925
          captions. If there is no such window, then no text will be added.
        
          926
          The start window must be between the times given and must have enough time
        
          927
          to display the message for at least the specified time.
        
          929
          Notes on the CEA-708 decoder:
        
          930
          By default, ccextractor now extracts both CEA-608 and CEA-708 subtitles
        
          931
          if they are present in the input. This results in two output files: one
        
          932
          for CEA-608 and one for CEA-708.
        
          933
          To extract only CEA-608 subtitles, use -1, -2, or -12.
        
          934
          To extract only CEA-708 subtitles, use -svc.
        
          935
          To extract both CEA-608 and CEA-708 subtitles, use both -1/-2/-12 and -svc.
        
          936
          While it is starting to be useful, it's
        
          937
          a work in progress. A number of things don't work yet in the decoder
        
          938
          itself, and many of the auxiliary tools (case conversion to name one)
        
          939
          won't do anything yet. Feel free to submit samples that cause problems
        
          940
          and feature requests.
        
          942
          Notes on spupng output format:
        
          943
          One .xml file is created per output field. A set of .png files are created in
        
          944
          a directory with the same base name as the corresponding .xml file(s), but with
        
          945
          a .d extension. Each .png file will contain an image representing one caption
        
          946
          and named subNNNN.png, starting with sub0000.png.
        
          947
          For example, the command:
        
          948
          ccextractor --out=spupng input.mpg
        
          949
          will create the files:
        
          950
          input.xml
        
          951
          input.d/sub0000.png
        
          952
          input.d/sub0001.png
        
          953
          ...
        
          954
          The command:
        
          955
          ccextractor --out=spupng -o /tmp/output --output-field both input.mpg
        
          956
          will create the files:
        
          957
          /tmp/output_1.xml
        
          958
          /tmp/output_1.d/sub0000.png
        
          959
          /tmp/output_1.d/sub0001.png
        
          960
          ...
        
          961
          /tmp/output_2.xml
        
          962
          /tmp/output_2.d/sub0000.png
        
          963
          /tmp/output_2.d/sub0001.png
        
          964
          ...
        
...
not set