Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
public:general:gettingstartedwithourcode [2018/02/12 23:44]
cfsmp3
public:general:gettingstartedwithourcode [2019/12/05 00:07] (current)
aadibajpai Correct file name
Line 1: Line 1:
-==== Getting started with our code ====+==== Getting started with CCExtractor'​s source ​code ==== 
 + 
 +This page is currently being written (a bit every day, actively) so new developers that want to join us don't have to learn the basics from scratch.
  
 We often get questions about how to get started with our code. The most important thing would be: Don't try to read and understand every file, because it's pointless and there'​s no need. While it's definitely not the linux kernel, CCExtractor'​s code is not trivial, and it's been written by a number of people during a long time. Often, that people was learning as they went, and it shows in parts of the code. We often get questions about how to get started with our code. The most important thing would be: Don't try to read and understand every file, because it's pointless and there'​s no need. While it's definitely not the linux kernel, CCExtractor'​s code is not trivial, and it's been written by a number of people during a long time. Often, that people was learning as they went, and it shows in parts of the code.
Line 6: Line 8:
  
 This page tries to explain the most important concepts and introduces the important files in the core CCExtractor tool. Note that we have additional tools such as our regression test platform, or the real time subtitle database. Those will be explained in their own pages. This page tries to explain the most important concepts and introduces the important files in the core CCExtractor tool. Note that we have additional tools such as our regression test platform, or the real time subtitle database. Those will be explained in their own pages.
 +
 +CCExtractor is written in C. If you are a C++ developer that will have pretty much zero impact in your ability to contribute, because the really important differences are abstracted in functions anyway. Sure we don't have classes and our I/O is different, but that's really not a big deal here - you will need to understand file formats anyway, or how to read specification documents. None of that depends on the language of choice.
  
 CCExtractor reads binary streams (a stream may be a file, but it can also be data coming from network - so don't assume) and writes subtitle files. ​ CCExtractor reads binary streams (a stream may be a file, but it can also be data coming from network - so don't assume) and writes subtitle files. ​
Line 13: Line 17:
 The usual audio/video streams come in a number of variants. You know how in files you have .avi, .mkv, .mp4, .mpeg and so on? Those are container formats, because they "​contain"​ the parts of the media: Video, audio and subtitles. Each of those have some limitations,​ but in general, the contain format doesn'​t specify how each part of the media is encoded. You have can a .mkv (Matroska) that contains the video encoded as MPEG-2, or H264, etc, then the audio as MP3, or AAC and so on.  The usual audio/video streams come in a number of variants. You know how in files you have .avi, .mkv, .mp4, .mpeg and so on? Those are container formats, because they "​contain"​ the parts of the media: Video, audio and subtitles. Each of those have some limitations,​ but in general, the contain format doesn'​t specify how each part of the media is encoded. You have can a .mkv (Matroska) that contains the video encoded as MPEG-2, or H264, etc, then the audio as MP3, or AAC and so on. 
  
-In TV broadcast, the typical container is the Transport Stream (.ts). ​Transport Stream can carry more than one TV program (for example, BBC One, BBC Two and BBC News), each of them with its own video, audio, and subtitles (and for each, maybe more than one language).+In TV broadcast, the typical container is the Transport Stream (.ts). ​Transport Stream can carry more than one TV program (for example, BBC One, BBC Two and BBC News), each of them with its own video, audio, and subtitles (and for each, maybe more than one language).
  
 Streaming services such as iTunes uses .mp4.  Streaming services such as iTunes uses .mp4. 
Line 21: Line 25:
 === Subtitles === === Subtitles ===
  
-Our input streams are files that contain subtitles. These subtitles can encoded in a different ways depending on the country they come from or the tecnology ​used to make the recording. Focusing on recordings made from a TV broadcast, we have:+Our input streams are files that contain subtitles. These subtitles can be encoded in a different ways depending on the country they come from or the technology ​used to make the recording. Focusing on recordings made from a TV broadcast, we have:
  
-**CEA-608**,​ which is the "​old"​ format used in North America. It comes from the analog days of NTSC, but the while the transmission was analog, in the end you had 2 bytes (that'​s digital) of subtitles ​in each frame, and that's the one thing that is important to keep in mind. You don't need to bother understanding the analog part of the transmission,​ because what we process is just those two bytes.+**CEA-608**,​ which is the "​old"​ format used in North America. It comes from the analog days of NTSC, but while the transmission was analog, in the end you have 2 bytes (that'​s digital) of subtitle data in each frame, and that's the one thing that is important to keep in mind. You don't need to bother understanding the analog part of the transmission,​ because what we process is just those two bytes.
  
 By the way, in North America those subtitles that you can turn on and off are called **closed captions**. By the way, in North America those subtitles that you can turn on and off are called **closed captions**.
Line 52: Line 56:
 </​code>​ </​code>​
  
-which is in the file streams_functions.c+which is in the file stream_functions.c
  
 That function (please check the code) sets the type format (best guess; identifying without fault is a lot harder than you'd think, but that's not important for an introduction) for the **context** (more on contexts later). ​ That function (please check the code) sets the type format (best guess; identifying without fault is a lot harder than you'd think, but that's not important for an introduction) for the **context** (more on contexts later). ​
Line 58: Line 62:
 Once we know what type of stream we're processing we know which demuxer to use to read it. Once we know what type of stream we're processing we know which demuxer to use to read it.
  
-We have demuxers for Transport Streams (in ts_functions.c),​ mp4 (in mp4.c) and more. The block that, after knowing the type of container, decides what to do, is in the main file, ccextractor.org,+We have demuxers for Transport Streams (in ts_functions.c),​ mp4 (in mp4.c) and more. The block that, after knowing the type of container, decides what to do, is in the main file, ccextractor.c,
  
 <​code>​ <​code>​
Line 69: Line 73:
         }         }
 </​code>  ​ </​code>  ​
 +
 +=== User options, contexts, and in general where stuff is saved ===
 +When adding a new user option (that can be selected via command line argument) the steps are always the same:
 +
 +1) Add the corresponding variable in the structure
 +<​code>​
 +struct ccx_s_options // Options from user parameters
 +{
 + int extract; ​                                              // Extract 1st, 2nd or both fields
 + int no_rollup;
 + int noscte20;
 + int webvtt_create_css;​
 +...
 +}
 +</​code>​
 +which is defined in src/​lib_ccx/​ccx_common_option.h
 +
 +2) Initialize it to the correct default value in this function:
 +<​code>​
 +void init_options (struct ccx_s_options *options)
 +{
 +#ifdef _WIN32
 + options->​buffer_input = 1; // In Windows buffering seems to help
 +#else
 + options->​buffer_input = 0; // In linux, not so much.
 +#endif
 + options->​nofontcolor=0;​ // 1 = don't put <font color> tags
 + options->​notypesetting=0;​ // 1 = Don't put <i>, <u>, etc typesetting tags
 +</​code>​
 +
 +which is defined src/​lib_ccx/​ccx_common_option.c
 +
 +3) Add the corresponding parsing code in the function ​
 +
 +<​code>​
 +int parse_parameters (struct ccx_s_options *opt, int argc, char *argv[])
 +{
 + // Parse parameters
 + for (int i=1; i<argc; i++)
 + {
 + if (!strcmp (argv[i],"​--help"​) || !strcmp(argv[i],​ "​-h"​))
 +...
 +}
 +
 +</​code>​
 +which is defined on src/​lib_ccx/​params.c
 +
 +4) Add usage instruction on the function ​
 +
 +<​code>​
 +void print_usage (void)
 +{
 + mprint ("​Originally based on McPoodle'​s tools. Check his page for lots of information\n"​);​
 + mprint ("on closed captions technical details.\n"​);​
 +...
 +}
 +</​code>​
 +
 +which is also defined on src/​lib_ccx/​params.c
 +
 +5) Depending on what part of the code is going to actually be using that parameter you will need to copy it on the right place. The "​place",​ in general, is a context. A context is a structure that contain status values relevant to a group of functions, such as a decoder or an encoder. For example, if the new parameter applied to a decoder, we could copy it in the function that initializes the decoder contexts with user options:
 +
 +<​code>​
 +static struct ccx_decoders_common_settings_t *init_decoder_setting(
 + struct ccx_s_options *opt)
 +{
 + struct ccx_decoders_common_settings_t *setting;
 +
 + setting = malloc(sizeof(struct ccx_decoders_common_settings_t));​
 +...
 +</​code>​
 +
 +Note that the function receives the same structure used in steps 2 and 3, and is going to return a struct ccx_decoders_common_settings_t* .
 +
 +Finally, there'​s the function that really initializes the decoder context from the decoder settings:
 +
 +<​code>​
 +struct lib_cc_decode* init_cc_decode (struct ccx_decoders_common_settings_t *setting)
 +{
 + struct lib_cc_decode *ctx = NULL;
 +</​code>​
 +
 +That struct lib_cc_decode* is what the decoders will have with all the options plus all the values they need to store as flow progresses.
 +
 +6) Use the variable.
 +
 +A [[https://​github.com/​CCExtractor/​ccextractor/​commit/​150d2e7404843491baaf94b33ca7416279d55bb8|sample commit]] that does all the steps and adds a new option.
  
  
  • public/general/gettingstartedwithourcode.1518479075.txt.gz
  • Last modified: 2018/02/12 23:44
  • by cfsmp3