Python extension module for CCExtractor

This is the main documentation of Python extension module for CCExtractor:

CCExtractor Library

Refactoring the codebase into a library

Earlier version of CCExtractor was compiled as a binary and could not be used as a library. The entire codebase was executed via a single main function defined in ccextractor.c and this architecture was not suitable for extending ccextractor source code to a library. Hence, many modifications were made to ccextractor.c so that conversion to a library could be done. Major modifications were:

Apart from these changes, the header file ccextractor.h was also included into the codebase to define many global variables as well as the function declarations of definitions made in ccextractor.c. The major changes could be seen at this PR (merged). However, following the next stages of development after the changes made in the above mentioned PR, the final structure could be found at ccextractor.c and ccextractor.h.

Definitions made in ccextractor.h

CCExtractor Python Extension Module

Extension module dependencies


2. Python-dev package

Overall architecture

Generating the Python extension module

Workflow of Python extension module

The following section encompasses on the detailed description of the entire workflow of Python extension modules and the importance of each function in the codeflow. An example usage has been done in


Function declaration- struct ccx_s_options api_init_options()*


Function declaration- void check_configuration_file(struct ccx_s_options api_options) This function is used to check the configuration file and it takes the struct ccx_s_options instance as returned by api_init_options().


Function declaration- void api_add_param(struct ccx_s_options api_options,char arg)**


Function declaration- (depends on whether the compilation is done as CCExtractor binary or as extension modules)


Function declaration- *int compile_params(struct ccx_s_options api_options,int argc)


Function declaration- *void call_from_python_api(struct ccx_s_options api_options)


Function declaration- int api_start(struct ccx_s_options api_options)

The user should note that the codeflow discussed above till this point is generic to both CCExtractor binary as well as CCExtractor's Python extension module. From this point onwards, the codeflow that has been described is mainly how the Python extension module accepts the caption frames via callback function and then processings done on the caption frames to generate the output subtitle file (.srt) via Python.

From the pass_cc_buffer_to_python function, the call is made to the extractor function, then the extractor function in turns calls the callback function provided earlier via my_pythonapi function. The arguments given to the callback function are the ones corresponding to the information content of the caption frame which has been processed by CCExtractor. This information is accessed via the Python SRT generator scripts which would process the caption frames and write the processed information in the output subtitle files. The following sections would be sequential in-detail descriptions about how each process functions:

Python Encoder for CCExtractor


Function declaration- **int pass_cc_buffer_to_python(struct eia608_screen data, struct encoder_ctx context)

Extractors for bindings


Function declaration- void python_extract_g608_grid(unsigned h1, unsigned m1, unsigned s1, unsigned ms1, unsigned h2, unsigned m2, unsigned s2, unsigned ms2, char buffer, int identifier, int srt_counter, int encoding)*

Callback Function architecture


Function declaration- void run(PyObject * reporter, char * line, int encoding)

This is how the callback mechanism works for passing the lines from C to Python in real time.

Processing output in Python

Support for only CE-608 captions

For understanding the CE-608 caption format, the user is advised to refer to this documentation on CE-608.

This is how the entire CE-608 is transmitted to Python and the user needs to follow the nomenclature in order to get the caption frames in Python.

Wrappers for the extension module

Test Script

Silent API

Work status

Future Work

Identifying the input format and raising errors if unsupported

Callback class mechanism

Completion of comparing_text_font_grids function

Adding more wrapper functions

Extending the module to support other caption formats