Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
public:gsoc:python_extension_module_technical_documentation_gsoc_17 [2017/09/04 17:34]
skrill [Callback Function architecture]
public:gsoc:python_extension_module_technical_documentation_gsoc_17 [2017/09/04 17:42] (current)
skrill [Wrappers for the extension module]
Line 139: Line 139:
  
 ==== Wrappers for the extension module ==== ==== Wrappers for the extension module ====
-In case of using an API, it is highly desired to set the parameters desired by the user not via command line but as call to built-in functions. This gave rise to the necessity of wrapper functions which can be called to set certain parameters for directing the functioning of the bindings. +  * In case of using an API, it is highly desired to set the parameters desired by the user not via command line but as call to built-in functions. This gave rise to the necessity of wrapper functions which can be called to set certain parameters for directing the functioning of the bindings. 
-The wrappers have been defined in the wrapper.c file in api/​wrappers/​ directory. The user can use just call the wrappers to set some parameters. More wrappers can be defined according to the architecture followed in wrapper.c. +  ​* ​The wrappers have been defined in the [[https://​github.com/​CCExtractor/​ccextractor/​blob/​master/​api/​wrappers/​wrapper.c|wrapper.c]] file in api/​wrappers/​ directory. The user can use just call the wrappers to set some parameters. More wrappers can be defined according to the architecture followed in wrapper.c. 
-The user needs to note that the wrappers can be called anytime in between adding parameters to CCExtractor instance (as done in api_testing.py) and before calling the compile_params function from the CCExtractor module. +  ​* ​The user needs to note that the wrappers can be called anytime in between adding parameters to CCExtractor instance (as done in api_testing.py) and before calling the compile_params function from the CCExtractor module. 
-Another thing to note about the wrapper is that, the my_pythonapi wrapper function is a very important wrapper function. It tells CCExtractor that the call has been made using the Python module and thus the functioning of CCExtractor is altered. Hence, if the user intends to use the Python module the user is always advised to call this wrapper function with its first argument to be the object returned by api_init function from CCExtractor module and second argument being the callback function which would be called by the CCExtractor to pass the extracted caption lines back to Python.+  ​* ​Another thing to note about the wrapper is that, the my_pythonapi wrapper function is a very important wrapper function. It tells CCExtractor that the call has been made using the Python module and thus the functioning of CCExtractor is altered. Hence, if the user intends to use the Python module the user is always advised to call this wrapper function with its first argument to be the object returned by api_init function from CCExtractor module and second argument being the callback function which would be called by the CCExtractor to pass the extracted caption lines back to Python.
  
- Test Script+==== Test Script ​==== 
 +  * Once the Python module are generated then the user can use them by importing ccextractor module in Python.  
 +  * For testing the output of the bindings a test script, [[https://​github.com/​CCExtractor/​ccextractor/​blob/​master/​api/​api_testing.py|api_testing.py]]. But to mention, the module at this stage only supports generating a subtitle file from the CE-608 standard samples only. 
 +  * Another testing feature, that has been added is that the user can use [[https://​github.com/​CCExtractor/​ccextractor/​blob/​master/​api/​recursive_tester.py|recursive_tester.py]] to generate the subtitle files for all the samples from a directory. The only parameter needed to run this script is the location of all the samples.
  
-Once the Python ​module are generated then the user can use them by importing ccextractor module ​in Python +==== Silent API ==== 
-For testing ​the output of the bindings ​test script, ​api_testing.py. But to mention, ​the module at this stage only supports generating a subtitle file from the CE-608 standard samples only+  * The Python ​bindings have been designed in such a way that the API is silent ​in itself as well as in the form of output generationSilent in itself means that the API doesn’t write out any output to the STDOUT and the entire ​output of CCExtractor is silenced when the module is used for extraction of caption frames. This feature has been made possible by passing ​parameter -pythonapi internally in api_testing.py ​using the function my_pythonapi() ​from the ccextractor module. The -pythonapi internally makes CCExtractor to silence all the outputs that could have been generated otherwise.  
-Another testing featurethat has been added is that the user can use recursive_tester.py ​to generate ​the subtitle files for all the samples ​from a directory. The only parameter needed ​to run this script is the location of all the samples.+  * If the user wants to add some print functionality from the CCExtractorthen may be defining the prints using printf C function could be an option. Note that the user cannot ​use the mprint function ​to get prints from the extension module ​from inside the CCExtractor C code part as used in CCExtractor ​to get the desired STDOUT prints as these are silenced via -pythonapi.
  
-Silent API+==== Work status ==== 
 +  * The proposal made by me for this project had a major component of multi-threading to let CCExtractor’s Python bindings run the CCExtractor’s extraction process in multi-threads. 
 +  * However, the end goal was modified while the GSOC 2017 coding period and after Second Phase Evaluation, the main aim was to create a Python extension module for CCExtractor which could process CE-608 video samples, extract the caption information present in them and pass this information to Python for further processing. The module was expected to be silent and the output generation from the caption information present in the video sample has to be done via Python. 
 +  * The present status of the extension module is that the module can extract caption information from CE-608 standard video samples and pass the caption information to Python. Further work has also been done to process this caption information to generate an output subtitle(srt) file (the user is advised to check completion of comparing_text_font_grids function sub-section under the future work section).
  
-The Python bindings have been designed in such a way that the API is silent in itself as well as in the form of output generationSilent in itself means that the API doesn’t ​write out any output ​to the STDOUT ​and the entire output ​of CCExtractor ​is silenced when the module ​is used for extraction of caption framesThis feature has been made possible by passing a parameter -pythonapi internally in api_testing.py using the function my_pythonapi() ​from the ccextractor moduleThe -pythonapi internally makes CCExtractor ​to silence all the outputs that could have been generated otherwise +==== Future Work ==== 
-If the user wants to add some print functionality from the CCExtractor,​ then may be defining ​the prints using printf C function could be an option. ​Note that the user cannot use the mprint function ​to get prints from the extension module ​from inside ​the CCExtractor C code part as used in CCExtractor ​to get the desired STDOUT prints as these are silenced via -pythonapi.+=== Identifying ​the input format and raising errors if unsupported === 
 +  * CCExtractor does not process any non-video files. Similarly, ​the processing ​of non-video files is not supported by extension moduleHowever, since the API has been designed to be silent, the module ​doesn’t output ​any error log stating that the input file is a non-video file and it cannot be processed.  
 +  * This is a much desired feature ​and the present version ​of CCExtractor ​extension ​module ​lacks this featureI would be working on this feature post GSOC 2017 but if any user finds that this feature has not been added until they start contribution to CCExtractor’s extension module, then their work on this feature would be highly appreciated. 
 +  * For adding this feature to extension module, ​the extension module must be extended to process the return value from CCExtractor as done in the [[https://​github.com/CCExtractor/​ccextractor/​blob/​master/​src/​ccextractor.c#​L71|api_start function]]. When the sample (non-video) is processed via CCExtractor’s binary, then the processing is stopped by raising ​an ‘Invalid ​option ​to CCExtractor Library’ errorHowever, since the extension module has been designed to be silent, this error message is suppressed. Hence, ​the user should extend ​the test scripts ​to process ​the return value of api_start function in python ​extension module ​according to the constants defined ​in [[https://​github.com/​CCExtractor/​ccextractor/​blob/​master/​src/​lib_ccx/​ccx_common_common.h|ccx_common_common.h]].
  
-Work status +=== Callback class mechanism === 
-The proposal made by me for this project had major component ​of multi-threading ​to let CCExtractor’s ​Python ​bindings run the CCExtractor’s extraction process ​in multi-threads. +  ​* ​The present architecture uses callback mechanism to pass the extracted caption lines from the caption frames ​of CE-608 captions ​to Python ​for further processing. In the callback mechanism, a callback function is supplied to CCExtractor in C via the my_pythonapi function which stores ​the callback function as a PyObject* in the global variable array. However, according ​to Python ​documentation on C-APIeverything ​in Python ​is a PyObject; be it a function, a tuple or a class. 
-However, ​the end goal was modified while the GSOC 2017 coding period and after Second Phase Evaluation, ​the main aim was to create a Python ​extension module for CCExtractor which could process CE-608 video samplesextract the caption information present ​in them and pass this information to Python ​for further processingThe module was expected ​to be silent and the output generation from the caption information ​present ​in the video sample ​has to be done via Python+  * So, the ideology is to replace ​the present ​callback function by a class which can have many methods that the user can use for different use cases. 
-The present status of the extension module ​is that the module can extract caption information from CE-608 standard video samples and pass the caption information ​to PythonFurther work has also been done to process ​this caption information to generate an output subtitle(srt) file (the user is advised ​to check completion of comparing_text_font_grids function sub-section under the future work section).+  * An example of such an implementation ​has been done [[https://​github.com/​Diptanshu8/​ccextractor/​blob/​callback_class/​api/​api_testing.py#​L27|here]]. The user needs to note that for accessing the Python class in C, some modifications need to be done to the run function defined in ccextractor.c and a sample example for calling a class method named ‘callback’ could be found [[https://​github.com/​Diptanshu8/​ccextractor/​blob/​callback_class/​src/​ccextractor.c#​L553|here]]
 +  * Also, an important point to be noted in this case is that the user needs to pass the callback function’s name to run function in C so that the corresponding callback method of the class passed via my_pythonapi could be called via CAs an example, the callback method’s name has been provided [[https://​github.com/​Diptanshu8/​ccextractor/​blob/​callback_class/​src/​ccextractor.c#​L562|here]]. 
 +  * For understanding the exact implementation of this approach, I would recommend ​the user to understand C-API for Python as the documentation is quite extensive to every use case.
  
-Future Work +=== Completion ​of comparing_text_font_grids function === 
-Identifying the input format and raising errors if unsupported +  * The Python ​extension module ​for CCExtractor is able to pass lines of the caption frames ​for different grids of CE-608 captions. ​Howeverfor generating ​the subtitle file from the caption gridsthe text grid needs to be modified ​according to the color grid as well as font grid. In CCExtractorthis job is done at the function, ​[[https://​github.com/​Diptanshu8/​ccextractor/​blob/​callback_class/​src/​lib_ccx/​ccx_encoders_helpers.c#​L234|get_decoder_line_encoded]]
-CCExtractor does not process any non-video files. Similarly, the processing ​of non-video files is not supported by extension module. However, since the API has been designed to be silent, the module doesn’t output any error log stating that the input file is a non-video file and it cannot be processed. ​ +  * For generation ​of subtitle files (.srt files) from Python, ​an equivalent version of get_decoder_line_encoded ​has been implemented in Python ​and has been defined ​as [[https://​github.com/​CCExtractor/​ccextractor/​blob/​master/​api/​python_srt_generator.py#​L56|comparing_text_font_grids]] in python_srt_generator.py 
-This is a much desired feature and the present version of CCExtractor ​extension module ​lacks this feature. I would be working on this feature post GSOC 2017 but if any user finds that this feature has not been added until they start contribution to CCExtractor’s extension module, then their work on this feature would be highly appreciated. +  * Howeveras the user can note that this function ​is not a complete implementation of get_decoder_line_encoded ​function, completion ​of this function’s definition is a matter ​of future work.
-For adding this feature to extension module, the extension module must be extended to process the return value from CCExtractor as done in the api_start function. When the sample (non-video) ​is processed via CCExtractor’s binary, then the processing is stopped by raising an ‘Invalid option to CCExtractor Library’ error. However, since the extension module has been designed to be silent, this error message is suppressed. Hence, the user should extend the test scripts to process the return value of api_start function in python extension module according to the constants defined in ccx_common_common.h +
-Callback class mechanism +
-The present architecture uses a callback mechanism ​to pass the extracted caption ​lines from the caption frames of CE-608 captions ​to Python for further processingIn the callback mechanisma callback function is supplied to CCExtractor in C via the my_pythonapi function which stores ​the callback function as a PyObject* in the global variable array. However, according to Python documentation on C-APIeverything in Python ​is a PyObject; be it a function, ​a tuple or a class. +
-So, the ideology is to replace the present callback function by a class which can have many methods that the user can use for different use cases+
-An example ​of such an implementation ​has been done here. The user needs to note that for accessing the Python ​class in C, some modifications need to be done to the run function ​defined ​in ccextractor.c and a sample example for calling a class method named ‘callback’ could be found here+
-Alsoan important point to be noted in this case is that the user needs to pass the callback ​function’s name to run function ​in C so that the corresponding callback method ​of the class passed via my_pythonapi could be called via C. As an example, the callback method’s name has been provided here. +
-For understanding the exact implementation ​of this approach, I would recommend the user to understand C-API for Python as the documentation is quite extensive to every use case.+
  
-Completion of comparing_text_font_grids function +=== Adding more wrapper functions === 
-The Python extension module for CCExtractor is able to pass lines of the caption frames for different grids of CE-608 captions. However, ​for generating ​the subtitle file from the caption gridsthe text grid needs to be modified according to the color grid as well as font gridIn CCExtractor, this job is done at the function, get_decoder_line_encoded. +  * As described in the ‘Wrappers ​for the extension module’ sectionmore wrapper functions are needed ​to be declared in the [[https://​github.com/CCExtractor/​ccextractor/​blob/​master/​api/​wrappers/​wrapper.c|wrapper.c]] file. For examplefew wrappers have been defined. ​More wrapper functions ​can be defined in similar manner.
-For generation of subtitle files (.srt files) from Pythonan equivalent version of get_decoder_line_encoded has been implemented in Python and has been defined ​as comparing_text_font_grids in python_srt_generator.py +
-However, as the user can note that this function is not a complete implementation of get_decoder_line_encoded function, completion of this function’s definition is matter of future work.+
  
-Adding more wrapper functions +=== Extending the module to support other caption formats ​=== 
-As described in the ‘Wrappers for the extension module’ section, more wrapper functions are needed to be declared in the wrapper.c file. For example, few wrappers have been defined. More wrapper functions can be defined in a similar manner. +  ​* ​In this version, CCExtractor’s extension module supports processing of video samples having CE-608 standard captions in them and writing these captions to output subtitle (.srt) files. 
-Extending the module to support other caption formats +  ​* ​However, CCExtractor in itself has support for other caption standards like DVB, 708 etc. So, extension of module to extract of caption information from samples containing the caption information in these formats is a future task. 
-In this version, CCExtractor’s extension module supports processing of video samples having CE-608 standard captions in them and writing these captions to output subtitle (.srt) files. +  ​* ​The user should note that the information passed from CE-608 to Python is in raw form as lines which are then used to form the 608 grids. Similarly, the extension to other formats must consider passing the raw information of caption in respective format and then processing the information extracted by CCExtractor in Python. 
-However, CCExtractor in itself has support for other caption standards like DVB, 708 etc. So, extension of module to extract of caption information from samples containing the caption information in these formats is a future task. +  ​* ​While extending, the architecture to be followed for ccx_encoders_python should be consistent to other encoders in the codebase to maintain uniformity. Thus for DVB samples, a function name pass_cc_bitmap_to_python and for 708 samples pass_cc_subtitle_to_python need to be declared in ccx_encoders_python.c.
-The user should note that the information passed from CE-608 to Python is in raw form as lines which are then used to form the 608 grids. Similarly, the extension to other formats must consider passing the raw information of caption in respective format and then processing the information extracted by CCExtractor in Python. +
-While extending, the architecture to be followed for ccx_encoders_python should be consistent to other encoders in the codebase to maintain uniformity. Thus for DVB samples, a function name pass_cc_bitmap_to_python and for 708 samples pass_cc_subtitle_to_python need to be declared in ccx_encoders_python.c.+
  
  • public/gsoc/python_extension_module_technical_documentation_gsoc_17.txt
  • Last modified: 2017/09/04 17:42
  • by skrill