Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
public:gsoc:ocr [2018/02/14 23:32]
abhinav95
public:gsoc:ocr [2020/02/29 17:35] (current)
thealphadollar refactor
Line 25: Line 25:
  
 We will provide all the samples and access to a high speed server that has them so the student can work on it (optional) if a fast internet connection is not available to them.  We will provide all the samples and access to a high speed server that has them so the student can work on it (optional) if a fast internet connection is not available to them. 
- 
-__**Qualification tasks**__\\ 
-[[https://​github.com/​CCExtractor/​ccextractor/​issues/​929|Terrible OCR results with Channel 5 (UK)]]\\ 
-This task is ideal to get started, because you only need to deal with one function in one file: [[https://​github.com/​CCExtractor/​ccextractor/​blob/​930ca716ca0bdae629ddd170abbcc2ad75472422/​src/​lib_ccx/​ocr.c|quantize_map]]() in src/​lib_ccx/​ocr.c 
- 
-In addition to the samples that we already have, we would also like the creation of a dataset of a few hardsubbed (videos with burned-in subtitles) videos with the accurate timed transcripts so that we can evaluate the performance of our code on a wide variety of these real world samples. For the qualification task, this does not have to be huge. A good representative set will do fine. 
  
 __**Related GitHub Issues**__\\ __**Related GitHub Issues**__\\
Line 45: Line 39:
 Abhinav Shukla (@abhinav95 on slack), which is the former Summer of Code student that worked on it last year and made an incredible job. Abhinav Shukla (@abhinav95 on slack), which is the former Summer of Code student that worked on it last year and made an incredible job.
  
 +**Qualification tasks**\\
 +[[https://​github.com/​CCExtractor/​ccextractor/​issues/​929|Terrible OCR results with Channel 5 (UK)]]\\
 +This task is ideal to get started, because you only need to deal with one function in one file: [[https://​github.com/​CCExtractor/​ccextractor/​blob/​930ca716ca0bdae629ddd170abbcc2ad75472422/​src/​lib_ccx/​ocr.c|quantize_map]]() in src/​lib_ccx/​ocr.c
 +
 +In addition to the samples that we already have, we would also like the creation of a dataset of a few hardsubbed (videos with burned-in subtitles) videos with the accurate timed transcripts so that we can evaluate the performance of our code on a wide variety of these real world samples. For the qualification task, this does not have to be huge. A good representative set will do fine.
 +
 +Take a look at [[https://​ccextractor.org/​public:​gsoc:​takehome|this page]] for more issues.
  
  • public/gsoc/ocr.1518651133.txt.gz
  • Last modified: 2018/02/14 23:32
  • by abhinav95