Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
public:gsoc:sampleplatform [2019/01/26 14:26]
willem
public:gsoc:sampleplatform [2020/02/12 21:50] (current)
willem
Line 1: Line 1:
 ~~META: ~~META:
-title = Google Summer of Code 2019 - The sample platform / Continuous integration+title = Google Summer of Code 2020 - The sample platform / Continuous integration ​revisited
 ~~ ~~
 ====== The sample platform / Continuous integration ====== ====== The sample platform / Continuous integration ======
  
-The [[https://​sampleplatform.ccextractor.org/​|sample platform]] was developed during GSoC '15 and overhauled during GSoC '16. In GSoC '17 another student added support for the Windows part, as well as some bugfixes. The student continued his work during GSoC '18, and will mentor this year.\\+The [[https://​sampleplatform.ccextractor.org/​|sample platform]] was developed during GSoC '15 and overhauled during GSoC '16. In GSoC '17 another student added support for the Windows part, as well as some bugfixes. The student continued his work during GSoC '18, and will mentor this year. Last year a new student did some improvements and bugfixes.\\
  
-The platform was mostly finalized during last year's GSoC, and we also got some nice additions from Code-in participants! Howeverone of the major pain points at this moment is comparison of two subtitles that can slightly differ. The current ​method flags a file as regressed as soon as a single byte changes. We'd like to see a smarter implementation, ​that can for example allow small time shifts (e.g. 40ms) to be let through. This obviously means that the formats need to be understood.\\+This GCi edition ​we came to the conclusion that for new contributorsthere are a bunch of drawbacks in the current ​system ​that make it no longer viable ​to continue to run the platform in it's current form. \\
  
-===== Getting started ​Requirements =====+The two main issues are:\\ 
 +- Long runtime if a lot of commits/PR's are opened. This is because there is only one instance per OS available.\\ 
 +- Unclear what the tests were being compared against. We should be able to have multiple approved versions and tell the user if the result deviates from those known ones.\\
  
-The Sample Platform ​is written in Pythonso we expect good knowledge of Python. Basic HTML, Javascript & CSS knowledge is also required.\\+With a lot of different cloud offerings available and the launch of GitHub actions we want to iterate on the design of the Sample Platform, ​moving the infrastructure from a single dedicated server to a scalable service that can cope with the variations in load.
  
-We make use of quite some libraries (you can find the full list here: [[https://sampleplatform.ccextractor.org/about|About the sample platform]])and we expect ​you to read up on the documentation ​of these platforms so you know how they work in general.\\+This will need an upfront survey ​of the existing functionality,​ followed by discussions with the mentor on how to implement this. 
 + 
 +Features that will need to be implemented for certain are:\\ 
 +- A coordinating platform that receives the call for actions, triggers the machines, displays results, ...\\ 
 +- Scalable Linux/​Mac/​Windows machines that can execute the regression tests (currently 180GB of samples!)\\ 
 +- Deep integration with the GitHub Actions that should be run first (creating Linux, Windows, Mac builds), so that no time is wasted if there are compiler errors or no code changes.\\ 
 +- Watch [[https://www.youtube.com/watch?​v=407nwX6__70|this video]]. Disregard that it's about the Rust community - it's the CD/CI part on it that is important to us. That's what we want. 
 + 
 + 
 + 
 +===== Getting started / Requirements ===== 
 + 
 +The Sample Platform is written in Pythonso we expect ​good knowledge of Python. The new project is not necessarily python-based,​ but the choice should be made based on maintainability (unit testing) and availability ​of third-party API's and libraries.\\
  
 ==== Qualification ==== ==== Qualification ====
  
 If you are interested in taking up this project during GSoC, you will need to satisfy these requirements (in order of importance, not all are a necessity): \\ If you are interested in taking up this project during GSoC, you will need to satisfy these requirements (in order of importance, not all are a necessity): \\
-- A well researched, well written project proposal.\\ +- A well researched, well written project proposal. ​This should include ​monthly cost prediction based on expected runtime'​sdisk storage used, ... A comparison between multiple providers ​(e.gAzure, GCP, AWS, Packet) must be included.\\
-- Proof you've set up the Sample Platform locally.\\ +
-- Fixed bugimproved installation documentation, ... (contributed something to the project)There are some issues in the tracker labeled issues labeled [[https://​github.com/​CCExtractor/​sample-platform/​issues?​q=is%3Aissue+is%3Aopen+label%3AGSoC-proposal-task|GSoC-proposal-task]] for this purpose.\\+
 - Have chatted with the mentor(s) at least once.\\ ​ - Have chatted with the mentor(s) at least once.\\ ​
 +- Fixed a bug, improved installation documentation,​ ... (contributed something to the project). There are some issues in the tracker labeled issues labeled [[https://​github.com/​CCExtractor/​sample-platform/​issues?​q=is%3Aissue+is%3Aopen+label%3AGSoC-proposal-task|GSoC-proposal-task]] for this purpose.\\
 +- Proof you've set up the Sample Platform locally.\\
 +
 +===== Additional information not necessarily well organized :-) ===== 
 +- For each sample we currently have one "​good"​ output. That's not really correct. Changes in code might produce minor changes in the output (in the order of a few milliseconds). For each sample we'll need to have a set of correct outputs (possibly with a "​correctness score"​).\\
 +- When a pull-request is checked, our system now reports the number of "​broken"​ samples, meaning how many samples are producing an incorrect output according to our "good output"​ list. This however does not help much determining how the output changes for this specific PR. Instead, the system needs to report the difference between the code before and after the changes in the PR, which is much more useful.\\
 +- We'll also need a way for final users to send test their own files against the current version so they don't need us to release a new CCExtractor version that _could_ fix something that is broken for them.\\
 +- It should be possible for users to get a binary compiled by the new system, particularly for Windows (in linux we don't have this problem since the typical way to install CCExtractor is just to build from source). Note that this build is already happening, so don't worry much if you have zero interest on Windows :-) You can use what we already have in order to build.\\
 +- We currently run all the tests for each PR. This is overkill. Instead, we should have different sets of tests, for example "only MP4 files",​ or only "​teletext",​ and so on, and the developer should be able to decide which tests he needs to run his PR on (or maybe none, for example if he just edited the help screen).\\
 +- One of the reasons we're "going cloud" (as opposed to continue to run in one server) is the ability to scale and parallelize. It must be possible to check several PRs at the same time, different platforms, and so on.\\
 +- We need to add a "​regression finder"​ feature that works (and possibly uses) like git bisect: Give a specific sample find which specific commit changed the output.\\
 +
  
 ===== Mentor(s) =====  ===== Mentor(s) ===== 
-- Willem Van Iseghem (@canihavesomecoffee on Slack) is a former GSoC student (2014, 2015, 2016) and mentor (2017). He started the project and is the official maintainer.\\ +- Willem Van Iseghem (@canihavesomecoffee on Slack) is a former GSoC student (2014, 2015, 2016) and mentor (2017, 2018, 2019). He started the project and is the official maintainer.\\ 
-- Satyam Mittal (@Satyam on Slack) is a former GSoC student (2017, 2018). He finalized quite a lot of the features in the past two years.\\+
  
  
  
  • public/gsoc/sampleplatform.1548512797.txt.gz
  • Last modified: 2019/01/26 14:26
  • by willem