How to execute the 2mwt script?
- Status: proposition
- Deciders: Artur Wojnar, Andres Lamont
- Date: 2023-04-18
Context and Problem Statement
Results of the 2mwt tests are calculated by a python algorithm, similarly as it happens with the TUG experiments.
Decision Drivers
- Scalability
- How fast we can run the script and get results
- Complexity of the solution
- Reliability
Considered Options
- Branching subprocesses
- Branching 1 subproccess at a time
- The router collects all experiment data and sends the START when it gets the last message
- The services handles only one experiment at a time
- Delegating a K8s' job
Decision Outcome
Option 2
Positive Consequences
The solution will be built on tghe top of the TUG Service and it should protect from the service getting bogged down with too many active consumers
Negative Consequences
It's not take-and-use it solution, but on the other hand it seems to be much more reliable.
[option 1]
Props
- Works already in the TUG Service, so we can copy-paste it
- It's very simple - when a new experiment has to be calculated a new Python subprocess is executed
Cons
- Branching subprocesses is not quite the Docker way
- Our containers are small and we can eaisly get to the point where we run out of memory or cpu and this situation has to be handled (waiting for processing or throwing an error and waiting for rerunning)
- Performance tests have shown that the service gets bogged down with too many active consumers, so we need a queuing mechanism
[option 2]
Props
- We are not afarid of running out of resources within one container
- We significantely reduced amount of simultanious consumers
- It can be built on the
option 1
Cons
- There's no easy way to automatically scale the service out. The only way seems to be a custom metric.
- In regards to the above point - this solutions needs performance tests to be sure that our assumptions are correct
- We can't use this solution for the BC
[option 3]
Props
- Works for also for bigger scripts like Brainclinics
- generic solution
Cons
- slow (k8s has to schedule a new job each time)