Skip to main content

How to execute the 2mwt script?

Status: proposition
Deciders: Artur Wojnar, Andres Lamont
Date: 2023-04-18

Context and Problem Statement

Results of the 2mwt tests are calculated by a python algorithm, similarly as it happens with the TUG experiments.

Decision Drivers

Scalability
How fast we can run the script and get results
Complexity of the solution
Reliability

Considered Options

Branching subprocesses
Branching 1 subproccess at a time
- The router collects all experiment data and sends the START when it gets the last message
- The services handles only one experiment at a time
Delegating a K8s' job

Decision Outcome

Option 2

Positive Consequences

The solution will be built on tghe top of the TUG Service and it should protect from the service getting bogged down with too many active consumers

Negative Consequences

It's not take-and-use it solution, but on the other hand it seems to be much more reliable.

[option 1]

Props

Works already in the TUG Service, so we can copy-paste it
It's very simple - when a new experiment has to be calculated a new Python subprocess is executed

Cons

Branching subprocesses is not quite the Docker way
Our containers are small and we can eaisly get to the point where we run out of memory or cpu and this situation has to be handled (waiting for processing or throwing an error and waiting for rerunning)
Performance tests have shown that the service gets bogged down with too many active consumers, so we need a queuing mechanism

[option 2]

Props

We are not afarid of running out of resources within one container
We significantely reduced amount of simultanious consumers
It can be built on the option 1

Cons

There's no easy way to automatically scale the service out. The only way seems to be a custom metric.
In regards to the above point - this solutions needs performance tests to be sure that our assumptions are correct
We can't use this solution for the BC

[option 3]

Props

Works for also for bigger scripts like Brainclinics
generic solution

Cons

slow (k8s has to schedule a new job each time)

Context and Problem Statement
Decision Drivers
Considered Options
Decision Outcome