Workers pull jobs from the queue of jobs in the order of their scheduled_for datetime as long as it is in the past. As soon as a worker pulls a job, it atomically sets its state to “running”, runs it, streams its logs then once it is finished, the final result and logs are stored for as long as the retention period allows. Logs are optionally stored to s3.

By default, every worker is the same and interchangeable. However, there are often needs to assign jobs to a specific worker pool, and to configure this worker pool to behave specifically or have different pre-installed binaries. To that end, we introduce the concept of “worker groups”.

You can assign groups to flows and flow steps to be executed on specific queues. The name of those queues are called tags. Worker groups listen to those tags.

In the Community Edition, worker management is done using tags that can be respectively assigned to workers (through the env variable WORKER_TAGS) and scripts or flows, so that the workers listen to specific jobs queues.

Set Tags to Assign Specific Queues

You can assign groups to flows and flow steps to be executed on specific queues.

In the Enterprise Edition, workers can be commonly managed based on the group they are in, from the UI. Specifically, you can group the workers into worker groups, groups for which you can manage the tags they listen to, assignment to a single script, or the worker init scripts, from the UI.

Create Worker Group config

workers can be commonly managed based on the group they are in, from the UI.

Examples of configurations include:

  1. Assign different jobs to specific worker groups by giving them tags.
  2. Set an init script that will run at the start of the workers (e.g. to pre-install binaries).
  3. Dedicate your worker to a specific script or flow for high throughput.

Assign custom Worker Groups

Assign custom worker groups to scripts and flows in getOperate for efficient execution on different machines with varying specifications.

This feature is useful if you want to run some scripts on a GPU machine, or if you want to run some scripts on high-memory machine.

How to have a Worker join a Worker Group

Create a worker group in your docker-compose.yml and simply pass the worker group as the env variable WORKER_GROUP=<name_of_worker_group> for it to automatically join its corresponding worker group.

getOperate’s responsibility is not to spawn the worker itself but to play well with existing service orchestrator such as Kubernetes, ECS, Nomad or Docker Compose, and any IaC. In those, you define the number of replicas (which can be auto-scaled up or down), the resource to allocate to those workers and the WORKER_GROUP passed as env.

Upon start, those workers will automatically join their worker group and fetch their configurations (including init scripts). They will also listen for changes on the worker group configuration for hot reloading.

Here is an example of a worker group specification in docker-compose:

getOperate_worker_highmem:
  image: ghcr.io/getOperate-labs/getOperate-ee:main
  pull_policy: always
  deploy:
    replicas: 2
    resources:
      limits:
        cpus: "1"
        memory: 4096M
  restart: unless-stopped
  environment:
    - DATABASE_URL=${DATABASE_URL}
    - MODE=worker
    - WORKER_GROUP=highmem

Assign replicas, resource constraints, and that’s it, the worker will automatically join the worker group on start and be displayed on the Workers page in the getOperate app!

Worker only require a database URL and can thus be spawned in separate VPCs if needed (as long as there is a tunnel to the database). There is also an agent mode for situations where workers are running in an untrusted environment.

Set Tags to Assign Specific Queues

You can assign groups to flows and flow steps to be executed on specific queues. The name of those queues are called tags. Worker groups listen to those tags.

There are 2 worker groups by default: default and native.

The tags of default worker group are:

  • deno: The default worker group for deno scripts.
  • python3: The default worker group for python scripts.
  • go: The default worker group for go scripts.
  • bash: The default worker group for bash scripts.
  • powershell: The default worker group for powershell scripts.
  • dependency: Where dependency jobs are run.
  • flow: The default worker group for executing flows modules outside of the script steps.
  • hub: The default worker group for executing hub scripts.
  • bun: The default worker group for bun scripts.
  • other: Everything else.

Button Reset to all tags minus native ones will reset the tags of default worker group to a given worker group.

The tags of native worker group are:

  • nativets: The default worker group for rest scripts.
  • postgresql: The default worker group for postgresql scripts.
  • mysql: The default worker group for mysql scripts.
  • mssql: The default worker group for mssql scripts.
  • graphql: The default worker group for graphql scripts.
  • snowflake: The default worker group for snowflake scripts.
  • bigquery: The default worker group for bigquery scripts.
  • mssql: The default worker group for mssql scripts.

If you assign custom worker groups to all your workers, make sure that they cover all tags above, otherwise those jobs will never be executed.

Button Reset to native tags will reset the tags of native worker group to a given worker group.

Button Reset to all tags will reset the tags of default and native worker group to a given worker group.

To make custom tags available from the UI, go to the dedicated “Workers” tab on the workspace and click on the “Assignable Tags” button:

It is possible to restrict some tags to specific workspace using the following syntax:

gpu(workspace+workspace2)

Only ‘workspace’ and ‘workspace2’ will be able to use the gpu tags.

How to assign Worker tags to a Worker Group

Use the edit/create config next to the worker group name in getOperate UI:

Note: The worker group management UI is an enterprise only feature. It is still possible to use worker groups with the community edition by passing to each worker the env variable WORKER_TAGS:

WORKER_TAGS=tag1,tag2

How to assign a custom Worker Group to a script or flow

For scripts deployed on the script editor, select the corresponding worker group tag in the settings section.

For scripts inlined in the flow editor, select it in the module header:

If no worker group is assigned to a script, it will be assigned the default worker group for its language.

You can assign a worker group to an entire flow in the flow’s settings:

Dynamic tag

If a workspace tag contains the substring $workspace, it will be replaced by the workspace id corresponding to the job. This is especially useful to have the same script deployed to different workspace and have them run on different workers.

With the following assignable tag:

normal-$workspace

the workspaces, dev, staging, prod and the worker groups: normal-dev, normal-staging, normal-prod. The same script wih the tag normal-$workspace will run on the corresponding worker group depending on the workspace it is deployed to. This enable to share the same control plane but use workers with different network restrictions for tighter security.

Last, if the tags contain $args[argName] (e.g: foo-$args[foobar]) then the tag will be replaced by the string value at the arg key argName and thus can be fully dynamic.

Create Worker Group config

In the Enterprise Edition, workers can be commonly managed based on the group they are in, from the UI. Specifically, you can group the workers into worker groups, groups for which you can manage the tags they listen to (queue), assignment to a single script, or the worker init scripts, from the UI.

In Community Edition Workers can still have their WORKER_TAGS passed as env.

Pick “New worker group config” and just write the name of your worker group.

You can then configure it directly from the UI.

Examples of configurations include:

  1. Assign different jobs to specific worker groups by giving them tags.
  2. Set an init script that will run at the start of the workers (e.g. to pre-install binaries).
  3. Dedicate your worker to a specific script or flow for high throughput.

Init Scripts

Init scripts provide a method to pre-install binaries or set initial configurations without the need to modify the base image. This approach offers added convenience. Init Scripts are executed at the beginning when the worker starts, ensuring that any necessary binaries or configurations are set up before the worker undertakes any other job.

Under the Enterprise Edition, they can be set from getOperate UI.

Dedicated Workers / High Throughput

Dedicated Workers are workers that are dedicated to a particular script. They are able to execute any job that target this script much faster than normal workers at the expense of being capable to only execute that one script. They are as fast as running the same logic in a forloop, but keep the benefit of showing separate jobs per execution.

Dedicated Workers / High Throughput is an Enterprise & Team Edition feature.