Quick Start
Tork is designed to let you define jobs consisting of multiple tasks, each running inside its own Docker container. You can run Tork on a single machine (standalone mode) or set it up in a distributed environment with multiple workers.
Requirements
Section titled “Requirements”-
Make sure you have a fairly recent version of Docker installed on your system. You can download Docker from the official Docker website.
-
Download the Tork binary for your system from the releases.
Set up PostgreSQL
Section titled “Set up PostgreSQL”Start a PostgreSQL container
Note
For production you may want to consider using a managed PostgreSQL service for better reliability and maintenance.
docker run -d \ --name tork-postgres \ -p 5432:5432 \ -e POSTGRES_PASSWORD=tork \ -e POSTGRES_USER=tork \ -e PGDATA=/var/lib/postgresql/data/pgdata \ -e POSTGRES_DB=tork postgres:15.3
Run a migration to create the database schema:
TORK_DATASTORE_TYPE=postgres ./tork migration
Hello World
Section titled “Hello World”Start Tork in standalone
mode:
./tork run standalone
Create a file called hello.yaml
with the following contents:
---name: hello jobtasks: - name: say hello image: ubuntu:mantic #docker image run: | echo -n hello world - name: say goodbye image: alpine:latest run: | echo -n bye world
Submit the job in another terminal window:
JOB_ID=$(curl -s -X POST --data-binary @hello.yaml \ -H "Content-type: text/yaml" http://localhost:8000/jobs | jq -r .id)
Query for the status of the job:
curl -s http://localhost:8000/jobs/$JOB_ID
{ "id": "ed0dba93d262492b8cf26e6c1c4f1c98", "state": "COMPLETED", ...}
What Happened Behind the Scenes?
Section titled “What Happened Behind the Scenes?”- Tork received your job and read the two tasks from
hello.yaml
. - Task 1 (“say hello”) ran in a container based on
ubuntu:mantic
. - Task 2 (“say goodbye”) ran in a container based on
alpine:latest
. - When both tasks finished, Tork reported the job state as
COMPLETED
.
Running in distributed mode
Section titled “Running in distributed mode”Running Tork in distributed mode allows you to split the roles of Coordinator (overseeing tasks) and Worker (executing tasks on separate machines or processes).
For distributed operation, Tork uses a message broker to move tasks between the coordinator and workers. a commong broker implemnentation is RabbitMQ.
Launch RabbitMQ with the following command:
docker run \ -d -p 5672:5672 -p 15672:15672 \ --name=tork-rabbitmq \ rabbitmq:4.1-management
Note
For production you may want to consider using a dedicated RabbitMQ service for better reliability and maintenance.
This command will start RabbitMQ in detached mode. You can access the RabbitMQ management interface by navigating to http://localhost:15672
in your web browser. The default username and password are both guest
.
Open a new terminal and run the coordinator:
TORK_DATASTORE_TYPE=postgres TORK_BROKER_TYPE=rabbitmq ./tork run coordinator
Open another terminal and start a worker (you can repeat this step to simulate multiple workers):
TORK_BROKER_TYPE=rabbitmq ./tork run worker
Let’s submit the same job from another terminal window:
JOB_ID=$(curl -s -X POST --data-binary @hello.yaml \ -H "Content-type: text/yaml" http://localhost:8000/jobs | jq -r .id)
Query for the status of the job:
curl -s http://localhost:8000/jobs/$JOB_ID | jq .state
COMPLETED
What’s different in distributed mode?
Section titled “What’s different in distributed mode?”- Coordinator receives the job and breaks it into tasks.
- Broker (RabbitMQ) manages these tasks as messages in a queue.
- Worker takes a task from the queue, runs the specified Docker command, and reports completion back to the coordinator.
- Coordinator sends the next task to the queue, until all tasks are done.
By separating these roles, you can scale Tork horizontally. Multiple workers can share the workload, each picking up tasks from the queue.
Adding external storage
Section titled “Adding external storage”By design, Tork tasks are ephemeral: each task runs independently in a Docker container, which disappears as soon as the task completes. Any data written to the container’s filesystem is lost after the task finishes. If you want to share data between tasks (or persist it beyond task execution), you need an external data store.
Set up MinIO
Section titled “Set up MinIO”MinIO is an S3-compatible object store that you can run locally via Docker.
Let’s start a MinIO container:
docker run --name=tork-minio \ -d -p 9000:9000 -p 9001:9001 \ -e MINIO_ROOT_USER=minioadmin \ -e MINIO_ROOT_PASSWORD=minioadmin \ minio/minio server /data \ --console-address ":9001"
Creating a Job with External State
Section titled “Creating a Job with External State”Below is an example job file (stateful.yaml
) with two tasks:
- Writes data to MinIO (creating a bucket, then uploading a file).
- Reads the data back from MinIO and prints it.
name: stateful exampleinputs: minio_endpoint: http://host.docker.internal:9000secrets: minio_user: minioadmin minio_password: minioadmintasks: - name: write data to object store image: amazon/aws-cli:latest env: AWS_ACCESS_KEY_ID: "{{ secrets.minio_user }}" AWS_SECRET_ACCESS_KEY: "{{ secrets.minio_password }}" AWS_ENDPOINT_URL: "{{ inputs.minio_endpoint }}" AWS_DEFAULT_REGION: us-east-1 run: | echo "Hello from Tork!" > /tmp/data.txt aws s3 mb s3://mybucket aws s3 cp /tmp/data.txt s3://mybucket/data.txt
- name: read data from object store image: amazon/aws-cli:latest env: AWS_ACCESS_KEY_ID: "{{ secrets.minio_user }}" AWS_SECRET_ACCESS_KEY: "{{ secrets.minio_password }}" AWS_ENDPOINT_URL: "{{ inputs.minio_endpoint }}" AWS_DEFAULT_REGION: us-east-1 run: | aws s3 cp s3://mybucket/data.txt /tmp/retrieved.txt echo "Contents of retrieved file:" cat /tmp/retrieved.txt
Key Points:
image: amazon/aws-cli:latest
: We use the AWS CLI Docker image to interact with MinIO via S3 commands.env
: We set credentials (AWS_ACCESS_KEY_ID
,AWS_SECRET_ACCESS_KEY
, etc.) so the AWS CLI can authenticate against MinIO.