Self-Hosted Apps: Shared Tenancy Scheduled Tasks in Docker

I host a few small web projects using CapRover, a simple web-based container orchestration framework built upon Docker Services. For non-critical hobby apps, it is perfect. I simply run CapRover on a virtual private server (VPS) with DigitalOcean and containerize all my running applications; forgoing all systemd setup and dependency installation.

Most managed application platforms (AWS, Google Cloud, Render, etc) provide the ability to schedule tasks to be performed on a repeated, regular basis. Unfortunately, CapRover does not provide this (It is a simple Docker service manager). If I want to schedule some tasks I have to use my VPS’s crontab. I didn’t want to do this. Why? For the sake of consistency, I want all my applications to be running within Docker containers. I don’t want any host-specific configuration, dependency installation, command-running or troubleshooting. This agnosticism will also allow easier migration to any other managed docker service manager down the line if my hobby project makes it big, and I want better reliability.

Luckily, containers can run cron themselves. So, first step: I create a Dockerfile that installs a crontab to the container:

FROM alpine:3.15
COPY crontab /etc/crontabs/root

I could now just edit the crontab file as needed and add it to every project repository to add scheduled tasks. If wanted my crontabs scattered across all my app repos, this would be fine. But what I really wanted was one centralized crontab that enables a ‘multi-tenant’ scheduling system. It could be easier to grasp and troubleshoot, especially for projects I rarely update.

One solution would be moving the code of different jobs from their respective app repos into a new cronservice repo, but that would be kind of gross. I don’t want to be separating app-specific job code from the rest of its codebase. I realized that all my apps were REST endpoints and I could leverage that somehow. So I did.

I moved the job code behind an endpoint. Of course, this would allow anyone to trigger the job, which is simply not acceptable. I could have added a fully realized api key and authentication system, but I didn’t want to spend too much time on this. These were just hobby apps after all. This is what I settled on:

  1. The cronservice contaier runs a rest API with a single endpoint: ‘/confirm’. This will accept a job key parameter and respond whether this is a genuine request.
  2. When the crontab runs a task, it will first generate a job key, insert it into a sqlite database, and send an HTTP request to the app endpoint.
  3. When the app receives the request, it will first query the cronservice ‘/confirm’ endpoint to see whether this is a genuine job running request using the job key.
  4. If yes, it will run the task, and respond with the success status. Otherwise it will refuse to run the task.
  5. The crontab task will remove the entry from the sqlite database and return 0.

I wrapped this up in three python files:

  • pending_job_queue.py: defining a simple API around sqlite.
  • job_token_confirmation_api.py: a FastAPI that checks the queue if the key is valid.
  • init_remote_job.py: a helper script that simply ‘pings’ an HTTP endpoint with a generated job key. Also prevents duplicate job running.

And usage in the crontab looks like:

0 2 * * * ./init_remote_job.py https://sampleurl.com/jobEndpoint

I like what I came up with. It is easy to reason about and understand. It leaves the job specific code with the specific app while centralizing all the scheduling in one file (within the cronservice repo). The use of the db also prevents running a job if the previous didn’t complete yet.

I am not running critical apps, so I didn’t care to add persistence of the sqlite database. Of course if one would be scaling this and needed to guarantee some jobs fully complete, or that the API could be restarted at will, they would need to mount a persistent volume or use an external database. Additionally, while the above solution is sound from a security standpoint, it probably would be better to put the jobs behind an unexposed endpoint. In this case, I was aiming for reduction in overhead.

Related post