Page MenuHomeElementl

Run each DagsterDaemon subclass in its own thread
ClosedPublic

Authored by dgibson on Feb 1 2021, 11:27 PM.

Details

Summary

Many tests will break, but before I fix them just wanted to see if there are any reservations about this approach? This will allow us to have a dagster-cloud API daemon that doesn't get stuck waiting for slow schedule/sensor iterations to finish.

Each daemon starts up its own DagsterInstance now, since DagsterInstance is not multi-threaded.

The big risk is that a single thread will die and leave the daemon in a weird state. Heartbeats and surfacing error messages helps a bit with this. We could also have the controller monitor the subthreads for premature death, and kill the whole process if any individual subthread dies or stops heartbeating?

Test Plan

Integration

Diff Detail

Repository
R1 dagster
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

dgibson published this revision for review.Feb 1 2021, 11:27 PM

did you consider process-level isolation?

@schrockn that could also work, yeah - but reasons I was leaning against it:

  • No risk of dangling daemon processes (there are ways to mitigate this, e.g. having the daemon processes see if their parents are still alive and shutting down if not)
  • The whole dagster-daemon process crashing when an individual daemon process crashes seems like a feature not a bug? Since k8s/docker/etc will then know to restart it

But I don't feel strongly. Doing each in its own process would certainly make the 'separate DagsterInstance per daemon' part less weird - if we run into weird multithreading issues i'll most likely go down that route by necessity.

seems fine to me

python_modules/dagster/dagster/daemon/cli/__init__.py
34–35

there should be a better way to do this than waking up and going back to sleep every half second

python_modules/dagster/dagster/daemon/controller.py
31–32

?

dgibson retitled this revision from [RFC] Run each DagsterDaemon subclass in its own thread to Run each DagsterDaemon subclass in its own thread.
dgibson edited the test plan for this revision. (Show Details)

up, run integration tests

revert unintentional changes

This revision is now accepted and ready to land.Feb 5 2021, 3:54 PM