Page MenuHomeElementl

Fixes for errors when a daemon thread goes down

Authored by dgibson on Apr 2 2021, 7:43 PM.



Right now there are two checks for if a daemon has gone bad. The first looks for dead threads, and the second looks for missing heartbeats. The second one requires a much longer time to wait, but if a thread dies we can shut down the daemon process much sooner. This diff makes those checks happen on different intervals.

Also add a guard around the heartbeat add function - before, a transient heartbeat add failure would bring down the whole thread, now we log an error (the process will still shut down eventually if the heartbeat is permanently down, it will just take longer).

Reviewers; johann, alangenfeld

Test Plan


Diff Detail

R1 dagster
Lint Not Applicable
Tests Not Applicable