Page MenuHomeElementl

Fixes for errors when a daemon thread goes down
ClosedPublic

Authored by dgibson on Apr 2 2021, 7:43 PM.

Details

Summary

Right now there are two checks for if a daemon has gone bad. The first looks for dead threads, and the second looks for missing heartbeats. The second one requires a much longer time to wait, but if a thread dies we can shut down the daemon process much sooner. This diff makes those checks happen on different intervals.

Also add a guard around the heartbeat add function - before, a transient heartbeat add failure would bring down the whole thread, now we log an error (the process will still shut down eventually if the heartbeat is permanently down, it will just take longer).

Reviewers; johann, alangenfeld

Test Plan

BK

Diff Detail

Repository
R1 dagster
Lint
Lint Not Applicable
Unit
Tests Not Applicable