HomeElementl

Make daemon heartbeat failure tolerance configurable and less aggressive by…

Description

Make daemon heartbeat failure tolerance configurable and less aggressive by default

Summary:
At least some k8s run launches seem to take long enough to trigger our 90 second heartbeat failure threshold. This diff bumpts that limit up to 5 minutes and makes it configurable, including on the helm chart.

Also a couple of additional yields in the scheduler and sensorer that I don't think are the root cause of the issue reported but do give us more heartbeat opportunities for schedules that create multiple runs and aren't using the run queue daemon.

Test Plan: Integration

Reviewers: johann, prha, rexledesma

Reviewed By: prha

Differential Revision: https://dagster.phacility.com/D7678

Details

Provenance
dgibsonAuthored on Apr 30 2021, 3:05 PM
Reviewer
prha
Differential Revision
D7678: Make daemon heartbeat failure tolerance configurable and less aggressive by default
Parents
R1:dcc0b327f25a: Automation: versioned docs for 0.11.7
Branches
Unknown
Tags
Unknown
References
refs/pull/3633/head