Page MenuHomeElementl

Cli for daemon heartbeat check
ClosedPublic

Authored by johann on Nov 30 2020, 4:06 PM.

Details

Summary

This can be used in k8s and wherever else a liveness check is required.

This deviates a bit from our other liveliness checks (and I think the standard) in that instead of actually looking at the process inside the container, it asks the DB. I think I prefer this over adding some port on which the daemon health can be checked. This keeps us with a single source of truth for daemon liveliness. If the db goes down then this container will be considered unhealthy, but the container would be crashing anyway. I might be overlooking some pitfall though, so I'd appreciate feedback here.

Test Plan

Integration

manually checked return code with and without the daemon running

manually did a helm install and checked that probe was working

Diff Detail

Repository
R1 dagster
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

johann edited the test plan for this revision. (Show Details)
johann added reviewers: dgibson, alangenfeld.

the non-k8s specific parts of this look great to me. not sure if there's any best practices wrt liveness checks I'm not aware of though?

looks good! i think timeoutSeconds: 3 would be a reasonable default given that users are bumping into issues with lower timeouts. do we need to make changes to the helm schema?

helm/dagster/values.yaml
493

can we add timeoutSeconds: 3 or greater? see: https://github.com/dagster-io/dagster/pull/3303/files

505

ditto

This revision now requires changes to proceed.Dec 1 2020, 6:29 AM

Haven't added a schema for the daemon yet, created a tracking issue to do that once the fields are more settled

This revision is now accepted and ready to land.Dec 1 2020, 7:59 PM
johann edited the test plan for this revision. (Show Details)

rebase

This revision was automatically updated to reflect the committed changes.