Page MenuHomePhabricator

Add grpc-health-probe

Authored by johann on Oct 14 2020, 2:56 PM.



Uses grpcio-health-check to add the health checking protocol to the grpc server. Then adds a client that can be invoked in the dagster cli.


Test Plan

TODO: add a failing test

Integrations, manual helm install and check that liveness probe is active


dagster api grpc --port 3030 &

dagster api grpc-health-check --port 3030
# $? == 0

dagster api grpc-health-check --port 3031
# $? == 1

Diff Detail

R1 dagster
grpc-health-check (branched from master)
Lint OK
No Unit Test Coverage

Event Timeline

Harbormaster returned this revision to the author for changes because remote builds failed.Oct 14 2020, 3:19 PM
Harbormaster failed remote builds in B19492: Diff 23681!
Harbormaster failed remote builds in B19494: Diff 23683!
johann edited the test plan for this revision. (Show Details)
johann added reviewers: nate, catherinewu, alangenfeld.
johann added inline comments.
91 ↗(On Diff #23700)

The only alternative I know is to have the probe be in the main container. The advantage of this approach is that users don't need to remember to include the probe binary in their images

94 ↗(On Diff #23700)

if this livenessProbe fails, will it restart the user code deployment container or only the grpc-health-probe container?

94 ↗(On Diff #23700)

Good point! I thought the containers in a pod were supposed to live and die together, but only the probe pod restarts.

94 ↗(On Diff #23700)

I think restartPolicy: Never would force the deployment to create a new pod? Otherwise I think we should just bake it in to the existing container.

Is there anything we can do to make it easier for users to include/remember to include the binary in their user deployment images? It could just be bundled in dagster and invoked via a dagster cli but I don't think that's great, especially given that the binary is only available for linux. We could build our own client for it?

94 ↗(On Diff #23700)

Actually restartPolicy isn't possible either, guess it has to be packaged with the container.

to your queue

94 ↗(On Diff #23700)

I think it makes sense to have a dagster cli entry to run the health check, we should be able to spin up the grpc client to make the health check request pretty easy in python

This revision now requires changes to proceed.Oct 15 2020, 8:50 PM

add client to dagster cli

rebase and up dependency pin

Should be ready for re-review

alangenfeld added inline comments.

why the fixed pin?


It depends on grpcio, seems best to have them pinned together?


grpcio is a >= though


Good point!

write a test too?

This revision now requires changes to proceed.Oct 21 2020, 9:41 PM

lol updated with test right when i req-ed changes

9–26 ↗(On Diff #24221)

i think this is legit but double check with @dgibson on making sure we dont leave accidental sub processes around

This revision is now accepted and ready to land.Oct 21 2020, 9:42 PM
19–21 ↗(On Diff #24221)

looks fine - you could add a server.wait() after thew with() block ends, but it should still shut down on its own without it

This revision was automatically updated to reflect the committed changes.