Page MenuHomePhabricator

(dagster-reload 1/n) Add server ID stream endpoint to gRPC server + watch for server disconnects
AbandonedPublic

Authored by sashank on Wed, Nov 4, 9:58 PM.

Details

Summary

This diff sets up the scaffolding to watch for grpc server disconnects, attempt to reconnect upon disconnect, and reload the handle when reconnection is successful. The next diff will implement the signal to Dagit that it needs to refresh it's client side state.

Test Plan
  • unit
  • load workspace pointed at remote grpc server, kill server, make sure it reconnects

Diff Detail

Repository
R1 dagster
Branch
grpc-demo
Lint
Lint OK
Unit
No Unit Test Coverage

Event Timeline

sashank retitled this revision from (dagster-reload 1/n) Add unique server ID and stream endpoint to gRPC server to (dagster-reload 1/n) Add server ID stream endpoint to gRPC server + watch for server disconnects.Wed, Nov 4, 10:05 PM
sashank edited the summary of this revision. (Show Details)
sashank added a reviewer: alangenfeld.
Harbormaster returned this revision to the author for changes because remote builds failed.Wed, Nov 4, 11:13 PM
Harbormaster failed remote builds in B20803: Diff 25216!
sashank edited the summary of this revision. (Show Details)

up

Harbormaster returned this revision to the author for changes because remote builds failed.Thu, Nov 5, 12:52 AM
Harbormaster failed remote builds in B20804: Diff 25219!
alangenfeld added a subscriber: dgibson.
alangenfeld added inline comments.
python_modules/dagster-graphql/dagster_graphql/implementation/context.py
21

@dgibson will know better - but should the handle own these threads instead?

python_modules/dagster/dagster/grpc/protos/api.proto
33–39

if we switch to poll instead of stream can maybe just repurpose Ping to also return server_id

dgibson requested changes to this revision.Thu, Nov 5, 5:55 PM
dgibson added inline comments.
python_modules/dagster-graphql/dagster_graphql/implementation/context.py
21

yeah this feels like something that should be on GrpcServerRepositoryLocationHandle (could be a method on that handle if we don't want to do it in, for example, a CLI call). doesn't seem like something you'd expect to find in dagster-graphql though

i suspect the reason its here though is b/c it needs to call reload_repository_location on disconnect - but maybe that should be an event or something on the handle that you can subscribe to from the context?

35–36

are we sure this isn't also useful for managed grpc handles? they can go down too, alex and I were just talking about that this morning: https://github.com/dagster-io/dagster/issues/3189

This revision now requires changes to proceed.Thu, Nov 5, 5:55 PM
python_modules/dagster-graphql/dagster_graphql/implementation/context.py
16–17

moving the thread logic to the handle will have the bonus of providing a place to clean up / join the threads during shutdown, since handles have a cleanup method

python_modules/dagster-graphql/dagster_graphql/implementation/context.py
21

Yup that's exactly why. I originally had this on the handle itself, but I needed to thread through an object all the way to the handle to let us reload the handle on the workspace. I can put that version up again and we can talk through it.

35–36

I didn't realize that those could go down so easily – good call.

python_modules/dagster-graphql/dagster_graphql/implementation/context.py
35–36

The logic would be a bit different though, since we can bring back managed grpc servers, but we expect unmanaged grpc servers to come back up themselves.