This diff sets up the scaffolding to watch for grpc server disconnects, attempt to reconnect upon disconnect, and reload the handle when reconnection is successful. The next diff will implement the signal to Dagit that it needs to refresh it's client side state.
@dgibson will know better - but should the handle own these threads instead?
if we switch to poll instead of stream can maybe just repurpose Ping to also return server_id
yeah this feels like something that should be on GrpcServerRepositoryLocationHandle (could be a method on that handle if we don't want to do it in, for example, a CLI call). doesn't seem like something you'd expect to find in dagster-graphql though
i suspect the reason its here though is b/c it needs to call reload_repository_location on disconnect - but maybe that should be an event or something on the handle that you can subscribe to from the context?
are we sure this isn't also useful for managed grpc handles? they can go down too, alex and I were just talking about that this morning: https://github.com/dagster-io/dagster/issues/3189
moving the thread logic to the handle will have the bonus of providing a place to clean up / join the threads during shutdown, since handles have a cleanup method
Yup that's exactly why. I originally had this on the handle itself, but I needed to thread through an object all the way to the handle to let us reload the handle on the workspace. I can put that version up again and we can talk through it.
I didn't realize that those could go down so easily – good call.
The logic would be a bit different though, since we can bring back managed grpc servers, but we expect unmanaged grpc servers to come back up themselves.