This moves execute_run to use GRPC. A follow-on diff will implement the (fairly complex) soft termination scheme.
to you for discussion
should we just modify CliApiRunLauncher to have the ability to launch grpc instead of cli direct? gated by config or by poking at the ExternalPipeline once we add a load_from variant / config that opts in to grpc? A bit weary of all this duped code.
if we stick with a process - does this code need to account for the intermediate process dying but not the grpc server process? just seems like we shouldn't have a straight copy paste of the cli api run launcher given this works a bit differently.
What isolation is already provided by the grpc server? If we have a crashy pipeline what actually happens? Do we need to even worry about doing this cleanup or will the grpc client in the intermediate process raise when the stream closes unexpectedly eliminating the need for all this zombie clean up checking.
why a process instead of a thread?
did you look in to doing a bidirectional stream in GRPC and sending the termination request through that way?
you run this on windows yet? keep an eye on the azure pipeline if not
what happens at  in this test? could we just try/catch there and drop the zombie checker thread?
thanks for this, moving to threads will allow us to drop the zombie checking machinery completely.
sure, we could do that. then the cutover is obscured / out of user control vs. requiring people to update their dagster.yaml to reflect the change. this seemed more explicit and avoids us having to keep track of two different families of subprocesses in one launcher, but happy to make this change if you think it's worthwhile.
yep, not an option if we want something that acts like an interrupt. the only solution for soft termination that i can see is roughly https://excalidraw.com/#json=5646104355930112,oYttHEA0LtS-GfJlpOzmLw
this should work since the pid is the process group id and we are now starting with the process group flags set correctly
im not certain what path is best - something to hash out as we figure out the details of our cut-over / migration / opt-in strategy.
Also container based stuff is going to make us reconsider a lot of stuff so that might influence as well.
alright - looking at the diagram it looks like  will happen in a subprocess - which feels especially right for long running servers. Once that happens maybe the bidirectional comms would make sense, but not sure. Stuff to hash out later.
did you also try to send over the instance ref and create new instances of the instance in the threads ?
|109–114 ↗||(On Diff #18039)|
oof brutal - I wonder what else we can do about this
if we are going to land this we should at least filter down on message like we did for the exceptions below this, just to prevent hiding some other problem
|143 ↗||(On Diff #18039)|
oh boy ok after a second pass i see whats happening now, missing the instance_ref above was my bad but the current control flow scheme here of calling back on to this class from  is pretty wild.
Could we instead manage the api client creation sync here and then only hand over the execute_run invocation to a thread? I think that would clean up these odd control flow issues
That said, why use a streaming request at all if we are ignoring the events, should we just have unary start_run and terminate_run instead?
unfortunately i think this whole scheme isn't viable -- we need to revert to the approach in https://dagster.phacility.com/D3662?id=17660
|143 ↗||(On Diff #18039)|
i.e., this got interrupted -- we asked for 100k responses but only got a few hundred
alright well apologies for the long journey back to the start
maybe a more specific name since this is ephemeral / cli grpc and not more general
also a lil comment block describing cant hurt
I think this aspect should get dropped since marking the pipeline run failed if the watcher / client process exited unexpectedly doesn't quite make sense 
as recently pointed out by leor these type of tests are brittle wrt adding new events. This is better than the mystery number indexing, but if you can think of a better approach probably worth making the change