Page MenuHomeElementl

Use grpc rather than ipc to decide if a grpc server is running

Authored by dgibson on Jul 23 2021, 6:55 PM.
Referenced Files
Unknown Object (File)
Fri, Mar 17, 3:05 PM
Unknown Object (File)
Fri, Mar 17, 5:45 AM
Unknown Object (File)
Thu, Mar 16, 5:15 AM
Unknown Object (File)
Tue, Mar 14, 9:48 PM
Unknown Object (File)
Fri, Mar 10, 4:04 PM
Unknown Object (File)
Feb 14 2023, 1:35 PM
Unknown Object (File)
Feb 14 2023, 2:23 AM
Unknown Object (File)
Feb 10 2023, 8:47 PM



A user is reporting that writing to a tempdir on their machine takes 6 minutes. This is a nice excuse to clean up the way we do gRPC server startup - we have a perfectly good way to check if a server is running, so let's use it.

The one thing that we lose here is reporting a load error on startup in a structured way, but the --lazy-load-user-code

Test Plan

BK, run dagit locally, gRPC servers still load, simulate load failure, get a clear error in dagit

Diff Detail

R1 dagster
grpcserver (branched from master)
Lint Passed
No Test Coverage

Event Timeline

this means we now need to lazy-load the server so that the error can be passed along in a structured way later, but that's fine

Harbormaster returned this revision to the author for changes because remote builds failed.Jul 23 2021, 7:31 PM
Harbormaster failed remote builds in B34157: Diff 42222!
dgibson added inline comments.

this is for idempoetency - weirdly the error changes to be less useful the second time you try to load it


when is this False? should we drop the bool arg for GrpcServerProcess?


hmm this seems odd - do you have an example?

Should encode probably a comment here

also worth changing the name of _serializable_load_error to be more specific since it only applies to one of this classes N responses-which-may-error


@alangenfeld example of what you're asking for is here.

First load says "object is not callable", second load says (paraphrased) "error_repo object does not exist in module" (I think somewhere deep in python module loading code it doesn't bother to try again loading a module that it loaded before when there was a load error)

ah ok so it is the module load - the error condition you describe makes sense then but definitely worth commenting in [1] that we are assuming any exception that occurs is coming from the target code and due to python module loading behavior we capture, keep, and always re-throw that original error.



it's surprising to me looking at this code that the python module load happens in one of these properties - I assume due to work happening in a @property? Might be worth making it a getter for clarity.


looking at [2] can this not be scoped to DagsterUserCodeProcessError? At least for the load problem behavior we are targeting with this idempotent capture?



the broad except + assumptions spooks me a bit, so at least make sure the assumptions are clearly commented in-case someone bumps in to this debugging

This revision is now accepted and ready to land.Jul 26 2021, 4:27 PM

DagsterUserCodeProcessError is thrown a layer above this (in the client making the gRPC call)

rebase, feedback, add comments

This revision was landed with ongoing or failed builds.Jul 27 2021, 2:26 PM
This revision was automatically updated to reflect the committed changes.