This adds a new argument that you can pass to termination to make it force-mark-as-failed. It will still try to terminate, but as long as the run isn't finished it won't require the termination to be possible (or wait for it to finish) before marking the run as failed.
Details
Diff Detail
- Repository
- R1 dagster
- Lint
Automatic diff as part of commit; lint not applicable. - Unit
Automatic diff as part of commit; unit tests not applicable.
Event Timeline
python_modules/dagster-graphql/dagster_graphql/implementation/execution/__init__.py | ||
---|---|---|
102–106 | another problem here is I think even in the success case we still want to immediately mark the run as failed, because sometimes the actual termination can take a while (or even just hang forever and never actually terminate in certain edge cases) |
I think the new error messages for report_fun_failed here reflect what we can actually guarantee to the user in this case
From the front end perspective, how would this interact with canTerminate? Can you not mark a run as failed that the run launcher in use says you can't terminate?
I think it needs to be two different options - one soft terminate, one hard terminate, and we don't allow you to soft terminate if canTerminate is False (but we always allow you to hard terminate if the run hasn't finished). And we prefer that you soft terminate when possible. I don't think we should fall-back to hard-terminate automatically without some kind of action from the user, because it really is riskier and might not be what the user expected.
We've also talked about a UI where if soft terminate fails, we then give you the option to hard terminate in a dialog or something (cc @dish).
cool cool. I think a separate mutation would make more sense to me, but by a negligible amount. As long as the dagit experience is good as described it doesn't really matter if its one mutation with an enum or two mutations.
(an alternative would be to move this into an entirely different mutation that doesn't even try to terminate and just marks it as failed no matter what, while providing a similar warning in the event log - I think it's still valuable to try to terminate, but I could be convinced to that too)
python_modules/dagster-graphql/dagster_graphql/implementation/execution/__init__.py | ||
---|---|---|
50 | "successfully sending a termination request." -> "successfully processing a termination request." or "successfully handling a termination request."? | |
53 | "created the" -> "created by the" "To avoid this in the future, consider updating the terminate(...) method on the RunLauncher"? | |
102 | I think this can cause report_run_failed(run) to be called twice. (ie from _force_mark_as_failed and cancel_run) | |
python_modules/dagster-graphql/dagster_graphql/schema/roots.py | ||
476 | nit: "SAFE_TERMINATE" / "TERMINATE_SAFELY"? just to make it really clear that this is preferred | |
python_modules/dagster-graphql/dagster_graphql_tests/graphql/test_run_cancellation.py | ||
162 | 👀 |
python_modules/dagster-graphql/dagster_graphql/implementation/execution/__init__.py | ||
---|---|---|
53 | I tweaked this language a bit to avoid talking about success or failure, that'll be handled by other events in the event log before this |
python_modules/dagster-graphql/dagster_graphql/implementation/execution/__init__.py | ||
---|---|---|
53 | may not have been fully |