Page MenuHomePhabricator

New mutation for force-mark-as-failed during termination

Authored by dgibson on Dec 7 2020, 10:56 PM.



This adds a new argument that you can pass to termination to make it force-mark-as-failed. It will still try to terminate, but as long as the run isn't finished it won't require the termination to be possible (or wait for it to finish) before marking the run as failed.

Test Plan


Diff Detail

R1 dagster
Automatic diff as part of commit; lint not applicable.
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

dgibson published this revision for review.Dec 7 2020, 11:06 PM

opening for Cs


we could include some indication here that it was terminated under duress


probably need to tweak this message


another problem here is I think even in the success case we still want to immediately mark the run as failed, because sometimes the actual termination can take a while (or even just hang forever and never actually terminate in certain edge cases)

I think the new error messages for report_fun_failed here reflect what we can actually guarantee to the user in this case

From the front end perspective, how would this interact with canTerminate? Can you not mark a run as failed that the run launcher in use says you can't terminate?

I think it needs to be two different options - one soft terminate, one hard terminate, and we don't allow you to soft terminate if canTerminate is False (but we always allow you to hard terminate if the run hasn't finished). And we prefer that you soft terminate when possible. I don't think we should fall-back to hard-terminate automatically without some kind of action from the user, because it really is riskier and might not be what the user expected.

We've also talked about a UI where if soft terminate fails, we then give you the option to hard terminate in a dialog or something (cc @dish).

cool cool. I think a separate mutation would make more sense to me, but by a negligible amount. As long as the dagit experience is good as described it doesn't really matter if its one mutation with an enum or two mutations.

(an alternative would be to move this into an entirely different mutation that doesn't even try to terminate and just marks it as failed no matter what, while providing a similar warning in the event log - I think it's still valuable to try to terminate, but I could be convinced to that too)

catherinewu added inline comments.

"successfully sending a termination request." -> "successfully processing a termination request." or "successfully handling a termination request."?


"created the" -> "created by the"

"To avoid this in the future, consider updating the terminate(...) method on the RunLauncher"?


I think this can cause report_run_failed(run) to be called twice. (ie from _force_mark_as_failed and cancel_run)


nit: "SAFE_TERMINATE" / "TERMINATE_SAFELY"? just to make it really clear that this is preferred



This revision now requires changes to proceed.Dec 11 2020, 10:58 PM

(rebase, feedback, incorporate new CANCELED status)


I tweaked this language a bit to avoid talking about success or failure, that'll be handled by other events in the event log before this

dgibson retitled this revision from RFC: New mutation for force-mark-as-failed during termination to New mutation for force-mark-as-failed during termination.Dec 15 2020, 2:58 PM
dgibson edited the summary of this revision. (Show Details)
dgibson edited the test plan for this revision. (Show Details)
prha added inline comments.

may not have been fully

This revision is now accepted and ready to land.Dec 16 2020, 2:20 AM
dgibson edited the summary of this revision. (Show Details)

update copy

This revision was landed with ongoing or failed builds.Dec 16 2020, 10:50 PM
This revision was automatically updated to reflect the committed changes.