Page MenuHomePhabricator

Mark pipeline as failed when there's a failure after launch but before execution starts
ClosedPublic

Authored by dgibson on Oct 26 2020, 3:27 PM.

Details

Summary

king reported this in https://github.com/dagster-io/dagster/issues/3016 - you'll get a hanging run if the gRPC server can't load the instance for some reason. Deal with that by catching failures in the instance.launch_run method and marking the run as failed when that happens. (Failures that happen during execution are responsible for marking the run as failed themselves - this is solely for the case where we weren't able to even start execution. We mark as failure in the run launching process because if it's an issue loading the instance, we have no way to find the run to mark it as failed.

Test Plan

New BK test

Diff Detail

Repository
R1 dagster
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

python_modules/dagster/dagster/core/storage/event_log/sqlite/consolidated_sqlite_event_log.py
69

the str => StringSource changes here were just to make the test I added possible, but seem harmless and backwards compatible, The docs claim "Note that Dagster supports retrieving instance YAML values from environment variables, using env: instead of a string literal. An example dagster.yaml is below:"

python_modules/dagster/dagster/scheduler/scheduler.py
408

not sure why I had this here, KeyboardInterrupt is a BaseException not an Exception

dgibson published this revision for review.Oct 26 2020, 4:08 PM

yeah I think this is the right thing to do

python_modules/dagster/dagster/core/storage/event_log/sqlite/consolidated_sqlite_event_log.py
69

👍

This revision is now accepted and ready to land.Oct 26 2020, 4:30 PM