also renamed methods on ExternalExecutionPlan to pair to the equivalent methods on ExecutionPlan (open to discussion)
Details
Diff Detail
- Repository
- R1 dagster
- Branch
- yuhan/reexecution-from-failure
- Lint
Lint Passed - Unit
No Test Coverage
Event Timeline
I would like to discuss alternative ways to expose this. My hesitation towards this approach comes from using a bool argument that conflicts with another argument [1]. I think this represents an ergonomic / feel issue as well as point at a lack of extensibility if we desire other times of re-execution where steps are selected by the system.
one alternative would be to extend step_selection to also take an enum entry or marker type for describing a step selection to be calculated by dagster.
open to other ideas and could also be convinced the bool is right, but i feel like theres a nicer solution to be had.
python_modules/dagster/dagster/core/execution/api.py | ||
---|---|---|
449–457 | agh looks like this got accidentally out of sync in D7044 | |
979–982 | [1] |
one alternative would be to extend step_selection to also take an enum entry or marker type for describing a step selection to be calculated by dagster.
we could
- extend step_selection to take StepSelection.FROM_FAILURE), step_selection=StepSelection.from_selection(["some_solid*"]), and ["some_solid*"] (backcompat)
- or a more generic arg reexecute_option and deprecate step_selection, it'd take ReexecuteOption.FROM_FAILURE, ReexecuteOption.from_selection(["some_solid*"]))
i could see another advantage of introducing a class (e.g. an enum entry) is that we could consolidate all the parsing/resolving logics into that class - for example, internally, we could pass around StepSelection.step_keys_to_execute as opposed to figure out whether the selection is pre-transformation or post-transformation.
but the cost is to expose a new top-level api and the api being verbose. i'm pretty much on the fence, slightly leaning towards the bool argument as it keeps the apis simple.
but the cost is to expose a new top-level api and the api being verbose
Ya that nails the trade-off here. I have a strong personal dislike of conflicting bool arguments, which is why I at least wanted to talk it out, but agree the cost of adding a new public enum/class isn't a slamdunk.
add some more reviewers for other opinions
Just to riff on options :
# current reexecute_pipeline(( my_pipeline, previous_run_id, from_failure=True # conflicty bool ) # enum reexecute_pipeline(( my_pipeline, previous_run_id, step_selection=StepSelection.RETRY_FROM_FAILURE # new api surface area ) # semantic run id arg reexecute_pipeline(( my_pipeline, retry_from_failed_run=previous_run_id, # still conflicty, have to make parent_run_id optional ) # stringly reexecute_pipeline(( my_pipeline, previous_run_id, step_selection="retry_from_failure" # list distinguishes step selection syntax from "algorithm choice" )
python_modules/dagster/dagster/core/execution/api.py | ||
---|---|---|
560–561 | ^ also needs to get re-corrected |
i think the enum solution is the cleanest here; it certainly adds API surface but if we retain backcompatibility with the old list of strings, that feels fine -- it adds API surface in lockstep with new functionality