Page MenuHomeElementl

add run group tags to backfill jobs
ClosedPublic

Authored by yuhan on May 6 2021, 5:32 PM.

Details

Summary

this fixes a bug where partial re-execution via backfills fails because it's unable to find the source run id using run group info
the root cause was instance.get_run_group fetches the info based on tags but the pipeline runs created via the backfill re-execution flow didn't have the corespondng tags (i.e. PARENT_RUN_ID and ROOT_RUN_ID)

the better solution is to not let get_run_group depend on the tags info but that may require a change in the runs table, so im not going with that for now - wanted to get the fix in today's release

Test Plan

local dagit - this only happens in sql storage bc of how the get_run_group was implemented, so it's a bit tricky to mock everything in the situation in a unit test

repro the bug: launch backfill -> fail the job -> reexecute from failure. the run didn't have the parent/root tags

Screen Shot 2021-05-06 at 10.25.27 AM.png (2×4 px, 875 KB)

the fix fixed the bug: re-execution succeeded. the runs have the tags

Screen Shot 2021-05-06 at 10.25.04 AM.png (718×4 px, 304 KB)

Diff Detail

Repository
R1 dagster
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

yuhan published this revision for review.May 6 2021, 5:36 PM
yuhan edited the test plan for this revision. (Show Details)

would like @prha to sign off on this too

python_modules/dagster/dagster/core/execution/context/system.py
471–477

if we fall through and return None here - what will happen?

python_modules/dagster/dagster/core/storage/runs/sql_run_storage.py
329

what utility does this comment have? maybe file an issue if you think there is a real problem to be fixed

yuhan marked an inline comment as done.May 6 2021, 5:45 PM
yuhan added inline comments.
python_modules/dagster/dagster/core/execution/context/system.py
471–477

the InputContext for load_input will get run_id=None which is optional in the commented case, i.e. the io manager used in that case wouldn't depend on the source run id associated to the input

python_modules/dagster/dagster/core/execution/context/system.py
471–477

right but now that behavior will also trigger for run_id dependent io managers when we fail to find the source run id - is that right?

python_modules/dagster/dagster/core/execution/context/system.py
471–477

yes but that shouldn't happen - i was erroring it but then it broke the commented case so i loosed it. if that happens, the run_id dependent io managers will error when _get_path saying run_id is None

python_modules/dagster/dagster/core/execution/context/system.py
471–477

ya i guess my thinking here is whether its better to error (with something better than a KeyError) here or will we be able to work backwards to this from the downstream error that occurs

yes but that shouldn't happen

maybe best to have check.invariant with a useful message then

the backfill changes look good.... thanks for fixing...

can resolve my error concerns in a follow up - your call

This revision is now accepted and ready to land.May 6 2021, 6:17 PM
yuhan marked an inline comment as done.May 6 2021, 6:20 PM
yuhan added inline comments.
python_modules/dagster/dagster/core/execution/context/system.py
471–477

added a check in the downstream

yuhan marked an inline comment as done.

check invariant in downstream

backoff :/ - will handle error in a follow up

This revision was automatically updated to reflect the committed changes.