HomePhabricator

(5/n k8s-pod-deletion) Prevent step re-executions due to pod deletions

Description

(5/n k8s-pod-deletion) Prevent step re-executions due to pod deletions

Summary:
This diff adds a should_verify_step argument to ExecuteStepArgs. If this argument is set, then we check the event log to make sure that execute_step_with_structured_logs has not been called before with the given step.

I decided to gate this check behind a flag that's only set in the k8s executor to perform an additional query when using other executors that call execute_step_with_structured_logs when we're not seeing any problems with the other executors.

More details in an inline comment.

Test Plan:
Manual testing by deleting pods. Existing integration tests cover the happy path.

Scenarios tested:

  • Make sure runs still succeed when pods are not deleted (bk)
  • Delete a pod on first attempt, make sure correct error message is emitted and run fails.
  • Delete a pod on retry N, make sure correct error message is emitted and run fails.

Reviewers: #platform, alangenfeld

Reviewed By: #platform, alangenfeld

Subscribers: alangenfeld

Differential Revision: https://dagster.phacility.com/D4855

Details

Provenance
sashankAuthored on Oct 21 2020, 10:17 AM
Reviewer
Restricted Project
Differential Revision
D4855: (5/n k8s-pod-deletion) Prevent step re-executions due to pod deletions
Parents
R1:a63beea2f675: [Experimental] Helm deployment of dagster daemon
Branches
Unknown
Tags
Unknown