Page MenuHomePhabricator

refactor some k8s error handling
Needs RevisionPublic

Authored by catherinewu on Tue, Sep 29, 10:36 PM.

Details

Reviewers
alangenfeld
Summary

division of responsibility after this diff is that the k8s client is responsible to raising errors that it encounters and the celery k8s executor is responsible for handling the error correctly. Also consolidated the logic within the celery k8s executor + consolidated error handling to afterwards.

would love to hear thoughts before adding the tests

really sketchy to return the serialized events since it swallows errors :( but the core celery loop crashes if it doesnt get an iterable

Test Plan

bk?

need to add tests for each error case

Diff Detail

Repository
R1 dagster
Branch
refactor-k8s-errors
Lint
Lint OK
Unit
No Unit Test Coverage

Event Timeline

Harbormaster returned this revision to the author for changes because remote builds failed.Tue, Sep 29, 11:23 PM
Harbormaster failed remote builds in B18880: Diff 22930!
Harbormaster returned this revision to the author for changes because remote builds failed.Wed, Sep 30, 2:53 AM
Harbormaster failed remote builds in B18888: Diff 22941!
Harbormaster returned this revision to the author for changes because remote builds failed.Wed, Sep 30, 8:12 AM
Harbormaster failed remote builds in B18890: Diff 22943!
Harbormaster returned this revision to the author for changes because remote builds failed.Wed, Sep 30, 7:32 PM
Harbormaster failed remote builds in B18922: Diff 22978!
Harbormaster returned this revision to the author for changes because remote builds failed.Wed, Sep 30, 9:14 PM
Harbormaster failed remote builds in B18935: Diff 22996!
Harbormaster returned this revision to the author for changes because remote builds failed.Thu, Oct 1, 12:41 AM
Harbormaster failed remote builds in B18950: Diff 23016!
Harbormaster returned this revision to the author for changes because remote builds failed.Thu, Oct 1, 2:06 AM
Harbormaster failed remote builds in B18952: Diff 23018!
Harbormaster returned this revision to the author for changes because remote builds failed.Thu, Oct 1, 4:10 AM
Harbormaster failed remote builds in B18954: Diff 23021!
catherinewu added a reviewer: alangenfeld.
catherinewu edited the summary of this revision. (Show Details)
catherinewu edited the test plan for this revision. (Show Details)
catherinewu edited the summary of this revision. (Show Details)

this seems like a good clean up to me overall

really sketchy to return the serialized events since it swallows errors :( but the core celery loop crashes if it doesnt get an iterable

can you talk more about this?

python_modules/libraries/dagster-celery-k8s/dagster_celery_k8s/executor.py
326–328

can this get refactored up and out in to a not-inline function ?

338

I think its clearer to just run this in the except block callsites instead of behind a bool arg

341

we dont want this marker repeated, should only be on the first one

python_modules/libraries/dagster-k8s/dagster_k8s/client.py
21–33

nit: feel like we should stay consistent and use Error or Exception, I believe we use Error as a suffix more often

python_modules/libraries/dagster-k8s/dagster_k8s/utils.py
13–37

hmm why do we have these instead of creating a DagsterKubernetesClient.production_client() once and re-using it?

to your queue

This revision now requires changes to proceed.Tue, Oct 6, 6:39 PM