Page MenuHomeElementl

[RFC] Celery-less k8s executor

Authored by johann on Apr 27 2021, 6:13 PM.



A new stacking of classes to make up Executors:

The new CoreExecutor holds the generic executor behavior, while the specifics are encapsulated in StepHandlers.


  • reduce check_step_health polling rate to reduce calls to k8s api
  • more robust k8s error handling
  • the multiprocess step handler is mostly for faster debugging for now, will remove it from this diff. Still curious if you have comments on that approach. The k8s step handler doesn't need to keep anything in memory (the init context can be refactored out, while the multiprocess has a bunch of internal state. It's possible that could be refactored to just storing a PID, but that seems non trivial)
Test Plan


Diff Detail

R1 dagster
Lint Not Applicable
Tests Not Applicable

Event Timeline

Harbormaster returned this revision to the author for changes because remote builds failed.Apr 27 2021, 6:32 PM
Harbormaster failed remote builds in B29582: Diff 36323!
Harbormaster failed remote builds in B29583: Diff 36324!
Harbormaster returned this revision to the author for changes because remote builds failed.Apr 27 2021, 9:55 PM
Harbormaster failed remote builds in B29609: Diff 36353!
johann retitled this revision from core executor to [RFC] Celery-less k8s executor.May 4 2021, 5:31 AM
johann edited the test plan for this revision. (Show Details)
johann edited the summary of this revision. (Show Details)


Harbormaster returned this revision to the author for changes because remote builds failed.May 4 2021, 6:12 AM
Harbormaster failed remote builds in B29899: Diff 36725!
johann requested review of this revision.May 4 2021, 3:00 PM

nice, I think this turned out quite clean. My only real comments are about the initialize() method and maybe some naming. @alangenfeld probably has some good thoughts there as well

264–276 ↗(On Diff #36731)


1 ↗(On Diff #36731)

dunno about 'extendable', how about 'base' or 'core'?

16 ↗(On Diff #36731)

maybe StepHandlerExecutor? @alangenfeld may have suggestions too. "Core" is a little vague

40 ↗(On Diff #36731)

the init vs initialize here is a little strange, I think I see why you have both though. is there a more specific name like... start_execution, i duno


you could store the specific fields that you need here vs slapping the whole InitExecutorContext on there.

Incidentally instance is also on InitExecutorContext, and retries is already on the base class- you could store those here at this stage, and have your 'initailize' method be just for the pipeline context?

would be good to include mypy return types as well

16 ↗(On Diff #36731)

StepDelegatingExecutor is my vote

StepIsolatedExecutor could be good too

22 ↗(On Diff #36746)

hm interesting choice whether to structure it like this versus having a subclass

58–88 ↗(On Diff #36746)

ya we definitely want some rate control for the main executor loop here - pop_events can be costly as well as check_step_health

10 ↗(On Diff #36746)

related to above comment - its not obvious to me why its factored out this way versus abstract methods + subclassing the executor

but seems fine too so no need to change it if you like this set up


K8sJobStep I think

dgibson requested changes to this revision.May 4 2021, 6:58 PM

for that round of feedback

This revision now requires changes to proceed.May 4 2021, 6:58 PM



confusing name since it returns a pipeline that's called an executor




should this be configurable? seems like a pretty tight loop


in the other step handler you are writing we will want this pretty quickly I think for the 'in-process' / 'no step isolation' case

This revision is now accepted and ready to land.May 6 2021, 3:40 PM

I think some form of exponential backoff with checking for new events might be ideal, but yeah

This revision was automatically updated to reflect the committed changes.