Page MenuHomeElementl

Create a new task definition for each launch
ClosedPublic

Authored by jordansanders on Jun 21 2021, 9:42 PM.

Details

Summary

When originally conceived, the EcsRunLauncher launched tasks using the
same task definition used by its current process. This was to punt on
needing to register new task definitions because:

  1. There's not a great way to identify if, given a set of input parameters, a task definition that satisfies those constraints already exists.
  2. There's a lot of variety in how you can configure a task definition and I thought constraining our options to just a known valid configuration would make development easier.

The fatal flaw of this plan is that the current process that calls
launch_run is usually the daemon. If your daemon container doesn't also
include your repository and pipeline code, then the ECS task correctly
stands up but the Dagster step fails execution because it can't
load the pipeline definition.

ECS allows you to override a lot of container behavior - but it doesn't
allow you to override the actual image. So we're forced to either find a
suitable task definition that already exists or bit the bullet and
create a new suitable task definition.

We can know the pipeline's repository origin's image at the time we want
to launch the run because it can be set via DAGSTER_CURRENT_IMAGE:

https://github.com/dagster-io/dagster/commit/90079c4bf5e57be591ef0c44c1a1da96b3bbd7ad

We could list all running tasks and find ones that use that image. But
it's possible that multiple tasks uses the image and I could see it
getting confusing if we chose the "wrong" one (even though technically
things would still work. For example, if both a repo1 task and a repo2
task use the same image and we're trying to run a pipeline from repo1,
we could accidentally do so using a task definition for repo2. Blergh.

So instead, I've decided to forge ahead on creating a task definition
for each run. We start with the parent task definition just like we
previously did. But then we munge it so that it's suitable to pass back
in as arguments to ecs.register_task_definition(). We remove the
"daemon" container and add a new "run" container.

There are two major optimizations that we should make to this before
recommending it for production use:

  1. Don't create a new task definition if a suitable active one already exists. This is perhaps easier said than done because ECS doesn't provide a mechanism for checking if a given task definition exists. So we'll probably need to either read through the ECS documentation and hardcode its default behaviors or we'll need to loosen our definition of "matching."
  2. Garbage collect unused revisions. Once we're done running our task, we should deregister its task definition so users' AWS accounts aren't littered with tons of active but outdated task definitions.
  3. Allow the task definition to be overridden. This can be done fairly trivially by changing:

    ` self.ecs.register_task_definition(task_definition) ` to ` self.ecs.register_task_definition{{task_definition, overrides}}) `

    although we'll want to give some thought and care to the exact implementation.
Test Plan

unit

Diff Detail

Repository
R1 dagster
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

this makes sense to me. Allowing users to pass in a custom task definition could be good? You could even have it configured on the grpc server in the same way that you can configure DAGSTER_CURRENT_IMAGE to specify the image to use when launching runs

python_modules/libraries/dagster-aws/dagster_aws/ecs/launcher.py
94

nit inherit

96

nit pipeline origin's image

This revision is now accepted and ready to land.Jun 23 2021, 2:49 PM