Page MenuHomeElementl

Return false if describe_tasks isn't consistent
ClosedPublic

Authored by jordansanders on Jul 27 2021, 10:48 PM.
Tags
None
Referenced Files
Unknown Object (File)
Fri, Jan 27, 1:34 AM
Unknown Object (File)
Tue, Jan 24, 4:28 AM
Unknown Object (File)
Mon, Jan 23, 10:18 AM
Unknown Object (File)
Sun, Jan 22, 4:05 AM
Unknown Object (File)
Thu, Jan 12, 3:50 PM
Unknown Object (File)
Wed, Jan 11, 6:34 PM
Unknown Object (File)
Fri, Jan 6, 4:55 AM
Unknown Object (File)
Thu, Jan 5, 1:38 PM
Subscribers
None

Details

Summary

Even after

00d73bb346011231f3fb1c43d4f32bfdb63cebe3 and
538c27bcada05674077612eabba7c8566988495f

ECS continues to run into list index errors:

https://dagster.slack.com/archives/C01U954MEER/p1627421095083100

I haven't been able to reproduce the issue but my best guess is that we're
running into eventual consistency issues with ECS. This is consistent with these
ECS docs:

https://docs.aws.amazon.com/AmazonECS/latest/APIReference/API_RunTask.html

AWS suggests an expontential backoff of up to 5 minutes. I think that's a little
extreme for our use case - particularly because we don't want to block the
GraphQL query from resolving.

Instead, I'm changing the behavior of .can_terminate to return False if we
run into this eventual consistency. This means occassionally, truly cancellable
pipelines will show as unable to cancel. Fortunately, the value of
.can_terminate isn't memoized so it won't be stuck as uncancellable for the
entire lifetime of the pipeline run.

Test Plan

unit

Diff Detail

Repository
R1 dagster
Branch
jordan-ecs-eventually-consistent (branched from master)
Lint
Lint Passed
Unit
No Test Coverage