Page MenuHomeElementl

Return false if describe_tasks isn't consistent
ClosedPublic

Authored by jordansanders on Jul 27 2021, 10:48 PM.

Details

Summary

Even after

00d73bb346011231f3fb1c43d4f32bfdb63cebe3 and
538c27bcada05674077612eabba7c8566988495f

ECS continues to run into list index errors:

https://dagster.slack.com/archives/C01U954MEER/p1627421095083100

I haven't been able to reproduce the issue but my best guess is that we're
running into eventual consistency issues with ECS. This is consistent with these
ECS docs:

https://docs.aws.amazon.com/AmazonECS/latest/APIReference/API_RunTask.html

AWS suggests an expontential backoff of up to 5 minutes. I think that's a little
extreme for our use case - particularly because we don't want to block the
GraphQL query from resolving.

Instead, I'm changing the behavior of .can_terminate to return False if we
run into this eventual consistency. This means occassionally, truly cancellable
pipelines will show as unable to cancel. Fortunately, the value of
.can_terminate isn't memoized so it won't be stuck as uncancellable for the
entire lifetime of the pipeline run.

Test Plan

unit

Diff Detail

Repository
R1 dagster
Lint
Lint Not Applicable
Unit
Tests Not Applicable