Even after
00d73bb346011231f3fb1c43d4f32bfdb63cebe3 and
538c27bcada05674077612eabba7c8566988495f
ECS continues to run into list index errors:
https://dagster.slack.com/archives/C01U954MEER/p1627421095083100
I haven't been able to reproduce the issue but my best guess is that we're
running into eventual consistency issues with ECS. This is consistent with these
ECS docs:
https://docs.aws.amazon.com/AmazonECS/latest/APIReference/API_RunTask.html
AWS suggests an expontential backoff of up to 5 minutes. I think that's a little
extreme for our use case - particularly because we don't want to block the
GraphQL query from resolving.
Instead, I'm changing the behavior of .can_terminate to return False if we
run into this eventual consistency. This means occassionally, truly cancellable
pipelines will show as unable to cancel. Fortunately, the value of
.can_terminate isn't memoized so it won't be stuck as uncancellable for the
entire lifetime of the pipeline run.