Page MenuHomeElementl

Test ECS reference deployment
ClosedPublic

Authored by jordansanders on Jun 28 2021, 6:54 PM.
Tags
None
Referenced Files
F2307109: D8575.id.diff
Mon, Jul 4, 9:57 PM
F2304888: D8575.id40432.diff
Mon, Jul 4, 10:11 AM
Unknown Object (File)
Thu, Jun 23, 3:27 AM
Unknown Object (File)
Tue, Jun 21, 3:49 PM
Unknown Object (File)
Mon, Jun 20, 3:36 PM
Unknown Object (File)
Mon, Jun 20, 12:10 PM
Unknown Object (File)
Sun, Jun 19, 3:04 AM
Unknown Object (File)
Sat, Jun 18, 4:03 PM
Subscribers
None

Details

Summary

Use the docker-compose ECS local simulator to test our ECS deployment.
This substitutes the DefaultRunLauncher for the EcsRunLauncher because
the latter cannot run locally and it builds the images from source.

Eventually, we can extend this to run on real ECS as well.

Additionally, this adds a pattern for doing our usual networking
shenanigans but from within PyTest fixtures. I found the approach to
be more flexible and if it's something we like, we can generalize it
and use it across all of our Buildkite docker-compose-in-docker use
cases.

Test Plan

unit, integration

Diff Detail

Repository
R1 dagster
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

Requesting feedback before I've gotten a clean build because I could use some advice on how we want to handle a particularly hair docker-compose-in-docker situation.

examples/deploy_ecs/tests/test_deploy.py
34

I'm way down a rabbit hole on a particularly hair docker-in-docker situation here. ECS Local Simulation is trying to mount credentials. Pytest (where this command is being invoked) is running inside a container. Docker compose will get run inside a container. But the Docker socket that it's using is running on the Buildkite host machine. So when it tries to mount the volume $tmpdir/.aws, it's trying to mount the host machine's $tmpdir/.aws (which doesn't exist), not the container's $tmpdir/.aws (which does exist).

I'm a little at a loss for how to fix this. A particularly bad solution that I could do is to make a directory at /tmp/.aws in our Buildkite AMI and then set $HOME to /tmp so that the directory would indeed exist on all host machines.

Another possible solution would be to skip using ECS local entirely and jump straight to the real deal - running these integration tests against real ECS. That's something I was hoping to avoid initially until I get a better sense of how reliable the cleanup is (I don't want a bunch of old ECS resources polluting our AWS account). Plus, it's a lot slower.

Anybody have other ideas?

56

The alternative to this approach is the separate "from_source" directory that we have in our deploy_docker example. I prefer this approach where we have one canonical set of files that we overwrite via fixtures rather than a set of files that's under test and a set of files that isn't but I'm curious to hear if we find this to not be worth the complexity.

I ran into the 'cant mount volumes' issue in https://dagster.phacility.com/D8485 as well - finding some kind of workaround that lets us mount volumes seems valuable, but I don't really have any particular insights on how to do that

This update fixes the docker mount issues by:

  • mounting volume /tmp from the host machine to /tmp in the container
  • setting the PYTEST_DEBUG_TEMPROOT to /tmp

The former ensures that a process in the container can write to a directory that's accessible from the host machine. There were a few other ways that I explored doing this:

  • tmpfs initially sounded appealing because it's temporary in nature: https://github.com/buildkite-plugins/docker-buildkite-plugin#tmpfs-optional-array. But because it's all in-memory, our host machine was still unable to recognize it as a volume.
  • /workdir is already a mounted volume between the host machine and the container. But the directory on the host machine has a different name than /workdir which made it dfficult to correctly resolve the correct volume name. In our test, we'd need to write to /workdir. But we'd then need to somehow tell docker to mount a volume from the host machine's real directory name.

The latter ensures that any time we uses PyTest's tmpdir or tmp_path fixtures, they're nested inside /tmp.

Additionally, this update cleans up the network connection between our Buildkite container and our docker compose network before trying to run docker compose down. Otherwise, removing the network would fail because it'd have active endpoints.

Finally, this switches the behavior of our networking fixture to return a dictionary of container names to hostnames. Previously, it was incorrectly returning just the hostname of our Buildkite container. Now, you can do stuff like hostnames["dagit"] to get the hostname of the dagit container.

examples/deploy_ecs/tests/test_deploy.py
151

Once this lands, I think I'll extract some of these fixtures (particularly the networking and docker compose up/down related ones) into something more generalizable that we can use instead custom building buildkite steps every time we want to do docker stuff.

examples/deploy_ecs/tests/test_deploy.py
34

I ended up going with a simple solution that only came to me after stepping back from the problem for a bit: just mount /tmp from the host machine to the container.

examples/deploy_ecs/tox.ini
15

Not capturing string output ended up being helpful during the development of this although it isn't strictly necessary to keep. It might make things a little easier going forward if the test ever breaks and people need to investigate?

Ignore containers that aren't connected to the network

dgibson added inline comments.
.buildkite/dagster-buildkite/dagster_buildkite/step_builder.py
66–77

comment here explaining what's going on would be lovely

73–77

comment here explaining what's going on would be lovely, I don't think PYTEST_DEBUG_TEMPROOT is widely known

examples/deploy_ecs/tests/test_deploy.py
96–99

this part is a bit jankier / more magical... maybe replace a commented out line (like a template kidn of) rather than always assuming the line after WORKDIR $DAGSTER_HOME is the right place

106

explain why?

149–150

doing this in pytest seems nice / preferable to wtihin bk, any plans to do a widespread application of this pattern elsewhere? If so this could go somewhere broader than deploy_ecs

This revision is now accepted and ready to land.Jun 30 2021, 5:18 PM
.buildkite/dagster-buildkite/dagster_buildkite/step_builder.py
73–77

Ah yes - excellent suggestion. I keep forgetting that including commit-like context in diff updates in phabricator is basically just shouting into the void 🙃

This update fixes the docker mount issues by:

  • mounting volume /tmp from the host machine to /tmp in the container
  • setting the PYTEST_DEBUG_TEMPROOT to /tmp

The former ensures that a process in the container can write to a directory that's accessible from the host machine. There were a few other ways that I explored doing this:

  • tmpfs initially sounded appealing because it's temporary in nature: https://github.com/buildkite-plugins/docker-buildkite-plugin#tmpfs-optional-array. But because it's all in-memory, our host machine was still unable to recognize it as a volume.
  • /workdir is already a mounted volume between the host machine and the container. But the directory on the host machine has a different name than /workdir which made it dfficult to correctly resolve the correct volume name. In our test, we'd need to write to /workdir. But we'd then need to somehow tell docker to mount a volume from the host machine's real directory name.

The latter ensures that any time we uses PyTest's tmpdir or tmp_path fixtures, they're nested inside /tmp.

Additionally, this update cleans up the network connection between our Buildkite container and our docker compose network before trying to run docker compose down. Otherwise, removing the network would fail because it'd have active endpoints.

Finally, this switches the behavior of our networking fixture to return a dictionary of container names to hostnames. Previously, it was incorrectly returning just the hostname of our Buildkite container. Now, you can do stuff like hostnames["dagit"] to get the hostname of the dagit container.

examples/deploy_ecs/tests/test_deploy.py
96–99

Hm. I'd prefer if the real resources (like Dockerfile) are as clean as possible without having magic stuff in them to enable testing - but your point is well taken. I'll think about how I can make this less confusing.

149–150

Yup - but out of scope. I'll extract/move it when I get around to generalizing the pattern.

examples/deploy_ecs/tests/test_deploy.py
96–99

Complexity noted but I think I'm going to land this as is. Especially because:

rather than always assuming the line after WORKDIR $DAGSTER_HOME is the right place

The test should break if this is the case. And it can be run locally during development and during CI/CD on push to verify this is the case. I think the benefit of keeping the user-facing reference documentation simple affords us a little bit of wiggle-room in test complexity.

This revision was automatically updated to reflect the committed changes.