Page MenuHomePhabricator

Run k8s tests from python
ClosedPublic

Authored by nate on Dec 22 2019, 6:08 AM.

Details

Summary

Depends on D1737

Fixes https://github.com/dagster-io/dagster/issues/2026 and https://github.com/dagster-io/dagster/issues/2028

This expresses the kubernetes kind/helm tests as pytest fixtures and cleans up the shell scripting that we were using previously. So now the tests can be run locally in the integration image with:

docker run -v /var/run/docker.sock:/var/run/docker.sock -v ~/src/dagster:/workdir -it 968703565975.dkr.ecr.us-west-1.amazonaws.com/buildkite-integration:py3.7.6-2019-12-29T212402 /bin/bash

pip install -e python_modules/dagster -e python_modules/dagster-graphql -e python_modules/libraries/dagster-k8s

export AWS_ACCESS_KEY_ID="..."
export AWS_SECRET_ACCESS_KEY="..."
export DAGSTER_DOCKER_IMAGE_TAG="..."
export DAGSTER_DOCKER_REPOSITORY="..."

pytest python_modules/libraries/dagster-k8s/dagster_k8s_tests/ -s

Note that while D1737 improves performance for the test image build step that is upstream of this test, this test still takes ~4m30s on average. So it adds a small amount of extra time to our build critical path since our current longest tests (airline demo) run for ~6m30s and this test is serialized after the test image build step.

Options for further build time improvements are:

  1. Reduce the size of the integration image; between the first stage docker image build and this set of tests, extracting the integration image adds ~2m30s to the critical path of our builds
  2. (long term) Enable using a vanilla Dagit image in Helm (i.e. a Dagit container built without any client code), instead of rebuilding Dagit in the test images every build. This would save ~2m30s from the critical path.
  3. Treat k8s tests as non-blocking for merges, whether via cultural means, by permitting merges with ongoing k8s builds, or by creating a separate build pipeline that is triggered by the primary (I don't love the latter option, it'd be hard to associate the child build with the parent build when debugging a failure)
  4. (not recommended, I think this would be more trouble than it is worth) Set up a standing EKS kubernetes cluster with scale-to-zero node pools for CI/CD, instead of using kind. Might save small amount of time from build times especially with a warm cluster, but depends on EKS auto-scaling behavior. Also, build isolation would be a real challenge—has high risk of build interactions causing confusing errors.
  5. Don't test kubernetes or helm in CI. Reduces build time by ~5m
Test Plan

buildkite

Diff Detail

Repository
R1 dagster
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
nate updated this revision to Diff 8159.Dec 23 2019, 6:20 PM

debugging

nate updated this revision to Diff 8171.Dec 23 2019, 11:12 PM

refactor to use pytest fixtures

nate edited the summary of this revision. (Show Details)Dec 23 2019, 11:13 PM
nate updated this revision to Diff 8175.Dec 24 2019, 12:30 AM

rebase and up

nate edited the summary of this revision. (Show Details)Dec 24 2019, 12:51 AM
nate updated this revision to Diff 8179.Dec 24 2019, 12:56 AM
nate edited the summary of this revision. (Show Details)

up

nate edited the summary of this revision. (Show Details)Dec 24 2019, 1:14 AM
nate edited the summary of this revision. (Show Details)
nate edited the summary of this revision. (Show Details)
nate updated this revision to Diff 8181.Dec 24 2019, 5:33 AM
nate edited the summary of this revision. (Show Details)

up

nate edited the summary of this revision. (Show Details)Dec 24 2019, 5:19 PM
nate added reviewers: schrockn, alangenfeld.

Seems like having much more library-specific containers which preinstalled deps would help a lot. If the actual pipeline being tested had zero deps beyond dagster and the built image had deps preinstalled we would be in a much better place no?

I'm pretty uncomfortable checking in this large of a regression to our already pretty heavy unit test suite. It would be a significant tax on productivity.

python_modules/libraries/dagster-k8s/dagster_k8s_tests/test_launcher.py
129

not a big fan of the exact match here (or the message) not testing the error message here and its another thing to update (on a long running test no less) if you decide to update some message verbiage

schrockn requested changes to this revision.Dec 28 2019, 12:24 AM
This revision now requires changes to proceed.Dec 28 2019, 12:24 AM
nate edited the summary of this revision. (Show Details)Dec 28 2019, 5:30 AM
nate updated this revision to Diff 8197.Dec 28 2019, 5:45 AM

comments

nate marked an inline comment as done.Dec 28 2019, 5:47 AM
nate added inline comments.
python_modules/libraries/dagster-k8s/dagster_k8s_tests/test_launcher.py
129

yeah good call - I've updated to replace these with more specific checks similar to what we have in dagster_graphql/dagster_graphql_tests/graphql/test_config_types.py

nate updated this revision to Diff 8198.Dec 28 2019, 5:58 AM
nate marked an inline comment as done.

up

nate updated this revision to Diff 8199.Dec 28 2019, 6:00 AM

rebase

nate added a comment.Dec 28 2019, 6:18 AM

If the actual pipeline being tested had zero deps beyond dagster and the built image had deps preinstalled we would be in a much better place no?

The pipeline being tested is built by test_image buildkite tasks, and now only depends on these dagster components.

The total test time for this diff on warm buildkite infra is about 8 minutes: ~4m for building test_image (see here), and ~4m for running k8s tests (see here).

See the enumerated options in the summary for how we might improve build times. In terms of the breakdown of where time is spent on the critical path of this diff, in sorted order:

  • 3m: kind cluster setup and run dagster-k8s tests
  • 2m30s: rebuilding dagit every build in test_image builds
  • 0 - 3 mins: extracting the integration images on the buildkite machines
  • 45-60s: building test_image docker image
  • 40s: dagster-k8s python env setup (e.g. installing dagster/dev-requirements.txt)
  • 20s: pushing test_image docker image
nate added a comment.Dec 28 2019, 6:21 AM

Actually FWIW it seems like dagster-airflow tests take longer to run than these, so we may want to start w/ optimization there: https://buildkite.com/dagster/dagster/builds/8791#50a5e6aa-f98b-4251-9b11-f62d3c146db5

nate updated this revision to Diff 8208.Dec 28 2019, 5:03 PM

test no build dagit

schrockn accepted this revision.Dec 30 2019, 12:12 AM

Cool. Glad to see this under test.

python_modules/libraries/dagster-k8s/dagster_k8s_tests/conftest.py
40

One thing that would be nice to do is to put the os.environ calls in a function function and then have a test that they exist and have a correct value. Then have the failure message explain what they need to be to function.

Our test environments are gathering undocumented requirements that make our test suite not a function of a hermetically sealed repo but instead a bunch of environment variables and state. This is both annoying to deal with as dagster team member and kind of hostile towards OSS contributors. (They should be able to run all tests without any creds and magic environment variables).

python_modules/libraries/dagster-k8s/dagster_k8s_tests/utils.py
10

would pretty a name like "remove_none_recursively" or something to more accurately describe what is going on.

22

this really does seem very, very fragile

This revision is now accepted and ready to land.Dec 30 2019, 12:12 AM
nate edited the summary of this revision. (Show Details)Dec 30 2019, 3:30 AM
nate marked 3 inline comments as done.Dec 30 2019, 6:44 PM
nate added inline comments.
python_modules/libraries/dagster-k8s/dagster_k8s_tests/conftest.py
40

Good call here. I've put these in test fixtures and added asserts, and also removed extraneous env vars where I could.

I wholeheartedly agree with this comment, I think this is worth a longer discussion (and would love @alangenfeld's ideas here also). Right now we're pretty tightly coupled to running this set of tests on AWS because we need ECR as the go-between for publishing and retrieving test images.

(the main reason being that it's actually much faster to publish to a registry and then pull down in kind than it is to pull from the local docker cache... go figure)

However, there's no reason you shouldn't be able to run these tests without ECR, maybe using GCR instead - or even in the ideal case we could fall back to transferring the images locally. Even if that means a 15min+ test run, at least it'll work without external deps.

python_modules/libraries/dagster-k8s/dagster_k8s_tests/utils.py
10

cool, done

22

yeah, we should revisit this everywhere we are currently using docker to invoke things - this is also how the airflow stuff is structured and I'm generally pretty uncomfortable with us hearing back from the containers this way.

nate updated this revision to Diff 8255.Dec 30 2019, 6:50 PM
nate marked 3 inline comments as done.

comments

nate edited the summary of this revision. (Show Details)Dec 30 2019, 6:50 PM
This revision was automatically updated to reflect the committed changes.