Depends on D1737
This expresses the kubernetes kind/helm tests as pytest fixtures and cleans up the shell scripting that we were using previously. So now the tests can be run locally in the integration image with:
docker run -v /var/run/docker.sock:/var/run/docker.sock -v ~/src/dagster:/workdir -it 968703565975.dkr.ecr.us-west-1.amazonaws.com/buildkite-integration:py3.7.6-2019-12-29T212402 /bin/bash pip install -e python_modules/dagster -e python_modules/dagster-graphql -e python_modules/libraries/dagster-k8s export AWS_ACCESS_KEY_ID="..." export AWS_SECRET_ACCESS_KEY="..." export DAGSTER_DOCKER_IMAGE_TAG="..." export DAGSTER_DOCKER_REPOSITORY="..." pytest python_modules/libraries/dagster-k8s/dagster_k8s_tests/ -s
Note that while D1737 improves performance for the test image build step that is upstream of this test, this test still takes ~4m30s on average. So it adds a small amount of extra time to our build critical path since our current longest tests (airline demo) run for ~6m30s and this test is serialized after the test image build step.
Options for further build time improvements are:
- Reduce the size of the integration image; between the first stage docker image build and this set of tests, extracting the integration image adds ~2m30s to the critical path of our builds
- (long term) Enable using a vanilla Dagit image in Helm (i.e. a Dagit container built without any client code), instead of rebuilding Dagit in the test images every build. This would save ~2m30s from the critical path.
- Treat k8s tests as non-blocking for merges, whether via cultural means, by permitting merges with ongoing k8s builds, or by creating a separate build pipeline that is triggered by the primary (I don't love the latter option, it'd be hard to associate the child build with the parent build when debugging a failure)
- (not recommended, I think this would be more trouble than it is worth) Set up a standing EKS kubernetes cluster with scale-to-zero node pools for CI/CD, instead of using kind. Might save small amount of time from build times especially with a warm cluster, but depends on EKS auto-scaling behavior. Also, build isolation would be a real challenge—has high risk of build interactions causing confusing errors.
- Don't test kubernetes or helm in CI. Reduces build time by ~5m