Details
Diff Detail
- Repository
- R1 dagster
- Branch
- rl/test-speedup (branched from master)
- Lint
No Lint Coverage - Unit
No Test Coverage
Event Timeline
@max has tried this before, and we had to revert it due to increased flakiness. It does seem to have worked here though, so maybe unrelated fixes have fixed the flakiness?
let's kick off some more runs just to see how this holds up against flakiness. was honestly expecting more of an performance gain - current tests are running at around 20-23m and this went to 19m. not a bad win, but will probably need to apply this to all tox.ini files
probably need to apply this to all tox.ini files
the long poll of the build suite is core_tests which depends on the step to build the test image. I am not sure why core depends on the test image build where most of the other tests do not.
https://buildkite.com/dagster/dagster-diffs/builds/18738/waterfall
https://buildkite.com/dagster/dagster-diffs/builds/18738/dag
This diff would get us ~15% speed up for low effort assuming its stable, though it looks like we caught 1 failure 6 runs.
To get the total test suite time down, I think getting the subset of core_tests that need the test image in to their own test suite could go a long way
yeah I think the failure in https://buildkite.com/dagster/dagster-diffs/builds/18742#6bddd65b-2c1f-4201-9776-5675ec89abf6 is representative of last time we tried this unfortunately. It kind of makes sense that tests that involve processes and interrupts would interfere with each other sometimes when running in different threads in the same process.
I think one thing we could do is take tests that depend on spinning up subprocesses or threads and move those into integration_tests, and be more aggro about what we run on every diff as fast unit tests (trading off completeness for test speed basically). Most of dagster-graphql should probably be considered an integration test, for example, and the tests in that failure that spin up actual gRPC subprocesses probably should be considered integration tests. That's a much bigger overhaul though. A lot of those tests probably overlap with the top slowest tests in alex's other diff thuogh.
This definitely looks cleaner than the last time I tried it. I'd want to see ~100 test runs succeed before approving this. The failure above might be due to a conflict with the GRPC generation test, test_compiled_protobuf, which could definitely be isolated but is definitely not isolated at present. As you've discovered, the big wins in test time will come from more intelligent use of fixtures and from writing more true unit tests, not from multiprocessing.