Page MenuHomeElementl

Fix inclusion of partitions mapped from execution time
AbandonedPublic

Authored by rexledesma on May 19 2021, 7:39 AM.
Tags
None
Referenced Files
Unknown Object (File)
Mon, May 22, 3:30 AM
Unknown Object (File)
Wed, May 10, 10:53 AM
Unknown Object (File)
Wed, May 10, 10:48 AM
Unknown Object (File)
Tue, May 9, 11:55 AM
Unknown Object (File)
Apr 19 2023, 9:41 AM
Unknown Object (File)
Apr 16 2023, 4:37 AM
Unknown Object (File)
Apr 6 2023, 6:31 AM
Unknown Object (File)
Mar 19 2023, 7:07 PM
Subscribers
None

Details

Summary

Depends on D7654.

A previous attempt was at https://dagster.phacility.com/D7656.

Inclusion of executed partitions from a schedule tick depends on the
partition offset. However, we naively just popped off the last partition
if any offset existed. If the offset is greater than 1, this logic
would not make any sense.

One approach to fix this is to apply the execution_time_to_partition_fn
on the end time of the schedule tick iterator. This way, any partition
in the range between the offset and the end time would not be included.

Test Plan

Add test cases that would have failed before, fix old tests

Diff Detail

Repository
R1 dagster
Branch
rl/fix-inclusive-logic
Lint
Lint Passed
Unit
No Test Coverage

Event Timeline

will probably need to patch D7654 along with this diff, since it can't stand alone anymore after the changes to serialize execution_time_to_partition_fn in the partition params

Harbormaster returned this revision to the author for changes because remote builds failed.May 19 2021, 8:16 AM
Harbormaster failed remote builds in B30800: Diff 37884!

this change makes sense to me and is smart - see one inline thing about how we should handle end_date being explicitly set, as I don't think that case should use the offset

python_modules/dagster/dagster/core/definitions/decorators/schedule.py
506–531

oops this is repeated

python_modules/dagster/dagster/core/definitions/partition.py
70

this parameter is no longer used and should disappear right?

110–111

this seems right in the case where end was not explicitly passed in and is intended to represent the current time - but if it was passed in, end is (I believe) intended to be an end partition time, not an end execution time.

(i.e. if I do daily_schedule(start=today, end=2 days from now, day_offset=1)

The expected behavior is that it run for 3 days starting tomorrow (not 2 days)

Otherwise this makes a ton of sense to me, in the common case where you set a start time but no end time, no point in returning a partition that hasn't executed yet (at least where we have the cron scheduler which is always looking at the last partition)

This diff has made me realize that the cron scheduler might just not be working at all when you set an explicit end_date, incidentally... probably not the end of the world since we are about to remove the cron scheduler entirely

python_modules/dagster/dagster_tests/core_tests/definitions_tests/test_decorators.py
575

my feedback above is making me realize that 'a schedule with an end_date explicitly set' is conspiciously absent from these tests :/ maybe include one here just for daily? And as per feedback above, that would want to go through the end_date, not partition_days_offset days before the end_date (i.e. that test should have existed before and this change as currently written probably should have broken it)

This revision now requires changes to proceed.May 21 2021, 4:11 PM
python_modules/dagster/dagster_tests/scheduler_tests/test_scheduler_timezones.py
826–830

Hey @rexledesma, I missed the earlier diff that this was based on, so forgive me for using this diff to supply a more general comment: This change makes a lot of sense to me. My main feedback is on the name. I’m having trouble getting into the headspace where the name “partition params” describes an object that’s responsible for generating a collection of partitions. If we hadn’t already used the name PartitionSet, I feel like that would be ideal. Maybe just Partitions?