Page MenuHomeElementl

Save partition offset in partition parameters
AbandonedPublic

Authored by rexledesma on May 19 2021, 1:40 AM.
Tags
None
Referenced Files
Unknown Object (File)
Tue, May 23, 10:26 PM
Unknown Object (File)
Mon, May 8, 11:29 PM
Unknown Object (File)
Apr 7 2023, 8:29 PM
Unknown Object (File)
Apr 7 2023, 10:33 AM
Unknown Object (File)
Apr 6 2023, 1:38 AM
Unknown Object (File)
Mar 29 2023, 5:15 AM
Unknown Object (File)
Mar 27 2023, 7:39 PM
Unknown Object (File)
Mar 19 2023, 5:59 PM
Subscribers
None

Details

Summary

Depends on D7621.

schedule_partition_range requires a function to map execution time
to the partition being filled, so save all the information required to
create this function.

Test Plan

pytest

Diff Detail

Repository
R1 dagster
Branch
rl/partition-set-refactor-2
Lint
Lint Passed
Unit
No Test Coverage

Event Timeline

python_modules/dagster/dagster/core/definitions/partition.py
162–174

I'm a little worried about these getting unwieldy... do we just expect most of the time-based partition params to be constructed by one of the schedule decorators?

python_modules/dagster/dagster/core/definitions/partition.py
170–174

@prha would having a separate PartitionOffset namedtuple and PartitionExecutionTime namedtuple help with the unwieldiness? Kind of a cosmetic difference but a lot of the explosion of fields seems to stem from those two structures having a lot of optiosn

python_modules/dagster/dagster/core/definitions/partition.py
170–174

I'm mostly concerned about the API docs. If we feel like a user could look at the docs site and figure out what they should use without getting overwhelmed, I think this is fine. I was thinking that it might be easier to describe the containing offset parameter and the containing execution time parameter rather than each individual value, but agree that it's mostly cosmetic.

I was going to suggest writing out the exploded api docs out in this diff, but then realized this isn't even exported.

If we don't expect users to populate these directly, I don't care even a little about the unwieldiness.

python_modules/dagster/dagster/core/definitions/partition.py
170–174

I'm actually a little unclear on whether users create these directly - these are intended to be kind of like serializable versions of schedule_partition_range / date_partition_range right rex? What do you expect the way for a user is to create a time-based partition set (and a schedule based on a time-based partition set) once all is said and done?

python_modules/dagster/dagster/core/definitions/partition.py
170–174

My current thinking is that TimeBasedPartitionParams will be firstly be used by the schedule decorators. It's basically a serialized cron schedule, so any partition set that can be expressed through a cron schedule can be expressed through this class.

The schedule decorators thread these partition params through to the current PartitionSetDefinition in a partition_params field, as a replacement for the user specified partition_fn. In the future, we'll just expose docs on how set partition_params and then deprecate partition_fn fully (as its behavior is fully encapsulated by Static/TimeBased/Dynamic partition params).

A couple of relatively cosmetic things (TimeBased => ScheduleBased, coalescing the partition_offset fields) and I think this makes a ton of sense)

python_modules/dagster/dagster/core/definitions/partition.py
156

kind of to phil's point - I think this name would make a lot more sense to me as ScheduleBasedPartitionParams or just SchedulePartitionParams (since it's a partition set that is generated starting from a schedule). With TimeBasedPartitionParams I get tripped up imagining how date_partition_range is going to map onto this (I imagine we would actually end up with a different, simpler DateRangePartitionParams class to support that, since the two cases are obnoxiously, subtlely different due to timezone shenanigans).

156–174

I think this really should be ScheduleBasedPartitionParams

170–174

having a separate partition_offset namedtuple (with optional minutes/hours/days/etc. fields) will help make this more comprehensible in the docs I think. I feel less strongly about execution_datetime but that could be an option as well. I also think you could consider a cronstring field and do some string parsing/validation, but I don't feel strongly about that either.

288–299

doing this in reverse and deriving the values from the cron string would also be an option?

This revision now requires changes to proceed.May 21 2021, 4:36 PM