Page MenuHomeElementl

feat: retrieve partition set names from external partition set in schedule based partitions
Needs RevisionPublic

Authored by rexledesma on Jun 3 2021, 4:24 AM.



Here, we serialize all the necessary information to reconstruct a schedule based partition set from the
parameters in the external partition set, rather than hydrating these values from the grpc server.

Since datetimes cannot be serialized, we deconstruct datetimes as components that can be serialized, and
use that to reconstruct the original datetime.

Test Plan


Diff Detail

R1 dagster
Lint Passed
No Test Coverage

Event Timeline

rexledesma retitled this revision from Add partition params to external partition set to feat: retrieve partition set names from external partition set in schedule based partitions.Jun 11 2021, 5:47 PM
rexledesma edited the summary of this revision. (Show Details)
dgibson requested changes to this revision.Jun 15 2021, 4:09 AM

overall this makes sense to me - tweak some names and remove some extra timezone fields and i think we're good


is this "if" backwards? this would make sense to me if it were if *not* isinstance (pendulum timestamps implement timestamp() correctly)


i don't think we need to include three timezones in this object - a start UTC timestamp, end UTC timestamp, and execution timezone should be sufficient. We can assume that the execution timezone is always UTC if it is not set, and assume that the start and end dates either have no timezone (in which case we can assume they are expressed in the execution timezone) or must be pendulum datetimes with a timezone that matches the execution timezone.

so i think the logic is something like in schedule_partition_range

tz = execution_timezone if execution_timezone else "UTC"

_start = (
    to_timezone(start_date, tz)
    if isinstance(start, PendulumDateTime)
    else pendulum.instance(start, tz=tz).  # Change "2020-01-01" to the same date, but in the right timezone
start_timestamp = _start.timestamp()

comment explaining the format here? day of month / day of week / etc. depending on the schedule type right?

(you could consider having separate fields so that they can be validated differently -day of month can go past 6)


as above, i think we can make this start_timestamp / end_timestamp / timezone and we're good


this doesn't have to be a perfect recreation of what they passed into the schedule function. We can coerce it into a correctly-timezoned pendulum time here even if they originally passed in a naive datetime




get_external_partition_names (although as per above I think we need to adjust the name here to clarify that this won't always be there for all partition sets)


this should probably be called partition_names not partition_set_names

can we do more to explain here when/why this is set on some partition sets but not others? I think as a new person to this function I would be very surprised to find that this was not set on every partition set

I would say static_partition_names but that doesn't really encompass the time-based case

Is there a succinct way to express 'doesnt require running user code' and including that in the name here? I think that would help a lot with the confusion


some naming tweaks will make this logic make a lot more sense I think, if we can just find the right names... I will sleep on it

if has_****_partition_names():
  compute it locally
  go out to gRPC

has_locally_computable_partition_names? Bit of a mouthful, but maybe more clear?


its too bad that this case does not get to take advantage of your stuff. are we still planning to introduce a breaking change to create_schedule_definition at some point to make it only work with time based partition param partition sets? (not blocking for this diff)

This revision now requires changes to proceed.Jun 15 2021, 4:09 AM