Depends on D8276.
This RFC seeks to answer the question: what do we do with partition sets in the world of crag?
Right now, partition sets are standalone objects that kind of float around. They're awkward in that there are theoretically multiple per mode, but in reality it almost never makes sense to do this. The problem is arguably worse in crag: if a job is meant to be a single thing that can be executed, it's weird for the job page to have a drop-down for partition sets.
Conceptually, partition sets are part of a job in the same way that config mappings are part of a job: they essentially define an interface for parameterizing the job. In the case of config, the input space is infinite. In the case of a partition set, it's finite.
This proposes making a partition set just a function that you can attach to a job and that generates a list of config values.
So, where before you would do this:
my_partition_set = PartitionSetDefinition( name="date_partition_set", pipeline_name="my_pipeline", partition_fn=get_date_partitions, run_config_fn_for_partition=run_config_for_date_partition, )
Instead you could do:
my_job = my_graph.to_job( config_fn=run_config_for_date_partition, partitions=get_date_partitions, )
The job is then launchable by just supplying a partition as config. and also easily testable:
def test_my_job(): # make sure the first partition appropriately parameterizes the job validate_run_config(my_job, run_config=my_job.partition_fn()) # execute the job with the first partition execute_job(my_job, run_config=my_job.partition_fn())