From situation hit during ... "user research", make it so InvokedSolidOutputHandle does not destructure accidentally in to two strings by making it an object and overriding methods to provide good error messages.
- Group Reviewers
- R1:53730741720c: [composition] improve output node misuse errors
new test cases
Hmmmm...I get the use case of avoiding the accidental destructuring of tuples but I wonder if we are also imposing too much rigidity as a result. I forsee many use cases (specifically in ML) where a common pattern is to return a tuple and have it destructured so that pieces can be passed into different parts of the control flow. Here is an example. Let's say I have a solid that downloads a dataset and throws it into a dataframe. In the simplest use case, I want to split that dataframe into a test set and a training set. These functions generally look like the following:
from sklearn import fancy_sk_learn_function def split_test_train(training_set): X, y = training_set[[FEATURE_COLUMNS]], training_set[LABEL_COLUMN] x_train, y_train, x_test, y_test = fancy_sk_learn_function(training_set) return x_train, y_train, x_test, y_test
The training sets (x_train, y_train) are then fed into a training function whereas the test sets (x_test, y_test) are thrown into a model evaluation function. This gets even more complicated when you further split out your training set into k randomized cross validation datasets. I don't have a very clear mental model for what solids look like when we remove the ability to destructure or at the very least __getitem__.
Implementation wise, this looks legit!
a common pattern is to return a tuple and have it destructured so that pieces can be passed into different parts of the control flow
So this doesn't prevent that - this is just catching an error where someone could be trying to do it incorrectly. The complexity here comes from how "composition" functions work (@pipeline, @composite_solid`) where we use a python function to capture the DAG vs. regular soilds where the body of the function does the compute. This change only affects composition functions, not regular solids.
So before this diff, this would cause an error:
@solid def fancy_sk_learn_function(_) -> tuple: # do stuff return (x_train, x_test) @pipeline def split_test_train(): ... # in composition functions, solid invocations return  which we track and use to wire # up the dag. Since it happened to be a tuple this would work but our tracking would get strings # instead of InvokedSolidOutputHandle and throw a confusing error. x_train, x_test = fancy_sk_learn_function(training_set)
The fix to be able to send different pieces of information through the pipeline is to use MultipleOutputs or yield multiple Outputs and align the output defs accordingly
@solid(output_defs=[OutputDefinition(Train, 'x_train'), OutputDefinition(Test, 'x_test')]) def fancy_sk_learn_function(_): # do stuff yield Output(x_train, 'x_train') yield Output(x_test, 'x_test') @pipeline def split_test_train(): ... # all gravy, we return a namedtuple of InvokedSoildOutputHandles when a solid # is defined to have multiple outputs x_train, x_test = fancy_sk_learn_function(training_set)