Page MenuHomePhabricator

up pickle protocol

Authored by catherinewu on Oct 28 2020, 3:56 PM.



User was getting "OSError: [Errno 22] Invalid argument" when the default intermediate storage called pickle.dump with PICKLE_PROTOCOL = 2 which can happen for very large objects. protocol 4 adds support for pickling very large objects:

there are potential issues if the pickle writer gets ahead of the pickle reader. I think there are some rare edge cases where this is unsafe, such as if you use python2 to re-execute a pipeline run that was originally executed in python3

separately "OSError: [Errno 22] Invalid argument" from the default intermediate storage is a pretty confusing error and it's not clear for the user what they should do next (set up a diff intermediate storage)

Test Plan


Diff Detail

R1 dagster
Automatic diff as part of commit; lint not applicable.
Automatic diff as part of commit; unit tests not applicable.

Event Timeline


v5 means that there also wont be compat from 3.8 back to previous versions of 3


yeah i think my primary concern is that new dagster users who are processing lots of data can easily hit this error and it's unclear what the next steps they should take are. i think the increase in data size supported from protocol 2->4 is substantial so it would mitigate the underlying problem. im starting to think we should have a page like

we definitely can't do this

we can catch the OSError in the default storage and provide hints around it as to what to do, but we can't do this.

get some other intermediate folks in the mix about what to do


I'm not super familiar with pickle: @max do you know whether going up to 4 would cause similar problems? we'd lose compatibility with python 2?


yeah we would lose interop between python 2 and python 3 processes

pinning and accepting breakage is one thing, but HIGHEST_PROTOCOL is going to be different on different minor versions of py3

Ok, since we've dropped 2.7, we can go to pickle protocol 4, but not to HIGHEST_PROTOCOL

rebase + use pickle protocol 4

worth mentioning this in changelog or

This revision is now accepted and ready to land.Nov 30 2020, 11:54 PM
This revision was automatically updated to reflect the committed changes.