- User Since
- Oct 22 2019, 4:39 PM (17 w, 6 d)
Fri, Feb 21
Good catch! So I am not actually sure if I did this right....the problem goes a lot deeper. It turns out, solid_subset isn't being threaded through at all. This is my best bet at what I think the fix is?
- thread into schedule definition and add test
Thu, Feb 20
- made pylint disabling more human readable
Talked with @schrockn. The intermediate system should not be counted on after the compute step. This means we need to explicitly persist the ml model and yield the materialization. I will take on the eventing story in another revision because this requires a lot of change.
- got rid of silly name keyword param and made it required
- made docs fixes and added comment
Wed, Feb 19
Holy shit. In all the excitement, I realized I never checked out a branch....uhhhh gonna abandon this and get a new revision out with the fixes. I'll link this revision to that one but will need another LGTM
- rebasing to pick up buildkite changes.
- renamed to more explicit val
Regardless of whether pandas' API changes in the future, we need to be compatible with all versions of pandas which makes it difficult to have a strictly typed schema right? I mean it's a tradeoff, but I don't know if we want to tie ourselves to maintaining it.
- addressed feedback
- brought in LocalClient resource
Done because I realized I need to make a core change.
- rebased with master to pick up buildkite changes
Tue, Feb 18
- moved issues to the right locations
- add tracking issues and resolve feedback
- switched to commenting out and fixed issue with control flow logic
- patched all gcs calls
- forgot to make black
- rebased and fixed mypy issue
Mon, Feb 17
Re naming. I'm not married to serialization_options, however, kwargs seems a bit confusing right? Here is an example. What about read_csv_kwargs and to_csv_kwargs just so we are being hella explicit with what's happening here?
The problem with strongly typing the config is that while yes you get to catch errors early, you also create a really brittle API. Pandas API changes pretty frequently and new kwargs are added all the time. What happens if the user uses a version of rreadcsv that differs from the version we made a strongly typed config of? Or worse, what if a version of pandas makes a backward incompatible change? Our current version is the least opinionated about implementation details and if the user has a pin to a specific version of pandas and want to be opinionated, they can make their own. Right now I feel more people would make their own rather than use the default which feels wrong.
Just to clean this up, should we abandon this?
Gonna abandon this in favor of a different approach, to discuss at a different time.
Ok let me take a crack at an idea I have and put it up in a few! This actually might simplify things a lot.
That's a good point. Hmmm, we could have create_dagster_pandas_dataframe_type take in a input_hydration_config and output_materialization_config as params. Then we could have these selectors live in their respective libraries. However, this means that pandas would have to be a dependency of gcp/aws/azure/....
- fixed final testing call site
- made black fixes
- fixed tests call sites
- added google dependency to dagster pandas lib
- fixed lint bugs
Sat, Feb 15
This seems legit to me. Watched Max and did see it work in action.
Fri, Feb 14
Thu, Feb 13
abandoning in favor of tweaking tutorial to dump artifacts to a data directory. Will put up a new revision soon.
Rubber stamp. Nick did something too make sure this doesn’t conflict with that.
Wed, Feb 12
- use safe_tempfile_path instead of raw named temporary file
- got rid of type annotations to be python2 compliant
- got rid of todo
Since we are testing to make sure we are emitting human readable error strings, we ought to encode these in tests as well. AKA does the error message return a DagsterInvariantViolationError with the error mentioned above.
We really really really need to do some sort of snapshot deploy that gets triggered iff there was a code change to the tutorial docs and include it with the buildkite thing. Currently, it's too easy to mess things up.
Abandoning because I was convinced this was not the right way to do this.