Page MenuHomePhabricator

Snapshot parallel timeseries to make feeding data into LSTM easier.
ClosedPublic

Authored by themissinghlink on Nov 28 2019, 12:47 AM.

Details

Summary

Changelog

  • Moved constants out to a new file.
  • Create a Timeseries and a MultivariateTimeseries object to handle the logic and all associated integrations with pandas that transforms a sequence into a snapshot sequence which will be used by the LSTM
  • Adds unit tests
  • Adds a TrainingSet type along with unit tests for the type checking code.
  • Adds a solid needed to produce a training set from the actual code
  • Adds a matrix_param check (along with a suite of unit tests) which handles invariant checks for matrix-ish lists of lists and use it in the MultivariateTimeseries.
Test Plan

unit

Diff Detail

Repository
R1 dagster
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

  • added tests for timeseries and also added checks
  • added test cases for training set type
themissinghlink edited the summary of this revision. (Show Details)Dec 2 2019, 9:28 PM
themissinghlink added reviewers: max, nate.
max accepted this revision.Dec 3 2019, 9:42 PM
max added inline comments.
examples/dagster_examples/bay_bikes/solids.py
276

cool, or a "rolling window"

examples/dagster_examples/bay_bikes/types.py
291

are these three different things or two?

313

how does this API feel?

This revision is now accepted and ready to land.Dec 3 2019, 9:42 PM
themissinghlink added inline comments.Dec 3 2019, 9:55 PM
examples/dagster_examples/bay_bikes/types.py
291

So you are totally right that X,y are different things but they come from the same dataset A. X is a transformed subset of A, wheras y is a transformed column from A. It would be odd to represent these things as semantically different because each is meaningless without its compliment. This is why I thought a tuple would be the best way to couple the two loosely, unsure if this was the right thing to do?

313

I am a huge fan of it! I will say that we ought to provide some sugar to make DQ reusable, right now, it's too freeform and I feel like I am typing a lot. I have an ideas for explicitly dataframes, but we can discuss in an RFC later.