just a draft
- need to update spark installation to fix failing spark_dag test
- maybe move airflow dags under examples? but i also like having an airflow_playground directory for easier local dev
With your example, would good to demonstrate what it looked like in an Airflow world.
I think you'll want to chop this up a bit too. Suggestion.
- One diff for the dependency structure translation
- One diff for inner execution
- One diff for execution_date
- One diff for all the testing infra you'll need
definitely will not want to check this in
For inputs, I would have a *single* input, and the rely on our fan-in feature.
Agh lost the most important comment.
We are going to need to look at this. It is a requirement that the *same* execution_date is flow through an entire computation. It is generated once at the beginning and every solid/step requires the same date. This for a case where, for example, an hourly job takes more than an hour. We don't want it to just start writing to the next partition magically. Also a delay in scheduling could cause the same issue.
@nate can provide more context here and invalidate/validate this point.
yeah you're correct here—IIRC it's the responsibility of a DagRun object to own that single execution date. This is an area of Airflow internals I haven't looked at in a while, but you might need to construct one of those and associate with these task instances?