Page MenuHomeElementl

Use tz_localize to get naive local timestamp

Authored by jordansanders on Jul 2 2021, 9:04 PM.
Referenced Files
F2761391: D8694.diff
Fri, Jan 27, 12:56 AM
Unknown Object (File)
Tue, Jan 3, 11:13 AM
Unknown Object (File)
Nov 26 2022, 12:21 PM
Unknown Object (File)
Nov 18 2022, 12:37 AM
Unknown Object (File)
Nov 18 2022, 12:37 AM
Unknown Object (File)
Nov 18 2022, 12:37 AM
Unknown Object (File)
Nov 14 2022, 4:14 PM
Unknown Object (File)
Nov 11 2022, 1:59 AM



Pandas 1.3.0 introduces a bunch of changes and deprecations around how
timestamps are handled. With the change, calling
Timestamp.min.replace() or Timestamp.max.replace() causes out of
bounds errors:

pandas._libs.tslibs.np_datetime.OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1677-09-21 00:12:43

This is presumably the same kind of error that this code is trying to
avoid - by converting min and max timestamps to something slightly
inside the true bounds.

It seems that Timestamp.min.replace(tzinfo=None) and
Timestamp.min.tz_localize(None) are semantically equivalent - and as
an added bonus, the latter is more performant and doesn't hit our out of
bounds errors:

I believe the risk of this causing issues is low - if I understand the
surrounding code correctly, it only matters when trying to validate
DataFrames that have Timestamps that are very close to the actual
min/max valid Timestamp.

Test Plan


Diff Detail

R1 dagster
Lint Not Applicable
Tests Not Applicable