Fix: date_part to extract only the requested part (not the overall interval)#7189
Fix: date_part to extract only the requested part (not the overall interval)#7189alamb merged 11 commits intoapache:mainfrom
date_part to extract only the requested part (not the overall interval)#7189Conversation
| match part { | ||
| DatePart::Year => Ok(self.unary_opt(|d: IntervalMonthDayNano| Some(d.months / 12))), | ||
| DatePart::Month => Ok(self.unary_opt(|d: IntervalMonthDayNano| Some(d.months))), | ||
| DatePart::Month => Ok(self.unary_opt(|d: IntervalMonthDayNano| Some(d.months % 12))), |
There was a problem hiding this comment.
D SELECT datepart('year', interval '12 month');
┌────────────────────────────────────────────────┐
│ datepart('year', CAST('12 month' AS INTERVAL)) │
│ int64 │
├────────────────────────────────────────────────┤
│ 1 │
└────────────────────────────────────────────────┘
D SELECT datepart('month', interval '12 month');
┌─────────────────────────────────────────────────┐
│ datepart('month', CAST('12 month' AS INTERVAL)) │
│ int64 │
├─────────────────────────────────────────────────┤
│ 0 │
└─────────────────────────────────────────────────┘
There was a problem hiding this comment.
Postgres agrees
postgres=# SELECT extract (month from interval '12 month');
extract
---------
0
(1 row)
Before this PR arrow also currently says 12.
> SELECT datepart('month', interval '12 month');
+---------------------------------------------------------------------------------------------------------------+
| date_part(Utf8("month"),IntervalMonthDayNano("IntervalMonthDayNano { months: 12, days: 0, nanoseconds: 0 }")) |
+---------------------------------------------------------------------------------------------------------------+
| 12 |
+---------------------------------------------------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.001 seconds.
I think this change makes sense
| } | ||
| DatePart::Nanosecond => { | ||
| Ok(self.unary_opt(|d| (d.milliseconds % (60 * 1_000)).checked_mul(1_000_000))) | ||
| } |
There was a problem hiding this comment.
D SELECT datepart('m', interval '1m 61s 33ms 44us');
┌─────────────────────────────────────────────────────┐
│ datepart('m', CAST('1m 61s 33ms 44us' AS INTERVAL)) │
│ int64 │
├─────────────────────────────────────────────────────┤
│ 2 │
└─────────────────────────────────────────────────────┘
D SELECT datepart('s', interval '1m 61s 33ms 44us');
┌─────────────────────────────────────────────────────┐
│ datepart('s', CAST('1m 61s 33ms 44us' AS INTERVAL)) │
│ int64 │
├─────────────────────────────────────────────────────┤
│ 1 │
└─────────────────────────────────────────────────────┘
D SELECT datepart('ms', interval '1m 61s 33ms 44us');
┌──────────────────────────────────────────────────────┐
│ datepart('ms', CAST('1m 61s 33ms 44us' AS INTERVAL)) │
│ int64 │
├──────────────────────────────────────────────────────┤
│ 1033 │
└──────────────────────────────────────────────────────┘
D SELECT datepart('us', interval '1m 61s 33ms 44us');
┌──────────────────────────────────────────────────────┐
│ datepart('us', CAST('1m 61s 33ms 44us' AS INTERVAL)) │
│ int64 │
├──────────────────────────────────────────────────────┤
│ 1033044 │
│ (1.03 million) │
└──────────────────────────────────────────────────────┘
There was a problem hiding this comment.
This is a behavior change for arrow-rs.
duckdb says 1
SELECT datepart('s', interval '1m 61s 33ms 44us'); ┌─────────────────────────────────────────────────────┐ │ datepart('s', CAST('1m 61s 33ms 44us' AS INTERVAL)) │ │ int64 │ ├─────────────────────────────────────────────────────┤ │ 1 │ └─────────────────────────────────────────────────────┘
postgres says almost 1:
postgres=# SELECT extract (seconds from interval '1m 61s 33ms 44us');
extract
----------
1.033044
(1 row)
Arrow says 121 (before this PR)
> SELECT datepart('s', interval '1m 61s 33ms 44us');
+---------------------------------------------------------------------------------------------------------------------+
| date_part(Utf8("s"),IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 0, nanoseconds: 121033044000 }")) |
+---------------------------------------------------------------------------------------------------------------------+
| 121 |
+---------------------------------------------------------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.004 seconds.
There was a problem hiding this comment.
Same thing for millisecond and microseconds
| @@ -464,11 +464,15 @@ impl ExtractDatePartExt for PrimitiveArray<IntervalDayTimeType> { | |||
| DatePart::Week => Ok(self.unary_opt(|d| Some(d.days / 7))), | |||
| DatePart::Day => Ok(self.unary_opt(|d| Some(d.days))), | |||
| DatePart::Hour => Ok(self.unary_opt(|d| Some(d.milliseconds / (60 * 60 * 1_000)))), | |||
There was a problem hiding this comment.
D SELECT datepart('hour', interval '25 hour');
┌───────────────────────────────────────────────┐
│ datepart('hour', CAST('25 hour' AS INTERVAL)) │
│ int64 │
├───────────────────────────────────────────────┤
│ 25 │
└───────────────────────────────────────────────┘
D SELECT datepart('day', interval '25 hour');
┌──────────────────────────────────────────────┐
│ datepart('day', CAST('25 hour' AS INTERVAL)) │
│ int64 │
├──────────────────────────────────────────────┤
│ 0 │
└──────────────────────────────────────────────┘
There was a problem hiding this comment.
For the record, the current arrow behavior is the same it seems:
> SELECT datepart('hour', interval '25 hour');
+--------------------------------------------------------------------------------------------------------------------------+
| date_part(Utf8("hour"),IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 0, nanoseconds: 90000000000000 }")) |
+--------------------------------------------------------------------------------------------------------------------------+
| 25 |
+--------------------------------------------------------------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.030 seconds.
> SELECT datepart('day', interval '25 hour');
+-------------------------------------------------------------------------------------------------------------------------+
| date_part(Utf8("day"),IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 0, nanoseconds: 90000000000000 }")) |
+-------------------------------------------------------------------------------------------------------------------------+
| 0 |
+-------------------------------------------------------------------------------------------------------------------------+
date_part to extract only the requested part (not the overall interval)
| @@ -464,11 +464,15 @@ impl ExtractDatePartExt for PrimitiveArray<IntervalDayTimeType> { | |||
| DatePart::Week => Ok(self.unary_opt(|d| Some(d.days / 7))), | |||
| DatePart::Day => Ok(self.unary_opt(|d| Some(d.days))), | |||
| DatePart::Hour => Ok(self.unary_opt(|d| Some(d.milliseconds / (60 * 60 * 1_000)))), | |||
There was a problem hiding this comment.
For the record, the current arrow behavior is the same it seems:
> SELECT datepart('hour', interval '25 hour');
+--------------------------------------------------------------------------------------------------------------------------+
| date_part(Utf8("hour"),IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 0, nanoseconds: 90000000000000 }")) |
+--------------------------------------------------------------------------------------------------------------------------+
| 25 |
+--------------------------------------------------------------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.030 seconds.
> SELECT datepart('day', interval '25 hour');
+-------------------------------------------------------------------------------------------------------------------------+
| date_part(Utf8("day"),IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 0, nanoseconds: 90000000000000 }")) |
+-------------------------------------------------------------------------------------------------------------------------+
| 0 |
+-------------------------------------------------------------------------------------------------------------------------+
| } | ||
| DatePart::Nanosecond => { | ||
| Ok(self.unary_opt(|d| (d.milliseconds % (60 * 1_000)).checked_mul(1_000_000))) | ||
| } |
There was a problem hiding this comment.
This is a behavior change for arrow-rs.
duckdb says 1
SELECT datepart('s', interval '1m 61s 33ms 44us'); ┌─────────────────────────────────────────────────────┐ │ datepart('s', CAST('1m 61s 33ms 44us' AS INTERVAL)) │ │ int64 │ ├─────────────────────────────────────────────────────┤ │ 1 │ └─────────────────────────────────────────────────────┘
postgres says almost 1:
postgres=# SELECT extract (seconds from interval '1m 61s 33ms 44us');
extract
----------
1.033044
(1 row)
Arrow says 121 (before this PR)
> SELECT datepart('s', interval '1m 61s 33ms 44us');
+---------------------------------------------------------------------------------------------------------------------+
| date_part(Utf8("s"),IntervalMonthDayNano("IntervalMonthDayNano { months: 0, days: 0, nanoseconds: 121033044000 }")) |
+---------------------------------------------------------------------------------------------------------------------+
| 121 |
+---------------------------------------------------------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.004 seconds.
| } | ||
| DatePart::Nanosecond => { | ||
| Ok(self.unary_opt(|d| (d.milliseconds % (60 * 1_000)).checked_mul(1_000_000))) | ||
| } |
There was a problem hiding this comment.
Same thing for millisecond and microseconds
| match part { | ||
| DatePart::Year => Ok(self.unary_opt(|d: IntervalMonthDayNano| Some(d.months / 12))), | ||
| DatePart::Month => Ok(self.unary_opt(|d: IntervalMonthDayNano| Some(d.months))), | ||
| DatePart::Month => Ok(self.unary_opt(|d: IntervalMonthDayNano| Some(d.months % 12))), |
There was a problem hiding this comment.
Postgres agrees
postgres=# SELECT extract (month from interval '12 month');
extract
---------
0
(1 row)
Before this PR arrow also currently says 12.
> SELECT datepart('month', interval '12 month');
+---------------------------------------------------------------------------------------------------------------+
| date_part(Utf8("month"),IntervalMonthDayNano("IntervalMonthDayNano { months: 12, days: 0, nanoseconds: 0 }")) |
+---------------------------------------------------------------------------------------------------------------+
| 12 |
+---------------------------------------------------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.001 seconds.
I think this change makes sense
|
I agree this change makes sense for consistency with established systems. |
|
As this is a behaviour change, even if the consensus appears to be that the current behaviour is somewhere between undesirable and a bug, should this wait until the next major release? |
That probably makes sense That will be in April so not that long: |
|
Marking with label |
|
🚀 -- thank you for your patience @delamarch3 |
Which issue does this PR close?
date_partusing an interval returns an incorrect result #7182Rationale for this change
date_partwith an interval returns results inconsistent with implementations in duckdb and postgres.What changes are included in this PR?
Parts are excluded from the interval:
Milliseconds, microseconds and nanoseconds will exclude the minutes and hours from the date part, but not seconds, so
date_part('ms', interval '61s')will return 1000Seconds will exclude minutes and hours, so
date_part('s', interval '3600s')will return 0Hours will not exclude days, so
date_part('h', interval '25h')returns 25Months will exclude years, so
date_part('month', interval '13 months')returns 1Are there any user-facing changes?
Yes,
date_partwithIntervalDayTimeTypeandIntervalMonthDayNanoTypewill return a different result