Hi all,
I am playing around with the table API, and I have a doubt about temporal operator overlaps. In particular, a test in the scalarFunctionsTest.testOverlaps checks for false the following intervals: testAllApis( temporalOverlaps("2011-03-10 05:02:02".toTimestamp, 0.second, "2011-03-10 05:02:02".toTimestamp, "2011-03-10 05:02:01".toTimestamp), "temporalOverlaps(toTimestamp('2011-03-10 05:02:02'), 0.second, " + "'2011-03-10 05:02:02'.toTimestamp, '2011-03-10 05:02:01'.toTimestamp)", "(TIMESTAMP '2011-03-10 05:02:02', INTERVAL '0' SECOND) OVERLAPS " + "(TIMESTAMP '2011-03-10 05:02:02', TIMESTAMP '2011-03-10 05:02:01')", "false") Basically, the compared intervals overlap just by one of the extreme. The interpretation of the time.scala implementation is AND( >=(DATETIME_PLUS(CAST('2011-03-10 05:02:02'):TIMESTAMP(3) NOT NULL, 0), CAST('2011-03-10 05:02:02'):TIMESTAMP(3) NOT NULL), >=(CAST('2011-03-10 05:02:01'):TIMESTAMP(3) NOT NULL, CAST('2011-03-10 05:02:02'):TIMESTAMP(3) NOT NULL) ), Where the result is false as the second clause is not satisfied. However, latest calcite master compiles the overlaps as follows: [AND ( >=( CASE( <=(2011-03-10 05:02:02, DATETIME_PLUS(2011-03-10 05:02:02, 0)), DATETIME_PLUS(2011-03-10 05:02:02, 0), 2011-03-10 05:02:02 ), CASE( <=(2011-03-10 05:02:02, 2011-03-10 05:02:01), 2011-03-10 05:02:02, 2011-03-10 05:02:01 ) ), >=( CASE( <=(2011-03-10 05:02:02, 2011-03-10 05:02:01), 2011-03-10 05:02:01, 2011-03-10 05:02:02 ), CASE( <=(2011-03-10 05:02:02, DATETIME_PLUS(2011-03-10 05:02:02, 0)), 2011-03-10 05:02:02, DATETIME_PLUS(2011-03-10 05:02:02, 0) ) ) ) ] Where the result is true. I believe the issue is about interpreting the extremes as part of the overlapping intervals or not. Flink does not consider the intervals as overlapping (as the test shows), whereas Calcite implements the test including them. Which one should be preserved? I think that calcite implementation is correct, and overlapping extremes should be considered. What do you think? Best, Stefano |
Hi Stefano,
I implemented the overlap according to Calcite's implementation. Maybe they changed the behavior in the mean time. I agree we should try to stay in sync with Calcite. What do other DB vendors do? Feel free to open an issue about this. Regards, Timo Am 30.05.17 um 14:24 schrieb Stefano Bortoli: > Hi all, > > I am playing around with the table API, and I have a doubt about temporal operator overlaps. In particular, a test in the scalarFunctionsTest.testOverlaps checks for false the following intervals: > testAllApis( > temporalOverlaps("2011-03-10 05:02:02".toTimestamp, 0.second, > "2011-03-10 05:02:02".toTimestamp, "2011-03-10 05:02:01".toTimestamp), > "temporalOverlaps(toTimestamp('2011-03-10 05:02:02'), 0.second, " + > "'2011-03-10 05:02:02'.toTimestamp, '2011-03-10 05:02:01'.toTimestamp)", > "(TIMESTAMP '2011-03-10 05:02:02', INTERVAL '0' SECOND) OVERLAPS " + > "(TIMESTAMP '2011-03-10 05:02:02', TIMESTAMP '2011-03-10 05:02:01')", > "false") > > Basically, the compared intervals overlap just by one of the extreme. The interpretation of the time.scala implementation is > AND( > >=(DATETIME_PLUS(CAST('2011-03-10 05:02:02'):TIMESTAMP(3) NOT NULL, 0), CAST('2011-03-10 05:02:02'):TIMESTAMP(3) NOT NULL), > >=(CAST('2011-03-10 05:02:01'):TIMESTAMP(3) NOT NULL, CAST('2011-03-10 05:02:02'):TIMESTAMP(3) NOT NULL) > ), > > Where the result is false as the second clause is not satisfied. > > However, latest calcite master compiles the overlaps as follows: > [AND > ( > >=( CASE( > <=(2011-03-10 05:02:02, DATETIME_PLUS(2011-03-10 05:02:02, 0)), DATETIME_PLUS(2011-03-10 05:02:02, 0), 2011-03-10 05:02:02 > ), > CASE( > <=(2011-03-10 05:02:02, 2011-03-10 05:02:01), 2011-03-10 05:02:02, 2011-03-10 05:02:01 > ) > ), > >=( CASE( > <=(2011-03-10 05:02:02, 2011-03-10 05:02:01), 2011-03-10 05:02:01, 2011-03-10 05:02:02 > ), > CASE( > <=(2011-03-10 05:02:02, DATETIME_PLUS(2011-03-10 05:02:02, 0)), 2011-03-10 05:02:02, DATETIME_PLUS(2011-03-10 05:02:02, 0) > ) > ) > ) > ] > > Where the result is true. > > I believe the issue is about interpreting the extremes as part of the overlapping intervals or not. Flink does not consider the intervals as overlapping (as the test shows), whereas Calcite implements the test including them. > > Which one should be preserved? > > I think that calcite implementation is correct, and overlapping extremes should be considered. What do you think? > > Best, > Stefano > |
Free forum by Nabble | Edit this page |