[DISCUSS] FLIP-37: Rework of the Table API Type System (Part 1)

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

[DISCUSS] FLIP-37: Rework of the Table API Type System (Part 1)

Timo Walther-2
Hi everyone,

some of you might have already read FLIP-32 [1] where we've described an
approximate roadmap of how to handle the big Blink SQL contribution and
how we can make the Table & SQL API equally important to the existing
DataStream API.

As mentioned there (Advance the API and Unblock New Features, Item 1),
the rework of the Table/SQL type system is a crucial step for unblocking
future contributions. In particular, Flink's current type system has
many shortcomings which make an integration with other systems (such as
Hive), DDL statements, and a unified API for Java/Scala difficult. We
propose a new type system that is closer to the SQL standard, integrates
better with other SQL vendors, and solves most of the type-related
issues we had in the past.

The design document for FLIP-37 can be found here:

https://docs.google.com/document/d/1a9HUb6OaBIoj9IRfbILcMFPrOL7ALeZ3rVI66dvA2_U/edit?usp=sharing

I'm looking forward to your feedback.

Thanks,
Timo

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-32%3A+Restructure+flink-table+for+future+contributions

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-37: Rework of the Table API Type System (Part 1)

Kurt Young
Big +1 to this! I left some comments in google doc.

Best,
Kurt


On Wed, Mar 27, 2019 at 11:32 PM Timo Walther <[hidden email]> wrote:

> Hi everyone,
>
> some of you might have already read FLIP-32 [1] where we've described an
> approximate roadmap of how to handle the big Blink SQL contribution and
> how we can make the Table & SQL API equally important to the existing
> DataStream API.
>
> As mentioned there (Advance the API and Unblock New Features, Item 1),
> the rework of the Table/SQL type system is a crucial step for unblocking
> future contributions. In particular, Flink's current type system has
> many shortcomings which make an integration with other systems (such as
> Hive), DDL statements, and a unified API for Java/Scala difficult. We
> propose a new type system that is closer to the SQL standard, integrates
> better with other SQL vendors, and solves most of the type-related
> issues we had in the past.
>
> The design document for FLIP-37 can be found here:
>
>
> https://docs.google.com/document/d/1a9HUb6OaBIoj9IRfbILcMFPrOL7ALeZ3rVI66dvA2_U/edit?usp=sharing
>
> I'm looking forward to your feedback.
>
> Thanks,
> Timo
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-32%3A+Restructure+flink-table+for+future+contributions
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-37: Rework of the Table API Type System (Part 1)

dwysakowicz
Another big +1 from my side. Thank you Timo for preparing the document!

I really look forward for this to have a standardized way of type
handling. This should solve loads of problems. I really like the
separation of logical type from its physical representation, I think we
should aim to introduce that and keep it separated.

Best,

Dawid

On 28/03/2019 08:51, Kurt Young wrote:

> Big +1 to this! I left some comments in google doc.
>
> Best,
> Kurt
>
>
> On Wed, Mar 27, 2019 at 11:32 PM Timo Walther <[hidden email]> wrote:
>
>> Hi everyone,
>>
>> some of you might have already read FLIP-32 [1] where we've described an
>> approximate roadmap of how to handle the big Blink SQL contribution and
>> how we can make the Table & SQL API equally important to the existing
>> DataStream API.
>>
>> As mentioned there (Advance the API and Unblock New Features, Item 1),
>> the rework of the Table/SQL type system is a crucial step for unblocking
>> future contributions. In particular, Flink's current type system has
>> many shortcomings which make an integration with other systems (such as
>> Hive), DDL statements, and a unified API for Java/Scala difficult. We
>> propose a new type system that is closer to the SQL standard, integrates
>> better with other SQL vendors, and solves most of the type-related
>> issues we had in the past.
>>
>> The design document for FLIP-37 can be found here:
>>
>>
>> https://docs.google.com/document/d/1a9HUb6OaBIoj9IRfbILcMFPrOL7ALeZ3rVI66dvA2_U/edit?usp=sharing
>>
>> I'm looking forward to your feedback.
>>
>> Thanks,
>> Timo
>>
>> [1]
>>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-32%3A+Restructure+flink-table+for+future+contributions
>>
>>


signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-37: Rework of the Table API Type System (Part 1)

Timo Walther-2
Maybe to give some background about Dawid's latest email:

Kurt raised some good points regarding the conversion of data types at
the boundaries of the API and SPI. After that, Dawid and I had a long
discussion of how users can define those boundaries in a nicer way. The
outcome of this discussion was similar to Blink's current distinction
between InternalTypes and ExternalTypes. I updated the document with a
improved structure of DataTypes (for users, API, SPI with conversion
information) and LogicalTypes (used internally and close to standard SQL
types).

Thanks for the feedback so far,
Timo

Am 28.03.19 um 11:18 schrieb Dawid Wysakowicz:

> Another big +1 from my side. Thank you Timo for preparing the document!
>
> I really look forward for this to have a standardized way of type
> handling. This should solve loads of problems. I really like the
> separation of logical type from its physical representation, I think we
> should aim to introduce that and keep it separated.
>
> Best,
>
> Dawid
>
> On 28/03/2019 08:51, Kurt Young wrote:
>> Big +1 to this! I left some comments in google doc.
>>
>> Best,
>> Kurt
>>
>>
>> On Wed, Mar 27, 2019 at 11:32 PM Timo Walther <[hidden email]> wrote:
>>
>>> Hi everyone,
>>>
>>> some of you might have already read FLIP-32 [1] where we've described an
>>> approximate roadmap of how to handle the big Blink SQL contribution and
>>> how we can make the Table & SQL API equally important to the existing
>>> DataStream API.
>>>
>>> As mentioned there (Advance the API and Unblock New Features, Item 1),
>>> the rework of the Table/SQL type system is a crucial step for unblocking
>>> future contributions. In particular, Flink's current type system has
>>> many shortcomings which make an integration with other systems (such as
>>> Hive), DDL statements, and a unified API for Java/Scala difficult. We
>>> propose a new type system that is closer to the SQL standard, integrates
>>> better with other SQL vendors, and solves most of the type-related
>>> issues we had in the past.
>>>
>>> The design document for FLIP-37 can be found here:
>>>
>>>
>>> https://docs.google.com/document/d/1a9HUb6OaBIoj9IRfbILcMFPrOL7ALeZ3rVI66dvA2_U/edit?usp=sharing
>>>
>>> I'm looking forward to your feedback.
>>>
>>> Thanks,
>>> Timo
>>>
>>> [1]
>>>
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-32%3A+Restructure+flink-table+for+future+contributions
>>>
>>>

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-37: Rework of the Table API Type System (Part 1)

Rong Rong
Thanks @Timo for starting this effort and preparing the document :-)

I took a pass and left some comments. I also very much like the idea of the
DataType and LogicalType separation.
As explained in the doc, we've also been looking into ways to improve the
type system so a huge +1 on our side.

One question I have is, since this touches many of the external systems
like Hive / Blink comparison, does it make sense to share this to a border
audience (such as user@) later to gather more feedbacks?

Looking forward to this change and would love to contribute in anyway I can!

Best,
Rong








On Thu, Mar 28, 2019 at 3:25 AM Timo Walther <[hidden email]> wrote:

> Maybe to give some background about Dawid's latest email:
>
> Kurt raised some good points regarding the conversion of data types at
> the boundaries of the API and SPI. After that, Dawid and I had a long
> discussion of how users can define those boundaries in a nicer way. The
> outcome of this discussion was similar to Blink's current distinction
> between InternalTypes and ExternalTypes. I updated the document with a
> improved structure of DataTypes (for users, API, SPI with conversion
> information) and LogicalTypes (used internally and close to standard SQL
> types).
>
> Thanks for the feedback so far,
> Timo
>
> Am 28.03.19 um 11:18 schrieb Dawid Wysakowicz:
> > Another big +1 from my side. Thank you Timo for preparing the document!
> >
> > I really look forward for this to have a standardized way of type
> > handling. This should solve loads of problems. I really like the
> > separation of logical type from its physical representation, I think we
> > should aim to introduce that and keep it separated.
> >
> > Best,
> >
> > Dawid
> >
> > On 28/03/2019 08:51, Kurt Young wrote:
> >> Big +1 to this! I left some comments in google doc.
> >>
> >> Best,
> >> Kurt
> >>
> >>
> >> On Wed, Mar 27, 2019 at 11:32 PM Timo Walther <[hidden email]>
> wrote:
> >>
> >>> Hi everyone,
> >>>
> >>> some of you might have already read FLIP-32 [1] where we've described
> an
> >>> approximate roadmap of how to handle the big Blink SQL contribution and
> >>> how we can make the Table & SQL API equally important to the existing
> >>> DataStream API.
> >>>
> >>> As mentioned there (Advance the API and Unblock New Features, Item 1),
> >>> the rework of the Table/SQL type system is a crucial step for
> unblocking
> >>> future contributions. In particular, Flink's current type system has
> >>> many shortcomings which make an integration with other systems (such as
> >>> Hive), DDL statements, and a unified API for Java/Scala difficult. We
> >>> propose a new type system that is closer to the SQL standard,
> integrates
> >>> better with other SQL vendors, and solves most of the type-related
> >>> issues we had in the past.
> >>>
> >>> The design document for FLIP-37 can be found here:
> >>>
> >>>
> >>>
> https://docs.google.com/document/d/1a9HUb6OaBIoj9IRfbILcMFPrOL7ALeZ3rVI66dvA2_U/edit?usp=sharing
> >>>
> >>> I'm looking forward to your feedback.
> >>>
> >>> Thanks,
> >>> Timo
> >>>
> >>> [1]
> >>>
> >>>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-32%3A+Restructure+flink-table+for+future+contributions
> >>>
> >>>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-37: Rework of the Table API Type System (Part 1)

Timo Walther-2
Hi everyone,

thanks for the valuable feedback I got so far. I updated the design
document at different positions due to the comments I got online and
offline.

In general, the feedback was very positive. It seems there is consensus
to perform a big rework of the type system with a better long-term
vision and closer semantics to other SQL vendors and the standard itself.

Since my last mail, we improved topics around date-time types (esp. due
to the cross-platform discussions [0]). And improved the general
inter-operability with UDF implementation and Java classes.

I would like to convert the design document [1] into a FLIP soon and
start with an implementation of the basic structure. I'm sure we will
have subsequent discussion about certain types or semantics but this can
also happen in the corresponding issues/PRs.

@Rong: Sorry for not responding earlier. I think we should avoid
crossposting design dicussions on both MLs, because there are a lot of
them right now. People that are interested should follow this ML.

Thanks,
Timo

[0]
https://docs.google.com/document/d/1gNRww9mZJcHvUDCXklzjFEQGpefsuR_akCDfWsdE35Q/edit#
[1]
https://docs.google.com/document/d/1a9HUb6OaBIoj9IRfbILcMFPrOL7ALeZ3rVI66dvA2_U/edit#


Am 28.03.19 um 17:24 schrieb Rong Rong:

> Thanks @Timo for starting this effort and preparing the document :-)
>
> I took a pass and left some comments. I also very much like the idea of the
> DataType and LogicalType separation.
> As explained in the doc, we've also been looking into ways to improve the
> type system so a huge +1 on our side.
>
> One question I have is, since this touches many of the external systems
> like Hive / Blink comparison, does it make sense to share this to a border
> audience (such as user@) later to gather more feedbacks?
>
> Looking forward to this change and would love to contribute in anyway I can!
>
> Best,
> Rong
>
>
>
>
>
>
>
>
> On Thu, Mar 28, 2019 at 3:25 AM Timo Walther <[hidden email]> wrote:
>
>> Maybe to give some background about Dawid's latest email:
>>
>> Kurt raised some good points regarding the conversion of data types at
>> the boundaries of the API and SPI. After that, Dawid and I had a long
>> discussion of how users can define those boundaries in a nicer way. The
>> outcome of this discussion was similar to Blink's current distinction
>> between InternalTypes and ExternalTypes. I updated the document with a
>> improved structure of DataTypes (for users, API, SPI with conversion
>> information) and LogicalTypes (used internally and close to standard SQL
>> types).
>>
>> Thanks for the feedback so far,
>> Timo
>>
>> Am 28.03.19 um 11:18 schrieb Dawid Wysakowicz:
>>> Another big +1 from my side. Thank you Timo for preparing the document!
>>>
>>> I really look forward for this to have a standardized way of type
>>> handling. This should solve loads of problems. I really like the
>>> separation of logical type from its physical representation, I think we
>>> should aim to introduce that and keep it separated.
>>>
>>> Best,
>>>
>>> Dawid
>>>
>>> On 28/03/2019 08:51, Kurt Young wrote:
>>>> Big +1 to this! I left some comments in google doc.
>>>>
>>>> Best,
>>>> Kurt
>>>>
>>>>
>>>> On Wed, Mar 27, 2019 at 11:32 PM Timo Walther <[hidden email]>
>> wrote:
>>>>> Hi everyone,
>>>>>
>>>>> some of you might have already read FLIP-32 [1] where we've described
>> an
>>>>> approximate roadmap of how to handle the big Blink SQL contribution and
>>>>> how we can make the Table & SQL API equally important to the existing
>>>>> DataStream API.
>>>>>
>>>>> As mentioned there (Advance the API and Unblock New Features, Item 1),
>>>>> the rework of the Table/SQL type system is a crucial step for
>> unblocking
>>>>> future contributions. In particular, Flink's current type system has
>>>>> many shortcomings which make an integration with other systems (such as
>>>>> Hive), DDL statements, and a unified API for Java/Scala difficult. We
>>>>> propose a new type system that is closer to the SQL standard,
>> integrates
>>>>> better with other SQL vendors, and solves most of the type-related
>>>>> issues we had in the past.
>>>>>
>>>>> The design document for FLIP-37 can be found here:
>>>>>
>>>>>
>>>>>
>> https://docs.google.com/document/d/1a9HUb6OaBIoj9IRfbILcMFPrOL7ALeZ3rVI66dvA2_U/edit?usp=sharing
>>>>> I'm looking forward to your feedback.
>>>>>
>>>>> Thanks,
>>>>> Timo
>>>>>
>>>>> [1]
>>>>>
>>>>>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-32%3A+Restructure+flink-table+for+future+contributions
>>>>>
>>