(DEPRECATED) Apache Flink Mailing List archive.

[DISCUSS] Improvements to the Unified SQL Connector API

Classic

List

Threaded

10 messages Options

Timo Walther-2

[DISCUSS] Improvements to the Unified SQL Connector API

Hi everyone,

as some of you might have noticed, in the last two releases we aimed to
unify SQL connectors and make them more modular. The first connectors
and formats have been implemented and are usable via the SQL Client and
Java/Scala/SQL APIs.

However, after writing more connectors/example programs and talking to
users, there are still a couple of improvements that should be applied
to unified SQL connector API.

I wrote a design document [1] that discusses limitations that I have
observed and consideres feedback that I have collected over the last
months. I don't know whether we will implement all of these
improvements, but it would be great to get feedback for a satisfactory
API and for future priorization.

The general goal should be to connect to external systems as convenient
and type-safe as possible. Any feedback is highly appreciated.

Thanks,

Timo

[1]
https://docs.google.com/document/d/1Yaxp1UJUFW-peGLt8EIidwKIZEWrrA-pznWLuvaH39Y/edit?usp=sharing

Fabian Hueske-2

Re: [DISCUSS] Improvements to the Unified SQL Connector API

Thanks for the proposal Timo!

I've done a pass and added some comments (mostly asking for clarification,
details).
Overall, this is going into a very good direction.
I think the tables which are stored in different systems and using a format
definition to define other formats require some more discussions.
However, these are also not the features that we would start with.

From a compatibility point of view, an important question to answer would
be whether we can drop the support for field mapping, i.e., do we have
users who take advantage of mapping format fields to fields with a
different name in the schema.
Besides that, all existing functionality is preserved although the syntax
changes a bit.

Best,
Fabian

Am Mo., 1. Okt. 2018 um 10:53 Uhr schrieb Timo Walther <[hidden email]>:

> Hi everyone,
>
> as some of you might have noticed, in the last two releases we aimed to
> unify SQL connectors and make them more modular. The first connectors
> and formats have been implemented and are usable via the SQL Client and
> Java/Scala/SQL APIs.
>
> However, after writing more connectors/example programs and talking to
> users, there are still a couple of improvements that should be applied
> to unified SQL connector API.
>
> I wrote a design document [1] that discusses limitations that I have
> observed and consideres feedback that I have collected over the last
> months. I don't know whether we will implement all of these
> improvements, but it would be great to get feedback for a satisfactory
> API and for future priorization.
>
> The general goal should be to connect to external systems as convenient
> and type-safe as possible. Any feedback is highly appreciated.
>
> Thanks,
>
> Timo
>
> [1]
>
> https://docs.google.com/document/d/1Yaxp1UJUFW-peGLt8EIidwKIZEWrrA-pznWLuvaH39Y/edit?usp=sharing
>
>

Timo Walther-2

Re: [DISCUSS] Improvements to the Unified SQL Connector API

Thanks for the feedback Fabian. I updated the document and addressed
your comments.

I agree that tables which are stored in different systems need more
discussion. I would suggest to deprecate the field mapping interfaces in
this release and remove it in the next release.

Regards,
Timo

Am 02.10.18 um 11:06 schrieb Fabian Hueske:

> Thanks for the proposal Timo!
>
> I've done a pass and added some comments (mostly asking for clarification,
> details).
> Overall, this is going into a very good direction.
> I think the tables which are stored in different systems and using a format
> definition to define other formats require some more discussions.
> However, these are also not the features that we would start with.
>
> >From a compatibility point of view, an important question to answer would
> be whether we can drop the support for field mapping, i.e., do we have
> users who take advantage of mapping format fields to fields with a
> different name in the schema.
> Besides that, all existing functionality is preserved although the syntax
> changes a bit.
>
> Best,
> Fabian
>
> Am Mo., 1. Okt. 2018 um 10:53 Uhr schrieb Timo Walther <[hidden email]>:
>
>> Hi everyone,
>>
>> as some of you might have noticed, in the last two releases we aimed to
>> unify SQL connectors and make them more modular. The first connectors
>> and formats have been implemented and are usable via the SQL Client and
>> Java/Scala/SQL APIs.
>>
>> However, after writing more connectors/example programs and talking to
>> users, there are still a couple of improvements that should be applied
>> to unified SQL connector API.
>>
>> I wrote a design document [1] that discusses limitations that I have
>> observed and consideres feedback that I have collected over the last
>> months. I don't know whether we will implement all of these
>> improvements, but it would be great to get feedback for a satisfactory
>> API and for future priorization.
>>
>> The general goal should be to connect to external systems as convenient
>> and type-safe as possible. Any feedback is highly appreciated.
>>
>> Thanks,
>>
>> Timo
>>
>> [1]
>>
>> https://docs.google.com/document/d/1Yaxp1UJUFW-peGLt8EIidwKIZEWrrA-pznWLuvaH39Y/edit?usp=sharing
>>
>>

Aljoscha Krettek-2

Re: [DISCUSS] Improvements to the Unified SQL Connector API

Thanks for the proposal!

I like the proposed changes a lot, especially support for reading/writing key data of systems that have a key/value split will be very nice to have.

> On 2. Oct 2018, at 11:58, Timo Walther <[hidden email]> wrote:
>
> Thanks for the feedback Fabian. I updated the document and addressed your comments.
>
> I agree that tables which are stored in different systems need more discussion. I would suggest to deprecate the field mapping interfaces in this release and remove it in the next release.
>
> Regards,
> Timo
>
>
> Am 02.10.18 um 11:06 schrieb Fabian Hueske:
>> Thanks for the proposal Timo!
>>
>> I've done a pass and added some comments (mostly asking for clarification,
>> details).
>> Overall, this is going into a very good direction.
>> I think the tables which are stored in different systems and using a format
>> definition to define other formats require some more discussions.
>> However, these are also not the features that we would start with.
>>
>> >From a compatibility point of view, an important question to answer would
>> be whether we can drop the support for field mapping, i.e., do we have
>> users who take advantage of mapping format fields to fields with a
>> different name in the schema.
>> Besides that, all existing functionality is preserved although the syntax
>> changes a bit.
>>
>> Best,
>> Fabian
>>
>> Am Mo., 1. Okt. 2018 um 10:53 Uhr schrieb Timo Walther <[hidden email]>:
>>
>>> Hi everyone,
>>>
>>> as some of you might have noticed, in the last two releases we aimed to
>>> unify SQL connectors and make them more modular. The first connectors
>>> and formats have been implemented and are usable via the SQL Client and
>>> Java/Scala/SQL APIs.
>>>
>>> However, after writing more connectors/example programs and talking to
>>> users, there are still a couple of improvements that should be applied
>>> to unified SQL connector API.
>>>
>>> I wrote a design document [1] that discusses limitations that I have
>>> observed and consideres feedback that I have collected over the last
>>> months. I don't know whether we will implement all of these
>>> improvements, but it would be great to get feedback for a satisfactory
>>> API and for future priorization.
>>>
>>> The general goal should be to connect to external systems as convenient
>>> and type-safe as possible. Any feedback is highly appreciated.
>>>
>>> Thanks,
>>>
>>> Timo
>>>
>>> [1]
>>>
>>> https://docs.google.com/document/d/1Yaxp1UJUFW-peGLt8EIidwKIZEWrrA-pznWLuvaH39Y/edit?usp=sharing
>>>
>>>
>

Shuyi Chen

Re: [DISCUSS] Improvements to the Unified SQL Connector API

In reply to this post by Timo Walther-2

Thanks a lot for the proposal, Timo. I left a few comments. Also, it seems
the example in the doc does not have the table type (source, sink and both)
property anymore. Are you suggesting drop it? I think the table type
properties is still useful as it can restrict a certain connector to be
only source/sink, for example, we usually want a Kafka topic to be either
read-only or write-only, but not both.

Shuyi

On Mon, Oct 1, 2018 at 1:53 AM Timo Walther <[hidden email]> wrote:

--
"So you have to trust that the dots will somehow connect in your future."

Hequn Cheng

Re: [DISCUSS] Improvements to the Unified SQL Connector API

Hi,

Thanks a lot for the proposal. I like the idea to unify table definitions.
I think we can drop the table type since the type can be derived from the
sql, i.e, a table be inserted can only be a sink table.

I left some minor suggestions in the document, mainly include:
- Maybe we also need to allow define properties for tables.
- Support specify Computed Columns in a table
- Support define keys for sources.

Best, Hequn

On Thu, Oct 4, 2018 at 4:09 PM Shuyi Chen <[hidden email]> wrote:

> Thanks a lot for the proposal, Timo. I left a few comments. Also, it seems
> the example in the doc does not have the table type (source, sink and both)
> property anymore. Are you suggesting drop it? I think the table type
> properties is still useful as it can restrict a certain connector to be
> only source/sink, for example, we usually want a Kafka topic to be either
> read-only or write-only, but not both.
>
> Shuyi
>
> On Mon, Oct 1, 2018 at 1:53 AM Timo Walther <[hidden email]> wrote:
>
> > Hi everyone,
> >
> > as some of you might have noticed, in the last two releases we aimed to
> > unify SQL connectors and make them more modular. The first connectors
> > and formats have been implemented and are usable via the SQL Client and
> > Java/Scala/SQL APIs.
> >
> > However, after writing more connectors/example programs and talking to
> > users, there are still a couple of improvements that should be applied
> > to unified SQL connector API.
> >
> > I wrote a design document [1] that discusses limitations that I have
> > observed and consideres feedback that I have collected over the last
> > months. I don't know whether we will implement all of these
> > improvements, but it would be great to get feedback for a satisfactory
> > API and for future priorization.
> >
> > The general goal should be to connect to external systems as convenient
> > and type-safe as possible. Any feedback is highly appreciated.
> >
> > Thanks,
> >
> > Timo
> >
> > [1]
> >
> >
> https://docs.google.com/document/d/1Yaxp1UJUFW-peGLt8EIidwKIZEWrrA-pznWLuvaH39Y/edit?usp=sharing
> >
> >
>
> --
> "So you have to trust that the dots will somehow connect in your future."
>

Rong Rong

Re: [DISCUSS] Improvements to the Unified SQL Connector API

Hi Timo,

Thanks for putting together the proposal!
I really love the idea to combining solution for historic and recent data
and left some suggestions on that part.

Regarding the table type, e.g. for kafka streams, I agree with @hequn's
idea that it should be pretty much inferable from the SQL context.
I think there might be some questions need to be addressed when unifying
the definition, for example:
- Should a Kafka table used in "INSERT INTO" statement be used again in
"FROM" statement, and vise versa ?
- How to enforce checks in combo-table use cases ?
- Can user change the way a table is used (e.g. source/sink) in interactive
env such as sql-client ?

Thanks,
Rong

On Thu, Oct 4, 2018 at 7:31 AM Hequn Cheng <[hidden email]> wrote:

> Hi,
>
> Thanks a lot for the proposal. I like the idea to unify table definitions.
> I think we can drop the table type since the type can be derived from the
> sql, i.e, a table be inserted can only be a sink table.
>
> I left some minor suggestions in the document, mainly include:
> - Maybe we also need to allow define properties for tables.
> - Support specify Computed Columns in a table
> - Support define keys for sources.
>
> Best, Hequn
>
>
> On Thu, Oct 4, 2018 at 4:09 PM Shuyi Chen <[hidden email]> wrote:
>
> > Thanks a lot for the proposal, Timo. I left a few comments. Also, it
> seems
> > the example in the doc does not have the table type (source, sink and
> both)
> > property anymore. Are you suggesting drop it? I think the table type
> > properties is still useful as it can restrict a certain connector to be
> > only source/sink, for example, we usually want a Kafka topic to be either
> > read-only or write-only, but not both.
> >
> > Shuyi
> >
> > On Mon, Oct 1, 2018 at 1:53 AM Timo Walther <[hidden email]> wrote:
> >
> > > Hi everyone,
> > >
> > > as some of you might have noticed, in the last two releases we aimed to
> > > unify SQL connectors and make them more modular. The first connectors
> > > and formats have been implemented and are usable via the SQL Client and
> > > Java/Scala/SQL APIs.
> > >
> > > However, after writing more connectors/example programs and talking to
> > > users, there are still a couple of improvements that should be applied
> > > to unified SQL connector API.
> > >
> > > I wrote a design document [1] that discusses limitations that I have
> > > observed and consideres feedback that I have collected over the last
> > > months. I don't know whether we will implement all of these
> > > improvements, but it would be great to get feedback for a satisfactory
> > > API and for future priorization.
> > >
> > > The general goal should be to connect to external systems as convenient
> > > and type-safe as possible. Any feedback is highly appreciated.
> > >
> > > Thanks,
> > >
> > > Timo
> > >
> > > [1]
> > >
> > >
> >
> https://docs.google.com/document/d/1Yaxp1UJUFW-peGLt8EIidwKIZEWrrA-pznWLuvaH39Y/edit?usp=sharing
> > >
> > >
> >
> > --
> > "So you have to trust that the dots will somehow connect in your future."
> >
>

Shuyi Chen

Re: [DISCUSS] Improvements to the Unified SQL Connector API

In reply to this post by Hequn Cheng

In the case of normal Flink job, I agree we can infer the table type from
the queries. However, for SQL client, the query is adhoc and not known
beforehand. In such case, we might want to enforce the table open mode at
startup time, so users won't accidentally write to a Kafka topic that is
supposed to be written only by some producer. What do you guys think?

Shuyi

On Thu, Oct 4, 2018 at 7:31 AM Hequn Cheng <[hidden email]> wrote:

--
"So you have to trust that the dots will somehow connect in your future."

Hequn Cheng

Re: [DISCUSS] Improvements to the Unified SQL Connector API

Hi,

It is a good question that how to avoid write to a table accidentally.
I think there are other ways to solve the problem, such as we can provide a
view instead of a table to the users or add a table constraint.

Best,
Hequn

On Fri, Oct 5, 2018 at 1:30 PM Shuyi Chen <[hidden email]> wrote:

> In the case of normal Flink job, I agree we can infer the table type from
> the queries. However, for SQL client, the query is adhoc and not known
> beforehand. In such case, we might want to enforce the table open mode at
> startup time, so users won't accidentally write to a Kafka topic that is
> supposed to be written only by some producer. What do you guys think?
>
> Shuyi
>
> On Thu, Oct 4, 2018 at 7:31 AM Hequn Cheng <[hidden email]> wrote:
>
> > Hi,
> >
> > Thanks a lot for the proposal. I like the idea to unify table
> definitions.
> > I think we can drop the table type since the type can be derived from the
> > sql, i.e, a table be inserted can only be a sink table.
> >
> > I left some minor suggestions in the document, mainly include:
> > - Maybe we also need to allow define properties for tables.
> > - Support specify Computed Columns in a table
> > - Support define keys for sources.
> >
> > Best, Hequn
> >
> >
> > On Thu, Oct 4, 2018 at 4:09 PM Shuyi Chen <[hidden email]> wrote:
> >
> > > Thanks a lot for the proposal, Timo. I left a few comments. Also, it
> > seems
> > > the example in the doc does not have the table type (source, sink and
> > both)
> > > property anymore. Are you suggesting drop it? I think the table type
> > > properties is still useful as it can restrict a certain connector to be
> > > only source/sink, for example, we usually want a Kafka topic to be
> either
> > > read-only or write-only, but not both.
> > >
> > > Shuyi
> > >
> > > On Mon, Oct 1, 2018 at 1:53 AM Timo Walther <[hidden email]>
> wrote:
> > >
> > > > Hi everyone,
> > > >
> > > > as some of you might have noticed, in the last two releases we aimed
> to
> > > > unify SQL connectors and make them more modular. The first connectors
> > > > and formats have been implemented and are usable via the SQL Client
> and
> > > > Java/Scala/SQL APIs.
> > > >
> > > > However, after writing more connectors/example programs and talking
> to
> > > > users, there are still a couple of improvements that should be
> applied
> > > > to unified SQL connector API.
> > > >
> > > > I wrote a design document [1] that discusses limitations that I have
> > > > observed and consideres feedback that I have collected over the last
> > > > months. I don't know whether we will implement all of these
> > > > improvements, but it would be great to get feedback for a
> satisfactory
> > > > API and for future priorization.
> > > >
> > > > The general goal should be to connect to external systems as
> convenient
> > > > and type-safe as possible. Any feedback is highly appreciated.
> > > >
> > > > Thanks,
> > > >
> > > > Timo
> > > >
> > > > [1]
> > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1Yaxp1UJUFW-peGLt8EIidwKIZEWrrA-pznWLuvaH39Y/edit?usp=sharing
> > > >
> > > >
> > >
> > > --
> > > "So you have to trust that the dots will somehow connect in your
> future."
> > >
> >
>
>
> --
> "So you have to trust that the dots will somehow connect in your future."
>

Timo Walther-2

Re: [DISCUSS] Improvements to the Unified SQL Connector API

In reply to this post by Shuyi Chen

Hi everyone,

thanks for the feedback that we got so far. I will update the document
in the next couple of hours such that we can continue with the discussion.

Regarding the table type: Actually I just didn't mention it in the
document, because the table type is a SQL Client/External catalog
interface specific property that is evaluated before the unified
connector API (depending on the table type a source and/or sink is
discovered). I agree with Shuyi's comments that it should be possible to
restrict read/write access. The general goal should be that properties
defined in the design document apply to both sources and sinks, i.e., no
special source-only or sink-only properties.

@Rong: Currently, a user can not change the way how a table is used in
the interactive shell. Tables defined in an environment file are
immutable. This will be possible using a SQL DDL in the future.

Regards,
Timo

Am 05.10.18 um 07:30 schrieb Shuyi Chen:

> In the case of normal Flink job, I agree we can infer the table type from
> the queries. However, for SQL client, the query is adhoc and not known
> beforehand. In such case, we might want to enforce the table open mode at
> startup time, so users won't accidentally write to a Kafka topic that is
> supposed to be written only by some producer. What do you guys think?
>
> Shuyi
>
> On Thu, Oct 4, 2018 at 7:31 AM Hequn Cheng <[hidden email]> wrote:
>
>> Hi,
>>
>> Thanks a lot for the proposal. I like the idea to unify table definitions.
>> I think we can drop the table type since the type can be derived from the
>> sql, i.e, a table be inserted can only be a sink table.
>>
>> I left some minor suggestions in the document, mainly include:
>> - Maybe we also need to allow define properties for tables.
>> - Support specify Computed Columns in a table
>> - Support define keys for sources.
>>
>> Best, Hequn
>>
>>
>> On Thu, Oct 4, 2018 at 4:09 PM Shuyi Chen <[hidden email]> wrote:
>>
>>> Thanks a lot for the proposal, Timo. I left a few comments. Also, it
>> seems
>>> the example in the doc does not have the table type (source, sink and
>> both)
>>> property anymore. Are you suggesting drop it? I think the table type
>>> properties is still useful as it can restrict a certain connector to be
>>> only source/sink, for example, we usually want a Kafka topic to be either
>>> read-only or write-only, but not both.
>>>
>>> Shuyi
>>>
>>> On Mon, Oct 1, 2018 at 1:53 AM Timo Walther <[hidden email]> wrote:
>>>
>>>> Hi everyone,
>>>>
>>>> as some of you might have noticed, in the last two releases we aimed to
>>>> unify SQL connectors and make them more modular. The first connectors
>>>> and formats have been implemented and are usable via the SQL Client and
>>>> Java/Scala/SQL APIs.
>>>>
>>>> However, after writing more connectors/example programs and talking to
>>>> users, there are still a couple of improvements that should be applied
>>>> to unified SQL connector API.
>>>>
>>>> I wrote a design document [1] that discusses limitations that I have
>>>> observed and consideres feedback that I have collected over the last
>>>> months. I don't know whether we will implement all of these
>>>> improvements, but it would be great to get feedback for a satisfactory
>>>> API and for future priorization.
>>>>
>>>> The general goal should be to connect to external systems as convenient
>>>> and type-safe as possible. Any feedback is highly appreciated.
>>>>
>>>> Thanks,
>>>>
>>>> Timo
>>>>
>>>> [1]
>>>>
>>>>
>> https://docs.google.com/document/d/1Yaxp1UJUFW-peGLt8EIidwKIZEWrrA-pznWLuvaH39Y/edit?usp=sharing
>>>>
>>> --
>>> "So you have to trust that the dots will somehow connect in your future."
>>>
>