[DISCUSS] Remove registration of TableSource/TableSink in Table Env and ConnectTableDescriptor

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

[DISCUSS] Remove registration of TableSource/TableSink in Table Env and ConnectTableDescriptor

Kurt Young
Hi all,

I'd like to bring up a discussion about removing registration of
TableSource and
TableSink in TableEnvironment as well as in ConnectTableDescriptor. The
affected
method would be:

TableEnvironment::registerTableSource
TableEnvironment::fromTableSource
TableEnvironment::registerTableSink
ConnectTableDescriptor::registerTableSource
ConnectTableDescriptor::registerTableSink
ConnectTableDescriptor::registerTableSourceAndSink

(Most of them are already deprecated, except for
TableEnvironment::fromTableSource,
which was intended to deprecate but missed by accident).

FLIP-64 [1] already explained why we want to deprecate TableSource &
TableSink from
user's interface. In a short word, these interfaces should only read &
write the physical
representation of the table, and they are not fitting well after we already
introduced some
logical table fields such as computed column, watermarks.

Another reason is the exposure of registerTableSource in Table Env just
make the whole
SQL protocol opposite. TableSource should be used as a reader of table, it
should rely on
other metadata information held by framework, which eventually comes from
DDL or
ConnectDescriptor. But if we register a TableSource to Table Env, we have
no choice but
have to rely on TableSource::getTableSchema. It will make the design
obscure, sometimes
TableSource should trust the information comes from framework, but
sometimes it should
also generate its own schema information.

Furthermore, if the authority about schema information is not clear, it
will make things much
more complicated if we want to improve the table api usability such as
introducing automatic
schema inference in the near future.

Since this is an API break change, I've also included user mailing list to
gather more feedbacks.

Best,
Kurt

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Remove registration of TableSource/TableSink in Table Env and ConnectTableDescriptor

dwysakowicz
Hi Kurt,

I fully agree with the proposal. Yes it was an omission that we did not
deprecate the TableEnvironment#fromTableSource in the previous version.

I would vote to remove all those methods altogether.

Best,

Dawid

On 05/02/2020 07:36, Kurt Young wrote:

> Hi all,
>
> I'd like to bring up a discussion about removing registration of
> TableSource and
> TableSink in TableEnvironment as well as in ConnectTableDescriptor. The
> affected
> method would be:
>
> TableEnvironment::registerTableSource
> TableEnvironment::fromTableSource
> TableEnvironment::registerTableSink
> ConnectTableDescriptor::registerTableSource
> ConnectTableDescriptor::registerTableSink
> ConnectTableDescriptor::registerTableSourceAndSink
>
> (Most of them are already deprecated, except for
> TableEnvironment::fromTableSource,
> which was intended to deprecate but missed by accident).
>
> FLIP-64 [1] already explained why we want to deprecate TableSource &
> TableSink from
> user's interface. In a short word, these interfaces should only read &
> write the physical
> representation of the table, and they are not fitting well after we already
> introduced some
> logical table fields such as computed column, watermarks.
>
> Another reason is the exposure of registerTableSource in Table Env just
> make the whole
> SQL protocol opposite. TableSource should be used as a reader of table, it
> should rely on
> other metadata information held by framework, which eventually comes from
> DDL or
> ConnectDescriptor. But if we register a TableSource to Table Env, we have
> no choice but
> have to rely on TableSource::getTableSchema. It will make the design
> obscure, sometimes
> TableSource should trust the information comes from framework, but
> sometimes it should
> also generate its own schema information.
>
> Furthermore, if the authority about schema information is not clear, it
> will make things much
> more complicated if we want to improve the table api usability such as
> introducing automatic
> schema inference in the near future.
>
> Since this is an API break change, I've also included user mailing list to
> gather more feedbacks.
>
> Best,
> Kurt
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module
>


signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Remove registration of TableSource/TableSink in Table Env and ConnectTableDescriptor

Danny Chan
Thanks for driving this Kurt.

Because we already have DDL and Descriptor as an alternative of these deprecated methods, removing them would reduce ambiguity and make the near future work more easier.

As we discussed offline, although some of the connectors may still have attributes that cannot be specified though properties, removing them force us to re-think the TableFactory/properties interface.

Best,
Danny Chan
在 2020年2月5日 +0800 PM3:58,Dawid Wysakowicz <[hidden email]>,写道:

> Hi Kurt,
>
> I fully agree with the proposal. Yes it was an omission that we did not
> deprecate the TableEnvironment#fromTableSource in the previous version.
>
> I would vote to remove all those methods altogether.
>
> Best,
>
> Dawid
>
> On 05/02/2020 07:36, Kurt Young wrote:
> > Hi all,
> >
> > I'd like to bring up a discussion about removing registration of
> > TableSource and
> > TableSink in TableEnvironment as well as in ConnectTableDescriptor. The
> > affected
> > method would be:
> >
> > TableEnvironment::registerTableSource
> > TableEnvironment::fromTableSource
> > TableEnvironment::registerTableSink
> > ConnectTableDescriptor::registerTableSource
> > ConnectTableDescriptor::registerTableSink
> > ConnectTableDescriptor::registerTableSourceAndSink
> >
> > (Most of them are already deprecated, except for
> > TableEnvironment::fromTableSource,
> > which was intended to deprecate but missed by accident).
> >
> > FLIP-64 [1] already explained why we want to deprecate TableSource &
> > TableSink from
> > user's interface. In a short word, these interfaces should only read &
> > write the physical
> > representation of the table, and they are not fitting well after we already
> > introduced some
> > logical table fields such as computed column, watermarks.
> >
> > Another reason is the exposure of registerTableSource in Table Env just
> > make the whole
> > SQL protocol opposite. TableSource should be used as a reader of table, it
> > should rely on
> > other metadata information held by framework, which eventually comes from
> > DDL or
> > ConnectDescriptor. But if we register a TableSource to Table Env, we have
> > no choice but
> > have to rely on TableSource::getTableSchema. It will make the design
> > obscure, sometimes
> > TableSource should trust the information comes from framework, but
> > sometimes it should
> > also generate its own schema information.
> >
> > Furthermore, if the authority about schema information is not clear, it
> > will make things much
> > more complicated if we want to improve the table api usability such as
> > introducing automatic
> > schema inference in the near future.
> >
> > Since this is an API break change, I've also included user mailing list to
> > gather more feedbacks.
> >
> > Best,
> > Kurt
> >
> > [1]
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Remove registration of TableSource/TableSink in Table Env and ConnectTableDescriptor

Jingsong Li
In reply to this post by dwysakowicz
HI Kurt,

+1 to remove these methods.

But one concern is that some of the current TableSource/TableSink may not
be ready, such as the JDBCUpsertTableSink, which accepts a JDBCDialect, but
through the TableFactory, there is no way to pass in the JDBCDialect at
present. But I also believe we have enough time on 1.11 to prepare them.
Then unified to TableFactory.

I think there may be complaints from users because they used to only
implement a TableSource or TableSink, but now they have to implement
TableFactory. But I think it's also good for conceptual clarity to force
them to implement TableFactory.

Another idea is can we provide some utils to help user implement
TableFactory? The current method may be a little bit complex, and it needs
to be added to
the "META_INF/services/org.apache.flink.table.factories.TableFactory" file.

Best,
Jingsong Lee

On Wed, Feb 5, 2020 at 3:58 PM Dawid Wysakowicz <[hidden email]>
wrote:

> Hi Kurt,
>
> I fully agree with the proposal. Yes it was an omission that we did not
> deprecate the TableEnvironment#fromTableSource in the previous version.
>
> I would vote to remove all those methods altogether.
>
> Best,
>
> Dawid
>
> On 05/02/2020 07:36, Kurt Young wrote:
> > Hi all,
> >
> > I'd like to bring up a discussion about removing registration of
> > TableSource and
> > TableSink in TableEnvironment as well as in ConnectTableDescriptor. The
> > affected
> > method would be:
> >
> > TableEnvironment::registerTableSource
> > TableEnvironment::fromTableSource
> > TableEnvironment::registerTableSink
> > ConnectTableDescriptor::registerTableSource
> > ConnectTableDescriptor::registerTableSink
> > ConnectTableDescriptor::registerTableSourceAndSink
> >
> > (Most of them are already deprecated, except for
> > TableEnvironment::fromTableSource,
> > which was intended to deprecate but missed by accident).
> >
> > FLIP-64 [1] already explained why we want to deprecate TableSource &
> > TableSink from
> > user's interface. In a short word, these interfaces should only read &
> > write the physical
> > representation of the table, and they are not fitting well after we
> already
> > introduced some
> > logical table fields such as computed column, watermarks.
> >
> > Another reason is the exposure of registerTableSource in Table Env just
> > make the whole
> > SQL protocol opposite. TableSource should be used as a reader of table,
> it
> > should rely on
> > other metadata information held by framework, which eventually comes from
> > DDL or
> > ConnectDescriptor. But if we register a TableSource to Table Env, we have
> > no choice but
> > have to rely on TableSource::getTableSchema. It will make the design
> > obscure, sometimes
> > TableSource should trust the information comes from framework, but
> > sometimes it should
> > also generate its own schema information.
> >
> > Furthermore, if the authority about schema information is not clear, it
> > will make things much
> > more complicated if we want to improve the table api usability such as
> > introducing automatic
> > schema inference in the near future.
> >
> > Since this is an API break change, I've also included user mailing list
> to
> > gather more feedbacks.
> >
> > Best,
> > Kurt
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module
> >
>
>

--
Best, Jingsong Lee
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Remove registration of TableSource/TableSink in Table Env and ConnectTableDescriptor

Zhenghua Gao
In reply to this post by Kurt Young
+1 to remove these methods.

One concern about invocations of TableSource::getTableSchema:
By removing such methods, we can stop calling TableSource::getTableSchema
in some place(such
as BatchTableEnvImpl/TableEnvironmentImpl#validateTableSource,
ConnectorCatalogTable, TableSourceQueryOperation).

But in other place we need field types and names of the table source(such
as
BatchExecLookupJoinRule/StreamExecLookupJoinRule,
PushProjectIntoTableSourceScanRule,
CommonLookupJoin).  So how should we deal with this?

*Best Regards,*
*Zhenghua Gao*


On Wed, Feb 5, 2020 at 2:36 PM Kurt Young <[hidden email]> wrote:

> Hi all,
>
> I'd like to bring up a discussion about removing registration of
> TableSource and
> TableSink in TableEnvironment as well as in ConnectTableDescriptor. The
> affected
> method would be:
>
> TableEnvironment::registerTableSource
> TableEnvironment::fromTableSource
> TableEnvironment::registerTableSink
> ConnectTableDescriptor::registerTableSource
> ConnectTableDescriptor::registerTableSink
> ConnectTableDescriptor::registerTableSourceAndSink
>
> (Most of them are already deprecated, except for
> TableEnvironment::fromTableSource,
> which was intended to deprecate but missed by accident).
>
> FLIP-64 [1] already explained why we want to deprecate TableSource &
> TableSink from
> user's interface. In a short word, these interfaces should only read &
> write the physical
> representation of the table, and they are not fitting well after we already
> introduced some
> logical table fields such as computed column, watermarks.
>
> Another reason is the exposure of registerTableSource in Table Env just
> make the whole
> SQL protocol opposite. TableSource should be used as a reader of table, it
> should rely on
> other metadata information held by framework, which eventually comes from
> DDL or
> ConnectDescriptor. But if we register a TableSource to Table Env, we have
> no choice but
> have to rely on TableSource::getTableSchema. It will make the design
> obscure, sometimes
> TableSource should trust the information comes from framework, but
> sometimes it should
> also generate its own schema information.
>
> Furthermore, if the authority about schema information is not clear, it
> will make things much
> more complicated if we want to improve the table api usability such as
> introducing automatic
> schema inference in the near future.
>
> Since this is an API break change, I've also included user mailing list to
> gather more feedbacks.
>
> Best,
> Kurt
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Remove registration of TableSource/TableSink in Table Env and ConnectTableDescriptor

Kurt Young
Hi Jingsong,

Yes current TableFactory is not ideal for users to use either. I think we
should
also spend some time in 1.11 to improve the usability of TableEnvironment
when
users trying to read or write something. Automatic scheme inference would
be
one of them. Other from this, we also support convert a DataStream to
Table, which
can serve some flexible requirements to read or write data.

Best,
Kurt


On Wed, Feb 5, 2020 at 7:29 PM Zhenghua Gao <[hidden email]> wrote:

> +1 to remove these methods.
>
> One concern about invocations of TableSource::getTableSchema:
> By removing such methods, we can stop calling TableSource::getTableSchema
> in some place(such
> as BatchTableEnvImpl/TableEnvironmentImpl#validateTableSource,
> ConnectorCatalogTable, TableSourceQueryOperation).
>
> But in other place we need field types and names of the table source(such
> as
> BatchExecLookupJoinRule/StreamExecLookupJoinRule,
> PushProjectIntoTableSourceScanRule,
> CommonLookupJoin).  So how should we deal with this?
>
> *Best Regards,*
> *Zhenghua Gao*
>
>
> On Wed, Feb 5, 2020 at 2:36 PM Kurt Young <[hidden email]> wrote:
>
> > Hi all,
> >
> > I'd like to bring up a discussion about removing registration of
> > TableSource and
> > TableSink in TableEnvironment as well as in ConnectTableDescriptor. The
> > affected
> > method would be:
> >
> > TableEnvironment::registerTableSource
> > TableEnvironment::fromTableSource
> > TableEnvironment::registerTableSink
> > ConnectTableDescriptor::registerTableSource
> > ConnectTableDescriptor::registerTableSink
> > ConnectTableDescriptor::registerTableSourceAndSink
> >
> > (Most of them are already deprecated, except for
> > TableEnvironment::fromTableSource,
> > which was intended to deprecate but missed by accident).
> >
> > FLIP-64 [1] already explained why we want to deprecate TableSource &
> > TableSink from
> > user's interface. In a short word, these interfaces should only read &
> > write the physical
> > representation of the table, and they are not fitting well after we
> already
> > introduced some
> > logical table fields such as computed column, watermarks.
> >
> > Another reason is the exposure of registerTableSource in Table Env just
> > make the whole
> > SQL protocol opposite. TableSource should be used as a reader of table,
> it
> > should rely on
> > other metadata information held by framework, which eventually comes from
> > DDL or
> > ConnectDescriptor. But if we register a TableSource to Table Env, we have
> > no choice but
> > have to rely on TableSource::getTableSchema. It will make the design
> > obscure, sometimes
> > TableSource should trust the information comes from framework, but
> > sometimes it should
> > also generate its own schema information.
> >
> > Furthermore, if the authority about schema information is not clear, it
> > will make things much
> > more complicated if we want to improve the table api usability such as
> > introducing automatic
> > schema inference in the near future.
> >
> > Since this is an API break change, I've also included user mailing list
> to
> > gather more feedbacks.
> >
> > Best,
> > Kurt
> >
> > [1]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Remove registration of TableSource/TableSink in Table Env and ConnectTableDescriptor

Kurt Young
Hi Zhenghua,

After removing TableSource::getTableSchema, during optimization, I could
imagine
the schema information might come from relational nodes such as TableScan.

Best,
Kurt


On Wed, Feb 5, 2020 at 8:24 PM Kurt Young <[hidden email]> wrote:

> Hi Jingsong,
>
> Yes current TableFactory is not ideal for users to use either. I think we
> should
> also spend some time in 1.11 to improve the usability of TableEnvironment
> when
> users trying to read or write something. Automatic scheme inference would
> be
> one of them. Other from this, we also support convert a DataStream to
> Table, which
> can serve some flexible requirements to read or write data.
>
> Best,
> Kurt
>
>
> On Wed, Feb 5, 2020 at 7:29 PM Zhenghua Gao <[hidden email]> wrote:
>
>> +1 to remove these methods.
>>
>> One concern about invocations of TableSource::getTableSchema:
>> By removing such methods, we can stop calling TableSource::getTableSchema
>> in some place(such
>> as BatchTableEnvImpl/TableEnvironmentImpl#validateTableSource,
>> ConnectorCatalogTable, TableSourceQueryOperation).
>>
>> But in other place we need field types and names of the table source(such
>> as
>> BatchExecLookupJoinRule/StreamExecLookupJoinRule,
>> PushProjectIntoTableSourceScanRule,
>> CommonLookupJoin).  So how should we deal with this?
>>
>> *Best Regards,*
>> *Zhenghua Gao*
>>
>>
>> On Wed, Feb 5, 2020 at 2:36 PM Kurt Young <[hidden email]> wrote:
>>
>> > Hi all,
>> >
>> > I'd like to bring up a discussion about removing registration of
>> > TableSource and
>> > TableSink in TableEnvironment as well as in ConnectTableDescriptor. The
>> > affected
>> > method would be:
>> >
>> > TableEnvironment::registerTableSource
>> > TableEnvironment::fromTableSource
>> > TableEnvironment::registerTableSink
>> > ConnectTableDescriptor::registerTableSource
>> > ConnectTableDescriptor::registerTableSink
>> > ConnectTableDescriptor::registerTableSourceAndSink
>> >
>> > (Most of them are already deprecated, except for
>> > TableEnvironment::fromTableSource,
>> > which was intended to deprecate but missed by accident).
>> >
>> > FLIP-64 [1] already explained why we want to deprecate TableSource &
>> > TableSink from
>> > user's interface. In a short word, these interfaces should only read &
>> > write the physical
>> > representation of the table, and they are not fitting well after we
>> already
>> > introduced some
>> > logical table fields such as computed column, watermarks.
>> >
>> > Another reason is the exposure of registerTableSource in Table Env just
>> > make the whole
>> > SQL protocol opposite. TableSource should be used as a reader of table,
>> it
>> > should rely on
>> > other metadata information held by framework, which eventually comes
>> from
>> > DDL or
>> > ConnectDescriptor. But if we register a TableSource to Table Env, we
>> have
>> > no choice but
>> > have to rely on TableSource::getTableSchema. It will make the design
>> > obscure, sometimes
>> > TableSource should trust the information comes from framework, but
>> > sometimes it should
>> > also generate its own schema information.
>> >
>> > Furthermore, if the authority about schema information is not clear, it
>> > will make things much
>> > more complicated if we want to improve the table api usability such as
>> > introducing automatic
>> > schema inference in the near future.
>> >
>> > Since this is an API break change, I've also included user mailing list
>> to
>> > gather more feedbacks.
>> >
>> > Best,
>> > Kurt
>> >
>> > [1]
>> >
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module
>> >
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Remove registration of TableSource/TableSink in Table Env and ConnectTableDescriptor

Kurt Young
Thanks all for your feedback, since no objection has been raised, I've
created
https://issues.apache.org/jira/browse/FLINK-15950 to track this issue.

Since this issue would require lots of tests adjustment before it really
happen,
it won't be done in a short time. Feel free to give feedback anytime here
or in jira
if you have other opinions.

Best,
Kurt


On Wed, Feb 5, 2020 at 8:26 PM Kurt Young <[hidden email]> wrote:

> Hi Zhenghua,
>
> After removing TableSource::getTableSchema, during optimization, I could
> imagine
> the schema information might come from relational nodes such as TableScan.
>
> Best,
> Kurt
>
>
> On Wed, Feb 5, 2020 at 8:24 PM Kurt Young <[hidden email]> wrote:
>
>> Hi Jingsong,
>>
>> Yes current TableFactory is not ideal for users to use either. I think we
>> should
>> also spend some time in 1.11 to improve the usability of TableEnvironment
>> when
>> users trying to read or write something. Automatic scheme inference would
>> be
>> one of them. Other from this, we also support convert a DataStream to
>> Table, which
>> can serve some flexible requirements to read or write data.
>>
>> Best,
>> Kurt
>>
>>
>> On Wed, Feb 5, 2020 at 7:29 PM Zhenghua Gao <[hidden email]> wrote:
>>
>>> +1 to remove these methods.
>>>
>>> One concern about invocations of TableSource::getTableSchema:
>>> By removing such methods, we can stop calling TableSource::getTableSchema
>>> in some place(such
>>> as BatchTableEnvImpl/TableEnvironmentImpl#validateTableSource,
>>> ConnectorCatalogTable, TableSourceQueryOperation).
>>>
>>> But in other place we need field types and names of the table source(such
>>> as
>>> BatchExecLookupJoinRule/StreamExecLookupJoinRule,
>>> PushProjectIntoTableSourceScanRule,
>>> CommonLookupJoin).  So how should we deal with this?
>>>
>>> *Best Regards,*
>>> *Zhenghua Gao*
>>>
>>>
>>> On Wed, Feb 5, 2020 at 2:36 PM Kurt Young <[hidden email]> wrote:
>>>
>>> > Hi all,
>>> >
>>> > I'd like to bring up a discussion about removing registration of
>>> > TableSource and
>>> > TableSink in TableEnvironment as well as in ConnectTableDescriptor. The
>>> > affected
>>> > method would be:
>>> >
>>> > TableEnvironment::registerTableSource
>>> > TableEnvironment::fromTableSource
>>> > TableEnvironment::registerTableSink
>>> > ConnectTableDescriptor::registerTableSource
>>> > ConnectTableDescriptor::registerTableSink
>>> > ConnectTableDescriptor::registerTableSourceAndSink
>>> >
>>> > (Most of them are already deprecated, except for
>>> > TableEnvironment::fromTableSource,
>>> > which was intended to deprecate but missed by accident).
>>> >
>>> > FLIP-64 [1] already explained why we want to deprecate TableSource &
>>> > TableSink from
>>> > user's interface. In a short word, these interfaces should only read &
>>> > write the physical
>>> > representation of the table, and they are not fitting well after we
>>> already
>>> > introduced some
>>> > logical table fields such as computed column, watermarks.
>>> >
>>> > Another reason is the exposure of registerTableSource in Table Env just
>>> > make the whole
>>> > SQL protocol opposite. TableSource should be used as a reader of
>>> table, it
>>> > should rely on
>>> > other metadata information held by framework, which eventually comes
>>> from
>>> > DDL or
>>> > ConnectDescriptor. But if we register a TableSource to Table Env, we
>>> have
>>> > no choice but
>>> > have to rely on TableSource::getTableSchema. It will make the design
>>> > obscure, sometimes
>>> > TableSource should trust the information comes from framework, but
>>> > sometimes it should
>>> > also generate its own schema information.
>>> >
>>> > Furthermore, if the authority about schema information is not clear, it
>>> > will make things much
>>> > more complicated if we want to improve the table api usability such as
>>> > introducing automatic
>>> > schema inference in the near future.
>>> >
>>> > Since this is an API break change, I've also included user mailing
>>> list to
>>> > gather more feedbacks.
>>> >
>>> > Best,
>>> > Kurt
>>> >
>>> > [1]
>>> >
>>> >
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module
>>> >
>>>
>>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Remove registration of TableSource/TableSink in Table Env and ConnectTableDescriptor

Timo Walther-2
Hi Kurt,

Dawid is currently working on making a tableEnv.fromElements/values()
kind of source possible in the future. We can use this to replace some
of the tests. Otherwise I guess we should come up with a better test
infrastructure to make defining source not necessary anymore.

Regards,
Timo


On 07.02.20 11:24, Kurt Young wrote:

> Thanks all for your feedback, since no objection has been raised, I've
> created
> https://issues.apache.org/jira/browse/FLINK-15950 to track this issue.
>
> Since this issue would require lots of tests adjustment before it really
> happen,
> it won't be done in a short time. Feel free to give feedback anytime here
> or in jira
> if you have other opinions.
>
> Best,
> Kurt
>
>
> On Wed, Feb 5, 2020 at 8:26 PM Kurt Young <[hidden email]> wrote:
>
>> Hi Zhenghua,
>>
>> After removing TableSource::getTableSchema, during optimization, I could
>> imagine
>> the schema information might come from relational nodes such as TableScan.
>>
>> Best,
>> Kurt
>>
>>
>> On Wed, Feb 5, 2020 at 8:24 PM Kurt Young <[hidden email]> wrote:
>>
>>> Hi Jingsong,
>>>
>>> Yes current TableFactory is not ideal for users to use either. I think we
>>> should
>>> also spend some time in 1.11 to improve the usability of TableEnvironment
>>> when
>>> users trying to read or write something. Automatic scheme inference would
>>> be
>>> one of them. Other from this, we also support convert a DataStream to
>>> Table, which
>>> can serve some flexible requirements to read or write data.
>>>
>>> Best,
>>> Kurt
>>>
>>>
>>> On Wed, Feb 5, 2020 at 7:29 PM Zhenghua Gao <[hidden email]> wrote:
>>>
>>>> +1 to remove these methods.
>>>>
>>>> One concern about invocations of TableSource::getTableSchema:
>>>> By removing such methods, we can stop calling TableSource::getTableSchema
>>>> in some place(such
>>>> as BatchTableEnvImpl/TableEnvironmentImpl#validateTableSource,
>>>> ConnectorCatalogTable, TableSourceQueryOperation).
>>>>
>>>> But in other place we need field types and names of the table source(such
>>>> as
>>>> BatchExecLookupJoinRule/StreamExecLookupJoinRule,
>>>> PushProjectIntoTableSourceScanRule,
>>>> CommonLookupJoin).  So how should we deal with this?
>>>>
>>>> *Best Regards,*
>>>> *Zhenghua Gao*
>>>>
>>>>
>>>> On Wed, Feb 5, 2020 at 2:36 PM Kurt Young <[hidden email]> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I'd like to bring up a discussion about removing registration of
>>>>> TableSource and
>>>>> TableSink in TableEnvironment as well as in ConnectTableDescriptor. The
>>>>> affected
>>>>> method would be:
>>>>>
>>>>> TableEnvironment::registerTableSource
>>>>> TableEnvironment::fromTableSource
>>>>> TableEnvironment::registerTableSink
>>>>> ConnectTableDescriptor::registerTableSource
>>>>> ConnectTableDescriptor::registerTableSink
>>>>> ConnectTableDescriptor::registerTableSourceAndSink
>>>>>
>>>>> (Most of them are already deprecated, except for
>>>>> TableEnvironment::fromTableSource,
>>>>> which was intended to deprecate but missed by accident).
>>>>>
>>>>> FLIP-64 [1] already explained why we want to deprecate TableSource &
>>>>> TableSink from
>>>>> user's interface. In a short word, these interfaces should only read &
>>>>> write the physical
>>>>> representation of the table, and they are not fitting well after we
>>>> already
>>>>> introduced some
>>>>> logical table fields such as computed column, watermarks.
>>>>>
>>>>> Another reason is the exposure of registerTableSource in Table Env just
>>>>> make the whole
>>>>> SQL protocol opposite. TableSource should be used as a reader of
>>>> table, it
>>>>> should rely on
>>>>> other metadata information held by framework, which eventually comes
>>>> from
>>>>> DDL or
>>>>> ConnectDescriptor. But if we register a TableSource to Table Env, we
>>>> have
>>>>> no choice but
>>>>> have to rely on TableSource::getTableSchema. It will make the design
>>>>> obscure, sometimes
>>>>> TableSource should trust the information comes from framework, but
>>>>> sometimes it should
>>>>> also generate its own schema information.
>>>>>
>>>>> Furthermore, if the authority about schema information is not clear, it
>>>>> will make things much
>>>>> more complicated if we want to improve the table api usability such as
>>>>> introducing automatic
>>>>> schema inference in the near future.
>>>>>
>>>>> Since this is an API break change, I've also included user mailing
>>>> list to
>>>>> gather more feedbacks.
>>>>>
>>>>> Best,
>>>>> Kurt
>>>>>
>>>>> [1]
>>>>>
>>>>>
>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module
>>>>>
>>>>
>>>
>

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Remove registration of TableSource/TableSink in Table Env and ConnectTableDescriptor

Kurt Young
Hi Timo,

tableEnv.fromElements/values() sounds good, do we have a jira ticket to
track the issue?

Best,
Kurt


On Fri, Feb 7, 2020 at 10:56 PM Timo Walther <[hidden email]> wrote:

> Hi Kurt,
>
> Dawid is currently working on making a tableEnv.fromElements/values()
> kind of source possible in the future. We can use this to replace some
> of the tests. Otherwise I guess we should come up with a better test
> infrastructure to make defining source not necessary anymore.
>
> Regards,
> Timo
>
>
> On 07.02.20 11:24, Kurt Young wrote:
> > Thanks all for your feedback, since no objection has been raised, I've
> > created
> > https://issues.apache.org/jira/browse/FLINK-15950 to track this issue.
> >
> > Since this issue would require lots of tests adjustment before it really
> > happen,
> > it won't be done in a short time. Feel free to give feedback anytime here
> > or in jira
> > if you have other opinions.
> >
> > Best,
> > Kurt
> >
> >
> > On Wed, Feb 5, 2020 at 8:26 PM Kurt Young <[hidden email]> wrote:
> >
> >> Hi Zhenghua,
> >>
> >> After removing TableSource::getTableSchema, during optimization, I could
> >> imagine
> >> the schema information might come from relational nodes such as
> TableScan.
> >>
> >> Best,
> >> Kurt
> >>
> >>
> >> On Wed, Feb 5, 2020 at 8:24 PM Kurt Young <[hidden email]> wrote:
> >>
> >>> Hi Jingsong,
> >>>
> >>> Yes current TableFactory is not ideal for users to use either. I think
> we
> >>> should
> >>> also spend some time in 1.11 to improve the usability of
> TableEnvironment
> >>> when
> >>> users trying to read or write something. Automatic scheme inference
> would
> >>> be
> >>> one of them. Other from this, we also support convert a DataStream to
> >>> Table, which
> >>> can serve some flexible requirements to read or write data.
> >>>
> >>> Best,
> >>> Kurt
> >>>
> >>>
> >>> On Wed, Feb 5, 2020 at 7:29 PM Zhenghua Gao <[hidden email]> wrote:
> >>>
> >>>> +1 to remove these methods.
> >>>>
> >>>> One concern about invocations of TableSource::getTableSchema:
> >>>> By removing such methods, we can stop calling
> TableSource::getTableSchema
> >>>> in some place(such
> >>>> as BatchTableEnvImpl/TableEnvironmentImpl#validateTableSource,
> >>>> ConnectorCatalogTable, TableSourceQueryOperation).
> >>>>
> >>>> But in other place we need field types and names of the table
> source(such
> >>>> as
> >>>> BatchExecLookupJoinRule/StreamExecLookupJoinRule,
> >>>> PushProjectIntoTableSourceScanRule,
> >>>> CommonLookupJoin).  So how should we deal with this?
> >>>>
> >>>> *Best Regards,*
> >>>> *Zhenghua Gao*
> >>>>
> >>>>
> >>>> On Wed, Feb 5, 2020 at 2:36 PM Kurt Young <[hidden email]> wrote:
> >>>>
> >>>>> Hi all,
> >>>>>
> >>>>> I'd like to bring up a discussion about removing registration of
> >>>>> TableSource and
> >>>>> TableSink in TableEnvironment as well as in ConnectTableDescriptor.
> The
> >>>>> affected
> >>>>> method would be:
> >>>>>
> >>>>> TableEnvironment::registerTableSource
> >>>>> TableEnvironment::fromTableSource
> >>>>> TableEnvironment::registerTableSink
> >>>>> ConnectTableDescriptor::registerTableSource
> >>>>> ConnectTableDescriptor::registerTableSink
> >>>>> ConnectTableDescriptor::registerTableSourceAndSink
> >>>>>
> >>>>> (Most of them are already deprecated, except for
> >>>>> TableEnvironment::fromTableSource,
> >>>>> which was intended to deprecate but missed by accident).
> >>>>>
> >>>>> FLIP-64 [1] already explained why we want to deprecate TableSource &
> >>>>> TableSink from
> >>>>> user's interface. In a short word, these interfaces should only read
> &
> >>>>> write the physical
> >>>>> representation of the table, and they are not fitting well after we
> >>>> already
> >>>>> introduced some
> >>>>> logical table fields such as computed column, watermarks.
> >>>>>
> >>>>> Another reason is the exposure of registerTableSource in Table Env
> just
> >>>>> make the whole
> >>>>> SQL protocol opposite. TableSource should be used as a reader of
> >>>> table, it
> >>>>> should rely on
> >>>>> other metadata information held by framework, which eventually comes
> >>>> from
> >>>>> DDL or
> >>>>> ConnectDescriptor. But if we register a TableSource to Table Env, we
> >>>> have
> >>>>> no choice but
> >>>>> have to rely on TableSource::getTableSchema. It will make the design
> >>>>> obscure, sometimes
> >>>>> TableSource should trust the information comes from framework, but
> >>>>> sometimes it should
> >>>>> also generate its own schema information.
> >>>>>
> >>>>> Furthermore, if the authority about schema information is not clear,
> it
> >>>>> will make things much
> >>>>> more complicated if we want to improve the table api usability such
> as
> >>>>> introducing automatic
> >>>>> schema inference in the near future.
> >>>>>
> >>>>> Since this is an API break change, I've also included user mailing
> >>>> list to
> >>>>> gather more feedbacks.
> >>>>>
> >>>>> Best,
> >>>>> Kurt
> >>>>>
> >>>>> [1]
> >>>>>
> >>>>>
> >>>>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module
> >>>>>
> >>>>
> >>>
> >
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Remove registration of TableSource/TableSink in Table Env and ConnectTableDescriptor

Timo Walther-2
Hi Kurt,

no there is no JIRA ticket yet. But in any case, I think it is better to
have good testing infrastructure that abstracts source generation, sink
generation, testing data etc. If we will introduce tableEnv.values() it
will also not solve everything because time-based operations might need
time attributes and so on.

Using DDL in tests should also be avoided because strings are even more
difficult to maintain.

Regards,
Timo


On 08.02.20 04:29, Kurt Young wrote:

> Hi Timo,
>
> tableEnv.fromElements/values() sounds good, do we have a jira ticket to
> track the issue?
>
> Best,
> Kurt
>
>
> On Fri, Feb 7, 2020 at 10:56 PM Timo Walther <[hidden email]> wrote:
>
>> Hi Kurt,
>>
>> Dawid is currently working on making a tableEnv.fromElements/values()
>> kind of source possible in the future. We can use this to replace some
>> of the tests. Otherwise I guess we should come up with a better test
>> infrastructure to make defining source not necessary anymore.
>>
>> Regards,
>> Timo
>>
>>
>> On 07.02.20 11:24, Kurt Young wrote:
>>> Thanks all for your feedback, since no objection has been raised, I've
>>> created
>>> https://issues.apache.org/jira/browse/FLINK-15950 to track this issue.
>>>
>>> Since this issue would require lots of tests adjustment before it really
>>> happen,
>>> it won't be done in a short time. Feel free to give feedback anytime here
>>> or in jira
>>> if you have other opinions.
>>>
>>> Best,
>>> Kurt
>>>
>>>
>>> On Wed, Feb 5, 2020 at 8:26 PM Kurt Young <[hidden email]> wrote:
>>>
>>>> Hi Zhenghua,
>>>>
>>>> After removing TableSource::getTableSchema, during optimization, I could
>>>> imagine
>>>> the schema information might come from relational nodes such as
>> TableScan.
>>>>
>>>> Best,
>>>> Kurt
>>>>
>>>>
>>>> On Wed, Feb 5, 2020 at 8:24 PM Kurt Young <[hidden email]> wrote:
>>>>
>>>>> Hi Jingsong,
>>>>>
>>>>> Yes current TableFactory is not ideal for users to use either. I think
>> we
>>>>> should
>>>>> also spend some time in 1.11 to improve the usability of
>> TableEnvironment
>>>>> when
>>>>> users trying to read or write something. Automatic scheme inference
>> would
>>>>> be
>>>>> one of them. Other from this, we also support convert a DataStream to
>>>>> Table, which
>>>>> can serve some flexible requirements to read or write data.
>>>>>
>>>>> Best,
>>>>> Kurt
>>>>>
>>>>>
>>>>> On Wed, Feb 5, 2020 at 7:29 PM Zhenghua Gao <[hidden email]> wrote:
>>>>>
>>>>>> +1 to remove these methods.
>>>>>>
>>>>>> One concern about invocations of TableSource::getTableSchema:
>>>>>> By removing such methods, we can stop calling
>> TableSource::getTableSchema
>>>>>> in some place(such
>>>>>> as BatchTableEnvImpl/TableEnvironmentImpl#validateTableSource,
>>>>>> ConnectorCatalogTable, TableSourceQueryOperation).
>>>>>>
>>>>>> But in other place we need field types and names of the table
>> source(such
>>>>>> as
>>>>>> BatchExecLookupJoinRule/StreamExecLookupJoinRule,
>>>>>> PushProjectIntoTableSourceScanRule,
>>>>>> CommonLookupJoin).  So how should we deal with this?
>>>>>>
>>>>>> *Best Regards,*
>>>>>> *Zhenghua Gao*
>>>>>>
>>>>>>
>>>>>> On Wed, Feb 5, 2020 at 2:36 PM Kurt Young <[hidden email]> wrote:
>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> I'd like to bring up a discussion about removing registration of
>>>>>>> TableSource and
>>>>>>> TableSink in TableEnvironment as well as in ConnectTableDescriptor.
>> The
>>>>>>> affected
>>>>>>> method would be:
>>>>>>>
>>>>>>> TableEnvironment::registerTableSource
>>>>>>> TableEnvironment::fromTableSource
>>>>>>> TableEnvironment::registerTableSink
>>>>>>> ConnectTableDescriptor::registerTableSource
>>>>>>> ConnectTableDescriptor::registerTableSink
>>>>>>> ConnectTableDescriptor::registerTableSourceAndSink
>>>>>>>
>>>>>>> (Most of them are already deprecated, except for
>>>>>>> TableEnvironment::fromTableSource,
>>>>>>> which was intended to deprecate but missed by accident).
>>>>>>>
>>>>>>> FLIP-64 [1] already explained why we want to deprecate TableSource &
>>>>>>> TableSink from
>>>>>>> user's interface. In a short word, these interfaces should only read
>> &
>>>>>>> write the physical
>>>>>>> representation of the table, and they are not fitting well after we
>>>>>> already
>>>>>>> introduced some
>>>>>>> logical table fields such as computed column, watermarks.
>>>>>>>
>>>>>>> Another reason is the exposure of registerTableSource in Table Env
>> just
>>>>>>> make the whole
>>>>>>> SQL protocol opposite. TableSource should be used as a reader of
>>>>>> table, it
>>>>>>> should rely on
>>>>>>> other metadata information held by framework, which eventually comes
>>>>>> from
>>>>>>> DDL or
>>>>>>> ConnectDescriptor. But if we register a TableSource to Table Env, we
>>>>>> have
>>>>>>> no choice but
>>>>>>> have to rely on TableSource::getTableSchema. It will make the design
>>>>>>> obscure, sometimes
>>>>>>> TableSource should trust the information comes from framework, but
>>>>>>> sometimes it should
>>>>>>> also generate its own schema information.
>>>>>>>
>>>>>>> Furthermore, if the authority about schema information is not clear,
>> it
>>>>>>> will make things much
>>>>>>> more complicated if we want to improve the table api usability such
>> as
>>>>>>> introducing automatic
>>>>>>> schema inference in the near future.
>>>>>>>
>>>>>>> Since this is an API break change, I've also included user mailing
>>>>>> list to
>>>>>>> gather more feedbacks.
>>>>>>>
>>>>>>> Best,
>>>>>>> Kurt
>>>>>>>
>>>>>>> [1]
>>>>>>>
>>>>>>>
>>>>>>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module
>>>>>>>
>>>>>>
>>>>>
>>>
>>
>>
>