Hi all,
I'd like to bring up a discussion about removing registration of TableSource and TableSink in TableEnvironment as well as in ConnectTableDescriptor. The affected method would be: TableEnvironment::registerTableSource TableEnvironment::fromTableSource TableEnvironment::registerTableSink ConnectTableDescriptor::registerTableSource ConnectTableDescriptor::registerTableSink ConnectTableDescriptor::registerTableSourceAndSink (Most of them are already deprecated, except for TableEnvironment::fromTableSource, which was intended to deprecate but missed by accident). FLIP-64 [1] already explained why we want to deprecate TableSource & TableSink from user's interface. In a short word, these interfaces should only read & write the physical representation of the table, and they are not fitting well after we already introduced some logical table fields such as computed column, watermarks. Another reason is the exposure of registerTableSource in Table Env just make the whole SQL protocol opposite. TableSource should be used as a reader of table, it should rely on other metadata information held by framework, which eventually comes from DDL or ConnectDescriptor. But if we register a TableSource to Table Env, we have no choice but have to rely on TableSource::getTableSchema. It will make the design obscure, sometimes TableSource should trust the information comes from framework, but sometimes it should also generate its own schema information. Furthermore, if the authority about schema information is not clear, it will make things much more complicated if we want to improve the table api usability such as introducing automatic schema inference in the near future. Since this is an API break change, I've also included user mailing list to gather more feedbacks. Best, Kurt [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module |
Hi Kurt,
I fully agree with the proposal. Yes it was an omission that we did not deprecate the TableEnvironment#fromTableSource in the previous version. I would vote to remove all those methods altogether. Best, Dawid On 05/02/2020 07:36, Kurt Young wrote: > Hi all, > > I'd like to bring up a discussion about removing registration of > TableSource and > TableSink in TableEnvironment as well as in ConnectTableDescriptor. The > affected > method would be: > > TableEnvironment::registerTableSource > TableEnvironment::fromTableSource > TableEnvironment::registerTableSink > ConnectTableDescriptor::registerTableSource > ConnectTableDescriptor::registerTableSink > ConnectTableDescriptor::registerTableSourceAndSink > > (Most of them are already deprecated, except for > TableEnvironment::fromTableSource, > which was intended to deprecate but missed by accident). > > FLIP-64 [1] already explained why we want to deprecate TableSource & > TableSink from > user's interface. In a short word, these interfaces should only read & > write the physical > representation of the table, and they are not fitting well after we already > introduced some > logical table fields such as computed column, watermarks. > > Another reason is the exposure of registerTableSource in Table Env just > make the whole > SQL protocol opposite. TableSource should be used as a reader of table, it > should rely on > other metadata information held by framework, which eventually comes from > DDL or > ConnectDescriptor. But if we register a TableSource to Table Env, we have > no choice but > have to rely on TableSource::getTableSchema. It will make the design > obscure, sometimes > TableSource should trust the information comes from framework, but > sometimes it should > also generate its own schema information. > > Furthermore, if the authority about schema information is not clear, it > will make things much > more complicated if we want to improve the table api usability such as > introducing automatic > schema inference in the near future. > > Since this is an API break change, I've also included user mailing list to > gather more feedbacks. > > Best, > Kurt > > [1] > https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module > signature.asc (849 bytes) Download Attachment |
Thanks for driving this Kurt.
Because we already have DDL and Descriptor as an alternative of these deprecated methods, removing them would reduce ambiguity and make the near future work more easier. As we discussed offline, although some of the connectors may still have attributes that cannot be specified though properties, removing them force us to re-think the TableFactory/properties interface. Best, Danny Chan 在 2020年2月5日 +0800 PM3:58,Dawid Wysakowicz <[hidden email]>,写道: > Hi Kurt, > > I fully agree with the proposal. Yes it was an omission that we did not > deprecate the TableEnvironment#fromTableSource in the previous version. > > I would vote to remove all those methods altogether. > > Best, > > Dawid > > On 05/02/2020 07:36, Kurt Young wrote: > > Hi all, > > > > I'd like to bring up a discussion about removing registration of > > TableSource and > > TableSink in TableEnvironment as well as in ConnectTableDescriptor. The > > affected > > method would be: > > > > TableEnvironment::registerTableSource > > TableEnvironment::fromTableSource > > TableEnvironment::registerTableSink > > ConnectTableDescriptor::registerTableSource > > ConnectTableDescriptor::registerTableSink > > ConnectTableDescriptor::registerTableSourceAndSink > > > > (Most of them are already deprecated, except for > > TableEnvironment::fromTableSource, > > which was intended to deprecate but missed by accident). > > > > FLIP-64 [1] already explained why we want to deprecate TableSource & > > TableSink from > > user's interface. In a short word, these interfaces should only read & > > write the physical > > representation of the table, and they are not fitting well after we already > > introduced some > > logical table fields such as computed column, watermarks. > > > > Another reason is the exposure of registerTableSource in Table Env just > > make the whole > > SQL protocol opposite. TableSource should be used as a reader of table, it > > should rely on > > other metadata information held by framework, which eventually comes from > > DDL or > > ConnectDescriptor. But if we register a TableSource to Table Env, we have > > no choice but > > have to rely on TableSource::getTableSchema. It will make the design > > obscure, sometimes > > TableSource should trust the information comes from framework, but > > sometimes it should > > also generate its own schema information. > > > > Furthermore, if the authority about schema information is not clear, it > > will make things much > > more complicated if we want to improve the table api usability such as > > introducing automatic > > schema inference in the near future. > > > > Since this is an API break change, I've also included user mailing list to > > gather more feedbacks. > > > > Best, > > Kurt > > > > [1] > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module > > > |
In reply to this post by dwysakowicz
HI Kurt,
+1 to remove these methods. But one concern is that some of the current TableSource/TableSink may not be ready, such as the JDBCUpsertTableSink, which accepts a JDBCDialect, but through the TableFactory, there is no way to pass in the JDBCDialect at present. But I also believe we have enough time on 1.11 to prepare them. Then unified to TableFactory. I think there may be complaints from users because they used to only implement a TableSource or TableSink, but now they have to implement TableFactory. But I think it's also good for conceptual clarity to force them to implement TableFactory. Another idea is can we provide some utils to help user implement TableFactory? The current method may be a little bit complex, and it needs to be added to the "META_INF/services/org.apache.flink.table.factories.TableFactory" file. Best, Jingsong Lee On Wed, Feb 5, 2020 at 3:58 PM Dawid Wysakowicz <[hidden email]> wrote: > Hi Kurt, > > I fully agree with the proposal. Yes it was an omission that we did not > deprecate the TableEnvironment#fromTableSource in the previous version. > > I would vote to remove all those methods altogether. > > Best, > > Dawid > > On 05/02/2020 07:36, Kurt Young wrote: > > Hi all, > > > > I'd like to bring up a discussion about removing registration of > > TableSource and > > TableSink in TableEnvironment as well as in ConnectTableDescriptor. The > > affected > > method would be: > > > > TableEnvironment::registerTableSource > > TableEnvironment::fromTableSource > > TableEnvironment::registerTableSink > > ConnectTableDescriptor::registerTableSource > > ConnectTableDescriptor::registerTableSink > > ConnectTableDescriptor::registerTableSourceAndSink > > > > (Most of them are already deprecated, except for > > TableEnvironment::fromTableSource, > > which was intended to deprecate but missed by accident). > > > > FLIP-64 [1] already explained why we want to deprecate TableSource & > > TableSink from > > user's interface. In a short word, these interfaces should only read & > > write the physical > > representation of the table, and they are not fitting well after we > already > > introduced some > > logical table fields such as computed column, watermarks. > > > > Another reason is the exposure of registerTableSource in Table Env just > > make the whole > > SQL protocol opposite. TableSource should be used as a reader of table, > it > > should rely on > > other metadata information held by framework, which eventually comes from > > DDL or > > ConnectDescriptor. But if we register a TableSource to Table Env, we have > > no choice but > > have to rely on TableSource::getTableSchema. It will make the design > > obscure, sometimes > > TableSource should trust the information comes from framework, but > > sometimes it should > > also generate its own schema information. > > > > Furthermore, if the authority about schema information is not clear, it > > will make things much > > more complicated if we want to improve the table api usability such as > > introducing automatic > > schema inference in the near future. > > > > Since this is an API break change, I've also included user mailing list > to > > gather more feedbacks. > > > > Best, > > Kurt > > > > [1] > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module > > > > -- Best, Jingsong Lee |
In reply to this post by Kurt Young
+1 to remove these methods.
One concern about invocations of TableSource::getTableSchema: By removing such methods, we can stop calling TableSource::getTableSchema in some place(such as BatchTableEnvImpl/TableEnvironmentImpl#validateTableSource, ConnectorCatalogTable, TableSourceQueryOperation). But in other place we need field types and names of the table source(such as BatchExecLookupJoinRule/StreamExecLookupJoinRule, PushProjectIntoTableSourceScanRule, CommonLookupJoin). So how should we deal with this? *Best Regards,* *Zhenghua Gao* On Wed, Feb 5, 2020 at 2:36 PM Kurt Young <[hidden email]> wrote: > Hi all, > > I'd like to bring up a discussion about removing registration of > TableSource and > TableSink in TableEnvironment as well as in ConnectTableDescriptor. The > affected > method would be: > > TableEnvironment::registerTableSource > TableEnvironment::fromTableSource > TableEnvironment::registerTableSink > ConnectTableDescriptor::registerTableSource > ConnectTableDescriptor::registerTableSink > ConnectTableDescriptor::registerTableSourceAndSink > > (Most of them are already deprecated, except for > TableEnvironment::fromTableSource, > which was intended to deprecate but missed by accident). > > FLIP-64 [1] already explained why we want to deprecate TableSource & > TableSink from > user's interface. In a short word, these interfaces should only read & > write the physical > representation of the table, and they are not fitting well after we already > introduced some > logical table fields such as computed column, watermarks. > > Another reason is the exposure of registerTableSource in Table Env just > make the whole > SQL protocol opposite. TableSource should be used as a reader of table, it > should rely on > other metadata information held by framework, which eventually comes from > DDL or > ConnectDescriptor. But if we register a TableSource to Table Env, we have > no choice but > have to rely on TableSource::getTableSchema. It will make the design > obscure, sometimes > TableSource should trust the information comes from framework, but > sometimes it should > also generate its own schema information. > > Furthermore, if the authority about schema information is not clear, it > will make things much > more complicated if we want to improve the table api usability such as > introducing automatic > schema inference in the near future. > > Since this is an API break change, I've also included user mailing list to > gather more feedbacks. > > Best, > Kurt > > [1] > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module > |
Hi Jingsong,
Yes current TableFactory is not ideal for users to use either. I think we should also spend some time in 1.11 to improve the usability of TableEnvironment when users trying to read or write something. Automatic scheme inference would be one of them. Other from this, we also support convert a DataStream to Table, which can serve some flexible requirements to read or write data. Best, Kurt On Wed, Feb 5, 2020 at 7:29 PM Zhenghua Gao <[hidden email]> wrote: > +1 to remove these methods. > > One concern about invocations of TableSource::getTableSchema: > By removing such methods, we can stop calling TableSource::getTableSchema > in some place(such > as BatchTableEnvImpl/TableEnvironmentImpl#validateTableSource, > ConnectorCatalogTable, TableSourceQueryOperation). > > But in other place we need field types and names of the table source(such > as > BatchExecLookupJoinRule/StreamExecLookupJoinRule, > PushProjectIntoTableSourceScanRule, > CommonLookupJoin). So how should we deal with this? > > *Best Regards,* > *Zhenghua Gao* > > > On Wed, Feb 5, 2020 at 2:36 PM Kurt Young <[hidden email]> wrote: > > > Hi all, > > > > I'd like to bring up a discussion about removing registration of > > TableSource and > > TableSink in TableEnvironment as well as in ConnectTableDescriptor. The > > affected > > method would be: > > > > TableEnvironment::registerTableSource > > TableEnvironment::fromTableSource > > TableEnvironment::registerTableSink > > ConnectTableDescriptor::registerTableSource > > ConnectTableDescriptor::registerTableSink > > ConnectTableDescriptor::registerTableSourceAndSink > > > > (Most of them are already deprecated, except for > > TableEnvironment::fromTableSource, > > which was intended to deprecate but missed by accident). > > > > FLIP-64 [1] already explained why we want to deprecate TableSource & > > TableSink from > > user's interface. In a short word, these interfaces should only read & > > write the physical > > representation of the table, and they are not fitting well after we > already > > introduced some > > logical table fields such as computed column, watermarks. > > > > Another reason is the exposure of registerTableSource in Table Env just > > make the whole > > SQL protocol opposite. TableSource should be used as a reader of table, > it > > should rely on > > other metadata information held by framework, which eventually comes from > > DDL or > > ConnectDescriptor. But if we register a TableSource to Table Env, we have > > no choice but > > have to rely on TableSource::getTableSchema. It will make the design > > obscure, sometimes > > TableSource should trust the information comes from framework, but > > sometimes it should > > also generate its own schema information. > > > > Furthermore, if the authority about schema information is not clear, it > > will make things much > > more complicated if we want to improve the table api usability such as > > introducing automatic > > schema inference in the near future. > > > > Since this is an API break change, I've also included user mailing list > to > > gather more feedbacks. > > > > Best, > > Kurt > > > > [1] > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module > > > |
Hi Zhenghua,
After removing TableSource::getTableSchema, during optimization, I could imagine the schema information might come from relational nodes such as TableScan. Best, Kurt On Wed, Feb 5, 2020 at 8:24 PM Kurt Young <[hidden email]> wrote: > Hi Jingsong, > > Yes current TableFactory is not ideal for users to use either. I think we > should > also spend some time in 1.11 to improve the usability of TableEnvironment > when > users trying to read or write something. Automatic scheme inference would > be > one of them. Other from this, we also support convert a DataStream to > Table, which > can serve some flexible requirements to read or write data. > > Best, > Kurt > > > On Wed, Feb 5, 2020 at 7:29 PM Zhenghua Gao <[hidden email]> wrote: > >> +1 to remove these methods. >> >> One concern about invocations of TableSource::getTableSchema: >> By removing such methods, we can stop calling TableSource::getTableSchema >> in some place(such >> as BatchTableEnvImpl/TableEnvironmentImpl#validateTableSource, >> ConnectorCatalogTable, TableSourceQueryOperation). >> >> But in other place we need field types and names of the table source(such >> as >> BatchExecLookupJoinRule/StreamExecLookupJoinRule, >> PushProjectIntoTableSourceScanRule, >> CommonLookupJoin). So how should we deal with this? >> >> *Best Regards,* >> *Zhenghua Gao* >> >> >> On Wed, Feb 5, 2020 at 2:36 PM Kurt Young <[hidden email]> wrote: >> >> > Hi all, >> > >> > I'd like to bring up a discussion about removing registration of >> > TableSource and >> > TableSink in TableEnvironment as well as in ConnectTableDescriptor. The >> > affected >> > method would be: >> > >> > TableEnvironment::registerTableSource >> > TableEnvironment::fromTableSource >> > TableEnvironment::registerTableSink >> > ConnectTableDescriptor::registerTableSource >> > ConnectTableDescriptor::registerTableSink >> > ConnectTableDescriptor::registerTableSourceAndSink >> > >> > (Most of them are already deprecated, except for >> > TableEnvironment::fromTableSource, >> > which was intended to deprecate but missed by accident). >> > >> > FLIP-64 [1] already explained why we want to deprecate TableSource & >> > TableSink from >> > user's interface. In a short word, these interfaces should only read & >> > write the physical >> > representation of the table, and they are not fitting well after we >> already >> > introduced some >> > logical table fields such as computed column, watermarks. >> > >> > Another reason is the exposure of registerTableSource in Table Env just >> > make the whole >> > SQL protocol opposite. TableSource should be used as a reader of table, >> it >> > should rely on >> > other metadata information held by framework, which eventually comes >> from >> > DDL or >> > ConnectDescriptor. But if we register a TableSource to Table Env, we >> have >> > no choice but >> > have to rely on TableSource::getTableSchema. It will make the design >> > obscure, sometimes >> > TableSource should trust the information comes from framework, but >> > sometimes it should >> > also generate its own schema information. >> > >> > Furthermore, if the authority about schema information is not clear, it >> > will make things much >> > more complicated if we want to improve the table api usability such as >> > introducing automatic >> > schema inference in the near future. >> > >> > Since this is an API break change, I've also included user mailing list >> to >> > gather more feedbacks. >> > >> > Best, >> > Kurt >> > >> > [1] >> > >> > >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module >> > >> > |
Thanks all for your feedback, since no objection has been raised, I've
created https://issues.apache.org/jira/browse/FLINK-15950 to track this issue. Since this issue would require lots of tests adjustment before it really happen, it won't be done in a short time. Feel free to give feedback anytime here or in jira if you have other opinions. Best, Kurt On Wed, Feb 5, 2020 at 8:26 PM Kurt Young <[hidden email]> wrote: > Hi Zhenghua, > > After removing TableSource::getTableSchema, during optimization, I could > imagine > the schema information might come from relational nodes such as TableScan. > > Best, > Kurt > > > On Wed, Feb 5, 2020 at 8:24 PM Kurt Young <[hidden email]> wrote: > >> Hi Jingsong, >> >> Yes current TableFactory is not ideal for users to use either. I think we >> should >> also spend some time in 1.11 to improve the usability of TableEnvironment >> when >> users trying to read or write something. Automatic scheme inference would >> be >> one of them. Other from this, we also support convert a DataStream to >> Table, which >> can serve some flexible requirements to read or write data. >> >> Best, >> Kurt >> >> >> On Wed, Feb 5, 2020 at 7:29 PM Zhenghua Gao <[hidden email]> wrote: >> >>> +1 to remove these methods. >>> >>> One concern about invocations of TableSource::getTableSchema: >>> By removing such methods, we can stop calling TableSource::getTableSchema >>> in some place(such >>> as BatchTableEnvImpl/TableEnvironmentImpl#validateTableSource, >>> ConnectorCatalogTable, TableSourceQueryOperation). >>> >>> But in other place we need field types and names of the table source(such >>> as >>> BatchExecLookupJoinRule/StreamExecLookupJoinRule, >>> PushProjectIntoTableSourceScanRule, >>> CommonLookupJoin). So how should we deal with this? >>> >>> *Best Regards,* >>> *Zhenghua Gao* >>> >>> >>> On Wed, Feb 5, 2020 at 2:36 PM Kurt Young <[hidden email]> wrote: >>> >>> > Hi all, >>> > >>> > I'd like to bring up a discussion about removing registration of >>> > TableSource and >>> > TableSink in TableEnvironment as well as in ConnectTableDescriptor. The >>> > affected >>> > method would be: >>> > >>> > TableEnvironment::registerTableSource >>> > TableEnvironment::fromTableSource >>> > TableEnvironment::registerTableSink >>> > ConnectTableDescriptor::registerTableSource >>> > ConnectTableDescriptor::registerTableSink >>> > ConnectTableDescriptor::registerTableSourceAndSink >>> > >>> > (Most of them are already deprecated, except for >>> > TableEnvironment::fromTableSource, >>> > which was intended to deprecate but missed by accident). >>> > >>> > FLIP-64 [1] already explained why we want to deprecate TableSource & >>> > TableSink from >>> > user's interface. In a short word, these interfaces should only read & >>> > write the physical >>> > representation of the table, and they are not fitting well after we >>> already >>> > introduced some >>> > logical table fields such as computed column, watermarks. >>> > >>> > Another reason is the exposure of registerTableSource in Table Env just >>> > make the whole >>> > SQL protocol opposite. TableSource should be used as a reader of >>> table, it >>> > should rely on >>> > other metadata information held by framework, which eventually comes >>> from >>> > DDL or >>> > ConnectDescriptor. But if we register a TableSource to Table Env, we >>> have >>> > no choice but >>> > have to rely on TableSource::getTableSchema. It will make the design >>> > obscure, sometimes >>> > TableSource should trust the information comes from framework, but >>> > sometimes it should >>> > also generate its own schema information. >>> > >>> > Furthermore, if the authority about schema information is not clear, it >>> > will make things much >>> > more complicated if we want to improve the table api usability such as >>> > introducing automatic >>> > schema inference in the near future. >>> > >>> > Since this is an API break change, I've also included user mailing >>> list to >>> > gather more feedbacks. >>> > >>> > Best, >>> > Kurt >>> > >>> > [1] >>> > >>> > >>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module >>> > >>> >> |
Hi Kurt,
Dawid is currently working on making a tableEnv.fromElements/values() kind of source possible in the future. We can use this to replace some of the tests. Otherwise I guess we should come up with a better test infrastructure to make defining source not necessary anymore. Regards, Timo On 07.02.20 11:24, Kurt Young wrote: > Thanks all for your feedback, since no objection has been raised, I've > created > https://issues.apache.org/jira/browse/FLINK-15950 to track this issue. > > Since this issue would require lots of tests adjustment before it really > happen, > it won't be done in a short time. Feel free to give feedback anytime here > or in jira > if you have other opinions. > > Best, > Kurt > > > On Wed, Feb 5, 2020 at 8:26 PM Kurt Young <[hidden email]> wrote: > >> Hi Zhenghua, >> >> After removing TableSource::getTableSchema, during optimization, I could >> imagine >> the schema information might come from relational nodes such as TableScan. >> >> Best, >> Kurt >> >> >> On Wed, Feb 5, 2020 at 8:24 PM Kurt Young <[hidden email]> wrote: >> >>> Hi Jingsong, >>> >>> Yes current TableFactory is not ideal for users to use either. I think we >>> should >>> also spend some time in 1.11 to improve the usability of TableEnvironment >>> when >>> users trying to read or write something. Automatic scheme inference would >>> be >>> one of them. Other from this, we also support convert a DataStream to >>> Table, which >>> can serve some flexible requirements to read or write data. >>> >>> Best, >>> Kurt >>> >>> >>> On Wed, Feb 5, 2020 at 7:29 PM Zhenghua Gao <[hidden email]> wrote: >>> >>>> +1 to remove these methods. >>>> >>>> One concern about invocations of TableSource::getTableSchema: >>>> By removing such methods, we can stop calling TableSource::getTableSchema >>>> in some place(such >>>> as BatchTableEnvImpl/TableEnvironmentImpl#validateTableSource, >>>> ConnectorCatalogTable, TableSourceQueryOperation). >>>> >>>> But in other place we need field types and names of the table source(such >>>> as >>>> BatchExecLookupJoinRule/StreamExecLookupJoinRule, >>>> PushProjectIntoTableSourceScanRule, >>>> CommonLookupJoin). So how should we deal with this? >>>> >>>> *Best Regards,* >>>> *Zhenghua Gao* >>>> >>>> >>>> On Wed, Feb 5, 2020 at 2:36 PM Kurt Young <[hidden email]> wrote: >>>> >>>>> Hi all, >>>>> >>>>> I'd like to bring up a discussion about removing registration of >>>>> TableSource and >>>>> TableSink in TableEnvironment as well as in ConnectTableDescriptor. The >>>>> affected >>>>> method would be: >>>>> >>>>> TableEnvironment::registerTableSource >>>>> TableEnvironment::fromTableSource >>>>> TableEnvironment::registerTableSink >>>>> ConnectTableDescriptor::registerTableSource >>>>> ConnectTableDescriptor::registerTableSink >>>>> ConnectTableDescriptor::registerTableSourceAndSink >>>>> >>>>> (Most of them are already deprecated, except for >>>>> TableEnvironment::fromTableSource, >>>>> which was intended to deprecate but missed by accident). >>>>> >>>>> FLIP-64 [1] already explained why we want to deprecate TableSource & >>>>> TableSink from >>>>> user's interface. In a short word, these interfaces should only read & >>>>> write the physical >>>>> representation of the table, and they are not fitting well after we >>>> already >>>>> introduced some >>>>> logical table fields such as computed column, watermarks. >>>>> >>>>> Another reason is the exposure of registerTableSource in Table Env just >>>>> make the whole >>>>> SQL protocol opposite. TableSource should be used as a reader of >>>> table, it >>>>> should rely on >>>>> other metadata information held by framework, which eventually comes >>>> from >>>>> DDL or >>>>> ConnectDescriptor. But if we register a TableSource to Table Env, we >>>> have >>>>> no choice but >>>>> have to rely on TableSource::getTableSchema. It will make the design >>>>> obscure, sometimes >>>>> TableSource should trust the information comes from framework, but >>>>> sometimes it should >>>>> also generate its own schema information. >>>>> >>>>> Furthermore, if the authority about schema information is not clear, it >>>>> will make things much >>>>> more complicated if we want to improve the table api usability such as >>>>> introducing automatic >>>>> schema inference in the near future. >>>>> >>>>> Since this is an API break change, I've also included user mailing >>>> list to >>>>> gather more feedbacks. >>>>> >>>>> Best, >>>>> Kurt >>>>> >>>>> [1] >>>>> >>>>> >>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module >>>>> >>>> >>> > |
Hi Timo,
tableEnv.fromElements/values() sounds good, do we have a jira ticket to track the issue? Best, Kurt On Fri, Feb 7, 2020 at 10:56 PM Timo Walther <[hidden email]> wrote: > Hi Kurt, > > Dawid is currently working on making a tableEnv.fromElements/values() > kind of source possible in the future. We can use this to replace some > of the tests. Otherwise I guess we should come up with a better test > infrastructure to make defining source not necessary anymore. > > Regards, > Timo > > > On 07.02.20 11:24, Kurt Young wrote: > > Thanks all for your feedback, since no objection has been raised, I've > > created > > https://issues.apache.org/jira/browse/FLINK-15950 to track this issue. > > > > Since this issue would require lots of tests adjustment before it really > > happen, > > it won't be done in a short time. Feel free to give feedback anytime here > > or in jira > > if you have other opinions. > > > > Best, > > Kurt > > > > > > On Wed, Feb 5, 2020 at 8:26 PM Kurt Young <[hidden email]> wrote: > > > >> Hi Zhenghua, > >> > >> After removing TableSource::getTableSchema, during optimization, I could > >> imagine > >> the schema information might come from relational nodes such as > TableScan. > >> > >> Best, > >> Kurt > >> > >> > >> On Wed, Feb 5, 2020 at 8:24 PM Kurt Young <[hidden email]> wrote: > >> > >>> Hi Jingsong, > >>> > >>> Yes current TableFactory is not ideal for users to use either. I think > we > >>> should > >>> also spend some time in 1.11 to improve the usability of > TableEnvironment > >>> when > >>> users trying to read or write something. Automatic scheme inference > would > >>> be > >>> one of them. Other from this, we also support convert a DataStream to > >>> Table, which > >>> can serve some flexible requirements to read or write data. > >>> > >>> Best, > >>> Kurt > >>> > >>> > >>> On Wed, Feb 5, 2020 at 7:29 PM Zhenghua Gao <[hidden email]> wrote: > >>> > >>>> +1 to remove these methods. > >>>> > >>>> One concern about invocations of TableSource::getTableSchema: > >>>> By removing such methods, we can stop calling > TableSource::getTableSchema > >>>> in some place(such > >>>> as BatchTableEnvImpl/TableEnvironmentImpl#validateTableSource, > >>>> ConnectorCatalogTable, TableSourceQueryOperation). > >>>> > >>>> But in other place we need field types and names of the table > source(such > >>>> as > >>>> BatchExecLookupJoinRule/StreamExecLookupJoinRule, > >>>> PushProjectIntoTableSourceScanRule, > >>>> CommonLookupJoin). So how should we deal with this? > >>>> > >>>> *Best Regards,* > >>>> *Zhenghua Gao* > >>>> > >>>> > >>>> On Wed, Feb 5, 2020 at 2:36 PM Kurt Young <[hidden email]> wrote: > >>>> > >>>>> Hi all, > >>>>> > >>>>> I'd like to bring up a discussion about removing registration of > >>>>> TableSource and > >>>>> TableSink in TableEnvironment as well as in ConnectTableDescriptor. > The > >>>>> affected > >>>>> method would be: > >>>>> > >>>>> TableEnvironment::registerTableSource > >>>>> TableEnvironment::fromTableSource > >>>>> TableEnvironment::registerTableSink > >>>>> ConnectTableDescriptor::registerTableSource > >>>>> ConnectTableDescriptor::registerTableSink > >>>>> ConnectTableDescriptor::registerTableSourceAndSink > >>>>> > >>>>> (Most of them are already deprecated, except for > >>>>> TableEnvironment::fromTableSource, > >>>>> which was intended to deprecate but missed by accident). > >>>>> > >>>>> FLIP-64 [1] already explained why we want to deprecate TableSource & > >>>>> TableSink from > >>>>> user's interface. In a short word, these interfaces should only read > & > >>>>> write the physical > >>>>> representation of the table, and they are not fitting well after we > >>>> already > >>>>> introduced some > >>>>> logical table fields such as computed column, watermarks. > >>>>> > >>>>> Another reason is the exposure of registerTableSource in Table Env > just > >>>>> make the whole > >>>>> SQL protocol opposite. TableSource should be used as a reader of > >>>> table, it > >>>>> should rely on > >>>>> other metadata information held by framework, which eventually comes > >>>> from > >>>>> DDL or > >>>>> ConnectDescriptor. But if we register a TableSource to Table Env, we > >>>> have > >>>>> no choice but > >>>>> have to rely on TableSource::getTableSchema. It will make the design > >>>>> obscure, sometimes > >>>>> TableSource should trust the information comes from framework, but > >>>>> sometimes it should > >>>>> also generate its own schema information. > >>>>> > >>>>> Furthermore, if the authority about schema information is not clear, > it > >>>>> will make things much > >>>>> more complicated if we want to improve the table api usability such > as > >>>>> introducing automatic > >>>>> schema inference in the near future. > >>>>> > >>>>> Since this is an API break change, I've also included user mailing > >>>> list to > >>>>> gather more feedbacks. > >>>>> > >>>>> Best, > >>>>> Kurt > >>>>> > >>>>> [1] > >>>>> > >>>>> > >>>> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module > >>>>> > >>>> > >>> > > > > |
Hi Kurt,
no there is no JIRA ticket yet. But in any case, I think it is better to have good testing infrastructure that abstracts source generation, sink generation, testing data etc. If we will introduce tableEnv.values() it will also not solve everything because time-based operations might need time attributes and so on. Using DDL in tests should also be avoided because strings are even more difficult to maintain. Regards, Timo On 08.02.20 04:29, Kurt Young wrote: > Hi Timo, > > tableEnv.fromElements/values() sounds good, do we have a jira ticket to > track the issue? > > Best, > Kurt > > > On Fri, Feb 7, 2020 at 10:56 PM Timo Walther <[hidden email]> wrote: > >> Hi Kurt, >> >> Dawid is currently working on making a tableEnv.fromElements/values() >> kind of source possible in the future. We can use this to replace some >> of the tests. Otherwise I guess we should come up with a better test >> infrastructure to make defining source not necessary anymore. >> >> Regards, >> Timo >> >> >> On 07.02.20 11:24, Kurt Young wrote: >>> Thanks all for your feedback, since no objection has been raised, I've >>> created >>> https://issues.apache.org/jira/browse/FLINK-15950 to track this issue. >>> >>> Since this issue would require lots of tests adjustment before it really >>> happen, >>> it won't be done in a short time. Feel free to give feedback anytime here >>> or in jira >>> if you have other opinions. >>> >>> Best, >>> Kurt >>> >>> >>> On Wed, Feb 5, 2020 at 8:26 PM Kurt Young <[hidden email]> wrote: >>> >>>> Hi Zhenghua, >>>> >>>> After removing TableSource::getTableSchema, during optimization, I could >>>> imagine >>>> the schema information might come from relational nodes such as >> TableScan. >>>> >>>> Best, >>>> Kurt >>>> >>>> >>>> On Wed, Feb 5, 2020 at 8:24 PM Kurt Young <[hidden email]> wrote: >>>> >>>>> Hi Jingsong, >>>>> >>>>> Yes current TableFactory is not ideal for users to use either. I think >> we >>>>> should >>>>> also spend some time in 1.11 to improve the usability of >> TableEnvironment >>>>> when >>>>> users trying to read or write something. Automatic scheme inference >> would >>>>> be >>>>> one of them. Other from this, we also support convert a DataStream to >>>>> Table, which >>>>> can serve some flexible requirements to read or write data. >>>>> >>>>> Best, >>>>> Kurt >>>>> >>>>> >>>>> On Wed, Feb 5, 2020 at 7:29 PM Zhenghua Gao <[hidden email]> wrote: >>>>> >>>>>> +1 to remove these methods. >>>>>> >>>>>> One concern about invocations of TableSource::getTableSchema: >>>>>> By removing such methods, we can stop calling >> TableSource::getTableSchema >>>>>> in some place(such >>>>>> as BatchTableEnvImpl/TableEnvironmentImpl#validateTableSource, >>>>>> ConnectorCatalogTable, TableSourceQueryOperation). >>>>>> >>>>>> But in other place we need field types and names of the table >> source(such >>>>>> as >>>>>> BatchExecLookupJoinRule/StreamExecLookupJoinRule, >>>>>> PushProjectIntoTableSourceScanRule, >>>>>> CommonLookupJoin). So how should we deal with this? >>>>>> >>>>>> *Best Regards,* >>>>>> *Zhenghua Gao* >>>>>> >>>>>> >>>>>> On Wed, Feb 5, 2020 at 2:36 PM Kurt Young <[hidden email]> wrote: >>>>>> >>>>>>> Hi all, >>>>>>> >>>>>>> I'd like to bring up a discussion about removing registration of >>>>>>> TableSource and >>>>>>> TableSink in TableEnvironment as well as in ConnectTableDescriptor. >> The >>>>>>> affected >>>>>>> method would be: >>>>>>> >>>>>>> TableEnvironment::registerTableSource >>>>>>> TableEnvironment::fromTableSource >>>>>>> TableEnvironment::registerTableSink >>>>>>> ConnectTableDescriptor::registerTableSource >>>>>>> ConnectTableDescriptor::registerTableSink >>>>>>> ConnectTableDescriptor::registerTableSourceAndSink >>>>>>> >>>>>>> (Most of them are already deprecated, except for >>>>>>> TableEnvironment::fromTableSource, >>>>>>> which was intended to deprecate but missed by accident). >>>>>>> >>>>>>> FLIP-64 [1] already explained why we want to deprecate TableSource & >>>>>>> TableSink from >>>>>>> user's interface. In a short word, these interfaces should only read >> & >>>>>>> write the physical >>>>>>> representation of the table, and they are not fitting well after we >>>>>> already >>>>>>> introduced some >>>>>>> logical table fields such as computed column, watermarks. >>>>>>> >>>>>>> Another reason is the exposure of registerTableSource in Table Env >> just >>>>>>> make the whole >>>>>>> SQL protocol opposite. TableSource should be used as a reader of >>>>>> table, it >>>>>>> should rely on >>>>>>> other metadata information held by framework, which eventually comes >>>>>> from >>>>>>> DDL or >>>>>>> ConnectDescriptor. But if we register a TableSource to Table Env, we >>>>>> have >>>>>>> no choice but >>>>>>> have to rely on TableSource::getTableSchema. It will make the design >>>>>>> obscure, sometimes >>>>>>> TableSource should trust the information comes from framework, but >>>>>>> sometimes it should >>>>>>> also generate its own schema information. >>>>>>> >>>>>>> Furthermore, if the authority about schema information is not clear, >> it >>>>>>> will make things much >>>>>>> more complicated if we want to improve the table api usability such >> as >>>>>>> introducing automatic >>>>>>> schema inference in the near future. >>>>>>> >>>>>>> Since this is an API break change, I've also included user mailing >>>>>> list to >>>>>>> gather more feedbacks. >>>>>>> >>>>>>> Best, >>>>>>> Kurt >>>>>>> >>>>>>> [1] >>>>>>> >>>>>>> >>>>>> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module >>>>>>> >>>>>> >>>>> >>> >> >> > |
Free forum by Nabble | Edit this page |