(DEPRECATED) Apache Flink Mailing List archive.

[DISCUSS] FLIP-68: Extend Core Table System with Modular Plugins

Classic

List

Threaded

21 messages Options

bowen.li

[DISCUSS] FLIP-68: Extend Core Table System with Modular Plugins

Hi devs,

We'd like to kick off a conversation on "FLIP-68: Extend Core Table System
with Modular Plugins" [1].

The modular approach was raised in discussion of how to support Hive
built-in functions in FLIP-57 [2]. As we discussed and looked deeper, we
think it’s a good opportunity to broaden the design and the corresponding
problem it aims to solve. The motivation is to expand Flink’s core table
system and enable users to do customizations by writing pluggable modules.

There are two aspects of the motivation:
1. Enpower users to write code and do customized developement for Flink
table core
2. Enable users to integrate Flink with cores and built-in objects of other
systems, so users can reuse what they are familiar with in other SQL
systems seamlessly as core and built-ins of Flink table

Please take a look, and feedbacks are welcome.

Bowen

[1]
https://docs.google.com/document/d/17CPMpMbPDjvM4selUVEfh_tqUK_oV0TODAUA9dfHakc/edit?usp=sharing
[2]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-57-Rework-FunctionCatalog-td32291.html

bowen.li

Re: [DISCUSS] FLIP-68: Extend Core Table System with Modular Plugins

Thanks everyone for your feedback. I've converted it to a FLIP wiki [1].

Please take another look. If there's no more concerns, I'd like to start a
voting thread for it.

Thanks

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-68%3A+Extend+Core+Table+System+with+Modular+Plugins

On Tue, Sep 17, 2019 at 11:25 AM Bowen Li <[hidden email]> wrote:

> Hi devs,
>
> We'd like to kick off a conversation on "FLIP-68: Extend Core Table
> System with Modular Plugins" [1].
>
> The modular approach was raised in discussion of how to support Hive
> built-in functions in FLIP-57 [2]. As we discussed and looked deeper, we
> think it’s a good opportunity to broaden the design and the corresponding
> problem it aims to solve. The motivation is to expand Flink’s core table
> system and enable users to do customizations by writing pluggable modules.
>
> There are two aspects of the motivation:
> 1. Enpower users to write code and do customized developement for Flink
> table core
> 2. Enable users to integrate Flink with cores and built-in objects of
> other systems, so users can reuse what they are familiar with in other SQL
> systems seamlessly as core and built-ins of Flink table
>
> Please take a look, and feedbacks are welcome.
>
> Bowen
>
> [1]
> https://docs.google.com/document/d/17CPMpMbPDjvM4selUVEfh_tqUK_oV0TODAUA9dfHakc/edit?usp=sharing
> [2]
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-57-Rework-FunctionCatalog-td32291.html
>

Timo Walther-2

Re: [DISCUSS] FLIP-68: Extend Core Table System with Modular Plugins

Hi Bowen,

thanks for this proposal after our discussion around the FunctionCatalog
rework. I like the architecture proposed in the FLIP because it is also
based on existing concepts and just slightly modifies the code base.
However, I would like to discuss some unanswered questions:

1) Terminology: Can we use the term "module" instead of "plugin"? Flink
also introduced the concept of a "plugin" recently for filesystems and
other purposes. So far our table extensions don't rely on this plugin
mechanism which means that classes in the classpath could potentially
clash. The term "module" also for function modules fits better here.

2) Intermediate plugin changes: Can a user call usePlugins(...) multiple
times within a session and change to completely different set of
modules? I guess we need to support this to have interactive sessions.
But what are the side effects if a plugin is dropped or a newly loaded
plugin overrides a function previous function name?

3) Missing CorePlugin: What happens if the CorePlugin is not loaded?
Will CAST, AS etc. not be available? This could lead to weird behavior
but I'm also fine with letting advanced users deal with the problem
themselves. They can still return function definitions from the
BuiltInFunctionDefinition class.

4) SQL: Can we already discuss how a SQL command for this would look
like? E.g.:
LOAD MODULE 'hive' [WITH ('prop'='myProp', ...)]; UNLOAD MODULE 'hive';

Thanks,
Timo

On 19.09.19 23:53, Bowen Li wrote:

> Thanks everyone for your feedback. I've converted it to a FLIP wiki [1].
>
> Please take another look. If there's no more concerns, I'd like to start a
> voting thread for it.
>
> Thanks
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-68%3A+Extend+Core+Table+System+with+Modular+Plugins
>
>
>
>
> On Tue, Sep 17, 2019 at 11:25 AM Bowen Li <[hidden email]> wrote:
>
>> Hi devs,
>>
>> We'd like to kick off a conversation on "FLIP-68: Extend Core Table
>> System with Modular Plugins" [1].
>>
>> The modular approach was raised in discussion of how to support Hive
>> built-in functions in FLIP-57 [2]. As we discussed and looked deeper, we
>> think it’s a good opportunity to broaden the design and the corresponding
>> problem it aims to solve. The motivation is to expand Flink’s core table
>> system and enable users to do customizations by writing pluggable modules.
>>
>> There are two aspects of the motivation:
>> 1. Enpower users to write code and do customized developement for Flink
>> table core
>> 2. Enable users to integrate Flink with cores and built-in objects of
>> other systems, so users can reuse what they are familiar with in other SQL
>> systems seamlessly as core and built-ins of Flink table
>>
>> Please take a look, and feedbacks are welcome.
>>
>> Bowen
>>
>> [1]
>> https://docs.google.com/document/d/17CPMpMbPDjvM4selUVEfh_tqUK_oV0TODAUA9dfHakc/edit?usp=sharing
>> [2]
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-57-Rework-FunctionCatalog-td32291.html
>>

bowen.li

Re: [DISCUSS] FLIP-68: Extend Core Table System with Modular Plugins

Hi Timo,

Re 1) I agree. I renamed the title to "Extend Core Table System with
Pluggable Modules" and all internal references

Re 2) First, I'll rename the API to useModules(). The design doesn't forbid
users to call useModules() multi times. Objects in modules are loaded on
demand instead of eagerly, so there won't be inconsistency. Users have to
be fully aware of the consequences of resetting modules as that might cause
that some objects can not be referenced anymore or resolution order of some
objects changes.

Re 3) Yes, we'd leave that to users.

Another approach can be to have a non-optional "Core" module for all
objects that cannot be overrode like "CAST" and "AS" functions, and have an
optional "ExtendedCore" module for other replaceable built-in objects.
"Core" should be positioned at the 1st in module list all the time.

I'm fine with either solution.

Re 4) It may sound like a nice-to-have advanced feature for 1.10, but we
can surely fully discuss it for the sake of feature completeness.

Unlike other configs, the order of modules would matter in Flink, and it
implies the LOAD/UNLOAD commands would not be equal in operation positions.
IIUYC, LOAD MODULE 'x' would be interpreted as appending x to the end of
module list, and UNLOAD MODULE 'x' would be interpreted as removing x from
any position in the list?

I'm thinking of the following list of commands:

SHOW MODULES - list modules in order
LOAD MODULE 'hive' [WITH ('prop'='myProp', ...)] - load and append the
module to end of the module list
UNLOAD MODULE 'hive' - remove the module from module list, and other
modules remain the same relative positions
USE MODULES 'x' 'y' 'z' (wondering can parser take "'x' 'y' 'z'"?), or USE
MODULES 'x,y,z' - to reorder module list completely

Timo Walther-2

Re: [DISCUSS] FLIP-68: Extend Core Table System with Modular Plugins

Hi Bowen,

thanks for your response.

Re 2) I also don't have a better approach for this issue. It is similar
to changing the general TableConfig between two statements. It would be
good to add your explanation to the design document.

Re 3) It would be interesting to know about which "core" functions we
are actually talking about. Also for the overriding built-in functions
that we discussed in the other FLIP. But I'm fine with leaving it to the
user for now. How about we just introduce loadModule(), unloadModule()
methods instead of useModules()? This would ensure that users don't
forget to add the core module when adding an additional module and they
need to explicitly call "unloadModule('core')".

Re 4) Every table environment feature should also be designed with SQL
statements in mind to verify the concept. SQL is also more popular that
Java/Scala API or YAML file. I would like to add it to 1.10 for marking
the feature as complete.

SHOW MODULES -> sounds good to me, we should add a listModules(): List<String> method to table environment

LOAD MODULE 'hive' [WITH ('prop'='myProp', ...)] --> we should add a loadModule() method to table environment

UNLOAD MODULE 'hive' --> we should add a unloadModule() method to table environment

I would not introduce `USE MODULES 'x' 'y' 'z'` for simplicity and
concise API. Users need to load the module anyway with properties. They
can also load them "in order" immediately. CREATE TABLE can also not
create multiple tables but only one at a time in that order.

One thing that came to my mind, shall we use a descriptor approach for
loadModule()? The past has shown that passing instances causes problems
when persisting objects. That's why we also want to get rid of
registerTableSource. I could image that users might want to persist a
table environment's state for later use in the future. Even though this
is future work, we should already keep such use cases in mind when
adding new API methods. What do you think?

Regards,
Timo

On 30.09.19 23:17, Bowen Li wrote:

> Hi Timo,
>
> Re 1) I agree. I renamed the title to "Extend Core Table System with
> Pluggable Modules" and all internal references
>
> Re 2) First, I'll rename the API to useModules(). The design doesn't forbid
> users to call useModules() multi times. Objects in modules are loaded on
> demand instead of eagerly, so there won't be inconsistency. Users have to
> be fully aware of the consequences of resetting modules as that might cause
> that some objects can not be referenced anymore or resolution order of some
> objects changes.
>
> Re 3) Yes, we'd leave that to users.
>
> Another approach can be to have a non-optional "Core" module for all
> objects that cannot be overrode like "CAST" and "AS" functions, and have an
> optional "ExtendedCore" module for other replaceable built-in objects.
> "Core" should be positioned at the 1st in module list all the time.
>
> I'm fine with either solution.
>
> Re 4) It may sound like a nice-to-have advanced feature for 1.10, but we
> can surely fully discuss it for the sake of feature completeness.
>
> Unlike other configs, the order of modules would matter in Flink, and it
> implies the LOAD/UNLOAD commands would not be equal in operation positions.
> IIUYC, LOAD MODULE 'x' would be interpreted as appending x to the end of
> module list, and UNLOAD MODULE 'x' would be interpreted as removing x from
> any position in the list?
>
> I'm thinking of the following list of commands:
>
> SHOW MODULES - list modules in order
> LOAD MODULE 'hive' [WITH ('prop'='myProp', ...)] - load and append the
> module to end of the module list
> UNLOAD MODULE 'hive' - remove the module from module list, and other
> modules remain the same relative positions
> USE MODULES 'x' 'y' 'z' (wondering can parser take "'x' 'y' 'z'"?), or USE
> MODULES 'x,y,z' - to reorder module list completely
>

dwysakowicz

Re: [DISCUSS] FLIP-68: Extend Core Table System with Modular Plugins

Hi Timo, Bowen,

Unfortunately I did not have enough time to go through all the
suggestions in details so I can not comment on the whole FLIP.

I just wanted to give my opinion on the "descriptor approach in
loadModule" part. I am not sure if we need it here. We might be
overthinking this a bit. It definitely makes sense for objects like
TableSource/TableSink etc. as they are logical definitions that nearly
always have to be persisted in a Catalog. I'm not sure if we really need
the same for a whole session. If we need a resume session feature, the
way to go would probably be to keep the session in memory on the server
side. I fear we will never be able to serialize the whole session
entirely (temporary objects, objects derived from DataStream etc.)

I think it is ok to use instances for objects like Catalogs or Modules
and have an overlay on top of that that can create instances from
properties.

Best,

Dawid

On 01/10/2019 11:28, Timo Walther wrote:

> Hi Bowen,
>
> thanks for your response.
>
> Re 2) I also don't have a better approach for this issue. It is
> similar to changing the general TableConfig between two statements. It
> would be good to add your explanation to the design document.
>
> Re 3) It would be interesting to know about which "core" functions we
> are actually talking about. Also for the overriding built-in functions
> that we discussed in the other FLIP. But I'm fine with leaving it to
> the user for now. How about we just introduce loadModule(),
> unloadModule() methods instead of useModules()? This would ensure that
> users don't forget to add the core module when adding an additional
> module and they need to explicitly call "unloadModule('core')".
>
> Re 4) Every table environment feature should also be designed with SQL
> statements in mind to verify the concept. SQL is also more popular
> that Java/Scala API or YAML file. I would like to add it to 1.10 for
> marking the feature as complete.
>
> SHOW MODULES -> sounds good to me, we should add a listModules():
> List<String> method to table environment
>
> LOAD MODULE 'hive' [WITH ('prop'='myProp', ...)] --> we should add a
> loadModule() method to table environment
>
> UNLOAD MODULE 'hive' --> we should add a unloadModule() method to
> table environment
>
> I would not introduce `USE MODULES 'x' 'y' 'z'` for simplicity and
> concise API. Users need to load the module anyway with properties.
> They can also load them "in order" immediately. CREATE TABLE can also
> not create multiple tables but only one at a time in that order.
>
> One thing that came to my mind, shall we use a descriptor approach for
> loadModule()? The past has shown that passing instances causes
> problems when persisting objects. That's why we also want to get rid
> of registerTableSource. I could image that users might want to persist
> a table environment's state for later use in the future. Even though
> this is future work, we should already keep such use cases in mind
> when adding new API methods. What do you think?
>
> Regards,
> Timo
>
>
> On 30.09.19 23:17, Bowen Li wrote:
>> Hi Timo,
>>
>> Re 1) I agree. I renamed the title to "Extend Core Table System with
>> Pluggable Modules" and all internal references
>>
>> Re 2) First, I'll rename the API to useModules(). The design doesn't
>> forbid
>> users to call useModules() multi times. Objects in modules are loaded on
>> demand instead of eagerly, so there won't be inconsistency. Users
>> have to
>> be fully aware of the consequences of resetting modules as that might
>> cause
>> that some objects can not be referenced anymore or resolution order
>> of some
>> objects changes.
>>
>> Re 3) Yes, we'd leave that to users.
>>
>> Another approach can be to have a non-optional "Core" module for all
>> objects that cannot be overrode like "CAST" and "AS" functions, and
>> have an
>> optional "ExtendedCore" module for other replaceable built-in objects.
>> "Core" should be positioned at the 1st in module list all the time.
>>
>> I'm fine with either solution.
>>
>> Re 4) It may sound like a nice-to-have advanced feature for 1.10, but we
>> can surely fully discuss it for the sake of feature completeness.
>>
>> Unlike other configs, the order of modules would matter in Flink, and it
>> implies the LOAD/UNLOAD commands would not be equal in operation
>> positions.
>> IIUYC, LOAD MODULE 'x' would be interpreted as appending x to the end of
>> module list, and UNLOAD MODULE 'x' would be interpreted as removing x
>> from
>> any position in the list?
>>
>> I'm thinking of the following list of commands:
>>
>> SHOW MODULES - list modules in order
>> LOAD MODULE 'hive' [WITH ('prop'='myProp', ...)] - load and append the
>> module to end of the module list
>> UNLOAD MODULE 'hive' - remove the module from module list, and other
>> modules remain the same relative positions
>> USE MODULES 'x' 'y' 'z' (wondering can parser take "'x' 'y' 'z'"?),
>> or USE
>> MODULES 'x,y,z' - to reorder module list completely
>>
>

signature.asc (849 bytes) Download Attachment

bowen.li

Re: [DISCUSS] FLIP-68: Extend Core Table System with Modular Plugins

Hi Timo, Dawid,

I've added the suggested SQL and related changes to TableEnvironment API
and other classes to the google doc. Also removed "USE MODULE" and its
APIs. Will update FLIP wiki once we have a consensus.

W.r.t. descriptor approach, my gut feeling is similar to Dawid's. Besides,
I feel yaml file would be a better solution to persist serializable state
of an environment as the file itself is in serializable format already.
Though yaml file only serves SQL CLI at this moment, we may be able to
extend its reach to Table API and allow users to load/offload a
TableEnvironment from/to yaml files, as something like "TableEnvironment
tEnv = TableEnvironment.loadFromYaml(<file_path>)" and
"tEnv.offloadToYaml(<file_path>)" to restore and persist state, and try to
make yaml file more expressive.

On Tue, Oct 1, 2019 at 6:47 AM Dawid Wysakowicz <[hidden email]>
wrote:

> Hi Timo, Bowen,
>
> Unfortunately I did not have enough time to go through all the
> suggestions in details so I can not comment on the whole FLIP.
>
> I just wanted to give my opinion on the "descriptor approach in
> loadModule" part. I am not sure if we need it here. We might be
> overthinking this a bit. It definitely makes sense for objects like
> TableSource/TableSink etc. as they are logical definitions that nearly
> always have to be persisted in a Catalog. I'm not sure if we really need
> the same for a whole session. If we need a resume session feature, the
> way to go would probably be to keep the session in memory on the server
> side. I fear we will never be able to serialize the whole session
> entirely (temporary objects, objects derived from DataStream etc.)
>
> I think it is ok to use instances for objects like Catalogs or Modules
> and have an overlay on top of that that can create instances from
> properties.
>
> Best,
>
> Dawid
>
> On 01/10/2019 11:28, Timo Walther wrote:
> > Hi Bowen,
> >
> > thanks for your response.
> >
> > Re 2) I also don't have a better approach for this issue. It is
> > similar to changing the general TableConfig between two statements. It
> > would be good to add your explanation to the design document.
> >
> > Re 3) It would be interesting to know about which "core" functions we
> > are actually talking about. Also for the overriding built-in functions
> > that we discussed in the other FLIP. But I'm fine with leaving it to
> > the user for now. How about we just introduce loadModule(),
> > unloadModule() methods instead of useModules()? This would ensure that
> > users don't forget to add the core module when adding an additional
> > module and they need to explicitly call "unloadModule('core')".
> >
> > Re 4) Every table environment feature should also be designed with SQL
> > statements in mind to verify the concept. SQL is also more popular
> > that Java/Scala API or YAML file. I would like to add it to 1.10 for
> > marking the feature as complete.
> >
> > SHOW MODULES -> sounds good to me, we should add a listModules():
> > List<String> method to table environment
> >
> > LOAD MODULE 'hive' [WITH ('prop'='myProp', ...)] --> we should add a
> > loadModule() method to table environment
> >
> > UNLOAD MODULE 'hive' --> we should add a unloadModule() method to
> > table environment
> >
> > I would not introduce `USE MODULES 'x' 'y' 'z'` for simplicity and
> > concise API. Users need to load the module anyway with properties.
> > They can also load them "in order" immediately. CREATE TABLE can also
> > not create multiple tables but only one at a time in that order.
> >
> > One thing that came to my mind, shall we use a descriptor approach for
> > loadModule()? The past has shown that passing instances causes
> > problems when persisting objects. That's why we also want to get rid
> > of registerTableSource. I could image that users might want to persist
> > a table environment's state for later use in the future. Even though
> > this is future work, we should already keep such use cases in mind
> > when adding new API methods. What do you think?
> >
> > Regards,
> > Timo
> >
> >
> > On 30.09.19 23:17, Bowen Li wrote:
> >> Hi Timo,
> >>
> >> Re 1) I agree. I renamed the title to "Extend Core Table System with
> >> Pluggable Modules" and all internal references
> >>
> >> Re 2) First, I'll rename the API to useModules(). The design doesn't
> >> forbid
> >> users to call useModules() multi times. Objects in modules are loaded on
> >> demand instead of eagerly, so there won't be inconsistency. Users
> >> have to
> >> be fully aware of the consequences of resetting modules as that might
> >> cause
> >> that some objects can not be referenced anymore or resolution order
> >> of some
> >> objects changes.
> >>
> >> Re 3) Yes, we'd leave that to users.
> >>
> >> Another approach can be to have a non-optional "Core" module for all
> >> objects that cannot be overrode like "CAST" and "AS" functions, and
> >> have an
> >> optional "ExtendedCore" module for other replaceable built-in objects.
> >> "Core" should be positioned at the 1st in module list all the time.
> >>
> >> I'm fine with either solution.
> >>
> >> Re 4) It may sound like a nice-to-have advanced feature for 1.10, but we
> >> can surely fully discuss it for the sake of feature completeness.
> >>
> >> Unlike other configs, the order of modules would matter in Flink, and it
> >> implies the LOAD/UNLOAD commands would not be equal in operation
> >> positions.
> >> IIUYC, LOAD MODULE 'x' would be interpreted as appending x to the end of
> >> module list, and UNLOAD MODULE 'x' would be interpreted as removing x
> >> from
> >> any position in the list?
> >>
> >> I'm thinking of the following list of commands:
> >>
> >> SHOW MODULES - list modules in order
> >> LOAD MODULE 'hive' [WITH ('prop'='myProp', ...)] - load and append the
> >> module to end of the module list
> >> UNLOAD MODULE 'hive' - remove the module from module list, and other
> >> modules remain the same relative positions
> >> USE MODULES 'x' 'y' 'z' (wondering can parser take "'x' 'y' 'z'"?),
> >> or USE
> >> MODULES 'x,y,z' - to reorder module list completely
> >>
> >
>
>

bowen.li

Re: [DISCUSS] FLIP-68: Extend Core Table System with Modular Plugins

If something like the yaml file is the way to go and achieve such
motivation, we would cover that with current design.

On Tue, Oct 1, 2019 at 12:05 Bowen Li <[hidden email]> wrote:

> Hi Timo, Dawid,
>
> I've added the suggested SQL and related changes to TableEnvironment API
> and other classes to the google doc. Also removed "USE MODULE" and its
> APIs. Will update FLIP wiki once we have a consensus.
>
> W.r.t. descriptor approach, my gut feeling is similar to Dawid's. Besides,
> I feel yaml file would be a better solution to persist serializable state
> of an environment as the file itself is in serializable format already.
> Though yaml file only serves SQL CLI at this moment, we may be able to
> extend its reach to Table API and allow users to load/offload a
> TableEnvironment from/to yaml files, as something like "TableEnvironment
> tEnv = TableEnvironment.loadFromYaml(<file_path>)" and
> "tEnv.offloadToYaml(<file_path>)" to restore and persist state, and try to
> make yaml file more expressive.
>
>
> On Tue, Oct 1, 2019 at 6:47 AM Dawid Wysakowicz <[hidden email]>
> wrote:
>
>> Hi Timo, Bowen,
>>
>> Unfortunately I did not have enough time to go through all the
>> suggestions in details so I can not comment on the whole FLIP.
>>
>> I just wanted to give my opinion on the "descriptor approach in
>> loadModule" part. I am not sure if we need it here. We might be
>> overthinking this a bit. It definitely makes sense for objects like
>> TableSource/TableSink etc. as they are logical definitions that nearly
>> always have to be persisted in a Catalog. I'm not sure if we really need
>> the same for a whole session. If we need a resume session feature, the
>> way to go would probably be to keep the session in memory on the server
>> side. I fear we will never be able to serialize the whole session
>> entirely (temporary objects, objects derived from DataStream etc.)
>>
>> I think it is ok to use instances for objects like Catalogs or Modules
>> and have an overlay on top of that that can create instances from
>> properties.
>>
>> Best,
>>
>> Dawid
>>
>> On 01/10/2019 11:28, Timo Walther wrote:
>> > Hi Bowen,
>> >
>> > thanks for your response.
>> >
>> > Re 2) I also don't have a better approach for this issue. It is
>> > similar to changing the general TableConfig between two statements. It
>> > would be good to add your explanation to the design document.
>> >
>> > Re 3) It would be interesting to know about which "core" functions we
>> > are actually talking about. Also for the overriding built-in functions
>> > that we discussed in the other FLIP. But I'm fine with leaving it to
>> > the user for now. How about we just introduce loadModule(),
>> > unloadModule() methods instead of useModules()? This would ensure that
>> > users don't forget to add the core module when adding an additional
>> > module and they need to explicitly call "unloadModule('core')".
>> >
>> > Re 4) Every table environment feature should also be designed with SQL
>> > statements in mind to verify the concept. SQL is also more popular
>> > that Java/Scala API or YAML file. I would like to add it to 1.10 for
>> > marking the feature as complete.
>> >
>> > SHOW MODULES -> sounds good to me, we should add a listModules():
>> > List<String> method to table environment
>> >
>> > LOAD MODULE 'hive' [WITH ('prop'='myProp', ...)] --> we should add a
>> > loadModule() method to table environment
>> >
>> > UNLOAD MODULE 'hive' --> we should add a unloadModule() method to
>> > table environment
>> >
>> > I would not introduce `USE MODULES 'x' 'y' 'z'` for simplicity and
>> > concise API. Users need to load the module anyway with properties.
>> > They can also load them "in order" immediately. CREATE TABLE can also
>> > not create multiple tables but only one at a time in that order.
>> >
>> > One thing that came to my mind, shall we use a descriptor approach for
>> > loadModule()? The past has shown that passing instances causes
>> > problems when persisting objects. That's why we also want to get rid
>> > of registerTableSource. I could image that users might want to persist
>> > a table environment's state for later use in the future. Even though
>> > this is future work, we should already keep such use cases in mind
>> > when adding new API methods. What do you think?
>> >
>> > Regards,
>> > Timo
>> >
>> >
>> > On 30.09.19 23:17, Bowen Li wrote:
>> >> Hi Timo,
>> >>
>> >> Re 1) I agree. I renamed the title to "Extend Core Table System with
>> >> Pluggable Modules" and all internal references
>> >>
>> >> Re 2) First, I'll rename the API to useModules(). The design doesn't
>> >> forbid
>> >> users to call useModules() multi times. Objects in modules are loaded
>> on
>> >> demand instead of eagerly, so there won't be inconsistency. Users
>> >> have to
>> >> be fully aware of the consequences of resetting modules as that might
>> >> cause
>> >> that some objects can not be referenced anymore or resolution order
>> >> of some
>> >> objects changes.
>> >>
>> >> Re 3) Yes, we'd leave that to users.
>> >>
>> >> Another approach can be to have a non-optional "Core" module for all
>> >> objects that cannot be overrode like "CAST" and "AS" functions, and
>> >> have an
>> >> optional "ExtendedCore" module for other replaceable built-in objects.
>> >> "Core" should be positioned at the 1st in module list all the time.
>> >>
>> >> I'm fine with either solution.
>> >>
>> >> Re 4) It may sound like a nice-to-have advanced feature for 1.10, but
>> we
>> >> can surely fully discuss it for the sake of feature completeness.
>> >>
>> >> Unlike other configs, the order of modules would matter in Flink, and
>> it
>> >> implies the LOAD/UNLOAD commands would not be equal in operation
>> >> positions.
>> >> IIUYC, LOAD MODULE 'x' would be interpreted as appending x to the end
>> of
>> >> module list, and UNLOAD MODULE 'x' would be interpreted as removing x
>> >> from
>> >> any position in the list?
>> >>
>> >> I'm thinking of the following list of commands:
>> >>
>> >> SHOW MODULES - list modules in order
>> >> LOAD MODULE 'hive' [WITH ('prop'='myProp', ...)] - load and append the
>> >> module to end of the module list
>> >> UNLOAD MODULE 'hive' - remove the module from module list, and other
>> >> modules remain the same relative positions
>> >> USE MODULES 'x' 'y' 'z' (wondering can parser take "'x' 'y' 'z'"?),
>> >> or USE
>> >> MODULES 'x,y,z' - to reorder module list completely
>> >>
>> >
>>
>>

Timo Walther-2

Re: [DISCUSS] FLIP-68: Extend Core Table System with Modular Plugins

Hi everyone,

first, I was also questioning my proposal. But Bowen's proposal of
`tEnv.offloadToYaml(<file_path>)` would not work with the current design
because we don't know how to serialize a catalog or module into
properties. Currently, there is no converter from instance to
properties. It is a one way conversion. We can add a `toProperties`
method to both Catalog and Module class in the future to solve this.
Solving the table environment serializability can be future work.

However, I find the current proposal for the TableEnvironment methods is
contradicting:

tableEnv.loadModule(new Yyy());
tableEnv.unloadModule(“Xxx”);

The loading is specified programmatically whereas the unloading requires
a string that is not specified in the module itself. But is defined in
the factory according to the current design.

SQL does it more consistently. There, the name `xxx` is used when
loading and unloading the module:

LOAD MODULE 'xxx' [WITH ('prop'='myProp', ...)]
UNLOAD MODULE 'xxx’

How about:

tableEnv.loadModule("xxx", new Yyy());
tableEnv.unloadModule(“xxx”);

This would be similar to the catalog interfaces. The name is not part of
the instance itself.

What do you think?

Thanks,
Timo

On 01.10.19 21:17, Bowen Li wrote:

> If something like the yaml file is the way to go and achieve such
> motivation, we would cover that with current design.
>
> On Tue, Oct 1, 2019 at 12:05 Bowen Li <[hidden email]> wrote:
>
>> Hi Timo, Dawid,
>>
>> I've added the suggested SQL and related changes to TableEnvironment API
>> and other classes to the google doc. Also removed "USE MODULE" and its
>> APIs. Will update FLIP wiki once we have a consensus.
>>
>> W.r.t. descriptor approach, my gut feeling is similar to Dawid's. Besides,
>> I feel yaml file would be a better solution to persist serializable state
>> of an environment as the file itself is in serializable format already.
>> Though yaml file only serves SQL CLI at this moment, we may be able to
>> extend its reach to Table API and allow users to load/offload a
>> TableEnvironment from/to yaml files, as something like "TableEnvironment
>> tEnv = TableEnvironment.loadFromYaml(<file_path>)" and
>> "tEnv.offloadToYaml(<file_path>)" to restore and persist state, and try to
>> make yaml file more expressive.
>>
>>
>> On Tue, Oct 1, 2019 at 6:47 AM Dawid Wysakowicz <[hidden email]>
>> wrote:
>>
>>> Hi Timo, Bowen,
>>>
>>> Unfortunately I did not have enough time to go through all the
>>> suggestions in details so I can not comment on the whole FLIP.
>>>
>>> I just wanted to give my opinion on the "descriptor approach in
>>> loadModule" part. I am not sure if we need it here. We might be
>>> overthinking this a bit. It definitely makes sense for objects like
>>> TableSource/TableSink etc. as they are logical definitions that nearly
>>> always have to be persisted in a Catalog. I'm not sure if we really need
>>> the same for a whole session. If we need a resume session feature, the
>>> way to go would probably be to keep the session in memory on the server
>>> side. I fear we will never be able to serialize the whole session
>>> entirely (temporary objects, objects derived from DataStream etc.)
>>>
>>> I think it is ok to use instances for objects like Catalogs or Modules
>>> and have an overlay on top of that that can create instances from
>>> properties.
>>>
>>> Best,
>>>
>>> Dawid
>>>
>>> On 01/10/2019 11:28, Timo Walther wrote:
>>>> Hi Bowen,
>>>>
>>>> thanks for your response.
>>>>
>>>> Re 2) I also don't have a better approach for this issue. It is
>>>> similar to changing the general TableConfig between two statements. It
>>>> would be good to add your explanation to the design document.
>>>>
>>>> Re 3) It would be interesting to know about which "core" functions we
>>>> are actually talking about. Also for the overriding built-in functions
>>>> that we discussed in the other FLIP. But I'm fine with leaving it to
>>>> the user for now. How about we just introduce loadModule(),
>>>> unloadModule() methods instead of useModules()? This would ensure that
>>>> users don't forget to add the core module when adding an additional
>>>> module and they need to explicitly call "unloadModule('core')".
>>>>
>>>> Re 4) Every table environment feature should also be designed with SQL
>>>> statements in mind to verify the concept. SQL is also more popular
>>>> that Java/Scala API or YAML file. I would like to add it to 1.10 for
>>>> marking the feature as complete.
>>>>
>>>> SHOW MODULES -> sounds good to me, we should add a listModules():
>>>> List<String> method to table environment
>>>>
>>>> LOAD MODULE 'hive' [WITH ('prop'='myProp', ...)] --> we should add a
>>>> loadModule() method to table environment
>>>>
>>>> UNLOAD MODULE 'hive' --> we should add a unloadModule() method to
>>>> table environment
>>>>
>>>> I would not introduce `USE MODULES 'x' 'y' 'z'` for simplicity and
>>>> concise API. Users need to load the module anyway with properties.
>>>> They can also load them "in order" immediately. CREATE TABLE can also
>>>> not create multiple tables but only one at a time in that order.
>>>>
>>>> One thing that came to my mind, shall we use a descriptor approach for
>>>> loadModule()? The past has shown that passing instances causes
>>>> problems when persisting objects. That's why we also want to get rid
>>>> of registerTableSource. I could image that users might want to persist
>>>> a table environment's state for later use in the future. Even though
>>>> this is future work, we should already keep such use cases in mind
>>>> when adding new API methods. What do you think?
>>>>
>>>> Regards,
>>>> Timo
>>>>
>>>>
>>>> On 30.09.19 23:17, Bowen Li wrote:
>>>>> Hi Timo,
>>>>>
>>>>> Re 1) I agree. I renamed the title to "Extend Core Table System with
>>>>> Pluggable Modules" and all internal references
>>>>>
>>>>> Re 2) First, I'll rename the API to useModules(). The design doesn't
>>>>> forbid
>>>>> users to call useModules() multi times. Objects in modules are loaded
>>> on
>>>>> demand instead of eagerly, so there won't be inconsistency. Users
>>>>> have to
>>>>> be fully aware of the consequences of resetting modules as that might
>>>>> cause
>>>>> that some objects can not be referenced anymore or resolution order
>>>>> of some
>>>>> objects changes.
>>>>>
>>>>> Re 3) Yes, we'd leave that to users.
>>>>>
>>>>> Another approach can be to have a non-optional "Core" module for all
>>>>> objects that cannot be overrode like "CAST" and "AS" functions, and
>>>>> have an
>>>>> optional "ExtendedCore" module for other replaceable built-in objects.
>>>>> "Core" should be positioned at the 1st in module list all the time.
>>>>>
>>>>> I'm fine with either solution.
>>>>>
>>>>> Re 4) It may sound like a nice-to-have advanced feature for 1.10, but
>>> we
>>>>> can surely fully discuss it for the sake of feature completeness.
>>>>>
>>>>> Unlike other configs, the order of modules would matter in Flink, and
>>> it
>>>>> implies the LOAD/UNLOAD commands would not be equal in operation
>>>>> positions.
>>>>> IIUYC, LOAD MODULE 'x' would be interpreted as appending x to the end
>>> of
>>>>> module list, and UNLOAD MODULE 'x' would be interpreted as removing x
>>>>> from
>>>>> any position in the list?
>>>>>
>>>>> I'm thinking of the following list of commands:
>>>>>
>>>>> SHOW MODULES - list modules in order
>>>>> LOAD MODULE 'hive' [WITH ('prop'='myProp', ...)] - load and append the
>>>>> module to end of the module list
>>>>> UNLOAD MODULE 'hive' - remove the module from module list, and other
>>>>> modules remain the same relative positions
>>>>> USE MODULES 'x' 'y' 'z' (wondering can parser take "'x' 'y' 'z'"?),
>>>>> or USE
>>>>> MODULES 'x,y,z' - to reorder module list completely
>>>>>
>>>

Xuefu Z

Re: [DISCUSS] FLIP-68: Extend Core Table System with Modular Plugins

I agree with Timo that the new table APIs need to be consistent. I'd go
further that an name (or id) is needed for module definition in YAML file.
In the current design, name is skipped and type has binary meanings.

Thanks,
Xuefu

On Fri, Oct 4, 2019 at 5:24 AM Timo Walther <[hidden email]> wrote:

> Hi everyone,
>
> first, I was also questioning my proposal. But Bowen's proposal of
> `tEnv.offloadToYaml(<file_path>)` would not work with the current design
> because we don't know how to serialize a catalog or module into
> properties. Currently, there is no converter from instance to
> properties. It is a one way conversion. We can add a `toProperties`
> method to both Catalog and Module class in the future to solve this.
> Solving the table environment serializability can be future work.
>
> However, I find the current proposal for the TableEnvironment methods is
> contradicting:
>
> tableEnv.loadModule(new Yyy());
> tableEnv.unloadModule(“Xxx”);
>
> The loading is specified programmatically whereas the unloading requires
> a string that is not specified in the module itself. But is defined in
> the factory according to the current design.
>
> SQL does it more consistently. There, the name `xxx` is used when
> loading and unloading the module:
>
> LOAD MODULE 'xxx' [WITH ('prop'='myProp', ...)]
> UNLOAD MODULE 'xxx’
>
> How about:
>
> tableEnv.loadModule("xxx", new Yyy());
> tableEnv.unloadModule(“xxx”);
>
> This would be similar to the catalog interfaces. The name is not part of
> the instance itself.
>
> What do you think?
>
> Thanks,
> Timo
>
>
>
>
> On 01.10.19 21:17, Bowen Li wrote:
> > If something like the yaml file is the way to go and achieve such
> > motivation, we would cover that with current design.
> >
> > On Tue, Oct 1, 2019 at 12:05 Bowen Li <[hidden email]> wrote:
> >
> >> Hi Timo, Dawid,
> >>
> >> I've added the suggested SQL and related changes to TableEnvironment API
> >> and other classes to the google doc. Also removed "USE MODULE" and its
> >> APIs. Will update FLIP wiki once we have a consensus.
> >>
> >> W.r.t. descriptor approach, my gut feeling is similar to Dawid's.
> Besides,
> >> I feel yaml file would be a better solution to persist serializable
> state
> >> of an environment as the file itself is in serializable format already.
> >> Though yaml file only serves SQL CLI at this moment, we may be able to
> >> extend its reach to Table API and allow users to load/offload a
> >> TableEnvironment from/to yaml files, as something like "TableEnvironment
> >> tEnv = TableEnvironment.loadFromYaml(<file_path>)" and
> >> "tEnv.offloadToYaml(<file_path>)" to restore and persist state, and try
> to
> >> make yaml file more expressive.
> >>
> >>
> >> On Tue, Oct 1, 2019 at 6:47 AM Dawid Wysakowicz <[hidden email]
> >
> >> wrote:
> >>
> >>> Hi Timo, Bowen,
> >>>
> >>> Unfortunately I did not have enough time to go through all the
> >>> suggestions in details so I can not comment on the whole FLIP.
> >>>
> >>> I just wanted to give my opinion on the "descriptor approach in
> >>> loadModule" part. I am not sure if we need it here. We might be
> >>> overthinking this a bit. It definitely makes sense for objects like
> >>> TableSource/TableSink etc. as they are logical definitions that nearly
> >>> always have to be persisted in a Catalog. I'm not sure if we really
> need
> >>> the same for a whole session. If we need a resume session feature, the
> >>> way to go would probably be to keep the session in memory on the server
> >>> side. I fear we will never be able to serialize the whole session
> >>> entirely (temporary objects, objects derived from DataStream etc.)
> >>>
> >>> I think it is ok to use instances for objects like Catalogs or Modules
> >>> and have an overlay on top of that that can create instances from
> >>> properties.
> >>>
> >>> Best,
> >>>
> >>> Dawid
> >>>
> >>> On 01/10/2019 11:28, Timo Walther wrote:
> >>>> Hi Bowen,
> >>>>
> >>>> thanks for your response.
> >>>>
> >>>> Re 2) I also don't have a better approach for this issue. It is
> >>>> similar to changing the general TableConfig between two statements. It
> >>>> would be good to add your explanation to the design document.
> >>>>
> >>>> Re 3) It would be interesting to know about which "core" functions we
> >>>> are actually talking about. Also for the overriding built-in functions
> >>>> that we discussed in the other FLIP. But I'm fine with leaving it to
> >>>> the user for now. How about we just introduce loadModule(),
> >>>> unloadModule() methods instead of useModules()? This would ensure that
> >>>> users don't forget to add the core module when adding an additional
> >>>> module and they need to explicitly call "unloadModule('core')".
> >>>>
> >>>> Re 4) Every table environment feature should also be designed with SQL
> >>>> statements in mind to verify the concept. SQL is also more popular
> >>>> that Java/Scala API or YAML file. I would like to add it to 1.10 for
> >>>> marking the feature as complete.
> >>>>
> >>>> SHOW MODULES -> sounds good to me, we should add a listModules():
> >>>> List<String> method to table environment
> >>>>
> >>>> LOAD MODULE 'hive' [WITH ('prop'='myProp', ...)] --> we should add a
> >>>> loadModule() method to table environment
> >>>>
> >>>> UNLOAD MODULE 'hive' --> we should add a unloadModule() method to
> >>>> table environment
> >>>>
> >>>> I would not introduce `USE MODULES 'x' 'y' 'z'` for simplicity and
> >>>> concise API. Users need to load the module anyway with properties.
> >>>> They can also load them "in order" immediately. CREATE TABLE can also
> >>>> not create multiple tables but only one at a time in that order.
> >>>>
> >>>> One thing that came to my mind, shall we use a descriptor approach for
> >>>> loadModule()? The past has shown that passing instances causes
> >>>> problems when persisting objects. That's why we also want to get rid
> >>>> of registerTableSource. I could image that users might want to persist
> >>>> a table environment's state for later use in the future. Even though
> >>>> this is future work, we should already keep such use cases in mind
> >>>> when adding new API methods. What do you think?
> >>>>
> >>>> Regards,
> >>>> Timo
> >>>>
> >>>>
> >>>> On 30.09.19 23:17, Bowen Li wrote:
> >>>>> Hi Timo,
> >>>>>
> >>>>> Re 1) I agree. I renamed the title to "Extend Core Table System with
> >>>>> Pluggable Modules" and all internal references
> >>>>>
> >>>>> Re 2) First, I'll rename the API to useModules(). The design doesn't
> >>>>> forbid
> >>>>> users to call useModules() multi times. Objects in modules are loaded
> >>> on
> >>>>> demand instead of eagerly, so there won't be inconsistency. Users
> >>>>> have to
> >>>>> be fully aware of the consequences of resetting modules as that might
> >>>>> cause
> >>>>> that some objects can not be referenced anymore or resolution order
> >>>>> of some
> >>>>> objects changes.
> >>>>>
> >>>>> Re 3) Yes, we'd leave that to users.
> >>>>>
> >>>>> Another approach can be to have a non-optional "Core" module for all
> >>>>> objects that cannot be overrode like "CAST" and "AS" functions, and
> >>>>> have an
> >>>>> optional "ExtendedCore" module for other replaceable built-in
> objects.
> >>>>> "Core" should be positioned at the 1st in module list all the time.
> >>>>>
> >>>>> I'm fine with either solution.
> >>>>>
> >>>>> Re 4) It may sound like a nice-to-have advanced feature for 1.10, but
> >>> we
> >>>>> can surely fully discuss it for the sake of feature completeness.
> >>>>>
> >>>>> Unlike other configs, the order of modules would matter in Flink, and
> >>> it
> >>>>> implies the LOAD/UNLOAD commands would not be equal in operation
> >>>>> positions.
> >>>>> IIUYC, LOAD MODULE 'x' would be interpreted as appending x to the end
> >>> of
> >>>>> module list, and UNLOAD MODULE 'x' would be interpreted as removing x
> >>>>> from
> >>>>> any position in the list?
> >>>>>
> >>>>> I'm thinking of the following list of commands:
> >>>>>
> >>>>> SHOW MODULES - list modules in order
> >>>>> LOAD MODULE 'hive' [WITH ('prop'='myProp', ...)] - load and append
> the
> >>>>> module to end of the module list
> >>>>> UNLOAD MODULE 'hive' - remove the module from module list, and other
> >>>>> modules remain the same relative positions
> >>>>> USE MODULES 'x' 'y' 'z' (wondering can parser take "'x' 'y' 'z'"?),
> >>>>> or USE
> >>>>> MODULES 'x,y,z' - to reorder module list completely
> >>>>>
> >>>
>
>

--
Xuefu Zhang

"In Honey We Trust!"

Jark Wu-2

Re: [DISCUSS] FLIP-68: Extend Core Table System with Modular Plugins

Hi Bowen,

Thanks for the proposal. I have two thoughts:

1) Regarding to "loadModule", how about
tableEnv.loadModule("xxx" [, propertiesMap]);
tableEnv.unloadModule(“xxx”);

This makes the API similar to SQL. IMO, instance of Module is not needed
and verbose as parameter.
And this makes it easier to load a simple module without any additional
properties, e.g. tEnv.loadModule("GEO"), tEnv.unloadModule("GEO")

2) In current design, the module interface only defines function metadata,
but no implementations.
I'm wondering how to call/map the implementations in runtime? Am I missing
something?

Besides, I left some minor comments in the doc.

Best,
Jark

On Sat, 5 Oct 2019 at 08:42, Xuefu Z <[hidden email]> wrote:

> I agree with Timo that the new table APIs need to be consistent. I'd go
> further that an name (or id) is needed for module definition in YAML file.
> In the current design, name is skipped and type has binary meanings.
>
> Thanks,
> Xuefu
>
> On Fri, Oct 4, 2019 at 5:24 AM Timo Walther <[hidden email]> wrote:
>
> > Hi everyone,
> >
> > first, I was also questioning my proposal. But Bowen's proposal of
> > `tEnv.offloadToYaml(<file_path>)` would not work with the current design
> > because we don't know how to serialize a catalog or module into
> > properties. Currently, there is no converter from instance to
> > properties. It is a one way conversion. We can add a `toProperties`
> > method to both Catalog and Module class in the future to solve this.
> > Solving the table environment serializability can be future work.
> >
> > However, I find the current proposal for the TableEnvironment methods is
> > contradicting:
> >
> > tableEnv.loadModule(new Yyy());
> > tableEnv.unloadModule(“Xxx”);
> >
> > The loading is specified programmatically whereas the unloading requires
> > a string that is not specified in the module itself. But is defined in
> > the factory according to the current design.
> >
> > SQL does it more consistently. There, the name `xxx` is used when
> > loading and unloading the module:
> >
> > LOAD MODULE 'xxx' [WITH ('prop'='myProp', ...)]
> > UNLOAD MODULE 'xxx’
> >
> > How about:
> >
> > tableEnv.loadModule("xxx", new Yyy());
> > tableEnv.unloadModule(“xxx”);
> >
> > This would be similar to the catalog interfaces. The name is not part of
> > the instance itself.
> >
> > What do you think?
> >
> > Thanks,
> > Timo
> >
> >
> >
> >
> > On 01.10.19 21:17, Bowen Li wrote:
> > > If something like the yaml file is the way to go and achieve such
> > > motivation, we would cover that with current design.
> > >
> > > On Tue, Oct 1, 2019 at 12:05 Bowen Li <[hidden email]> wrote:
> > >
> > >> Hi Timo, Dawid,
> > >>
> > >> I've added the suggested SQL and related changes to TableEnvironment
> API
> > >> and other classes to the google doc. Also removed "USE MODULE" and its
> > >> APIs. Will update FLIP wiki once we have a consensus.
> > >>
> > >> W.r.t. descriptor approach, my gut feeling is similar to Dawid's.
> > Besides,
> > >> I feel yaml file would be a better solution to persist serializable
> > state
> > >> of an environment as the file itself is in serializable format
> already.
> > >> Though yaml file only serves SQL CLI at this moment, we may be able to
> > >> extend its reach to Table API and allow users to load/offload a
> > >> TableEnvironment from/to yaml files, as something like
> "TableEnvironment
> > >> tEnv = TableEnvironment.loadFromYaml(<file_path>)" and
> > >> "tEnv.offloadToYaml(<file_path>)" to restore and persist state, and
> try
> > to
> > >> make yaml file more expressive.
> > >>
> > >>
> > >> On Tue, Oct 1, 2019 at 6:47 AM Dawid Wysakowicz <
> [hidden email]
> > >
> > >> wrote:
> > >>
> > >>> Hi Timo, Bowen,
> > >>>
> > >>> Unfortunately I did not have enough time to go through all the
> > >>> suggestions in details so I can not comment on the whole FLIP.
> > >>>
> > >>> I just wanted to give my opinion on the "descriptor approach in
> > >>> loadModule" part. I am not sure if we need it here. We might be
> > >>> overthinking this a bit. It definitely makes sense for objects like
> > >>> TableSource/TableSink etc. as they are logical definitions that
> nearly
> > >>> always have to be persisted in a Catalog. I'm not sure if we really
> > need
> > >>> the same for a whole session. If we need a resume session feature,
> the
> > >>> way to go would probably be to keep the session in memory on the
> server
> > >>> side. I fear we will never be able to serialize the whole session
> > >>> entirely (temporary objects, objects derived from DataStream etc.)
> > >>>
> > >>> I think it is ok to use instances for objects like Catalogs or
> Modules
> > >>> and have an overlay on top of that that can create instances from
> > >>> properties.
> > >>>
> > >>> Best,
> > >>>
> > >>> Dawid
> > >>>
> > >>> On 01/10/2019 11:28, Timo Walther wrote:
> > >>>> Hi Bowen,
> > >>>>
> > >>>> thanks for your response.
> > >>>>
> > >>>> Re 2) I also don't have a better approach for this issue. It is
> > >>>> similar to changing the general TableConfig between two statements.
> It
> > >>>> would be good to add your explanation to the design document.
> > >>>>
> > >>>> Re 3) It would be interesting to know about which "core" functions
> we
> > >>>> are actually talking about. Also for the overriding built-in
> functions
> > >>>> that we discussed in the other FLIP. But I'm fine with leaving it to
> > >>>> the user for now. How about we just introduce loadModule(),
> > >>>> unloadModule() methods instead of useModules()? This would ensure
> that
> > >>>> users don't forget to add the core module when adding an additional
> > >>>> module and they need to explicitly call "unloadModule('core')".
> > >>>>
> > >>>> Re 4) Every table environment feature should also be designed with
> SQL
> > >>>> statements in mind to verify the concept. SQL is also more popular
> > >>>> that Java/Scala API or YAML file. I would like to add it to 1.10 for
> > >>>> marking the feature as complete.
> > >>>>
> > >>>> SHOW MODULES -> sounds good to me, we should add a listModules():
> > >>>> List<String> method to table environment
> > >>>>
> > >>>> LOAD MODULE 'hive' [WITH ('prop'='myProp', ...)] --> we should add a
> > >>>> loadModule() method to table environment
> > >>>>
> > >>>> UNLOAD MODULE 'hive' --> we should add a unloadModule() method to
> > >>>> table environment
> > >>>>
> > >>>> I would not introduce `USE MODULES 'x' 'y' 'z'` for simplicity and
> > >>>> concise API. Users need to load the module anyway with properties.
> > >>>> They can also load them "in order" immediately. CREATE TABLE can
> also
> > >>>> not create multiple tables but only one at a time in that order.
> > >>>>
> > >>>> One thing that came to my mind, shall we use a descriptor approach
> for
> > >>>> loadModule()? The past has shown that passing instances causes
> > >>>> problems when persisting objects. That's why we also want to get rid
> > >>>> of registerTableSource. I could image that users might want to
> persist
> > >>>> a table environment's state for later use in the future. Even though
> > >>>> this is future work, we should already keep such use cases in mind
> > >>>> when adding new API methods. What do you think?
> > >>>>
> > >>>> Regards,
> > >>>> Timo
> > >>>>
> > >>>>
> > >>>> On 30.09.19 23:17, Bowen Li wrote:
> > >>>>> Hi Timo,
> > >>>>>
> > >>>>> Re 1) I agree. I renamed the title to "Extend Core Table System
> with
> > >>>>> Pluggable Modules" and all internal references
> > >>>>>
> > >>>>> Re 2) First, I'll rename the API to useModules(). The design
> doesn't
> > >>>>> forbid
> > >>>>> users to call useModules() multi times. Objects in modules are
> loaded
> > >>> on
> > >>>>> demand instead of eagerly, so there won't be inconsistency. Users
> > >>>>> have to
> > >>>>> be fully aware of the consequences of resetting modules as that
> might
> > >>>>> cause
> > >>>>> that some objects can not be referenced anymore or resolution order
> > >>>>> of some
> > >>>>> objects changes.
> > >>>>>
> > >>>>> Re 3) Yes, we'd leave that to users.
> > >>>>>
> > >>>>> Another approach can be to have a non-optional "Core" module for
> all
> > >>>>> objects that cannot be overrode like "CAST" and "AS" functions, and
> > >>>>> have an
> > >>>>> optional "ExtendedCore" module for other replaceable built-in
> > objects.
> > >>>>> "Core" should be positioned at the 1st in module list all the time.
> > >>>>>
> > >>>>> I'm fine with either solution.
> > >>>>>
> > >>>>> Re 4) It may sound like a nice-to-have advanced feature for 1.10,
> but
> > >>> we
> > >>>>> can surely fully discuss it for the sake of feature completeness.
> > >>>>>
> > >>>>> Unlike other configs, the order of modules would matter in Flink,
> and
> > >>> it
> > >>>>> implies the LOAD/UNLOAD commands would not be equal in operation
> > >>>>> positions.
> > >>>>> IIUYC, LOAD MODULE 'x' would be interpreted as appending x to the
> end
> > >>> of
> > >>>>> module list, and UNLOAD MODULE 'x' would be interpreted as
> removing x
> > >>>>> from
> > >>>>> any position in the list?
> > >>>>>
> > >>>>> I'm thinking of the following list of commands:
> > >>>>>
> > >>>>> SHOW MODULES - list modules in order
> > >>>>> LOAD MODULE 'hive' [WITH ('prop'='myProp', ...)] - load and append
> > the
> > >>>>> module to end of the module list
> > >>>>> UNLOAD MODULE 'hive' - remove the module from module list, and
> other
> > >>>>> modules remain the same relative positions
> > >>>>> USE MODULES 'x' 'y' 'z' (wondering can parser take "'x' 'y' 'z'"?),
> > >>>>> or USE
> > >>>>> MODULES 'x,y,z' - to reorder module list completely
> > >>>>>
> > >>>
> >
> >
>
> --
> Xuefu Zhang
>
> "In Honey We Trust!"
>

bowen.li

Re: [DISCUSS] FLIP-68: Extend Core Table System with Modular Plugins

Thanks everyone for your review.

After discussing with Timo and Dawid offline, as well as incorporating
feedback from Xuefu and Jark on mailing list, I decided to make a few
critical changes to the proposal.

- renamed the keyword "type" to "kind". The community has plan to have
"type" keyword in yaml/descriptor refer to data types exclusively in the
near future. We should cater to that change in our design
- allowed specifying names for modules to simplify and unify module
loading/unloading syntax between programming and SQL. Here're the proposed
changes:
SQL:
LOAD MODULE "name" WITH ("kind"="xxx" [, (properties)])
UNLOAD MODULE "name";
Table:
tEnv.loadModule("name", new Xxx(properties));
tEnv.unloadModule("name");

I have completely updated the google doc [1]. Please take another look, and
let me know if you have any other questions. Thanks!

[1]
https://docs.google.com/document/d/17CPMpMbPDjvM4selUVEfh_tqUK_oV0TODAUA9dfHakc/edit#

On Tue, Oct 8, 2019 at 6:26 AM Jark Wu <[hidden email]> wrote:

> Hi Bowen,
>
> Thanks for the proposal. I have two thoughts:
>
> 1) Regarding to "loadModule", how about
> tableEnv.loadModule("xxx" [, propertiesMap]);
> tableEnv.unloadModule(“xxx”);
>
> This makes the API similar to SQL. IMO, instance of Module is not needed
> and verbose as parameter.
> And this makes it easier to load a simple module without any additional
> properties, e.g. tEnv.loadModule("GEO"), tEnv.unloadModule("GEO")
>
> 2) In current design, the module interface only defines function metadata,
> but no implementations.
> I'm wondering how to call/map the implementations in runtime? Am I missing
> something?
>
> Besides, I left some minor comments in the doc.
>
> Best,
> Jark
>
>
> On Sat, 5 Oct 2019 at 08:42, Xuefu Z <[hidden email]> wrote:
>
> > I agree with Timo that the new table APIs need to be consistent. I'd go
> > further that an name (or id) is needed for module definition in YAML
> file.
> > In the current design, name is skipped and type has binary meanings.
> >
> > Thanks,
> > Xuefu
> >
> > On Fri, Oct 4, 2019 at 5:24 AM Timo Walther <[hidden email]> wrote:
> >
> > > Hi everyone,
> > >
> > > first, I was also questioning my proposal. But Bowen's proposal of
> > > `tEnv.offloadToYaml(<file_path>)` would not work with the current
> design
> > > because we don't know how to serialize a catalog or module into
> > > properties. Currently, there is no converter from instance to
> > > properties. It is a one way conversion. We can add a `toProperties`
> > > method to both Catalog and Module class in the future to solve this.
> > > Solving the table environment serializability can be future work.
> > >
> > > However, I find the current proposal for the TableEnvironment methods
> is
> > > contradicting:
> > >
> > > tableEnv.loadModule(new Yyy());
> > > tableEnv.unloadModule(“Xxx”);
> > >
> > > The loading is specified programmatically whereas the unloading
> requires
> > > a string that is not specified in the module itself. But is defined in
> > > the factory according to the current design.
> > >
> > > SQL does it more consistently. There, the name `xxx` is used when
> > > loading and unloading the module:
> > >
> > > LOAD MODULE 'xxx' [WITH ('prop'='myProp', ...)]
> > > UNLOAD MODULE 'xxx’
> > >
> > > How about:
> > >
> > > tableEnv.loadModule("xxx", new Yyy());
> > > tableEnv.unloadModule(“xxx”);
> > >
> > > This would be similar to the catalog interfaces. The name is not part
> of
> > > the instance itself.
> > >
> > > What do you think?
> > >
> > > Thanks,
> > > Timo
> > >
> > >
> > >
> > >
> > > On 01.10.19 21:17, Bowen Li wrote:
> > > > If something like the yaml file is the way to go and achieve such
> > > > motivation, we would cover that with current design.
> > > >
> > > > On Tue, Oct 1, 2019 at 12:05 Bowen Li <[hidden email]> wrote:
> > > >
> > > >> Hi Timo, Dawid,
> > > >>
> > > >> I've added the suggested SQL and related changes to TableEnvironment
> > API
> > > >> and other classes to the google doc. Also removed "USE MODULE" and
> its
> > > >> APIs. Will update FLIP wiki once we have a consensus.
> > > >>
> > > >> W.r.t. descriptor approach, my gut feeling is similar to Dawid's.
> > > Besides,
> > > >> I feel yaml file would be a better solution to persist serializable
> > > state
> > > >> of an environment as the file itself is in serializable format
> > already.
> > > >> Though yaml file only serves SQL CLI at this moment, we may be able
> to
> > > >> extend its reach to Table API and allow users to load/offload a
> > > >> TableEnvironment from/to yaml files, as something like
> > "TableEnvironment
> > > >> tEnv = TableEnvironment.loadFromYaml(<file_path>)" and
> > > >> "tEnv.offloadToYaml(<file_path>)" to restore and persist state, and
> > try
> > > to
> > > >> make yaml file more expressive.
> > > >>
> > > >>
> > > >> On Tue, Oct 1, 2019 at 6:47 AM Dawid Wysakowicz <
> > [hidden email]
> > > >
> > > >> wrote:
> > > >>
> > > >>> Hi Timo, Bowen,
> > > >>>
> > > >>> Unfortunately I did not have enough time to go through all the
> > > >>> suggestions in details so I can not comment on the whole FLIP.
> > > >>>
> > > >>> I just wanted to give my opinion on the "descriptor approach in
> > > >>> loadModule" part. I am not sure if we need it here. We might be
> > > >>> overthinking this a bit. It definitely makes sense for objects like
> > > >>> TableSource/TableSink etc. as they are logical definitions that
> > nearly
> > > >>> always have to be persisted in a Catalog. I'm not sure if we really
> > > need
> > > >>> the same for a whole session. If we need a resume session feature,
> > the
> > > >>> way to go would probably be to keep the session in memory on the
> > server
> > > >>> side. I fear we will never be able to serialize the whole session
> > > >>> entirely (temporary objects, objects derived from DataStream etc.)
> > > >>>
> > > >>> I think it is ok to use instances for objects like Catalogs or
> > Modules
> > > >>> and have an overlay on top of that that can create instances from
> > > >>> properties.
> > > >>>
> > > >>> Best,
> > > >>>
> > > >>> Dawid
> > > >>>
> > > >>> On 01/10/2019 11:28, Timo Walther wrote:
> > > >>>> Hi Bowen,
> > > >>>>
> > > >>>> thanks for your response.
> > > >>>>
> > > >>>> Re 2) I also don't have a better approach for this issue. It is
> > > >>>> similar to changing the general TableConfig between two
> statements.
> > It
> > > >>>> would be good to add your explanation to the design document.
> > > >>>>
> > > >>>> Re 3) It would be interesting to know about which "core" functions
> > we
> > > >>>> are actually talking about. Also for the overriding built-in
> > functions
> > > >>>> that we discussed in the other FLIP. But I'm fine with leaving it
> to
> > > >>>> the user for now. How about we just introduce loadModule(),
> > > >>>> unloadModule() methods instead of useModules()? This would ensure
> > that
> > > >>>> users don't forget to add the core module when adding an
> additional
> > > >>>> module and they need to explicitly call "unloadModule('core')".
> > > >>>>
> > > >>>> Re 4) Every table environment feature should also be designed with
> > SQL
> > > >>>> statements in mind to verify the concept. SQL is also more popular
> > > >>>> that Java/Scala API or YAML file. I would like to add it to 1.10
> for
> > > >>>> marking the feature as complete.
> > > >>>>
> > > >>>> SHOW MODULES -> sounds good to me, we should add a listModules():
> > > >>>> List<String> method to table environment
> > > >>>>
> > > >>>> LOAD MODULE 'hive' [WITH ('prop'='myProp', ...)] --> we should
> add a
> > > >>>> loadModule() method to table environment
> > > >>>>
> > > >>>> UNLOAD MODULE 'hive' --> we should add a unloadModule() method to
> > > >>>> table environment
> > > >>>>
> > > >>>> I would not introduce `USE MODULES 'x' 'y' 'z'` for simplicity and
> > > >>>> concise API. Users need to load the module anyway with properties.
> > > >>>> They can also load them "in order" immediately. CREATE TABLE can
> > also
> > > >>>> not create multiple tables but only one at a time in that order.
> > > >>>>
> > > >>>> One thing that came to my mind, shall we use a descriptor approach
> > for
> > > >>>> loadModule()? The past has shown that passing instances causes
> > > >>>> problems when persisting objects. That's why we also want to get
> rid
> > > >>>> of registerTableSource. I could image that users might want to
> > persist
> > > >>>> a table environment's state for later use in the future. Even
> though
> > > >>>> this is future work, we should already keep such use cases in mind
> > > >>>> when adding new API methods. What do you think?
> > > >>>>
> > > >>>> Regards,
> > > >>>> Timo
> > > >>>>
> > > >>>>
> > > >>>> On 30.09.19 23:17, Bowen Li wrote:
> > > >>>>> Hi Timo,
> > > >>>>>
> > > >>>>> Re 1) I agree. I renamed the title to "Extend Core Table System
> > with
> > > >>>>> Pluggable Modules" and all internal references
> > > >>>>>
> > > >>>>> Re 2) First, I'll rename the API to useModules(). The design
> > doesn't
> > > >>>>> forbid
> > > >>>>> users to call useModules() multi times. Objects in modules are
> > loaded
> > > >>> on
> > > >>>>> demand instead of eagerly, so there won't be inconsistency. Users
> > > >>>>> have to
> > > >>>>> be fully aware of the consequences of resetting modules as that
> > might
> > > >>>>> cause
> > > >>>>> that some objects can not be referenced anymore or resolution
> order
> > > >>>>> of some
> > > >>>>> objects changes.
> > > >>>>>
> > > >>>>> Re 3) Yes, we'd leave that to users.
> > > >>>>>
> > > >>>>> Another approach can be to have a non-optional "Core" module for
> > all
> > > >>>>> objects that cannot be overrode like "CAST" and "AS" functions,
> and
> > > >>>>> have an
> > > >>>>> optional "ExtendedCore" module for other replaceable built-in
> > > objects.
> > > >>>>> "Core" should be positioned at the 1st in module list all the
> time.
> > > >>>>>
> > > >>>>> I'm fine with either solution.
> > > >>>>>
> > > >>>>> Re 4) It may sound like a nice-to-have advanced feature for 1.10,
> > but
> > > >>> we
> > > >>>>> can surely fully discuss it for the sake of feature completeness.
> > > >>>>>
> > > >>>>> Unlike other configs, the order of modules would matter in Flink,
> > and
> > > >>> it
> > > >>>>> implies the LOAD/UNLOAD commands would not be equal in operation
> > > >>>>> positions.
> > > >>>>> IIUYC, LOAD MODULE 'x' would be interpreted as appending x to the
> > end
> > > >>> of
> > > >>>>> module list, and UNLOAD MODULE 'x' would be interpreted as
> > removing x
> > > >>>>> from
> > > >>>>> any position in the list?
> > > >>>>>
> > > >>>>> I'm thinking of the following list of commands:
> > > >>>>>
> > > >>>>> SHOW MODULES - list modules in order
> > > >>>>> LOAD MODULE 'hive' [WITH ('prop'='myProp', ...)] - load and
> append
> > > the
> > > >>>>> module to end of the module list
> > > >>>>> UNLOAD MODULE 'hive' - remove the module from module list, and
> > other
> > > >>>>> modules remain the same relative positions
> > > >>>>> USE MODULES 'x' 'y' 'z' (wondering can parser take "'x' 'y'
> 'z'"?),
> > > >>>>> or USE
> > > >>>>> MODULES 'x,y,z' - to reorder module list completely
> > > >>>>>
> > > >>>
> > >
> > >
> >
> > --
> > Xuefu Zhang
> >
> > "In Honey We Trust!"
> >
>

Jark Wu-2

Re: [DISCUSS] FLIP-68: Extend Core Table System with Modular Plugins

Thanks Bowen for the updating.

I have some different opinions on the change.
IIUC, in the previous design, the "name" is also the "id" or "type" to
identify which module to load. Which means we can only load one instance of
a module.
In the new design, the "name" is just an alias to the module instance, the
"kind" is used to identify modules. Which means we can load different
instances of a module.
However, what's the "name" or alias used for? Do we need to support loading
different instances of a module? From my point of view, it brings more
complexity and confusion.
For example, if we load a "hive121" which uses HiveModule with version
1.2.1 and load a "hive234" which uses HiveModule with version 2.3.4, then
how to solve the class conflict problem?

IMO, a module can only be load once in a session, so "name" maybe useless.
So my proposal is similar to the previous one, but only change "name" to
"kind".

SQL:
LOAD MODULE "kind" [WITH (properties)];
UNLOAD MODULE "kind";
Table:
tEnv.loadModule("kind" [, properties]);
tEnv.unloadModule("kind");

What do you think?

Best,
Jark

On Wed, 9 Oct 2019 at 20:38, Bowen Li <[hidden email]> wrote:

> Thanks everyone for your review.
>
> After discussing with Timo and Dawid offline, as well as incorporating
> feedback from Xuefu and Jark on mailing list, I decided to make a few
> critical changes to the proposal.
>
> - renamed the keyword "type" to "kind". The community has plan to have
> "type" keyword in yaml/descriptor refer to data types exclusively in the
> near future. We should cater to that change in our design
> - allowed specifying names for modules to simplify and unify module
> loading/unloading syntax between programming and SQL. Here're the proposed
> changes:
> SQL:
> LOAD MODULE "name" WITH ("kind"="xxx" [, (properties)])
> UNLOAD MODULE "name";
> Table:
> tEnv.loadModule("name", new Xxx(properties));
> tEnv.unloadModule("name");
>
> I have completely updated the google doc [1]. Please take another look, and
> let me know if you have any other questions. Thanks!
>
> [1]
>
> https://docs.google.com/document/d/17CPMpMbPDjvM4selUVEfh_tqUK_oV0TODAUA9dfHakc/edit#
>
>
> On Tue, Oct 8, 2019 at 6:26 AM Jark Wu <[hidden email]> wrote:
>
> > Hi Bowen,
> >
> > Thanks for the proposal. I have two thoughts:
> >
> > 1) Regarding to "loadModule", how about
> > tableEnv.loadModule("xxx" [, propertiesMap]);
> > tableEnv.unloadModule(“xxx”);
> >
> > This makes the API similar to SQL. IMO, instance of Module is not needed
> > and verbose as parameter.
> > And this makes it easier to load a simple module without any additional
> > properties, e.g. tEnv.loadModule("GEO"), tEnv.unloadModule("GEO")
> >
> > 2) In current design, the module interface only defines function
> metadata,
> > but no implementations.
> > I'm wondering how to call/map the implementations in runtime? Am I
> missing
> > something?
> >
> > Besides, I left some minor comments in the doc.
> >
> > Best,
> > Jark
> >
> >
> > On Sat, 5 Oct 2019 at 08:42, Xuefu Z <[hidden email]> wrote:
> >
> > > I agree with Timo that the new table APIs need to be consistent. I'd go
> > > further that an name (or id) is needed for module definition in YAML
> > file.
> > > In the current design, name is skipped and type has binary meanings.
> > >
> > > Thanks,
> > > Xuefu
> > >
> > > On Fri, Oct 4, 2019 at 5:24 AM Timo Walther <[hidden email]>
> wrote:
> > >
> > > > Hi everyone,
> > > >
> > > > first, I was also questioning my proposal. But Bowen's proposal of
> > > > `tEnv.offloadToYaml(<file_path>)` would not work with the current
> > design
> > > > because we don't know how to serialize a catalog or module into
> > > > properties. Currently, there is no converter from instance to
> > > > properties. It is a one way conversion. We can add a `toProperties`
> > > > method to both Catalog and Module class in the future to solve this.
> > > > Solving the table environment serializability can be future work.
> > > >
> > > > However, I find the current proposal for the TableEnvironment methods
> > is
> > > > contradicting:
> > > >
> > > > tableEnv.loadModule(new Yyy());
> > > > tableEnv.unloadModule(“Xxx”);
> > > >
> > > > The loading is specified programmatically whereas the unloading
> > requires
> > > > a string that is not specified in the module itself. But is defined
> in
> > > > the factory according to the current design.
> > > >
> > > > SQL does it more consistently. There, the name `xxx` is used when
> > > > loading and unloading the module:
> > > >
> > > > LOAD MODULE 'xxx' [WITH ('prop'='myProp', ...)]
> > > > UNLOAD MODULE 'xxx’
> > > >
> > > > How about:
> > > >
> > > > tableEnv.loadModule("xxx", new Yyy());
> > > > tableEnv.unloadModule(“xxx”);
> > > >
> > > > This would be similar to the catalog interfaces. The name is not part
> > of
> > > > the instance itself.
> > > >
> > > > What do you think?
> > > >
> > > > Thanks,
> > > > Timo
> > > >
> > > >
> > > >
> > > >
> > > > On 01.10.19 21:17, Bowen Li wrote:
> > > > > If something like the yaml file is the way to go and achieve such
> > > > > motivation, we would cover that with current design.
> > > > >
> > > > > On Tue, Oct 1, 2019 at 12:05 Bowen Li <[hidden email]> wrote:
> > > > >
> > > > >> Hi Timo, Dawid,
> > > > >>
> > > > >> I've added the suggested SQL and related changes to
> TableEnvironment
> > > API
> > > > >> and other classes to the google doc. Also removed "USE MODULE" and
> > its
> > > > >> APIs. Will update FLIP wiki once we have a consensus.
> > > > >>
> > > > >> W.r.t. descriptor approach, my gut feeling is similar to Dawid's.
> > > > Besides,
> > > > >> I feel yaml file would be a better solution to persist
> serializable
> > > > state
> > > > >> of an environment as the file itself is in serializable format
> > > already.
> > > > >> Though yaml file only serves SQL CLI at this moment, we may be
> able
> > to
> > > > >> extend its reach to Table API and allow users to load/offload a
> > > > >> TableEnvironment from/to yaml files, as something like
> > > "TableEnvironment
> > > > >> tEnv = TableEnvironment.loadFromYaml(<file_path>)" and
> > > > >> "tEnv.offloadToYaml(<file_path>)" to restore and persist state,
> and
> > > try
> > > > to
> > > > >> make yaml file more expressive.
> > > > >>
> > > > >>
> > > > >> On Tue, Oct 1, 2019 at 6:47 AM Dawid Wysakowicz <
> > > [hidden email]
> > > > >
> > > > >> wrote:
> > > > >>
> > > > >>> Hi Timo, Bowen,
> > > > >>>
> > > > >>> Unfortunately I did not have enough time to go through all the
> > > > >>> suggestions in details so I can not comment on the whole FLIP.
> > > > >>>
> > > > >>> I just wanted to give my opinion on the "descriptor approach in
> > > > >>> loadModule" part. I am not sure if we need it here. We might be
> > > > >>> overthinking this a bit. It definitely makes sense for objects
> like
> > > > >>> TableSource/TableSink etc. as they are logical definitions that
> > > nearly
> > > > >>> always have to be persisted in a Catalog. I'm not sure if we
> really
> > > > need
> > > > >>> the same for a whole session. If we need a resume session
> feature,
> > > the
> > > > >>> way to go would probably be to keep the session in memory on the
> > > server
> > > > >>> side. I fear we will never be able to serialize the whole session
> > > > >>> entirely (temporary objects, objects derived from DataStream
> etc.)
> > > > >>>
> > > > >>> I think it is ok to use instances for objects like Catalogs or
> > > Modules
> > > > >>> and have an overlay on top of that that can create instances from
> > > > >>> properties.
> > > > >>>
> > > > >>> Best,
> > > > >>>
> > > > >>> Dawid
> > > > >>>
> > > > >>> On 01/10/2019 11:28, Timo Walther wrote:
> > > > >>>> Hi Bowen,
> > > > >>>>
> > > > >>>> thanks for your response.
> > > > >>>>
> > > > >>>> Re 2) I also don't have a better approach for this issue. It is
> > > > >>>> similar to changing the general TableConfig between two
> > statements.
> > > It
> > > > >>>> would be good to add your explanation to the design document.
> > > > >>>>
> > > > >>>> Re 3) It would be interesting to know about which "core"
> functions
> > > we
> > > > >>>> are actually talking about. Also for the overriding built-in
> > > functions
> > > > >>>> that we discussed in the other FLIP. But I'm fine with leaving
> it
> > to
> > > > >>>> the user for now. How about we just introduce loadModule(),
> > > > >>>> unloadModule() methods instead of useModules()? This would
> ensure
> > > that
> > > > >>>> users don't forget to add the core module when adding an
> > additional
> > > > >>>> module and they need to explicitly call "unloadModule('core')".
> > > > >>>>
> > > > >>>> Re 4) Every table environment feature should also be designed
> with
> > > SQL
> > > > >>>> statements in mind to verify the concept. SQL is also more
> popular
> > > > >>>> that Java/Scala API or YAML file. I would like to add it to 1.10
> > for
> > > > >>>> marking the feature as complete.
> > > > >>>>
> > > > >>>> SHOW MODULES -> sounds good to me, we should add a
> listModules():
> > > > >>>> List<String> method to table environment
> > > > >>>>
> > > > >>>> LOAD MODULE 'hive' [WITH ('prop'='myProp', ...)] --> we should
> > add a
> > > > >>>> loadModule() method to table environment
> > > > >>>>
> > > > >>>> UNLOAD MODULE 'hive' --> we should add a unloadModule() method
> to
> > > > >>>> table environment
> > > > >>>>
> > > > >>>> I would not introduce `USE MODULES 'x' 'y' 'z'` for simplicity
> and
> > > > >>>> concise API. Users need to load the module anyway with
> properties.
> > > > >>>> They can also load them "in order" immediately. CREATE TABLE can
> > > also
> > > > >>>> not create multiple tables but only one at a time in that order.
> > > > >>>>
> > > > >>>> One thing that came to my mind, shall we use a descriptor
> approach
> > > for
> > > > >>>> loadModule()? The past has shown that passing instances causes
> > > > >>>> problems when persisting objects. That's why we also want to get
> > rid
> > > > >>>> of registerTableSource. I could image that users might want to
> > > persist
> > > > >>>> a table environment's state for later use in the future. Even
> > though
> > > > >>>> this is future work, we should already keep such use cases in
> mind
> > > > >>>> when adding new API methods. What do you think?
> > > > >>>>
> > > > >>>> Regards,
> > > > >>>> Timo
> > > > >>>>
> > > > >>>>
> > > > >>>> On 30.09.19 23:17, Bowen Li wrote:
> > > > >>>>> Hi Timo,
> > > > >>>>>
> > > > >>>>> Re 1) I agree. I renamed the title to "Extend Core Table System
> > > with
> > > > >>>>> Pluggable Modules" and all internal references
> > > > >>>>>
> > > > >>>>> Re 2) First, I'll rename the API to useModules(). The design
> > > doesn't
> > > > >>>>> forbid
> > > > >>>>> users to call useModules() multi times. Objects in modules are
> > > loaded
> > > > >>> on
> > > > >>>>> demand instead of eagerly, so there won't be inconsistency.
> Users
> > > > >>>>> have to
> > > > >>>>> be fully aware of the consequences of resetting modules as that
> > > might
> > > > >>>>> cause
> > > > >>>>> that some objects can not be referenced anymore or resolution
> > order
> > > > >>>>> of some
> > > > >>>>> objects changes.
> > > > >>>>>
> > > > >>>>> Re 3) Yes, we'd leave that to users.
> > > > >>>>>
> > > > >>>>> Another approach can be to have a non-optional "Core" module
> for
> > > all
> > > > >>>>> objects that cannot be overrode like "CAST" and "AS" functions,
> > and
> > > > >>>>> have an
> > > > >>>>> optional "ExtendedCore" module for other replaceable built-in
> > > > objects.
> > > > >>>>> "Core" should be positioned at the 1st in module list all the
> > time.
> > > > >>>>>
> > > > >>>>> I'm fine with either solution.
> > > > >>>>>
> > > > >>>>> Re 4) It may sound like a nice-to-have advanced feature for
> 1.10,
> > > but
> > > > >>> we
> > > > >>>>> can surely fully discuss it for the sake of feature
> completeness.
> > > > >>>>>
> > > > >>>>> Unlike other configs, the order of modules would matter in
> Flink,
> > > and
> > > > >>> it
> > > > >>>>> implies the LOAD/UNLOAD commands would not be equal in
> operation
> > > > >>>>> positions.
> > > > >>>>> IIUYC, LOAD MODULE 'x' would be interpreted as appending x to
> the
> > > end
> > > > >>> of
> > > > >>>>> module list, and UNLOAD MODULE 'x' would be interpreted as
> > > removing x
> > > > >>>>> from
> > > > >>>>> any position in the list?
> > > > >>>>>
> > > > >>>>> I'm thinking of the following list of commands:
> > > > >>>>>
> > > > >>>>> SHOW MODULES - list modules in order
> > > > >>>>> LOAD MODULE 'hive' [WITH ('prop'='myProp', ...)] - load and
> > append
> > > > the
> > > > >>>>> module to end of the module list
> > > > >>>>> UNLOAD MODULE 'hive' - remove the module from module list, and
> > > other
> > > > >>>>> modules remain the same relative positions
> > > > >>>>> USE MODULES 'x' 'y' 'z' (wondering can parser take "'x' 'y'
> > 'z'"?),
> > > > >>>>> or USE
> > > > >>>>> MODULES 'x,y,z' - to reorder module list completely
> > > > >>>>>
> > > > >>>
> > > >
> > > >
> > >
> > > --
> > > Xuefu Zhang
> > >
> > > "In Honey We Trust!"
> > >
> >
>

Xuefu Z

Re: [DISCUSS] FLIP-68: Extend Core Table System with Modular Plugins

Jark has a good point. However, I think validation logic can put in place
to restrict one instance per type. Maybe the doc needs to be specific on
this.

Thanks,
Xuefu

On Wed, Oct 9, 2019 at 7:41 PM Jark Wu <[hidden email]> wrote:

> Thanks Bowen for the updating.
>
> I have some different opinions on the change.
> IIUC, in the previous design, the "name" is also the "id" or "type" to
> identify which module to load. Which means we can only load one instance of
> a module.
> In the new design, the "name" is just an alias to the module instance, the
> "kind" is used to identify modules. Which means we can load different
> instances of a module.
> However, what's the "name" or alias used for? Do we need to support loading
> different instances of a module? From my point of view, it brings more
> complexity and confusion.
> For example, if we load a "hive121" which uses HiveModule with version
> 1.2.1 and load a "hive234" which uses HiveModule with version 2.3.4, then
> how to solve the class conflict problem?
>
> IMO, a module can only be load once in a session, so "name" maybe useless.
> So my proposal is similar to the previous one, but only change "name" to
> "kind".
>
> SQL:
> LOAD MODULE "kind" [WITH (properties)];
> UNLOAD MODULE "kind";
> Table:
> tEnv.loadModule("kind" [, properties]);
> tEnv.unloadModule("kind");
>
> What do you think?
>
>
> Best,
> Jark
>
>
>
>
>
> On Wed, 9 Oct 2019 at 20:38, Bowen Li <[hidden email]> wrote:
>
> > Thanks everyone for your review.
> >
> > After discussing with Timo and Dawid offline, as well as incorporating
> > feedback from Xuefu and Jark on mailing list, I decided to make a few
> > critical changes to the proposal.
> >
> > - renamed the keyword "type" to "kind". The community has plan to have
> > "type" keyword in yaml/descriptor refer to data types exclusively in the
> > near future. We should cater to that change in our design
> > - allowed specifying names for modules to simplify and unify module
> > loading/unloading syntax between programming and SQL. Here're the
> proposed
> > changes:
> > SQL:
> > LOAD MODULE "name" WITH ("kind"="xxx" [, (properties)])
> > UNLOAD MODULE "name";
> > Table:
> > tEnv.loadModule("name", new Xxx(properties));
> > tEnv.unloadModule("name");
> >
> > I have completely updated the google doc [1]. Please take another look,
> and
> > let me know if you have any other questions. Thanks!
> >
> > [1]
> >
> >
> https://docs.google.com/document/d/17CPMpMbPDjvM4selUVEfh_tqUK_oV0TODAUA9dfHakc/edit#
> >
> >
> > On Tue, Oct 8, 2019 at 6:26 AM Jark Wu <[hidden email]> wrote:
> >
> > > Hi Bowen,
> > >
> > > Thanks for the proposal. I have two thoughts:
> > >
> > > 1) Regarding to "loadModule", how about
> > > tableEnv.loadModule("xxx" [, propertiesMap]);
> > > tableEnv.unloadModule(“xxx”);
> > >
> > > This makes the API similar to SQL. IMO, instance of Module is not
> needed
> > > and verbose as parameter.
> > > And this makes it easier to load a simple module without any additional
> > > properties, e.g. tEnv.loadModule("GEO"), tEnv.unloadModule("GEO")
> > >
> > > 2) In current design, the module interface only defines function
> > metadata,
> > > but no implementations.
> > > I'm wondering how to call/map the implementations in runtime? Am I
> > missing
> > > something?
> > >
> > > Besides, I left some minor comments in the doc.
> > >
> > > Best,
> > > Jark
> > >
> > >
> > > On Sat, 5 Oct 2019 at 08:42, Xuefu Z <[hidden email]> wrote:
> > >
> > > > I agree with Timo that the new table APIs need to be consistent. I'd
> go
> > > > further that an name (or id) is needed for module definition in YAML
> > > file.
> > > > In the current design, name is skipped and type has binary meanings.
> > > >
> > > > Thanks,
> > > > Xuefu
> > > >
> > > > On Fri, Oct 4, 2019 at 5:24 AM Timo Walther <[hidden email]>
> > wrote:
> > > >
> > > > > Hi everyone,
> > > > >
> > > > > first, I was also questioning my proposal. But Bowen's proposal of
> > > > > `tEnv.offloadToYaml(<file_path>)` would not work with the current
> > > design
> > > > > because we don't know how to serialize a catalog or module into
> > > > > properties. Currently, there is no converter from instance to
> > > > > properties. It is a one way conversion. We can add a `toProperties`
> > > > > method to both Catalog and Module class in the future to solve
> this.
> > > > > Solving the table environment serializability can be future work.
> > > > >
> > > > > However, I find the current proposal for the TableEnvironment
> methods
> > > is
> > > > > contradicting:
> > > > >
> > > > > tableEnv.loadModule(new Yyy());
> > > > > tableEnv.unloadModule(“Xxx”);
> > > > >
> > > > > The loading is specified programmatically whereas the unloading
> > > requires
> > > > > a string that is not specified in the module itself. But is defined
> > in
> > > > > the factory according to the current design.
> > > > >
> > > > > SQL does it more consistently. There, the name `xxx` is used when
> > > > > loading and unloading the module:
> > > > >
> > > > > LOAD MODULE 'xxx' [WITH ('prop'='myProp', ...)]
> > > > > UNLOAD MODULE 'xxx’
> > > > >
> > > > > How about:
> > > > >
> > > > > tableEnv.loadModule("xxx", new Yyy());
> > > > > tableEnv.unloadModule(“xxx”);
> > > > >
> > > > > This would be similar to the catalog interfaces. The name is not
> part
> > > of
> > > > > the instance itself.
> > > > >
> > > > > What do you think?
> > > > >
> > > > > Thanks,
> > > > > Timo
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On 01.10.19 21:17, Bowen Li wrote:
> > > > > > If something like the yaml file is the way to go and achieve such
> > > > > > motivation, we would cover that with current design.
> > > > > >
> > > > > > On Tue, Oct 1, 2019 at 12:05 Bowen Li <[hidden email]>
> wrote:
> > > > > >
> > > > > >> Hi Timo, Dawid,
> > > > > >>
> > > > > >> I've added the suggested SQL and related changes to
> > TableEnvironment
> > > > API
> > > > > >> and other classes to the google doc. Also removed "USE MODULE"
> and
> > > its
> > > > > >> APIs. Will update FLIP wiki once we have a consensus.
> > > > > >>
> > > > > >> W.r.t. descriptor approach, my gut feeling is similar to
> Dawid's.
> > > > > Besides,
> > > > > >> I feel yaml file would be a better solution to persist
> > serializable
> > > > > state
> > > > > >> of an environment as the file itself is in serializable format
> > > > already.
> > > > > >> Though yaml file only serves SQL CLI at this moment, we may be
> > able
> > > to
> > > > > >> extend its reach to Table API and allow users to load/offload a
> > > > > >> TableEnvironment from/to yaml files, as something like
> > > > "TableEnvironment
> > > > > >> tEnv = TableEnvironment.loadFromYaml(<file_path>)" and
> > > > > >> "tEnv.offloadToYaml(<file_path>)" to restore and persist state,
> > and
> > > > try
> > > > > to
> > > > > >> make yaml file more expressive.
> > > > > >>
> > > > > >>
> > > > > >> On Tue, Oct 1, 2019 at 6:47 AM Dawid Wysakowicz <
> > > > [hidden email]
> > > > > >
> > > > > >> wrote:
> > > > > >>
> > > > > >>> Hi Timo, Bowen,
> > > > > >>>
> > > > > >>> Unfortunately I did not have enough time to go through all the
> > > > > >>> suggestions in details so I can not comment on the whole FLIP.
> > > > > >>>
> > > > > >>> I just wanted to give my opinion on the "descriptor approach in
> > > > > >>> loadModule" part. I am not sure if we need it here. We might be
> > > > > >>> overthinking this a bit. It definitely makes sense for objects
> > like
> > > > > >>> TableSource/TableSink etc. as they are logical definitions that
> > > > nearly
> > > > > >>> always have to be persisted in a Catalog. I'm not sure if we
> > really
> > > > > need
> > > > > >>> the same for a whole session. If we need a resume session
> > feature,
> > > > the
> > > > > >>> way to go would probably be to keep the session in memory on
> the
> > > > server
> > > > > >>> side. I fear we will never be able to serialize the whole
> session
> > > > > >>> entirely (temporary objects, objects derived from DataStream
> > etc.)
> > > > > >>>
> > > > > >>> I think it is ok to use instances for objects like Catalogs or
> > > > Modules
> > > > > >>> and have an overlay on top of that that can create instances
> from
> > > > > >>> properties.
> > > > > >>>
> > > > > >>> Best,
> > > > > >>>
> > > > > >>> Dawid
> > > > > >>>
> > > > > >>> On 01/10/2019 11:28, Timo Walther wrote:
> > > > > >>>> Hi Bowen,
> > > > > >>>>
> > > > > >>>> thanks for your response.
> > > > > >>>>
> > > > > >>>> Re 2) I also don't have a better approach for this issue. It
> is
> > > > > >>>> similar to changing the general TableConfig between two
> > > statements.
> > > > It
> > > > > >>>> would be good to add your explanation to the design document.
> > > > > >>>>
> > > > > >>>> Re 3) It would be interesting to know about which "core"
> > functions
> > > > we
> > > > > >>>> are actually talking about. Also for the overriding built-in
> > > > functions
> > > > > >>>> that we discussed in the other FLIP. But I'm fine with leaving
> > it
> > > to
> > > > > >>>> the user for now. How about we just introduce loadModule(),
> > > > > >>>> unloadModule() methods instead of useModules()? This would
> > ensure
> > > > that
> > > > > >>>> users don't forget to add the core module when adding an
> > > additional
> > > > > >>>> module and they need to explicitly call
> "unloadModule('core')".
> > > > > >>>>
> > > > > >>>> Re 4) Every table environment feature should also be designed
> > with
> > > > SQL
> > > > > >>>> statements in mind to verify the concept. SQL is also more
> > popular
> > > > > >>>> that Java/Scala API or YAML file. I would like to add it to
> 1.10
> > > for
> > > > > >>>> marking the feature as complete.
> > > > > >>>>
> > > > > >>>> SHOW MODULES -> sounds good to me, we should add a
> > listModules():
> > > > > >>>> List<String> method to table environment
> > > > > >>>>
> > > > > >>>> LOAD MODULE 'hive' [WITH ('prop'='myProp', ...)] --> we should
> > > add a
> > > > > >>>> loadModule() method to table environment
> > > > > >>>>
> > > > > >>>> UNLOAD MODULE 'hive' --> we should add a unloadModule() method
> > to
> > > > > >>>> table environment
> > > > > >>>>
> > > > > >>>> I would not introduce `USE MODULES 'x' 'y' 'z'` for simplicity
> > and
> > > > > >>>> concise API. Users need to load the module anyway with
> > properties.
> > > > > >>>> They can also load them "in order" immediately. CREATE TABLE
> can
> > > > also
> > > > > >>>> not create multiple tables but only one at a time in that
> order.
> > > > > >>>>
> > > > > >>>> One thing that came to my mind, shall we use a descriptor
> > approach
> > > > for
> > > > > >>>> loadModule()? The past has shown that passing instances causes
> > > > > >>>> problems when persisting objects. That's why we also want to
> get
> > > rid
> > > > > >>>> of registerTableSource. I could image that users might want to
> > > > persist
> > > > > >>>> a table environment's state for later use in the future. Even
> > > though
> > > > > >>>> this is future work, we should already keep such use cases in
> > mind
> > > > > >>>> when adding new API methods. What do you think?
> > > > > >>>>
> > > > > >>>> Regards,
> > > > > >>>> Timo
> > > > > >>>>
> > > > > >>>>
> > > > > >>>> On 30.09.19 23:17, Bowen Li wrote:
> > > > > >>>>> Hi Timo,
> > > > > >>>>>
> > > > > >>>>> Re 1) I agree. I renamed the title to "Extend Core Table
> System
> > > > with
> > > > > >>>>> Pluggable Modules" and all internal references
> > > > > >>>>>
> > > > > >>>>> Re 2) First, I'll rename the API to useModules(). The design
> > > > doesn't
> > > > > >>>>> forbid
> > > > > >>>>> users to call useModules() multi times. Objects in modules
> are
> > > > loaded
> > > > > >>> on
> > > > > >>>>> demand instead of eagerly, so there won't be inconsistency.
> > Users
> > > > > >>>>> have to
> > > > > >>>>> be fully aware of the consequences of resetting modules as
> that
> > > > might
> > > > > >>>>> cause
> > > > > >>>>> that some objects can not be referenced anymore or resolution
> > > order
> > > > > >>>>> of some
> > > > > >>>>> objects changes.
> > > > > >>>>>
> > > > > >>>>> Re 3) Yes, we'd leave that to users.
> > > > > >>>>>
> > > > > >>>>> Another approach can be to have a non-optional "Core" module
> > for
> > > > all
> > > > > >>>>> objects that cannot be overrode like "CAST" and "AS"
> functions,
> > > and
> > > > > >>>>> have an
> > > > > >>>>> optional "ExtendedCore" module for other replaceable built-in
> > > > > objects.
> > > > > >>>>> "Core" should be positioned at the 1st in module list all the
> > > time.
> > > > > >>>>>
> > > > > >>>>> I'm fine with either solution.
> > > > > >>>>>
> > > > > >>>>> Re 4) It may sound like a nice-to-have advanced feature for
> > 1.10,
> > > > but
> > > > > >>> we
> > > > > >>>>> can surely fully discuss it for the sake of feature
> > completeness.
> > > > > >>>>>
> > > > > >>>>> Unlike other configs, the order of modules would matter in
> > Flink,
> > > > and
> > > > > >>> it
> > > > > >>>>> implies the LOAD/UNLOAD commands would not be equal in
> > operation
> > > > > >>>>> positions.
> > > > > >>>>> IIUYC, LOAD MODULE 'x' would be interpreted as appending x to
> > the
> > > > end
> > > > > >>> of
> > > > > >>>>> module list, and UNLOAD MODULE 'x' would be interpreted as
> > > > removing x
> > > > > >>>>> from
> > > > > >>>>> any position in the list?
> > > > > >>>>>
> > > > > >>>>> I'm thinking of the following list of commands:
> > > > > >>>>>
> > > > > >>>>> SHOW MODULES - list modules in order
> > > > > >>>>> LOAD MODULE 'hive' [WITH ('prop'='myProp', ...)] - load and
> > > append
> > > > > the
> > > > > >>>>> module to end of the module list
> > > > > >>>>> UNLOAD MODULE 'hive' - remove the module from module list,
> and
> > > > other
> > > > > >>>>> modules remain the same relative positions
> > > > > >>>>> USE MODULES 'x' 'y' 'z' (wondering can parser take "'x' 'y'
> > > 'z'"?),
> > > > > >>>>> or USE
> > > > > >>>>> MODULES 'x,y,z' - to reorder module list completely
> > > > > >>>>>
> > > > > >>>
> > > > >
> > > > >
> > > >
> > > > --
> > > > Xuefu Zhang
> > > >
> > > > "In Honey We Trust!"
> > > >
> > >
> >
>

--
Xuefu Zhang

"In Honey We Trust!"

Jark Wu-2

Re: [DISCUSS] FLIP-68: Extend Core Table System with Modular Plugins

Hi Xuefu,

If there is only one instance per type, then what's the "name" used for?
Could we remove it and only keep "type" or "kind" to identify modules?

Best,
Jark

On Thu, 10 Oct 2019 at 11:21, Xuefu Z <[hidden email]> wrote:

> Jark has a good point. However, I think validation logic can put in place
> to restrict one instance per type. Maybe the doc needs to be specific on
> this.
>
> Thanks,
> Xuefu
>
> On Wed, Oct 9, 2019 at 7:41 PM Jark Wu <[hidden email]> wrote:
>
> > Thanks Bowen for the updating.
> >
> > I have some different opinions on the change.
> > IIUC, in the previous design, the "name" is also the "id" or "type" to
> > identify which module to load. Which means we can only load one instance
> of
> > a module.
> > In the new design, the "name" is just an alias to the module instance,
> the
> > "kind" is used to identify modules. Which means we can load different
> > instances of a module.
> > However, what's the "name" or alias used for? Do we need to support
> loading
> > different instances of a module? From my point of view, it brings more
> > complexity and confusion.
> > For example, if we load a "hive121" which uses HiveModule with version
> > 1.2.1 and load a "hive234" which uses HiveModule with version 2.3.4, then
> > how to solve the class conflict problem?
> >
> > IMO, a module can only be load once in a session, so "name" maybe
> useless.
> > So my proposal is similar to the previous one, but only change "name" to
> > "kind".
> >
> > SQL:
> > LOAD MODULE "kind" [WITH (properties)];
> > UNLOAD MODULE "kind";
> > Table:
> > tEnv.loadModule("kind" [, properties]);
> > tEnv.unloadModule("kind");
> >
> > What do you think?
> >
> >
> > Best,
> > Jark
> >
> >
> >
> >
> >
> > On Wed, 9 Oct 2019 at 20:38, Bowen Li <[hidden email]> wrote:
> >
> > > Thanks everyone for your review.
> > >
> > > After discussing with Timo and Dawid offline, as well as incorporating
> > > feedback from Xuefu and Jark on mailing list, I decided to make a few
> > > critical changes to the proposal.
> > >
> > > - renamed the keyword "type" to "kind". The community has plan to have
> > > "type" keyword in yaml/descriptor refer to data types exclusively in
> the
> > > near future. We should cater to that change in our design
> > > - allowed specifying names for modules to simplify and unify module
> > > loading/unloading syntax between programming and SQL. Here're the
> > proposed
> > > changes:
> > > SQL:
> > > LOAD MODULE "name" WITH ("kind"="xxx" [, (properties)])
> > > UNLOAD MODULE "name";
> > > Table:
> > > tEnv.loadModule("name", new Xxx(properties));
> > > tEnv.unloadModule("name");
> > >
> > > I have completely updated the google doc [1]. Please take another look,
> > and
> > > let me know if you have any other questions. Thanks!
> > >
> > > [1]
> > >
> > >
> >
> https://docs.google.com/document/d/17CPMpMbPDjvM4selUVEfh_tqUK_oV0TODAUA9dfHakc/edit#
> > >
> > >
> > > On Tue, Oct 8, 2019 at 6:26 AM Jark Wu <[hidden email]> wrote:
> > >
> > > > Hi Bowen,
> > > >
> > > > Thanks for the proposal. I have two thoughts:
> > > >
> > > > 1) Regarding to "loadModule", how about
> > > > tableEnv.loadModule("xxx" [, propertiesMap]);
> > > > tableEnv.unloadModule(“xxx”);
> > > >
> > > > This makes the API similar to SQL. IMO, instance of Module is not
> > needed
> > > > and verbose as parameter.
> > > > And this makes it easier to load a simple module without any
> additional
> > > > properties, e.g. tEnv.loadModule("GEO"), tEnv.unloadModule("GEO")
> > > >
> > > > 2) In current design, the module interface only defines function
> > > metadata,
> > > > but no implementations.
> > > > I'm wondering how to call/map the implementations in runtime? Am I
> > > missing
> > > > something?
> > > >
> > > > Besides, I left some minor comments in the doc.
> > > >
> > > > Best,
> > > > Jark
> > > >
> > > >
> > > > On Sat, 5 Oct 2019 at 08:42, Xuefu Z <[hidden email]> wrote:
> > > >
> > > > > I agree with Timo that the new table APIs need to be consistent.
> I'd
> > go
> > > > > further that an name (or id) is needed for module definition in
> YAML
> > > > file.
> > > > > In the current design, name is skipped and type has binary
> meanings.
> > > > >
> > > > > Thanks,
> > > > > Xuefu
> > > > >
> > > > > On Fri, Oct 4, 2019 at 5:24 AM Timo Walther <[hidden email]>
> > > wrote:
> > > > >
> > > > > > Hi everyone,
> > > > > >
> > > > > > first, I was also questioning my proposal. But Bowen's proposal
> of
> > > > > > `tEnv.offloadToYaml(<file_path>)` would not work with the current
> > > > design
> > > > > > because we don't know how to serialize a catalog or module into
> > > > > > properties. Currently, there is no converter from instance to
> > > > > > properties. It is a one way conversion. We can add a
> `toProperties`
> > > > > > method to both Catalog and Module class in the future to solve
> > this.
> > > > > > Solving the table environment serializability can be future work.
> > > > > >
> > > > > > However, I find the current proposal for the TableEnvironment
> > methods
> > > > is
> > > > > > contradicting:
> > > > > >
> > > > > > tableEnv.loadModule(new Yyy());
> > > > > > tableEnv.unloadModule(“Xxx”);
> > > > > >
> > > > > > The loading is specified programmatically whereas the unloading
> > > > requires
> > > > > > a string that is not specified in the module itself. But is
> defined
> > > in
> > > > > > the factory according to the current design.
> > > > > >
> > > > > > SQL does it more consistently. There, the name `xxx` is used when
> > > > > > loading and unloading the module:
> > > > > >
> > > > > > LOAD MODULE 'xxx' [WITH ('prop'='myProp', ...)]
> > > > > > UNLOAD MODULE 'xxx’
> > > > > >
> > > > > > How about:
> > > > > >
> > > > > > tableEnv.loadModule("xxx", new Yyy());
> > > > > > tableEnv.unloadModule(“xxx”);
> > > > > >
> > > > > > This would be similar to the catalog interfaces. The name is not
> > part
> > > > of
> > > > > > the instance itself.
> > > > > >
> > > > > > What do you think?
> > > > > >
> > > > > > Thanks,
> > > > > > Timo
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > On 01.10.19 21:17, Bowen Li wrote:
> > > > > > > If something like the yaml file is the way to go and achieve
> such
> > > > > > > motivation, we would cover that with current design.
> > > > > > >
> > > > > > > On Tue, Oct 1, 2019 at 12:05 Bowen Li <[hidden email]>
> > wrote:
> > > > > > >
> > > > > > >> Hi Timo, Dawid,
> > > > > > >>
> > > > > > >> I've added the suggested SQL and related changes to
> > > TableEnvironment
> > > > > API
> > > > > > >> and other classes to the google doc. Also removed "USE MODULE"
> > and
> > > > its
> > > > > > >> APIs. Will update FLIP wiki once we have a consensus.
> > > > > > >>
> > > > > > >> W.r.t. descriptor approach, my gut feeling is similar to
> > Dawid's.
> > > > > > Besides,
> > > > > > >> I feel yaml file would be a better solution to persist
> > > serializable
> > > > > > state
> > > > > > >> of an environment as the file itself is in serializable format
> > > > > already.
> > > > > > >> Though yaml file only serves SQL CLI at this moment, we may be
> > > able
> > > > to
> > > > > > >> extend its reach to Table API and allow users to load/offload
> a
> > > > > > >> TableEnvironment from/to yaml files, as something like
> > > > > "TableEnvironment
> > > > > > >> tEnv = TableEnvironment.loadFromYaml(<file_path>)" and
> > > > > > >> "tEnv.offloadToYaml(<file_path>)" to restore and persist
> state,
> > > and
> > > > > try
> > > > > > to
> > > > > > >> make yaml file more expressive.
> > > > > > >>
> > > > > > >>
> > > > > > >> On Tue, Oct 1, 2019 at 6:47 AM Dawid Wysakowicz <
> > > > > [hidden email]
> > > > > > >
> > > > > > >> wrote:
> > > > > > >>
> > > > > > >>> Hi Timo, Bowen,
> > > > > > >>>
> > > > > > >>> Unfortunately I did not have enough time to go through all
> the
> > > > > > >>> suggestions in details so I can not comment on the whole
> FLIP.
> > > > > > >>>
> > > > > > >>> I just wanted to give my opinion on the "descriptor approach
> in
> > > > > > >>> loadModule" part. I am not sure if we need it here. We might
> be
> > > > > > >>> overthinking this a bit. It definitely makes sense for
> objects
> > > like
> > > > > > >>> TableSource/TableSink etc. as they are logical definitions
> that
> > > > > nearly
> > > > > > >>> always have to be persisted in a Catalog. I'm not sure if we
> > > really
> > > > > > need
> > > > > > >>> the same for a whole session. If we need a resume session
> > > feature,
> > > > > the
> > > > > > >>> way to go would probably be to keep the session in memory on
> > the
> > > > > server
> > > > > > >>> side. I fear we will never be able to serialize the whole
> > session
> > > > > > >>> entirely (temporary objects, objects derived from DataStream
> > > etc.)
> > > > > > >>>
> > > > > > >>> I think it is ok to use instances for objects like Catalogs
> or
> > > > > Modules
> > > > > > >>> and have an overlay on top of that that can create instances
> > from
> > > > > > >>> properties.
> > > > > > >>>
> > > > > > >>> Best,
> > > > > > >>>
> > > > > > >>> Dawid
> > > > > > >>>
> > > > > > >>> On 01/10/2019 11:28, Timo Walther wrote:
> > > > > > >>>> Hi Bowen,
> > > > > > >>>>
> > > > > > >>>> thanks for your response.
> > > > > > >>>>
> > > > > > >>>> Re 2) I also don't have a better approach for this issue. It
> > is
> > > > > > >>>> similar to changing the general TableConfig between two
> > > > statements.
> > > > > It
> > > > > > >>>> would be good to add your explanation to the design
> document.
> > > > > > >>>>
> > > > > > >>>> Re 3) It would be interesting to know about which "core"
> > > functions
> > > > > we
> > > > > > >>>> are actually talking about. Also for the overriding built-in
> > > > > functions
> > > > > > >>>> that we discussed in the other FLIP. But I'm fine with
> leaving
> > > it
> > > > to
> > > > > > >>>> the user for now. How about we just introduce loadModule(),
> > > > > > >>>> unloadModule() methods instead of useModules()? This would
> > > ensure
> > > > > that
> > > > > > >>>> users don't forget to add the core module when adding an
> > > > additional
> > > > > > >>>> module and they need to explicitly call
> > "unloadModule('core')".
> > > > > > >>>>
> > > > > > >>>> Re 4) Every table environment feature should also be
> designed
> > > with
> > > > > SQL
> > > > > > >>>> statements in mind to verify the concept. SQL is also more
> > > popular
> > > > > > >>>> that Java/Scala API or YAML file. I would like to add it to
> > 1.10
> > > > for
> > > > > > >>>> marking the feature as complete.
> > > > > > >>>>
> > > > > > >>>> SHOW MODULES -> sounds good to me, we should add a
> > > listModules():
> > > > > > >>>> List<String> method to table environment
> > > > > > >>>>
> > > > > > >>>> LOAD MODULE 'hive' [WITH ('prop'='myProp', ...)] --> we
> should
> > > > add a
> > > > > > >>>> loadModule() method to table environment
> > > > > > >>>>
> > > > > > >>>> UNLOAD MODULE 'hive' --> we should add a unloadModule()
> method
> > > to
> > > > > > >>>> table environment
> > > > > > >>>>
> > > > > > >>>> I would not introduce `USE MODULES 'x' 'y' 'z'` for
> simplicity
> > > and
> > > > > > >>>> concise API. Users need to load the module anyway with
> > > properties.
> > > > > > >>>> They can also load them "in order" immediately. CREATE TABLE
> > can
> > > > > also
> > > > > > >>>> not create multiple tables but only one at a time in that
> > order.
> > > > > > >>>>
> > > > > > >>>> One thing that came to my mind, shall we use a descriptor
> > > approach
> > > > > for
> > > > > > >>>> loadModule()? The past has shown that passing instances
> causes
> > > > > > >>>> problems when persisting objects. That's why we also want to
> > get
> > > > rid
> > > > > > >>>> of registerTableSource. I could image that users might want
> to
> > > > > persist
> > > > > > >>>> a table environment's state for later use in the future.
> Even
> > > > though
> > > > > > >>>> this is future work, we should already keep such use cases
> in
> > > mind
> > > > > > >>>> when adding new API methods. What do you think?
> > > > > > >>>>
> > > > > > >>>> Regards,
> > > > > > >>>> Timo
> > > > > > >>>>
> > > > > > >>>>
> > > > > > >>>> On 30.09.19 23:17, Bowen Li wrote:
> > > > > > >>>>> Hi Timo,
> > > > > > >>>>>
> > > > > > >>>>> Re 1) I agree. I renamed the title to "Extend Core Table
> > System
> > > > > with
> > > > > > >>>>> Pluggable Modules" and all internal references
> > > > > > >>>>>
> > > > > > >>>>> Re 2) First, I'll rename the API to useModules(). The
> design
> > > > > doesn't
> > > > > > >>>>> forbid
> > > > > > >>>>> users to call useModules() multi times. Objects in modules
> > are
> > > > > loaded
> > > > > > >>> on
> > > > > > >>>>> demand instead of eagerly, so there won't be inconsistency.
> > > Users
> > > > > > >>>>> have to
> > > > > > >>>>> be fully aware of the consequences of resetting modules as
> > that
> > > > > might
> > > > > > >>>>> cause
> > > > > > >>>>> that some objects can not be referenced anymore or
> resolution
> > > > order
> > > > > > >>>>> of some
> > > > > > >>>>> objects changes.
> > > > > > >>>>>
> > > > > > >>>>> Re 3) Yes, we'd leave that to users.
> > > > > > >>>>>
> > > > > > >>>>> Another approach can be to have a non-optional "Core"
> module
> > > for
> > > > > all
> > > > > > >>>>> objects that cannot be overrode like "CAST" and "AS"
> > functions,
> > > > and
> > > > > > >>>>> have an
> > > > > > >>>>> optional "ExtendedCore" module for other replaceable
> built-in
> > > > > > objects.
> > > > > > >>>>> "Core" should be positioned at the 1st in module list all
> the
> > > > time.
> > > > > > >>>>>
> > > > > > >>>>> I'm fine with either solution.
> > > > > > >>>>>
> > > > > > >>>>> Re 4) It may sound like a nice-to-have advanced feature for
> > > 1.10,
> > > > > but
> > > > > > >>> we
> > > > > > >>>>> can surely fully discuss it for the sake of feature
> > > completeness.
> > > > > > >>>>>
> > > > > > >>>>> Unlike other configs, the order of modules would matter in
> > > Flink,
> > > > > and
> > > > > > >>> it
> > > > > > >>>>> implies the LOAD/UNLOAD commands would not be equal in
> > > operation
> > > > > > >>>>> positions.
> > > > > > >>>>> IIUYC, LOAD MODULE 'x' would be interpreted as appending x
> to
> > > the
> > > > > end
> > > > > > >>> of
> > > > > > >>>>> module list, and UNLOAD MODULE 'x' would be interpreted as
> > > > > removing x
> > > > > > >>>>> from
> > > > > > >>>>> any position in the list?
> > > > > > >>>>>
> > > > > > >>>>> I'm thinking of the following list of commands:
> > > > > > >>>>>
> > > > > > >>>>> SHOW MODULES - list modules in order
> > > > > > >>>>> LOAD MODULE 'hive' [WITH ('prop'='myProp', ...)] - load and
> > > > append
> > > > > > the
> > > > > > >>>>> module to end of the module list
> > > > > > >>>>> UNLOAD MODULE 'hive' - remove the module from module list,
> > and
> > > > > other
> > > > > > >>>>> modules remain the same relative positions
> > > > > > >>>>> USE MODULES 'x' 'y' 'z' (wondering can parser take "'x' 'y'
> > > > 'z'"?),
> > > > > > >>>>> or USE
> > > > > > >>>>> MODULES 'x,y,z' - to reorder module list completely
> > > > > > >>>>>
> > > > > > >>>
> > > > > >
> > > > > >
> > > > >
> > > > > --
> > > > > Xuefu Zhang
> > > > >
> > > > > "In Honey We Trust!"
> > > > >
> > > >
> > >
> >
>
>
> --
> Xuefu Zhang
>
> "In Honey We Trust!"
>

Timo Walther-2

Re: [DISCUSS] FLIP-68: Extend Core Table System with Modular Plugins

Hi Jark,

we had a long offline discussion yesterday where we considered all
options again. The reasons why we decided for the updated design that
Bowen suggested:

- Both Dawid and Xuefu argued that in the old design "kind" has binary
meanings. I agree here.
- Compared to other SQL concepts such as tables/functions/catalogs, a
"name" is never part of the object itself but always specified when
registering the object. We should have the same behavior for modules
because of consistency reasons. It also makes the "name" explicit when
it comes to unloading the module compared to the previous design.
- Regarding, "How to solve the class conflict problem?" this is an
orthogonal issue that will be fixed in future versions once we use
Flink's new plugin architecture. If a module is parameterizable, it is
the responsibility of the module to prevent class conflicts. If the Hive
classes are hidden in a module classloader, it should be possible to
load both hive121 and hive234. But in general this problem is unsolved
for now, also Kafka tables could clash if you read from two Kafka
clusters with different versions.

Regards,
Timo

On 10.10.19 08:01, Jark Wu wrote:

> Hi Xuefu,
>
> If there is only one instance per type, then what's the "name" used for?
> Could we remove it and only keep "type" or "kind" to identify modules?
>
> Best,
> Jark
>
> On Thu, 10 Oct 2019 at 11:21, Xuefu Z <[hidden email]> wrote:
>
>> Jark has a good point. However, I think validation logic can put in place
>> to restrict one instance per type. Maybe the doc needs to be specific on
>> this.
>>
>> Thanks,
>> Xuefu
>>
>> On Wed, Oct 9, 2019 at 7:41 PM Jark Wu <[hidden email]> wrote:
>>
>>> Thanks Bowen for the updating.
>>>
>>> I have some different opinions on the change.
>>> IIUC, in the previous design, the "name" is also the "id" or "type" to
>>> identify which module to load. Which means we can only load one instance
>> of
>>> a module.
>>> In the new design, the "name" is just an alias to the module instance,
>> the
>>> "kind" is used to identify modules. Which means we can load different
>>> instances of a module.
>>> However, what's the "name" or alias used for? Do we need to support
>> loading
>>> different instances of a module? From my point of view, it brings more
>>> complexity and confusion.
>>> For example, if we load a "hive121" which uses HiveModule with version
>>> 1.2.1 and load a "hive234" which uses HiveModule with version 2.3.4, then
>>> how to solve the class conflict problem?
>>>
>>> IMO, a module can only be load once in a session, so "name" maybe
>> useless.
>>> So my proposal is similar to the previous one, but only change "name" to
>>> "kind".
>>>
>>> SQL:
>>> LOAD MODULE "kind" [WITH (properties)];
>>> UNLOAD MODULE "kind";
>>> Table:
>>> tEnv.loadModule("kind" [, properties]);
>>> tEnv.unloadModule("kind");
>>>
>>> What do you think?
>>>
>>>
>>> Best,
>>> Jark
>>>
>>>
>>>
>>>
>>>
>>> On Wed, 9 Oct 2019 at 20:38, Bowen Li <[hidden email]> wrote:
>>>
>>>> Thanks everyone for your review.
>>>>
>>>> After discussing with Timo and Dawid offline, as well as incorporating
>>>> feedback from Xuefu and Jark on mailing list, I decided to make a few
>>>> critical changes to the proposal.
>>>>
>>>> - renamed the keyword "type" to "kind". The community has plan to have
>>>> "type" keyword in yaml/descriptor refer to data types exclusively in
>> the
>>>> near future. We should cater to that change in our design
>>>> - allowed specifying names for modules to simplify and unify module
>>>> loading/unloading syntax between programming and SQL. Here're the
>>> proposed
>>>> changes:
>>>> SQL:
>>>> LOAD MODULE "name" WITH ("kind"="xxx" [, (properties)])
>>>> UNLOAD MODULE "name";
>>>> Table:
>>>> tEnv.loadModule("name", new Xxx(properties));
>>>> tEnv.unloadModule("name");
>>>>
>>>> I have completely updated the google doc [1]. Please take another look,
>>> and
>>>> let me know if you have any other questions. Thanks!
>>>>
>>>> [1]
>>>>
>>>>
>> https://docs.google.com/document/d/17CPMpMbPDjvM4selUVEfh_tqUK_oV0TODAUA9dfHakc/edit#
>>>>
>>>> On Tue, Oct 8, 2019 at 6:26 AM Jark Wu <[hidden email]> wrote:
>>>>
>>>>> Hi Bowen,
>>>>>
>>>>> Thanks for the proposal. I have two thoughts:
>>>>>
>>>>> 1) Regarding to "loadModule", how about
>>>>> tableEnv.loadModule("xxx" [, propertiesMap]);
>>>>> tableEnv.unloadModule(“xxx”);
>>>>>
>>>>> This makes the API similar to SQL. IMO, instance of Module is not
>>> needed
>>>>> and verbose as parameter.
>>>>> And this makes it easier to load a simple module without any
>> additional
>>>>> properties, e.g. tEnv.loadModule("GEO"), tEnv.unloadModule("GEO")
>>>>>
>>>>> 2) In current design, the module interface only defines function
>>>> metadata,
>>>>> but no implementations.
>>>>> I'm wondering how to call/map the implementations in runtime? Am I
>>>> missing
>>>>> something?
>>>>>
>>>>> Besides, I left some minor comments in the doc.
>>>>>
>>>>> Best,
>>>>> Jark
>>>>>
>>>>>
>>>>> On Sat, 5 Oct 2019 at 08:42, Xuefu Z <[hidden email]> wrote:
>>>>>
>>>>>> I agree with Timo that the new table APIs need to be consistent.
>> I'd
>>> go
>>>>>> further that an name (or id) is needed for module definition in
>> YAML
>>>>> file.
>>>>>> In the current design, name is skipped and type has binary
>> meanings.
>>>>>> Thanks,
>>>>>> Xuefu
>>>>>>
>>>>>> On Fri, Oct 4, 2019 at 5:24 AM Timo Walther <[hidden email]>
>>>> wrote:
>>>>>>> Hi everyone,
>>>>>>>
>>>>>>> first, I was also questioning my proposal. But Bowen's proposal
>> of
>>>>>>> `tEnv.offloadToYaml(<file_path>)` would not work with the current
>>>>> design
>>>>>>> because we don't know how to serialize a catalog or module into
>>>>>>> properties. Currently, there is no converter from instance to
>>>>>>> properties. It is a one way conversion. We can add a
>> `toProperties`
>>>>>>> method to both Catalog and Module class in the future to solve
>>> this.
>>>>>>> Solving the table environment serializability can be future work.
>>>>>>>
>>>>>>> However, I find the current proposal for the TableEnvironment
>>> methods
>>>>> is
>>>>>>> contradicting:
>>>>>>>
>>>>>>> tableEnv.loadModule(new Yyy());
>>>>>>> tableEnv.unloadModule(“Xxx”);
>>>>>>>
>>>>>>> The loading is specified programmatically whereas the unloading
>>>>> requires
>>>>>>> a string that is not specified in the module itself. But is
>> defined
>>>> in
>>>>>>> the factory according to the current design.
>>>>>>>
>>>>>>> SQL does it more consistently. There, the name `xxx` is used when
>>>>>>> loading and unloading the module:
>>>>>>>
>>>>>>> LOAD MODULE 'xxx' [WITH ('prop'='myProp', ...)]
>>>>>>> UNLOAD MODULE 'xxx’
>>>>>>>
>>>>>>> How about:
>>>>>>>
>>>>>>> tableEnv.loadModule("xxx", new Yyy());
>>>>>>> tableEnv.unloadModule(“xxx”);
>>>>>>>
>>>>>>> This would be similar to the catalog interfaces. The name is not
>>> part
>>>>> of
>>>>>>> the instance itself.
>>>>>>>
>>>>>>> What do you think?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Timo
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 01.10.19 21:17, Bowen Li wrote:
>>>>>>>> If something like the yaml file is the way to go and achieve
>> such
>>>>>>>> motivation, we would cover that with current design.
>>>>>>>>
>>>>>>>> On Tue, Oct 1, 2019 at 12:05 Bowen Li <[hidden email]>
>>> wrote:
>>>>>>>>> Hi Timo, Dawid,
>>>>>>>>>
>>>>>>>>> I've added the suggested SQL and related changes to
>>>> TableEnvironment
>>>>>> API
>>>>>>>>> and other classes to the google doc. Also removed "USE MODULE"
>>> and
>>>>> its
>>>>>>>>> APIs. Will update FLIP wiki once we have a consensus.
>>>>>>>>>
>>>>>>>>> W.r.t. descriptor approach, my gut feeling is similar to
>>> Dawid's.
>>>>>>> Besides,
>>>>>>>>> I feel yaml file would be a better solution to persist
>>>> serializable
>>>>>>> state
>>>>>>>>> of an environment as the file itself is in serializable format
>>>>>> already.
>>>>>>>>> Though yaml file only serves SQL CLI at this moment, we may be
>>>> able
>>>>> to
>>>>>>>>> extend its reach to Table API and allow users to load/offload
>> a
>>>>>>>>> TableEnvironment from/to yaml files, as something like
>>>>>> "TableEnvironment
>>>>>>>>> tEnv = TableEnvironment.loadFromYaml(<file_path>)" and
>>>>>>>>> "tEnv.offloadToYaml(<file_path>)" to restore and persist
>> state,
>>>> and
>>>>>> try
>>>>>>> to
>>>>>>>>> make yaml file more expressive.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Oct 1, 2019 at 6:47 AM Dawid Wysakowicz <
>>>>>> [hidden email]
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Timo, Bowen,
>>>>>>>>>>
>>>>>>>>>> Unfortunately I did not have enough time to go through all
>> the
>>>>>>>>>> suggestions in details so I can not comment on the whole
>> FLIP.
>>>>>>>>>> I just wanted to give my opinion on the "descriptor approach
>> in
>>>>>>>>>> loadModule" part. I am not sure if we need it here. We might
>> be
>>>>>>>>>> overthinking this a bit. It definitely makes sense for
>> objects
>>>> like
>>>>>>>>>> TableSource/TableSink etc. as they are logical definitions
>> that
>>>>>> nearly
>>>>>>>>>> always have to be persisted in a Catalog. I'm not sure if we
>>>> really
>>>>>>> need
>>>>>>>>>> the same for a whole session. If we need a resume session
>>>> feature,
>>>>>> the
>>>>>>>>>> way to go would probably be to keep the session in memory on
>>> the
>>>>>> server
>>>>>>>>>> side. I fear we will never be able to serialize the whole
>>> session
>>>>>>>>>> entirely (temporary objects, objects derived from DataStream
>>>> etc.)
>>>>>>>>>> I think it is ok to use instances for objects like Catalogs
>> or
>>>>>> Modules
>>>>>>>>>> and have an overlay on top of that that can create instances
>>> from
>>>>>>>>>> properties.
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>>
>>>>>>>>>> Dawid
>>>>>>>>>>
>>>>>>>>>> On 01/10/2019 11:28, Timo Walther wrote:
>>>>>>>>>>> Hi Bowen,
>>>>>>>>>>>
>>>>>>>>>>> thanks for your response.
>>>>>>>>>>>
>>>>>>>>>>> Re 2) I also don't have a better approach for this issue. It
>>> is
>>>>>>>>>>> similar to changing the general TableConfig between two
>>>>> statements.
>>>>>> It
>>>>>>>>>>> would be good to add your explanation to the design
>> document.
>>>>>>>>>>> Re 3) It would be interesting to know about which "core"
>>>> functions
>>>>>> we
>>>>>>>>>>> are actually talking about. Also for the overriding built-in
>>>>>> functions
>>>>>>>>>>> that we discussed in the other FLIP. But I'm fine with
>> leaving
>>>> it
>>>>> to
>>>>>>>>>>> the user for now. How about we just introduce loadModule(),
>>>>>>>>>>> unloadModule() methods instead of useModules()? This would
>>>> ensure
>>>>>> that
>>>>>>>>>>> users don't forget to add the core module when adding an
>>>>> additional
>>>>>>>>>>> module and they need to explicitly call
>>> "unloadModule('core')".
>>>>>>>>>>> Re 4) Every table environment feature should also be
>> designed
>>>> with
>>>>>> SQL
>>>>>>>>>>> statements in mind to verify the concept. SQL is also more
>>>> popular
>>>>>>>>>>> that Java/Scala API or YAML file. I would like to add it to
>>> 1.10
>>>>> for
>>>>>>>>>>> marking the feature as complete.
>>>>>>>>>>>
>>>>>>>>>>> SHOW MODULES -> sounds good to me, we should add a
>>>> listModules():
>>>>>>>>>>> List<String> method to table environment
>>>>>>>>>>>
>>>>>>>>>>> LOAD MODULE 'hive' [WITH ('prop'='myProp', ...)] --> we
>> should
>>>>> add a
>>>>>>>>>>> loadModule() method to table environment
>>>>>>>>>>>
>>>>>>>>>>> UNLOAD MODULE 'hive' --> we should add a unloadModule()
>> method
>>>> to
>>>>>>>>>>> table environment
>>>>>>>>>>>
>>>>>>>>>>> I would not introduce `USE MODULES 'x' 'y' 'z'` for
>> simplicity
>>>> and
>>>>>>>>>>> concise API. Users need to load the module anyway with
>>>> properties.
>>>>>>>>>>> They can also load them "in order" immediately. CREATE TABLE
>>> can
>>>>>> also
>>>>>>>>>>> not create multiple tables but only one at a time in that
>>> order.
>>>>>>>>>>> One thing that came to my mind, shall we use a descriptor
>>>> approach
>>>>>> for
>>>>>>>>>>> loadModule()? The past has shown that passing instances
>> causes
>>>>>>>>>>> problems when persisting objects. That's why we also want to
>>> get
>>>>> rid
>>>>>>>>>>> of registerTableSource. I could image that users might want
>> to
>>>>>> persist
>>>>>>>>>>> a table environment's state for later use in the future.
>> Even
>>>>> though
>>>>>>>>>>> this is future work, we should already keep such use cases
>> in
>>>> mind
>>>>>>>>>>> when adding new API methods. What do you think?
>>>>>>>>>>>
>>>>>>>>>>> Regards,
>>>>>>>>>>> Timo
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 30.09.19 23:17, Bowen Li wrote:
>>>>>>>>>>>> Hi Timo,
>>>>>>>>>>>>
>>>>>>>>>>>> Re 1) I agree. I renamed the title to "Extend Core Table
>>> System
>>>>>> with
>>>>>>>>>>>> Pluggable Modules" and all internal references
>>>>>>>>>>>>
>>>>>>>>>>>> Re 2) First, I'll rename the API to useModules(). The
>> design
>>>>>> doesn't
>>>>>>>>>>>> forbid
>>>>>>>>>>>> users to call useModules() multi times. Objects in modules
>>> are
>>>>>> loaded
>>>>>>>>>> on
>>>>>>>>>>>> demand instead of eagerly, so there won't be inconsistency.
>>>> Users
>>>>>>>>>>>> have to
>>>>>>>>>>>> be fully aware of the consequences of resetting modules as
>>> that
>>>>>> might
>>>>>>>>>>>> cause
>>>>>>>>>>>> that some objects can not be referenced anymore or
>> resolution
>>>>> order
>>>>>>>>>>>> of some
>>>>>>>>>>>> objects changes.
>>>>>>>>>>>>
>>>>>>>>>>>> Re 3) Yes, we'd leave that to users.
>>>>>>>>>>>>
>>>>>>>>>>>> Another approach can be to have a non-optional "Core"
>> module
>>>> for
>>>>>> all
>>>>>>>>>>>> objects that cannot be overrode like "CAST" and "AS"
>>> functions,
>>>>> and
>>>>>>>>>>>> have an
>>>>>>>>>>>> optional "ExtendedCore" module for other replaceable
>> built-in
>>>>>>> objects.
>>>>>>>>>>>> "Core" should be positioned at the 1st in module list all
>> the
>>>>> time.
>>>>>>>>>>>> I'm fine with either solution.
>>>>>>>>>>>>
>>>>>>>>>>>> Re 4) It may sound like a nice-to-have advanced feature for
>>>> 1.10,
>>>>>> but
>>>>>>>>>> we
>>>>>>>>>>>> can surely fully discuss it for the sake of feature
>>>> completeness.
>>>>>>>>>>>> Unlike other configs, the order of modules would matter in
>>>> Flink,
>>>>>> and
>>>>>>>>>> it
>>>>>>>>>>>> implies the LOAD/UNLOAD commands would not be equal in
>>>> operation
>>>>>>>>>>>> positions.
>>>>>>>>>>>> IIUYC, LOAD MODULE 'x' would be interpreted as appending x
>> to
>>>> the
>>>>>> end
>>>>>>>>>> of
>>>>>>>>>>>> module list, and UNLOAD MODULE 'x' would be interpreted as
>>>>>> removing x
>>>>>>>>>>>> from
>>>>>>>>>>>> any position in the list?
>>>>>>>>>>>>
>>>>>>>>>>>> I'm thinking of the following list of commands:
>>>>>>>>>>>>
>>>>>>>>>>>> SHOW MODULES - list modules in order
>>>>>>>>>>>> LOAD MODULE 'hive' [WITH ('prop'='myProp', ...)] - load and
>>>>> append
>>>>>>> the
>>>>>>>>>>>> module to end of the module list
>>>>>>>>>>>> UNLOAD MODULE 'hive' - remove the module from module list,
>>> and
>>>>>> other
>>>>>>>>>>>> modules remain the same relative positions
>>>>>>>>>>>> USE MODULES 'x' 'y' 'z' (wondering can parser take "'x' 'y'
>>>>> 'z'"?),
>>>>>>>>>>>> or USE
>>>>>>>>>>>> MODULES 'x,y,z' - to reorder module list completely
>>>>>>>>>>>>
>>>>>>>
>>>>>> --
>>>>>> Xuefu Zhang
>>>>>>
>>>>>> "In Honey We Trust!"
>>>>>>
>>
>> --
>> Xuefu Zhang
>>
>> "In Honey We Trust!"
>>

Jark Wu-2

Re: [DISCUSS] FLIP-68: Extend Core Table System with Modular Plugins

Hi Timo,

Thanks for the explanation, it makes sense to me.
So at least, we can have a validation to restrict one module instance per
type in the first version.

Regarding to "type" vs "kind", could we use "datatype" keyword to refer
data types exclusively in the future?
This can avoid the conflict/confusion when we use "type" here.
The concern of using "kind" is that it is inconsistent with other
descriptor properties.
We have heavily used "type" in properties, including `connector.type`,
`format.type`, `catalog.type`, etc...
Are we planning to change them all ?

Best,
Jark

On Thu, 10 Oct 2019 at 19:56, Timo Walther <[hidden email]> wrote:

> Hi Jark,
>
> we had a long offline discussion yesterday where we considered all
> options again. The reasons why we decided for the updated design that
> Bowen suggested:
>
> - Both Dawid and Xuefu argued that in the old design "kind" has binary
> meanings. I agree here.
> - Compared to other SQL concepts such as tables/functions/catalogs, a
> "name" is never part of the object itself but always specified when
> registering the object. We should have the same behavior for modules
> because of consistency reasons. It also makes the "name" explicit when
> it comes to unloading the module compared to the previous design.
> - Regarding, "How to solve the class conflict problem?" this is an
> orthogonal issue that will be fixed in future versions once we use
> Flink's new plugin architecture. If a module is parameterizable, it is
> the responsibility of the module to prevent class conflicts. If the Hive
> classes are hidden in a module classloader, it should be possible to
> load both hive121 and hive234. But in general this problem is unsolved
> for now, also Kafka tables could clash if you read from two Kafka
> clusters with different versions.
>
> Regards,
> Timo
>
>
> On 10.10.19 08:01, Jark Wu wrote:
> > Hi Xuefu,
> >
> > If there is only one instance per type, then what's the "name" used for?
> > Could we remove it and only keep "type" or "kind" to identify modules?
> >
> > Best,
> > Jark
> >
> > On Thu, 10 Oct 2019 at 11:21, Xuefu Z <[hidden email]> wrote:
> >
> >> Jark has a good point. However, I think validation logic can put in
> place
> >> to restrict one instance per type. Maybe the doc needs to be specific on
> >> this.
> >>
> >> Thanks,
> >> Xuefu
> >>
> >> On Wed, Oct 9, 2019 at 7:41 PM Jark Wu <[hidden email]> wrote:
> >>
> >>> Thanks Bowen for the updating.
> >>>
> >>> I have some different opinions on the change.
> >>> IIUC, in the previous design, the "name" is also the "id" or "type" to
> >>> identify which module to load. Which means we can only load one
> instance
> >> of
> >>> a module.
> >>> In the new design, the "name" is just an alias to the module instance,
> >> the
> >>> "kind" is used to identify modules. Which means we can load different
> >>> instances of a module.
> >>> However, what's the "name" or alias used for? Do we need to support
> >> loading
> >>> different instances of a module? From my point of view, it brings more
> >>> complexity and confusion.
> >>> For example, if we load a "hive121" which uses HiveModule with version
> >>> 1.2.1 and load a "hive234" which uses HiveModule with version 2.3.4,
> then
> >>> how to solve the class conflict problem?
> >>>
> >>> IMO, a module can only be load once in a session, so "name" maybe
> >> useless.
> >>> So my proposal is similar to the previous one, but only change "name"
> to
> >>> "kind".
> >>>
> >>> SQL:
> >>> LOAD MODULE "kind" [WITH (properties)];
> >>> UNLOAD MODULE "kind";
> >>> Table:
> >>> tEnv.loadModule("kind" [, properties]);
> >>> tEnv.unloadModule("kind");
> >>>
> >>> What do you think?
> >>>
> >>>
> >>> Best,
> >>> Jark
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> On Wed, 9 Oct 2019 at 20:38, Bowen Li <[hidden email]> wrote:
> >>>
> >>>> Thanks everyone for your review.
> >>>>
> >>>> After discussing with Timo and Dawid offline, as well as incorporating
> >>>> feedback from Xuefu and Jark on mailing list, I decided to make a few
> >>>> critical changes to the proposal.
> >>>>
> >>>> - renamed the keyword "type" to "kind". The community has plan to have
> >>>> "type" keyword in yaml/descriptor refer to data types exclusively in
> >> the
> >>>> near future. We should cater to that change in our design
> >>>> - allowed specifying names for modules to simplify and unify module
> >>>> loading/unloading syntax between programming and SQL. Here're the
> >>> proposed
> >>>> changes:
> >>>> SQL:
> >>>> LOAD MODULE "name" WITH ("kind"="xxx" [, (properties)])
> >>>> UNLOAD MODULE "name";
> >>>> Table:
> >>>> tEnv.loadModule("name", new Xxx(properties));
> >>>> tEnv.unloadModule("name");
> >>>>
> >>>> I have completely updated the google doc [1]. Please take another
> look,
> >>> and
> >>>> let me know if you have any other questions. Thanks!
> >>>>
> >>>> [1]
> >>>>
> >>>>
> >>
> https://docs.google.com/document/d/17CPMpMbPDjvM4selUVEfh_tqUK_oV0TODAUA9dfHakc/edit#
> >>>>
> >>>> On Tue, Oct 8, 2019 at 6:26 AM Jark Wu <[hidden email]> wrote:
> >>>>
> >>>>> Hi Bowen,
> >>>>>
> >>>>> Thanks for the proposal. I have two thoughts:
> >>>>>
> >>>>> 1) Regarding to "loadModule", how about
> >>>>> tableEnv.loadModule("xxx" [, propertiesMap]);
> >>>>> tableEnv.unloadModule(“xxx”);
> >>>>>
> >>>>> This makes the API similar to SQL. IMO, instance of Module is not
> >>> needed
> >>>>> and verbose as parameter.
> >>>>> And this makes it easier to load a simple module without any
> >> additional
> >>>>> properties, e.g. tEnv.loadModule("GEO"), tEnv.unloadModule("GEO")
> >>>>>
> >>>>> 2) In current design, the module interface only defines function
> >>>> metadata,
> >>>>> but no implementations.
> >>>>> I'm wondering how to call/map the implementations in runtime? Am I
> >>>> missing
> >>>>> something?
> >>>>>
> >>>>> Besides, I left some minor comments in the doc.
> >>>>>
> >>>>> Best,
> >>>>> Jark
> >>>>>
> >>>>>
> >>>>> On Sat, 5 Oct 2019 at 08:42, Xuefu Z <[hidden email]> wrote:
> >>>>>
> >>>>>> I agree with Timo that the new table APIs need to be consistent.
> >> I'd
> >>> go
> >>>>>> further that an name (or id) is needed for module definition in
> >> YAML
> >>>>> file.
> >>>>>> In the current design, name is skipped and type has binary
> >> meanings.
> >>>>>> Thanks,
> >>>>>> Xuefu
> >>>>>>
> >>>>>> On Fri, Oct 4, 2019 at 5:24 AM Timo Walther <[hidden email]>
> >>>> wrote:
> >>>>>>> Hi everyone,
> >>>>>>>
> >>>>>>> first, I was also questioning my proposal. But Bowen's proposal
> >> of
> >>>>>>> `tEnv.offloadToYaml(<file_path>)` would not work with the current
> >>>>> design
> >>>>>>> because we don't know how to serialize a catalog or module into
> >>>>>>> properties. Currently, there is no converter from instance to
> >>>>>>> properties. It is a one way conversion. We can add a
> >> `toProperties`
> >>>>>>> method to both Catalog and Module class in the future to solve
> >>> this.
> >>>>>>> Solving the table environment serializability can be future work.
> >>>>>>>
> >>>>>>> However, I find the current proposal for the TableEnvironment
> >>> methods
> >>>>> is
> >>>>>>> contradicting:
> >>>>>>>
> >>>>>>> tableEnv.loadModule(new Yyy());
> >>>>>>> tableEnv.unloadModule(“Xxx”);
> >>>>>>>
> >>>>>>> The loading is specified programmatically whereas the unloading
> >>>>> requires
> >>>>>>> a string that is not specified in the module itself. But is
> >> defined
> >>>> in
> >>>>>>> the factory according to the current design.
> >>>>>>>
> >>>>>>> SQL does it more consistently. There, the name `xxx` is used when
> >>>>>>> loading and unloading the module:
> >>>>>>>
> >>>>>>> LOAD MODULE 'xxx' [WITH ('prop'='myProp', ...)]
> >>>>>>> UNLOAD MODULE 'xxx’
> >>>>>>>
> >>>>>>> How about:
> >>>>>>>
> >>>>>>> tableEnv.loadModule("xxx", new Yyy());
> >>>>>>> tableEnv.unloadModule(“xxx”);
> >>>>>>>
> >>>>>>> This would be similar to the catalog interfaces. The name is not
> >>> part
> >>>>> of
> >>>>>>> the instance itself.
> >>>>>>>
> >>>>>>> What do you think?
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> Timo
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On 01.10.19 21:17, Bowen Li wrote:
> >>>>>>>> If something like the yaml file is the way to go and achieve
> >> such
> >>>>>>>> motivation, we would cover that with current design.
> >>>>>>>>
> >>>>>>>> On Tue, Oct 1, 2019 at 12:05 Bowen Li <[hidden email]>
> >>> wrote:
> >>>>>>>>> Hi Timo, Dawid,
> >>>>>>>>>
> >>>>>>>>> I've added the suggested SQL and related changes to
> >>>> TableEnvironment
> >>>>>> API
> >>>>>>>>> and other classes to the google doc. Also removed "USE MODULE"
> >>> and
> >>>>> its
> >>>>>>>>> APIs. Will update FLIP wiki once we have a consensus.
> >>>>>>>>>
> >>>>>>>>> W.r.t. descriptor approach, my gut feeling is similar to
> >>> Dawid's.
> >>>>>>> Besides,
> >>>>>>>>> I feel yaml file would be a better solution to persist
> >>>> serializable
> >>>>>>> state
> >>>>>>>>> of an environment as the file itself is in serializable format
> >>>>>> already.
> >>>>>>>>> Though yaml file only serves SQL CLI at this moment, we may be
> >>>> able
> >>>>> to
> >>>>>>>>> extend its reach to Table API and allow users to load/offload
> >> a
> >>>>>>>>> TableEnvironment from/to yaml files, as something like
> >>>>>> "TableEnvironment
> >>>>>>>>> tEnv = TableEnvironment.loadFromYaml(<file_path>)" and
> >>>>>>>>> "tEnv.offloadToYaml(<file_path>)" to restore and persist
> >> state,
> >>>> and
> >>>>>> try
> >>>>>>> to
> >>>>>>>>> make yaml file more expressive.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Tue, Oct 1, 2019 at 6:47 AM Dawid Wysakowicz <
> >>>>>> [hidden email]
> >>>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> Hi Timo, Bowen,
> >>>>>>>>>>
> >>>>>>>>>> Unfortunately I did not have enough time to go through all
> >> the
> >>>>>>>>>> suggestions in details so I can not comment on the whole
> >> FLIP.
> >>>>>>>>>> I just wanted to give my opinion on the "descriptor approach
> >> in
> >>>>>>>>>> loadModule" part. I am not sure if we need it here. We might
> >> be
> >>>>>>>>>> overthinking this a bit. It definitely makes sense for
> >> objects
> >>>> like
> >>>>>>>>>> TableSource/TableSink etc. as they are logical definitions
> >> that
> >>>>>> nearly
> >>>>>>>>>> always have to be persisted in a Catalog. I'm not sure if we
> >>>> really
> >>>>>>> need
> >>>>>>>>>> the same for a whole session. If we need a resume session
> >>>> feature,
> >>>>>> the
> >>>>>>>>>> way to go would probably be to keep the session in memory on
> >>> the
> >>>>>> server
> >>>>>>>>>> side. I fear we will never be able to serialize the whole
> >>> session
> >>>>>>>>>> entirely (temporary objects, objects derived from DataStream
> >>>> etc.)
> >>>>>>>>>> I think it is ok to use instances for objects like Catalogs
> >> or
> >>>>>> Modules
> >>>>>>>>>> and have an overlay on top of that that can create instances
> >>> from
> >>>>>>>>>> properties.
> >>>>>>>>>>
> >>>>>>>>>> Best,
> >>>>>>>>>>
> >>>>>>>>>> Dawid
> >>>>>>>>>>
> >>>>>>>>>> On 01/10/2019 11:28, Timo Walther wrote:
> >>>>>>>>>>> Hi Bowen,
> >>>>>>>>>>>
> >>>>>>>>>>> thanks for your response.
> >>>>>>>>>>>
> >>>>>>>>>>> Re 2) I also don't have a better approach for this issue. It
> >>> is
> >>>>>>>>>>> similar to changing the general TableConfig between two
> >>>>> statements.
> >>>>>> It
> >>>>>>>>>>> would be good to add your explanation to the design
> >> document.
> >>>>>>>>>>> Re 3) It would be interesting to know about which "core"
> >>>> functions
> >>>>>> we
> >>>>>>>>>>> are actually talking about. Also for the overriding built-in
> >>>>>> functions
> >>>>>>>>>>> that we discussed in the other FLIP. But I'm fine with
> >> leaving
> >>>> it
> >>>>> to
> >>>>>>>>>>> the user for now. How about we just introduce loadModule(),
> >>>>>>>>>>> unloadModule() methods instead of useModules()? This would
> >>>> ensure
> >>>>>> that
> >>>>>>>>>>> users don't forget to add the core module when adding an
> >>>>> additional
> >>>>>>>>>>> module and they need to explicitly call
> >>> "unloadModule('core')".
> >>>>>>>>>>> Re 4) Every table environment feature should also be
> >> designed
> >>>> with
> >>>>>> SQL
> >>>>>>>>>>> statements in mind to verify the concept. SQL is also more
> >>>> popular
> >>>>>>>>>>> that Java/Scala API or YAML file. I would like to add it to
> >>> 1.10
> >>>>> for
> >>>>>>>>>>> marking the feature as complete.
> >>>>>>>>>>>
> >>>>>>>>>>> SHOW MODULES -> sounds good to me, we should add a
> >>>> listModules():
> >>>>>>>>>>> List<String> method to table environment
> >>>>>>>>>>>
> >>>>>>>>>>> LOAD MODULE 'hive' [WITH ('prop'='myProp', ...)] --> we
> >> should
> >>>>> add a
> >>>>>>>>>>> loadModule() method to table environment
> >>>>>>>>>>>
> >>>>>>>>>>> UNLOAD MODULE 'hive' --> we should add a unloadModule()
> >> method
> >>>> to
> >>>>>>>>>>> table environment
> >>>>>>>>>>>
> >>>>>>>>>>> I would not introduce `USE MODULES 'x' 'y' 'z'` for
> >> simplicity
> >>>> and
> >>>>>>>>>>> concise API. Users need to load the module anyway with
> >>>> properties.
> >>>>>>>>>>> They can also load them "in order" immediately. CREATE TABLE
> >>> can
> >>>>>> also
> >>>>>>>>>>> not create multiple tables but only one at a time in that
> >>> order.
> >>>>>>>>>>> One thing that came to my mind, shall we use a descriptor
> >>>> approach
> >>>>>> for
> >>>>>>>>>>> loadModule()? The past has shown that passing instances
> >> causes
> >>>>>>>>>>> problems when persisting objects. That's why we also want to
> >>> get
> >>>>> rid
> >>>>>>>>>>> of registerTableSource. I could image that users might want
> >> to
> >>>>>> persist
> >>>>>>>>>>> a table environment's state for later use in the future.
> >> Even
> >>>>> though
> >>>>>>>>>>> this is future work, we should already keep such use cases
> >> in
> >>>> mind
> >>>>>>>>>>> when adding new API methods. What do you think?
> >>>>>>>>>>>
> >>>>>>>>>>> Regards,
> >>>>>>>>>>> Timo
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> On 30.09.19 23:17, Bowen Li wrote:
> >>>>>>>>>>>> Hi Timo,
> >>>>>>>>>>>>
> >>>>>>>>>>>> Re 1) I agree. I renamed the title to "Extend Core Table
> >>> System
> >>>>>> with
> >>>>>>>>>>>> Pluggable Modules" and all internal references
> >>>>>>>>>>>>
> >>>>>>>>>>>> Re 2) First, I'll rename the API to useModules(). The
> >> design
> >>>>>> doesn't
> >>>>>>>>>>>> forbid
> >>>>>>>>>>>> users to call useModules() multi times. Objects in modules
> >>> are
> >>>>>> loaded
> >>>>>>>>>> on
> >>>>>>>>>>>> demand instead of eagerly, so there won't be inconsistency.
> >>>> Users
> >>>>>>>>>>>> have to
> >>>>>>>>>>>> be fully aware of the consequences of resetting modules as
> >>> that
> >>>>>> might
> >>>>>>>>>>>> cause
> >>>>>>>>>>>> that some objects can not be referenced anymore or
> >> resolution
> >>>>> order
> >>>>>>>>>>>> of some
> >>>>>>>>>>>> objects changes.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Re 3) Yes, we'd leave that to users.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Another approach can be to have a non-optional "Core"
> >> module
> >>>> for
> >>>>>> all
> >>>>>>>>>>>> objects that cannot be overrode like "CAST" and "AS"
> >>> functions,
> >>>>> and
> >>>>>>>>>>>> have an
> >>>>>>>>>>>> optional "ExtendedCore" module for other replaceable
> >> built-in
> >>>>>>> objects.
> >>>>>>>>>>>> "Core" should be positioned at the 1st in module list all
> >> the
> >>>>> time.
> >>>>>>>>>>>> I'm fine with either solution.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Re 4) It may sound like a nice-to-have advanced feature for
> >>>> 1.10,
> >>>>>> but
> >>>>>>>>>> we
> >>>>>>>>>>>> can surely fully discuss it for the sake of feature
> >>>> completeness.
> >>>>>>>>>>>> Unlike other configs, the order of modules would matter in
> >>>> Flink,
> >>>>>> and
> >>>>>>>>>> it
> >>>>>>>>>>>> implies the LOAD/UNLOAD commands would not be equal in
> >>>> operation
> >>>>>>>>>>>> positions.
> >>>>>>>>>>>> IIUYC, LOAD MODULE 'x' would be interpreted as appending x
> >> to
> >>>> the
> >>>>>> end
> >>>>>>>>>> of
> >>>>>>>>>>>> module list, and UNLOAD MODULE 'x' would be interpreted as
> >>>>>> removing x
> >>>>>>>>>>>> from
> >>>>>>>>>>>> any position in the list?
> >>>>>>>>>>>>
> >>>>>>>>>>>> I'm thinking of the following list of commands:
> >>>>>>>>>>>>
> >>>>>>>>>>>> SHOW MODULES - list modules in order
> >>>>>>>>>>>> LOAD MODULE 'hive' [WITH ('prop'='myProp', ...)] - load and
> >>>>> append
> >>>>>>> the
> >>>>>>>>>>>> module to end of the module list
> >>>>>>>>>>>> UNLOAD MODULE 'hive' - remove the module from module list,
> >>> and
> >>>>>> other
> >>>>>>>>>>>> modules remain the same relative positions
> >>>>>>>>>>>> USE MODULES 'x' 'y' 'z' (wondering can parser take "'x' 'y'
> >>>>> 'z'"?),
> >>>>>>>>>>>> or USE
> >>>>>>>>>>>> MODULES 'x,y,z' - to reorder module list completely
> >>>>>>>>>>>>
> >>>>>>>
> >>>>>> --
> >>>>>> Xuefu Zhang
> >>>>>>
> >>>>>> "In Honey We Trust!"
> >>>>>>
> >>
> >> --
> >> Xuefu Zhang
> >>
> >> "In Honey We Trust!"
> >>
>
>

Timo Walther-2

Re: [DISCUSS] FLIP-68: Extend Core Table System with Modular Plugins

Hi Jark,

restricting one module instance per kind sounds good to me. Modules can
implement hashCode/equals and we can perform the check you metioned. The
equals() method can determine what "same kind" means.

Correct me if I'm wrong, but we wanted to perform a big properties
rework soonish anyway, no? Then we can also change `connector.type` to
`connector.kind`?

In Calcite they distinguish between `type` and `kind`:

SqlOperator(
       String name,
       SqlKind kind,
       int leftPrecedence,
       int rightPrecedence,
       SqlReturnTypeInference returnTypeInference,
       SqlOperandTypeInference operandTypeInference,
       SqlOperandTypeChecker operandTypeChecker

In Flink 1.9, we also started with
org.apache.flink.table.functions.FunctionDefinition#getKind because type
is just heavily overloaded with different meanings. Logical type, data
type, connector type, etc.

Regards,
Timo

On 10.10.19 14:40, Jark Wu wrote:

> Hi Timo,
>
> Thanks for the explanation, it makes sense to me.
> So at least, we can have a validation to restrict one module instance per
> type in the first version.
>
> Regarding to "type" vs "kind", could we use "datatype" keyword to refer
> data types exclusively in the future?
> This can avoid the conflict/confusion when we use "type" here.
> The concern of using "kind" is that it is inconsistent with other
> descriptor properties.
> We have heavily used "type" in properties, including `connector.type`,
> `format.type`, `catalog.type`, etc...
> Are we planning to change them all ?
>
> Best,
> Jark
>
> On Thu, 10 Oct 2019 at 19:56, Timo Walther <[hidden email]> wrote:
>
>> Hi Jark,
>>
>> we had a long offline discussion yesterday where we considered all
>> options again. The reasons why we decided for the updated design that
>> Bowen suggested:
>>
>> - Both Dawid and Xuefu argued that in the old design "kind" has binary
>> meanings. I agree here.
>> - Compared to other SQL concepts such as tables/functions/catalogs, a
>> "name" is never part of the object itself but always specified when
>> registering the object. We should have the same behavior for modules
>> because of consistency reasons. It also makes the "name" explicit when
>> it comes to unloading the module compared to the previous design.
>> - Regarding, "How to solve the class conflict problem?" this is an
>> orthogonal issue that will be fixed in future versions once we use
>> Flink's new plugin architecture. If a module is parameterizable, it is
>> the responsibility of the module to prevent class conflicts. If the Hive
>> classes are hidden in a module classloader, it should be possible to
>> load both hive121 and hive234. But in general this problem is unsolved
>> for now, also Kafka tables could clash if you read from two Kafka
>> clusters with different versions.
>>
>> Regards,
>> Timo
>>
>>
>> On 10.10.19 08:01, Jark Wu wrote:
>>> Hi Xuefu,
>>>
>>> If there is only one instance per type, then what's the "name" used for?
>>> Could we remove it and only keep "type" or "kind" to identify modules?
>>>
>>> Best,
>>> Jark
>>>
>>> On Thu, 10 Oct 2019 at 11:21, Xuefu Z <[hidden email]> wrote:
>>>
>>>> Jark has a good point. However, I think validation logic can put in
>> place
>>>> to restrict one instance per type. Maybe the doc needs to be specific on
>>>> this.
>>>>
>>>> Thanks,
>>>> Xuefu
>>>>
>>>> On Wed, Oct 9, 2019 at 7:41 PM Jark Wu <[hidden email]> wrote:
>>>>
>>>>> Thanks Bowen for the updating.
>>>>>
>>>>> I have some different opinions on the change.
>>>>> IIUC, in the previous design, the "name" is also the "id" or "type" to
>>>>> identify which module to load. Which means we can only load one
>> instance
>>>> of
>>>>> a module.
>>>>> In the new design, the "name" is just an alias to the module instance,
>>>> the
>>>>> "kind" is used to identify modules. Which means we can load different
>>>>> instances of a module.
>>>>> However, what's the "name" or alias used for? Do we need to support
>>>> loading
>>>>> different instances of a module? From my point of view, it brings more
>>>>> complexity and confusion.
>>>>> For example, if we load a "hive121" which uses HiveModule with version
>>>>> 1.2.1 and load a "hive234" which uses HiveModule with version 2.3.4,
>> then
>>>>> how to solve the class conflict problem?
>>>>>
>>>>> IMO, a module can only be load once in a session, so "name" maybe
>>>> useless.
>>>>> So my proposal is similar to the previous one, but only change "name"
>> to
>>>>> "kind".
>>>>>
>>>>> SQL:
>>>>> LOAD MODULE "kind" [WITH (properties)];
>>>>> UNLOAD MODULE "kind";
>>>>> Table:
>>>>> tEnv.loadModule("kind" [, properties]);
>>>>> tEnv.unloadModule("kind");
>>>>>
>>>>> What do you think?
>>>>>
>>>>>
>>>>> Best,
>>>>> Jark
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Wed, 9 Oct 2019 at 20:38, Bowen Li <[hidden email]> wrote:
>>>>>
>>>>>> Thanks everyone for your review.
>>>>>>
>>>>>> After discussing with Timo and Dawid offline, as well as incorporating
>>>>>> feedback from Xuefu and Jark on mailing list, I decided to make a few
>>>>>> critical changes to the proposal.
>>>>>>
>>>>>> - renamed the keyword "type" to "kind". The community has plan to have
>>>>>> "type" keyword in yaml/descriptor refer to data types exclusively in
>>>> the
>>>>>> near future. We should cater to that change in our design
>>>>>> - allowed specifying names for modules to simplify and unify module
>>>>>> loading/unloading syntax between programming and SQL. Here're the
>>>>> proposed
>>>>>> changes:
>>>>>> SQL:
>>>>>> LOAD MODULE "name" WITH ("kind"="xxx" [, (properties)])
>>>>>> UNLOAD MODULE "name";
>>>>>> Table:
>>>>>> tEnv.loadModule("name", new Xxx(properties));
>>>>>> tEnv.unloadModule("name");
>>>>>>
>>>>>> I have completely updated the google doc [1]. Please take another
>> look,
>>>>> and
>>>>>> let me know if you have any other questions. Thanks!
>>>>>>
>>>>>> [1]
>>>>>>
>>>>>>
>> https://docs.google.com/document/d/17CPMpMbPDjvM4selUVEfh_tqUK_oV0TODAUA9dfHakc/edit#
>>>>>> On Tue, Oct 8, 2019 at 6:26 AM Jark Wu <[hidden email]> wrote:
>>>>>>
>>>>>>> Hi Bowen,
>>>>>>>
>>>>>>> Thanks for the proposal. I have two thoughts:
>>>>>>>
>>>>>>> 1) Regarding to "loadModule", how about
>>>>>>> tableEnv.loadModule("xxx" [, propertiesMap]);
>>>>>>> tableEnv.unloadModule(“xxx”);
>>>>>>>
>>>>>>> This makes the API similar to SQL. IMO, instance of Module is not
>>>>> needed
>>>>>>> and verbose as parameter.
>>>>>>> And this makes it easier to load a simple module without any
>>>> additional
>>>>>>> properties, e.g. tEnv.loadModule("GEO"), tEnv.unloadModule("GEO")
>>>>>>>
>>>>>>> 2) In current design, the module interface only defines function
>>>>>> metadata,
>>>>>>> but no implementations.
>>>>>>> I'm wondering how to call/map the implementations in runtime? Am I
>>>>>> missing
>>>>>>> something?
>>>>>>>
>>>>>>> Besides, I left some minor comments in the doc.
>>>>>>>
>>>>>>> Best,
>>>>>>> Jark
>>>>>>>
>>>>>>>
>>>>>>> On Sat, 5 Oct 2019 at 08:42, Xuefu Z <[hidden email]> wrote:
>>>>>>>
>>>>>>>> I agree with Timo that the new table APIs need to be consistent.
>>>> I'd
>>>>> go
>>>>>>>> further that an name (or id) is needed for module definition in
>>>> YAML
>>>>>>> file.
>>>>>>>> In the current design, name is skipped and type has binary
>>>> meanings.
>>>>>>>> Thanks,
>>>>>>>> Xuefu
>>>>>>>>
>>>>>>>> On Fri, Oct 4, 2019 at 5:24 AM Timo Walther <[hidden email]>
>>>>>> wrote:
>>>>>>>>> Hi everyone,
>>>>>>>>>
>>>>>>>>> first, I was also questioning my proposal. But Bowen's proposal
>>>> of
>>>>>>>>> `tEnv.offloadToYaml(<file_path>)` would not work with the current
>>>>>>> design
>>>>>>>>> because we don't know how to serialize a catalog or module into
>>>>>>>>> properties. Currently, there is no converter from instance to
>>>>>>>>> properties. It is a one way conversion. We can add a
>>>> `toProperties`
>>>>>>>>> method to both Catalog and Module class in the future to solve
>>>>> this.
>>>>>>>>> Solving the table environment serializability can be future work.
>>>>>>>>>
>>>>>>>>> However, I find the current proposal for the TableEnvironment
>>>>> methods
>>>>>>> is
>>>>>>>>> contradicting:
>>>>>>>>>
>>>>>>>>> tableEnv.loadModule(new Yyy());
>>>>>>>>> tableEnv.unloadModule(“Xxx”);
>>>>>>>>>
>>>>>>>>> The loading is specified programmatically whereas the unloading
>>>>>>> requires
>>>>>>>>> a string that is not specified in the module itself. But is
>>>> defined
>>>>>> in
>>>>>>>>> the factory according to the current design.
>>>>>>>>>
>>>>>>>>> SQL does it more consistently. There, the name `xxx` is used when
>>>>>>>>> loading and unloading the module:
>>>>>>>>>
>>>>>>>>> LOAD MODULE 'xxx' [WITH ('prop'='myProp', ...)]
>>>>>>>>> UNLOAD MODULE 'xxx’
>>>>>>>>>
>>>>>>>>> How about:
>>>>>>>>>
>>>>>>>>> tableEnv.loadModule("xxx", new Yyy());
>>>>>>>>> tableEnv.unloadModule(“xxx”);
>>>>>>>>>
>>>>>>>>> This would be similar to the catalog interfaces. The name is not
>>>>> part
>>>>>>> of
>>>>>>>>> the instance itself.
>>>>>>>>>
>>>>>>>>> What do you think?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Timo
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 01.10.19 21:17, Bowen Li wrote:
>>>>>>>>>> If something like the yaml file is the way to go and achieve
>>>> such
>>>>>>>>>> motivation, we would cover that with current design.
>>>>>>>>>>
>>>>>>>>>> On Tue, Oct 1, 2019 at 12:05 Bowen Li <[hidden email]>
>>>>> wrote:
>>>>>>>>>>> Hi Timo, Dawid,
>>>>>>>>>>>
>>>>>>>>>>> I've added the suggested SQL and related changes to
>>>>>> TableEnvironment
>>>>>>>> API
>>>>>>>>>>> and other classes to the google doc. Also removed "USE MODULE"
>>>>> and
>>>>>>> its
>>>>>>>>>>> APIs. Will update FLIP wiki once we have a consensus.
>>>>>>>>>>>
>>>>>>>>>>> W.r.t. descriptor approach, my gut feeling is similar to
>>>>> Dawid's.
>>>>>>>>> Besides,
>>>>>>>>>>> I feel yaml file would be a better solution to persist
>>>>>> serializable
>>>>>>>>> state
>>>>>>>>>>> of an environment as the file itself is in serializable format
>>>>>>>> already.
>>>>>>>>>>> Though yaml file only serves SQL CLI at this moment, we may be
>>>>>> able
>>>>>>> to
>>>>>>>>>>> extend its reach to Table API and allow users to load/offload
>>>> a
>>>>>>>>>>> TableEnvironment from/to yaml files, as something like
>>>>>>>> "TableEnvironment
>>>>>>>>>>> tEnv = TableEnvironment.loadFromYaml(<file_path>)" and
>>>>>>>>>>> "tEnv.offloadToYaml(<file_path>)" to restore and persist
>>>> state,
>>>>>> and
>>>>>>>> try
>>>>>>>>> to
>>>>>>>>>>> make yaml file more expressive.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Oct 1, 2019 at 6:47 AM Dawid Wysakowicz <
>>>>>>>> [hidden email]
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Timo, Bowen,
>>>>>>>>>>>>
>>>>>>>>>>>> Unfortunately I did not have enough time to go through all
>>>> the
>>>>>>>>>>>> suggestions in details so I can not comment on the whole
>>>> FLIP.
>>>>>>>>>>>> I just wanted to give my opinion on the "descriptor approach
>>>> in
>>>>>>>>>>>> loadModule" part. I am not sure if we need it here. We might
>>>> be
>>>>>>>>>>>> overthinking this a bit. It definitely makes sense for
>>>> objects
>>>>>> like
>>>>>>>>>>>> TableSource/TableSink etc. as they are logical definitions
>>>> that
>>>>>>>> nearly
>>>>>>>>>>>> always have to be persisted in a Catalog. I'm not sure if we
>>>>>> really
>>>>>>>>> need
>>>>>>>>>>>> the same for a whole session. If we need a resume session
>>>>>> feature,
>>>>>>>> the
>>>>>>>>>>>> way to go would probably be to keep the session in memory on
>>>>> the
>>>>>>>> server
>>>>>>>>>>>> side. I fear we will never be able to serialize the whole
>>>>> session
>>>>>>>>>>>> entirely (temporary objects, objects derived from DataStream
>>>>>> etc.)
>>>>>>>>>>>> I think it is ok to use instances for objects like Catalogs
>>>> or
>>>>>>>> Modules
>>>>>>>>>>>> and have an overlay on top of that that can create instances
>>>>> from
>>>>>>>>>>>> properties.
>>>>>>>>>>>>
>>>>>>>>>>>> Best,
>>>>>>>>>>>>
>>>>>>>>>>>> Dawid
>>>>>>>>>>>>
>>>>>>>>>>>> On 01/10/2019 11:28, Timo Walther wrote:
>>>>>>>>>>>>> Hi Bowen,
>>>>>>>>>>>>>
>>>>>>>>>>>>> thanks for your response.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Re 2) I also don't have a better approach for this issue. It
>>>>> is
>>>>>>>>>>>>> similar to changing the general TableConfig between two
>>>>>>> statements.
>>>>>>>> It
>>>>>>>>>>>>> would be good to add your explanation to the design
>>>> document.
>>>>>>>>>>>>> Re 3) It would be interesting to know about which "core"
>>>>>> functions
>>>>>>>> we
>>>>>>>>>>>>> are actually talking about. Also for the overriding built-in
>>>>>>>> functions
>>>>>>>>>>>>> that we discussed in the other FLIP. But I'm fine with
>>>> leaving
>>>>>> it
>>>>>>> to
>>>>>>>>>>>>> the user for now. How about we just introduce loadModule(),
>>>>>>>>>>>>> unloadModule() methods instead of useModules()? This would
>>>>>> ensure
>>>>>>>> that
>>>>>>>>>>>>> users don't forget to add the core module when adding an
>>>>>>> additional
>>>>>>>>>>>>> module and they need to explicitly call
>>>>> "unloadModule('core')".
>>>>>>>>>>>>> Re 4) Every table environment feature should also be
>>>> designed
>>>>>> with
>>>>>>>> SQL
>>>>>>>>>>>>> statements in mind to verify the concept. SQL is also more
>>>>>> popular
>>>>>>>>>>>>> that Java/Scala API or YAML file. I would like to add it to
>>>>> 1.10
>>>>>>> for
>>>>>>>>>>>>> marking the feature as complete.
>>>>>>>>>>>>>
>>>>>>>>>>>>> SHOW MODULES -> sounds good to me, we should add a
>>>>>> listModules():
>>>>>>>>>>>>> List<String> method to table environment
>>>>>>>>>>>>>
>>>>>>>>>>>>> LOAD MODULE 'hive' [WITH ('prop'='myProp', ...)] --> we
>>>> should
>>>>>>> add a
>>>>>>>>>>>>> loadModule() method to table environment
>>>>>>>>>>>>>
>>>>>>>>>>>>> UNLOAD MODULE 'hive' --> we should add a unloadModule()
>>>> method
>>>>>> to
>>>>>>>>>>>>> table environment
>>>>>>>>>>>>>
>>>>>>>>>>>>> I would not introduce `USE MODULES 'x' 'y' 'z'` for
>>>> simplicity
>>>>>> and
>>>>>>>>>>>>> concise API. Users need to load the module anyway with
>>>>>> properties.
>>>>>>>>>>>>> They can also load them "in order" immediately. CREATE TABLE
>>>>> can
>>>>>>>> also
>>>>>>>>>>>>> not create multiple tables but only one at a time in that
>>>>> order.
>>>>>>>>>>>>> One thing that came to my mind, shall we use a descriptor
>>>>>> approach
>>>>>>>> for
>>>>>>>>>>>>> loadModule()? The past has shown that passing instances
>>>> causes
>>>>>>>>>>>>> problems when persisting objects. That's why we also want to
>>>>> get
>>>>>>> rid
>>>>>>>>>>>>> of registerTableSource. I could image that users might want
>>>> to
>>>>>>>> persist
>>>>>>>>>>>>> a table environment's state for later use in the future.
>>>> Even
>>>>>>> though
>>>>>>>>>>>>> this is future work, we should already keep such use cases
>>>> in
>>>>>> mind
>>>>>>>>>>>>> when adding new API methods. What do you think?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>> Timo
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 30.09.19 23:17, Bowen Li wrote:
>>>>>>>>>>>>>> Hi Timo,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Re 1) I agree. I renamed the title to "Extend Core Table
>>>>> System
>>>>>>>> with
>>>>>>>>>>>>>> Pluggable Modules" and all internal references
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Re 2) First, I'll rename the API to useModules(). The
>>>> design
>>>>>>>> doesn't
>>>>>>>>>>>>>> forbid
>>>>>>>>>>>>>> users to call useModules() multi times. Objects in modules
>>>>> are
>>>>>>>> loaded
>>>>>>>>>>>> on
>>>>>>>>>>>>>> demand instead of eagerly, so there won't be inconsistency.
>>>>>> Users
>>>>>>>>>>>>>> have to
>>>>>>>>>>>>>> be fully aware of the consequences of resetting modules as
>>>>> that
>>>>>>>> might
>>>>>>>>>>>>>> cause
>>>>>>>>>>>>>> that some objects can not be referenced anymore or
>>>> resolution
>>>>>>> order
>>>>>>>>>>>>>> of some
>>>>>>>>>>>>>> objects changes.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Re 3) Yes, we'd leave that to users.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Another approach can be to have a non-optional "Core"
>>>> module
>>>>>> for
>>>>>>>> all
>>>>>>>>>>>>>> objects that cannot be overrode like "CAST" and "AS"
>>>>> functions,
>>>>>>> and
>>>>>>>>>>>>>> have an
>>>>>>>>>>>>>> optional "ExtendedCore" module for other replaceable
>>>> built-in
>>>>>>>>> objects.
>>>>>>>>>>>>>> "Core" should be positioned at the 1st in module list all
>>>> the
>>>>>>> time.
>>>>>>>>>>>>>> I'm fine with either solution.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Re 4) It may sound like a nice-to-have advanced feature for
>>>>>> 1.10,
>>>>>>>> but
>>>>>>>>>>>> we
>>>>>>>>>>>>>> can surely fully discuss it for the sake of feature
>>>>>> completeness.
>>>>>>>>>>>>>> Unlike other configs, the order of modules would matter in
>>>>>> Flink,
>>>>>>>> and
>>>>>>>>>>>> it
>>>>>>>>>>>>>> implies the LOAD/UNLOAD commands would not be equal in
>>>>>> operation
>>>>>>>>>>>>>> positions.
>>>>>>>>>>>>>> IIUYC, LOAD MODULE 'x' would be interpreted as appending x
>>>> to
>>>>>> the
>>>>>>>> end
>>>>>>>>>>>> of
>>>>>>>>>>>>>> module list, and UNLOAD MODULE 'x' would be interpreted as
>>>>>>>> removing x
>>>>>>>>>>>>>> from
>>>>>>>>>>>>>> any position in the list?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I'm thinking of the following list of commands:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> SHOW MODULES - list modules in order
>>>>>>>>>>>>>> LOAD MODULE 'hive' [WITH ('prop'='myProp', ...)] - load and
>>>>>>> append
>>>>>>>>> the
>>>>>>>>>>>>>> module to end of the module list
>>>>>>>>>>>>>> UNLOAD MODULE 'hive' - remove the module from module list,
>>>>> and
>>>>>>>> other
>>>>>>>>>>>>>> modules remain the same relative positions
>>>>>>>>>>>>>> USE MODULES 'x' 'y' 'z' (wondering can parser take "'x' 'y'
>>>>>>> 'z'"?),
>>>>>>>>>>>>>> or USE
>>>>>>>>>>>>>> MODULES 'x,y,z' - to reorder module list completely
>>>>>>>>>>>>>>
>>>>>>>> --
>>>>>>>> Xuefu Zhang
>>>>>>>>
>>>>>>>> "In Honey We Trust!"
>>>>>>>>
>>>> --
>>>> Xuefu Zhang
>>>>
>>>> "In Honey We Trust!"
>>>>
>>

Jark Wu-2

Re: [DISCUSS] FLIP-68: Extend Core Table System with Modular Plugins

Hi Timo,

I agree that we are going to rework properties soon.
But we may come up with a better name or a better way than "kind" when the
proposal is started and more people involved.
On the other hand, reworking properties should be a compatible way.
So I think it's fine to use "type" for now (keep consistent with others)
and change them all together when refactoring in the near future.

Regards,
Jark

On Thu, 10 Oct 2019 at 21:12, Timo Walther <[hidden email]> wrote:

> Hi Jark,
>
> restricting one module instance per kind sounds good to me. Modules can
> implement hashCode/equals and we can perform the check you metioned. The
> equals() method can determine what "same kind" means.
>
> Correct me if I'm wrong, but we wanted to perform a big properties
> rework soonish anyway, no? Then we can also change `connector.type` to
> `connector.kind`?
>
> In Calcite they distinguish between `type` and `kind`:
>
> SqlOperator(
> String name,
> SqlKind kind,
> int leftPrecedence,
> int rightPrecedence,
> SqlReturnTypeInference returnTypeInference,
> SqlOperandTypeInference operandTypeInference,
> SqlOperandTypeChecker operandTypeChecker
>
> In Flink 1.9, we also started with
> org.apache.flink.table.functions.FunctionDefinition#getKind because type
> is just heavily overloaded with different meanings. Logical type, data
> type, connector type, etc.
>
> Regards,
> Timo
>
>
> On 10.10.19 14:40, Jark Wu wrote:
> > Hi Timo,
> >
> > Thanks for the explanation, it makes sense to me.
> > So at least, we can have a validation to restrict one module instance per
> > type in the first version.
> >
> > Regarding to "type" vs "kind", could we use "datatype" keyword to refer
> > data types exclusively in the future?
> > This can avoid the conflict/confusion when we use "type" here.
> > The concern of using "kind" is that it is inconsistent with other
> > descriptor properties.
> > We have heavily used "type" in properties, including `connector.type`,
> > `format.type`, `catalog.type`, etc...
> > Are we planning to change them all ?
> >
> > Best,
> > Jark
> >
> > On Thu, 10 Oct 2019 at 19:56, Timo Walther <[hidden email]> wrote:
> >
> >> Hi Jark,
> >>
> >> we had a long offline discussion yesterday where we considered all
> >> options again. The reasons why we decided for the updated design that
> >> Bowen suggested:
> >>
> >> - Both Dawid and Xuefu argued that in the old design "kind" has binary
> >> meanings. I agree here.
> >> - Compared to other SQL concepts such as tables/functions/catalogs, a
> >> "name" is never part of the object itself but always specified when
> >> registering the object. We should have the same behavior for modules
> >> because of consistency reasons. It also makes the "name" explicit when
> >> it comes to unloading the module compared to the previous design.
> >> - Regarding, "How to solve the class conflict problem?" this is an
> >> orthogonal issue that will be fixed in future versions once we use
> >> Flink's new plugin architecture. If a module is parameterizable, it is
> >> the responsibility of the module to prevent class conflicts. If the Hive
> >> classes are hidden in a module classloader, it should be possible to
> >> load both hive121 and hive234. But in general this problem is unsolved
> >> for now, also Kafka tables could clash if you read from two Kafka
> >> clusters with different versions.
> >>
> >> Regards,
> >> Timo
> >>
> >>
> >> On 10.10.19 08:01, Jark Wu wrote:
> >>> Hi Xuefu,
> >>>
> >>> If there is only one instance per type, then what's the "name" used
> for?
> >>> Could we remove it and only keep "type" or "kind" to identify modules?
> >>>
> >>> Best,
> >>> Jark
> >>>
> >>> On Thu, 10 Oct 2019 at 11:21, Xuefu Z <[hidden email]> wrote:
> >>>
> >>>> Jark has a good point. However, I think validation logic can put in
> >> place
> >>>> to restrict one instance per type. Maybe the doc needs to be specific
> on
> >>>> this.
> >>>>
> >>>> Thanks,
> >>>> Xuefu
> >>>>
> >>>> On Wed, Oct 9, 2019 at 7:41 PM Jark Wu <[hidden email]> wrote:
> >>>>
> >>>>> Thanks Bowen for the updating.
> >>>>>
> >>>>> I have some different opinions on the change.
> >>>>> IIUC, in the previous design, the "name" is also the "id" or "type"
> to
> >>>>> identify which module to load. Which means we can only load one
> >> instance
> >>>> of
> >>>>> a module.
> >>>>> In the new design, the "name" is just an alias to the module
> instance,
> >>>> the
> >>>>> "kind" is used to identify modules. Which means we can load different
> >>>>> instances of a module.
> >>>>> However, what's the "name" or alias used for? Do we need to support
> >>>> loading
> >>>>> different instances of a module? From my point of view, it brings
> more
> >>>>> complexity and confusion.
> >>>>> For example, if we load a "hive121" which uses HiveModule with
> version
> >>>>> 1.2.1 and load a "hive234" which uses HiveModule with version 2.3.4,
> >> then
> >>>>> how to solve the class conflict problem?
> >>>>>
> >>>>> IMO, a module can only be load once in a session, so "name" maybe
> >>>> useless.
> >>>>> So my proposal is similar to the previous one, but only change "name"
> >> to
> >>>>> "kind".
> >>>>>
> >>>>> SQL:
> >>>>> LOAD MODULE "kind" [WITH (properties)];
> >>>>> UNLOAD MODULE "kind";
> >>>>> Table:
> >>>>> tEnv.loadModule("kind" [, properties]);
> >>>>> tEnv.unloadModule("kind");
> >>>>>
> >>>>> What do you think?
> >>>>>
> >>>>>
> >>>>> Best,
> >>>>> Jark
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Wed, 9 Oct 2019 at 20:38, Bowen Li <[hidden email]> wrote:
> >>>>>
> >>>>>> Thanks everyone for your review.
> >>>>>>
> >>>>>> After discussing with Timo and Dawid offline, as well as
> incorporating
> >>>>>> feedback from Xuefu and Jark on mailing list, I decided to make a
> few
> >>>>>> critical changes to the proposal.
> >>>>>>
> >>>>>> - renamed the keyword "type" to "kind". The community has plan to
> have
> >>>>>> "type" keyword in yaml/descriptor refer to data types exclusively in
> >>>> the
> >>>>>> near future. We should cater to that change in our design
> >>>>>> - allowed specifying names for modules to simplify and unify module
> >>>>>> loading/unloading syntax between programming and SQL. Here're the
> >>>>> proposed
> >>>>>> changes:
> >>>>>> SQL:
> >>>>>> LOAD MODULE "name" WITH ("kind"="xxx" [, (properties)])
> >>>>>> UNLOAD MODULE "name";
> >>>>>> Table:
> >>>>>> tEnv.loadModule("name", new Xxx(properties));
> >>>>>> tEnv.unloadModule("name");
> >>>>>>
> >>>>>> I have completely updated the google doc [1]. Please take another
> >> look,
> >>>>> and
> >>>>>> let me know if you have any other questions. Thanks!
> >>>>>>
> >>>>>> [1]
> >>>>>>
> >>>>>>
> >>
> https://docs.google.com/document/d/17CPMpMbPDjvM4selUVEfh_tqUK_oV0TODAUA9dfHakc/edit#
> >>>>>> On Tue, Oct 8, 2019 at 6:26 AM Jark Wu <[hidden email]> wrote:
> >>>>>>
> >>>>>>> Hi Bowen,
> >>>>>>>
> >>>>>>> Thanks for the proposal. I have two thoughts:
> >>>>>>>
> >>>>>>> 1) Regarding to "loadModule", how about
> >>>>>>> tableEnv.loadModule("xxx" [, propertiesMap]);
> >>>>>>> tableEnv.unloadModule(“xxx”);
> >>>>>>>
> >>>>>>> This makes the API similar to SQL. IMO, instance of Module is not
> >>>>> needed
> >>>>>>> and verbose as parameter.
> >>>>>>> And this makes it easier to load a simple module without any
> >>>> additional
> >>>>>>> properties, e.g. tEnv.loadModule("GEO"), tEnv.unloadModule("GEO")
> >>>>>>>
> >>>>>>> 2) In current design, the module interface only defines function
> >>>>>> metadata,
> >>>>>>> but no implementations.
> >>>>>>> I'm wondering how to call/map the implementations in runtime? Am I
> >>>>>> missing
> >>>>>>> something?
> >>>>>>>
> >>>>>>> Besides, I left some minor comments in the doc.
> >>>>>>>
> >>>>>>> Best,
> >>>>>>> Jark
> >>>>>>>
> >>>>>>>
> >>>>>>> On Sat, 5 Oct 2019 at 08:42, Xuefu Z <[hidden email]> wrote:
> >>>>>>>
> >>>>>>>> I agree with Timo that the new table APIs need to be consistent.
> >>>> I'd
> >>>>> go
> >>>>>>>> further that an name (or id) is needed for module definition in
> >>>> YAML
> >>>>>>> file.
> >>>>>>>> In the current design, name is skipped and type has binary
> >>>> meanings.
> >>>>>>>> Thanks,
> >>>>>>>> Xuefu
> >>>>>>>>
> >>>>>>>> On Fri, Oct 4, 2019 at 5:24 AM Timo Walther <[hidden email]>
> >>>>>> wrote:
> >>>>>>>>> Hi everyone,
> >>>>>>>>>
> >>>>>>>>> first, I was also questioning my proposal. But Bowen's proposal
> >>>> of
> >>>>>>>>> `tEnv.offloadToYaml(<file_path>)` would not work with the current
> >>>>>>> design
> >>>>>>>>> because we don't know how to serialize a catalog or module into
> >>>>>>>>> properties. Currently, there is no converter from instance to
> >>>>>>>>> properties. It is a one way conversion. We can add a
> >>>> `toProperties`
> >>>>>>>>> method to both Catalog and Module class in the future to solve
> >>>>> this.
> >>>>>>>>> Solving the table environment serializability can be future work.
> >>>>>>>>>
> >>>>>>>>> However, I find the current proposal for the TableEnvironment
> >>>>> methods
> >>>>>>> is
> >>>>>>>>> contradicting:
> >>>>>>>>>
> >>>>>>>>> tableEnv.loadModule(new Yyy());
> >>>>>>>>> tableEnv.unloadModule(“Xxx”);
> >>>>>>>>>
> >>>>>>>>> The loading is specified programmatically whereas the unloading
> >>>>>>> requires
> >>>>>>>>> a string that is not specified in the module itself. But is
> >>>> defined
> >>>>>> in
> >>>>>>>>> the factory according to the current design.
> >>>>>>>>>
> >>>>>>>>> SQL does it more consistently. There, the name `xxx` is used when
> >>>>>>>>> loading and unloading the module:
> >>>>>>>>>
> >>>>>>>>> LOAD MODULE 'xxx' [WITH ('prop'='myProp', ...)]
> >>>>>>>>> UNLOAD MODULE 'xxx’
> >>>>>>>>>
> >>>>>>>>> How about:
> >>>>>>>>>
> >>>>>>>>> tableEnv.loadModule("xxx", new Yyy());
> >>>>>>>>> tableEnv.unloadModule(“xxx”);
> >>>>>>>>>
> >>>>>>>>> This would be similar to the catalog interfaces. The name is not
> >>>>> part
> >>>>>>> of
> >>>>>>>>> the instance itself.
> >>>>>>>>>
> >>>>>>>>> What do you think?
> >>>>>>>>>
> >>>>>>>>> Thanks,
> >>>>>>>>> Timo
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On 01.10.19 21:17, Bowen Li wrote:
> >>>>>>>>>> If something like the yaml file is the way to go and achieve
> >>>> such
> >>>>>>>>>> motivation, we would cover that with current design.
> >>>>>>>>>>
> >>>>>>>>>> On Tue, Oct 1, 2019 at 12:05 Bowen Li <[hidden email]>
> >>>>> wrote:
> >>>>>>>>>>> Hi Timo, Dawid,
> >>>>>>>>>>>
> >>>>>>>>>>> I've added the suggested SQL and related changes to
> >>>>>> TableEnvironment
> >>>>>>>> API
> >>>>>>>>>>> and other classes to the google doc. Also removed "USE MODULE"
> >>>>> and
> >>>>>>> its
> >>>>>>>>>>> APIs. Will update FLIP wiki once we have a consensus.
> >>>>>>>>>>>
> >>>>>>>>>>> W.r.t. descriptor approach, my gut feeling is similar to
> >>>>> Dawid's.
> >>>>>>>>> Besides,
> >>>>>>>>>>> I feel yaml file would be a better solution to persist
> >>>>>> serializable
> >>>>>>>>> state
> >>>>>>>>>>> of an environment as the file itself is in serializable format
> >>>>>>>> already.
> >>>>>>>>>>> Though yaml file only serves SQL CLI at this moment, we may be
> >>>>>> able
> >>>>>>> to
> >>>>>>>>>>> extend its reach to Table API and allow users to load/offload
> >>>> a
> >>>>>>>>>>> TableEnvironment from/to yaml files, as something like
> >>>>>>>> "TableEnvironment
> >>>>>>>>>>> tEnv = TableEnvironment.loadFromYaml(<file_path>)" and
> >>>>>>>>>>> "tEnv.offloadToYaml(<file_path>)" to restore and persist
> >>>> state,
> >>>>>> and
> >>>>>>>> try
> >>>>>>>>> to
> >>>>>>>>>>> make yaml file more expressive.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> On Tue, Oct 1, 2019 at 6:47 AM Dawid Wysakowicz <
> >>>>>>>> [hidden email]
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> Hi Timo, Bowen,
> >>>>>>>>>>>>
> >>>>>>>>>>>> Unfortunately I did not have enough time to go through all
> >>>> the
> >>>>>>>>>>>> suggestions in details so I can not comment on the whole
> >>>> FLIP.
> >>>>>>>>>>>> I just wanted to give my opinion on the "descriptor approach
> >>>> in
> >>>>>>>>>>>> loadModule" part. I am not sure if we need it here. We might
> >>>> be
> >>>>>>>>>>>> overthinking this a bit. It definitely makes sense for
> >>>> objects
> >>>>>> like
> >>>>>>>>>>>> TableSource/TableSink etc. as they are logical definitions
> >>>> that
> >>>>>>>> nearly
> >>>>>>>>>>>> always have to be persisted in a Catalog. I'm not sure if we
> >>>>>> really
> >>>>>>>>> need
> >>>>>>>>>>>> the same for a whole session. If we need a resume session
> >>>>>> feature,
> >>>>>>>> the
> >>>>>>>>>>>> way to go would probably be to keep the session in memory on
> >>>>> the
> >>>>>>>> server
> >>>>>>>>>>>> side. I fear we will never be able to serialize the whole
> >>>>> session
> >>>>>>>>>>>> entirely (temporary objects, objects derived from DataStream
> >>>>>> etc.)
> >>>>>>>>>>>> I think it is ok to use instances for objects like Catalogs
> >>>> or
> >>>>>>>> Modules
> >>>>>>>>>>>> and have an overlay on top of that that can create instances
> >>>>> from
> >>>>>>>>>>>> properties.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Best,
> >>>>>>>>>>>>
> >>>>>>>>>>>> Dawid
> >>>>>>>>>>>>
> >>>>>>>>>>>> On 01/10/2019 11:28, Timo Walther wrote:
> >>>>>>>>>>>>> Hi Bowen,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> thanks for your response.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Re 2) I also don't have a better approach for this issue. It
> >>>>> is
> >>>>>>>>>>>>> similar to changing the general TableConfig between two
> >>>>>>> statements.
> >>>>>>>> It
> >>>>>>>>>>>>> would be good to add your explanation to the design
> >>>> document.
> >>>>>>>>>>>>> Re 3) It would be interesting to know about which "core"
> >>>>>> functions
> >>>>>>>> we
> >>>>>>>>>>>>> are actually talking about. Also for the overriding built-in
> >>>>>>>> functions
> >>>>>>>>>>>>> that we discussed in the other FLIP. But I'm fine with
> >>>> leaving
> >>>>>> it
> >>>>>>> to
> >>>>>>>>>>>>> the user for now. How about we just introduce loadModule(),
> >>>>>>>>>>>>> unloadModule() methods instead of useModules()? This would
> >>>>>> ensure
> >>>>>>>> that
> >>>>>>>>>>>>> users don't forget to add the core module when adding an
> >>>>>>> additional
> >>>>>>>>>>>>> module and they need to explicitly call
> >>>>> "unloadModule('core')".
> >>>>>>>>>>>>> Re 4) Every table environment feature should also be
> >>>> designed
> >>>>>> with
> >>>>>>>> SQL
> >>>>>>>>>>>>> statements in mind to verify the concept. SQL is also more
> >>>>>> popular
> >>>>>>>>>>>>> that Java/Scala API or YAML file. I would like to add it to
> >>>>> 1.10
> >>>>>>> for
> >>>>>>>>>>>>> marking the feature as complete.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> SHOW MODULES -> sounds good to me, we should add a
> >>>>>> listModules():
> >>>>>>>>>>>>> List<String> method to table environment
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> LOAD MODULE 'hive' [WITH ('prop'='myProp', ...)] --> we
> >>>> should
> >>>>>>> add a
> >>>>>>>>>>>>> loadModule() method to table environment
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> UNLOAD MODULE 'hive' --> we should add a unloadModule()
> >>>> method
> >>>>>> to
> >>>>>>>>>>>>> table environment
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I would not introduce `USE MODULES 'x' 'y' 'z'` for
> >>>> simplicity
> >>>>>> and
> >>>>>>>>>>>>> concise API. Users need to load the module anyway with
> >>>>>> properties.
> >>>>>>>>>>>>> They can also load them "in order" immediately. CREATE TABLE
> >>>>> can
> >>>>>>>> also
> >>>>>>>>>>>>> not create multiple tables but only one at a time in that
> >>>>> order.
> >>>>>>>>>>>>> One thing that came to my mind, shall we use a descriptor
> >>>>>> approach
> >>>>>>>> for
> >>>>>>>>>>>>> loadModule()? The past has shown that passing instances
> >>>> causes
> >>>>>>>>>>>>> problems when persisting objects. That's why we also want to
> >>>>> get
> >>>>>>> rid
> >>>>>>>>>>>>> of registerTableSource. I could image that users might want
> >>>> to
> >>>>>>>> persist
> >>>>>>>>>>>>> a table environment's state for later use in the future.
> >>>> Even
> >>>>>>> though
> >>>>>>>>>>>>> this is future work, we should already keep such use cases
> >>>> in
> >>>>>> mind
> >>>>>>>>>>>>> when adding new API methods. What do you think?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Regards,
> >>>>>>>>>>>>> Timo
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On 30.09.19 23:17, Bowen Li wrote:
> >>>>>>>>>>>>>> Hi Timo,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Re 1) I agree. I renamed the title to "Extend Core Table
> >>>>> System
> >>>>>>>> with
> >>>>>>>>>>>>>> Pluggable Modules" and all internal references
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Re 2) First, I'll rename the API to useModules(). The
> >>>> design
> >>>>>>>> doesn't
> >>>>>>>>>>>>>> forbid
> >>>>>>>>>>>>>> users to call useModules() multi times. Objects in modules
> >>>>> are
> >>>>>>>> loaded
> >>>>>>>>>>>> on
> >>>>>>>>>>>>>> demand instead of eagerly, so there won't be inconsistency.
> >>>>>> Users
> >>>>>>>>>>>>>> have to
> >>>>>>>>>>>>>> be fully aware of the consequences of resetting modules as
> >>>>> that
> >>>>>>>> might
> >>>>>>>>>>>>>> cause
> >>>>>>>>>>>>>> that some objects can not be referenced anymore or
> >>>> resolution
> >>>>>>> order
> >>>>>>>>>>>>>> of some
> >>>>>>>>>>>>>> objects changes.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Re 3) Yes, we'd leave that to users.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Another approach can be to have a non-optional "Core"
> >>>> module
> >>>>>> for
> >>>>>>>> all
> >>>>>>>>>>>>>> objects that cannot be overrode like "CAST" and "AS"
> >>>>> functions,
> >>>>>>> and
> >>>>>>>>>>>>>> have an
> >>>>>>>>>>>>>> optional "ExtendedCore" module for other replaceable
> >>>> built-in
> >>>>>>>>> objects.
> >>>>>>>>>>>>>> "Core" should be positioned at the 1st in module list all
> >>>> the
> >>>>>>> time.
> >>>>>>>>>>>>>> I'm fine with either solution.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Re 4) It may sound like a nice-to-have advanced feature for
> >>>>>> 1.10,
> >>>>>>>> but
> >>>>>>>>>>>> we
> >>>>>>>>>>>>>> can surely fully discuss it for the sake of feature
> >>>>>> completeness.
> >>>>>>>>>>>>>> Unlike other configs, the order of modules would matter in
> >>>>>> Flink,
> >>>>>>>> and
> >>>>>>>>>>>> it
> >>>>>>>>>>>>>> implies the LOAD/UNLOAD commands would not be equal in
> >>>>>> operation
> >>>>>>>>>>>>>> positions.
> >>>>>>>>>>>>>> IIUYC, LOAD MODULE 'x' would be interpreted as appending x
> >>>> to
> >>>>>> the
> >>>>>>>> end
> >>>>>>>>>>>> of
> >>>>>>>>>>>>>> module list, and UNLOAD MODULE 'x' would be interpreted as
> >>>>>>>> removing x
> >>>>>>>>>>>>>> from
> >>>>>>>>>>>>>> any position in the list?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I'm thinking of the following list of commands:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> SHOW MODULES - list modules in order
> >>>>>>>>>>>>>> LOAD MODULE 'hive' [WITH ('prop'='myProp', ...)] - load and
> >>>>>>> append
> >>>>>>>>> the
> >>>>>>>>>>>>>> module to end of the module list
> >>>>>>>>>>>>>> UNLOAD MODULE 'hive' - remove the module from module list,
> >>>>> and
> >>>>>>>> other
> >>>>>>>>>>>>>> modules remain the same relative positions
> >>>>>>>>>>>>>> USE MODULES 'x' 'y' 'z' (wondering can parser take "'x' 'y'
> >>>>>>> 'z'"?),
> >>>>>>>>>>>>>> or USE
> >>>>>>>>>>>>>> MODULES 'x,y,z' - to reorder module list completely
> >>>>>>>>>>>>>>
> >>>>>>>> --
> >>>>>>>> Xuefu Zhang
> >>>>>>>>
> >>>>>>>> "In Honey We Trust!"
> >>>>>>>>
> >>>> --
> >>>> Xuefu Zhang
> >>>>
> >>>> "In Honey We Trust!"
> >>>>
> >>
>
>

Timo Walther-2

Re: [DISCUSS] FLIP-68: Extend Core Table System with Modular Plugins

I'm fine with `type` for consistency reasons for now. I hope we will fix
that when we rework the properties design.

@Bowen: could you update the wiki page? I think we could start a vote,
right?

Regards,
Timo

On 11.10.19 04:31, Jark Wu wrote:

> Hi Timo,
>
> I agree that we are going to rework properties soon.
> But we may come up with a better name or a better way than "kind" when the
> proposal is started and more people involved.
> On the other hand, reworking properties should be a compatible way.
> So I think it's fine to use "type" for now (keep consistent with others)
> and change them all together when refactoring in the near future.
>
> Regards,
> Jark
>
> On Thu, 10 Oct 2019 at 21:12, Timo Walther <[hidden email]> wrote:
>
>> Hi Jark,
>>
>> restricting one module instance per kind sounds good to me. Modules can
>> implement hashCode/equals and we can perform the check you metioned. The
>> equals() method can determine what "same kind" means.
>>
>> Correct me if I'm wrong, but we wanted to perform a big properties
>> rework soonish anyway, no? Then we can also change `connector.type` to
>> `connector.kind`?
>>
>> In Calcite they distinguish between `type` and `kind`:
>>
>> SqlOperator(
>> String name,
>> SqlKind kind,
>> int leftPrecedence,
>> int rightPrecedence,
>> SqlReturnTypeInference returnTypeInference,
>> SqlOperandTypeInference operandTypeInference,
>> SqlOperandTypeChecker operandTypeChecker
>>
>> In Flink 1.9, we also started with
>> org.apache.flink.table.functions.FunctionDefinition#getKind because type
>> is just heavily overloaded with different meanings. Logical type, data
>> type, connector type, etc.
>>
>> Regards,
>> Timo
>>
>>
>> On 10.10.19 14:40, Jark Wu wrote:
>>> Hi Timo,
>>>
>>> Thanks for the explanation, it makes sense to me.
>>> So at least, we can have a validation to restrict one module instance per
>>> type in the first version.
>>>
>>> Regarding to "type" vs "kind", could we use "datatype" keyword to refer
>>> data types exclusively in the future?
>>> This can avoid the conflict/confusion when we use "type" here.
>>> The concern of using "kind" is that it is inconsistent with other
>>> descriptor properties.
>>> We have heavily used "type" in properties, including `connector.type`,
>>> `format.type`, `catalog.type`, etc...
>>> Are we planning to change them all ?
>>>
>>> Best,
>>> Jark
>>>
>>> On Thu, 10 Oct 2019 at 19:56, Timo Walther <[hidden email]> wrote:
>>>
>>>> Hi Jark,
>>>>
>>>> we had a long offline discussion yesterday where we considered all
>>>> options again. The reasons why we decided for the updated design that
>>>> Bowen suggested:
>>>>
>>>> - Both Dawid and Xuefu argued that in the old design "kind" has binary
>>>> meanings. I agree here.
>>>> - Compared to other SQL concepts such as tables/functions/catalogs, a
>>>> "name" is never part of the object itself but always specified when
>>>> registering the object. We should have the same behavior for modules
>>>> because of consistency reasons. It also makes the "name" explicit when
>>>> it comes to unloading the module compared to the previous design.
>>>> - Regarding, "How to solve the class conflict problem?" this is an
>>>> orthogonal issue that will be fixed in future versions once we use
>>>> Flink's new plugin architecture. If a module is parameterizable, it is
>>>> the responsibility of the module to prevent class conflicts. If the Hive
>>>> classes are hidden in a module classloader, it should be possible to
>>>> load both hive121 and hive234. But in general this problem is unsolved
>>>> for now, also Kafka tables could clash if you read from two Kafka
>>>> clusters with different versions.
>>>>
>>>> Regards,
>>>> Timo
>>>>
>>>>
>>>> On 10.10.19 08:01, Jark Wu wrote:
>>>>> Hi Xuefu,
>>>>>
>>>>> If there is only one instance per type, then what's the "name" used
>> for?
>>>>> Could we remove it and only keep "type" or "kind" to identify modules?
>>>>>
>>>>> Best,
>>>>> Jark
>>>>>
>>>>> On Thu, 10 Oct 2019 at 11:21, Xuefu Z <[hidden email]> wrote:
>>>>>
>>>>>> Jark has a good point. However, I think validation logic can put in
>>>> place
>>>>>> to restrict one instance per type. Maybe the doc needs to be specific
>> on
>>>>>> this.
>>>>>>
>>>>>> Thanks,
>>>>>> Xuefu
>>>>>>
>>>>>> On Wed, Oct 9, 2019 at 7:41 PM Jark Wu <[hidden email]> wrote:
>>>>>>
>>>>>>> Thanks Bowen for the updating.
>>>>>>>
>>>>>>> I have some different opinions on the change.
>>>>>>> IIUC, in the previous design, the "name" is also the "id" or "type"
>> to
>>>>>>> identify which module to load. Which means we can only load one
>>>> instance
>>>>>> of
>>>>>>> a module.
>>>>>>> In the new design, the "name" is just an alias to the module
>> instance,
>>>>>> the
>>>>>>> "kind" is used to identify modules. Which means we can load different
>>>>>>> instances of a module.
>>>>>>> However, what's the "name" or alias used for? Do we need to support
>>>>>> loading
>>>>>>> different instances of a module? From my point of view, it brings
>> more
>>>>>>> complexity and confusion.
>>>>>>> For example, if we load a "hive121" which uses HiveModule with
>> version
>>>>>>> 1.2.1 and load a "hive234" which uses HiveModule with version 2.3.4,
>>>> then
>>>>>>> how to solve the class conflict problem?
>>>>>>>
>>>>>>> IMO, a module can only be load once in a session, so "name" maybe
>>>>>> useless.
>>>>>>> So my proposal is similar to the previous one, but only change "name"
>>>> to
>>>>>>> "kind".
>>>>>>>
>>>>>>> SQL:
>>>>>>> LOAD MODULE "kind" [WITH (properties)];
>>>>>>> UNLOAD MODULE "kind";
>>>>>>> Table:
>>>>>>> tEnv.loadModule("kind" [, properties]);
>>>>>>> tEnv.unloadModule("kind");
>>>>>>>
>>>>>>> What do you think?
>>>>>>>
>>>>>>>
>>>>>>> Best,
>>>>>>> Jark
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, 9 Oct 2019 at 20:38, Bowen Li <[hidden email]> wrote:
>>>>>>>
>>>>>>>> Thanks everyone for your review.
>>>>>>>>
>>>>>>>> After discussing with Timo and Dawid offline, as well as
>> incorporating
>>>>>>>> feedback from Xuefu and Jark on mailing list, I decided to make a
>> few
>>>>>>>> critical changes to the proposal.
>>>>>>>>
>>>>>>>> - renamed the keyword "type" to "kind". The community has plan to
>> have
>>>>>>>> "type" keyword in yaml/descriptor refer to data types exclusively in
>>>>>> the
>>>>>>>> near future. We should cater to that change in our design
>>>>>>>> - allowed specifying names for modules to simplify and unify module
>>>>>>>> loading/unloading syntax between programming and SQL. Here're the
>>>>>>> proposed
>>>>>>>> changes:
>>>>>>>> SQL:
>>>>>>>> LOAD MODULE "name" WITH ("kind"="xxx" [, (properties)])
>>>>>>>> UNLOAD MODULE "name";
>>>>>>>> Table:
>>>>>>>> tEnv.loadModule("name", new Xxx(properties));
>>>>>>>> tEnv.unloadModule("name");
>>>>>>>>
>>>>>>>> I have completely updated the google doc [1]. Please take another
>>>> look,
>>>>>>> and
>>>>>>>> let me know if you have any other questions. Thanks!
>>>>>>>>
>>>>>>>> [1]
>>>>>>>>
>>>>>>>>
>> https://docs.google.com/document/d/17CPMpMbPDjvM4selUVEfh_tqUK_oV0TODAUA9dfHakc/edit#
>>>>>>>> On Tue, Oct 8, 2019 at 6:26 AM Jark Wu <[hidden email]> wrote:
>>>>>>>>
>>>>>>>>> Hi Bowen,
>>>>>>>>>
>>>>>>>>> Thanks for the proposal. I have two thoughts:
>>>>>>>>>
>>>>>>>>> 1) Regarding to "loadModule", how about
>>>>>>>>> tableEnv.loadModule("xxx" [, propertiesMap]);
>>>>>>>>> tableEnv.unloadModule(“xxx”);
>>>>>>>>>
>>>>>>>>> This makes the API similar to SQL. IMO, instance of Module is not
>>>>>>> needed
>>>>>>>>> and verbose as parameter.
>>>>>>>>> And this makes it easier to load a simple module without any
>>>>>> additional
>>>>>>>>> properties, e.g. tEnv.loadModule("GEO"), tEnv.unloadModule("GEO")
>>>>>>>>>
>>>>>>>>> 2) In current design, the module interface only defines function
>>>>>>>> metadata,
>>>>>>>>> but no implementations.
>>>>>>>>> I'm wondering how to call/map the implementations in runtime? Am I
>>>>>>>> missing
>>>>>>>>> something?
>>>>>>>>>
>>>>>>>>> Besides, I left some minor comments in the doc.
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>> Jark
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Sat, 5 Oct 2019 at 08:42, Xuefu Z <[hidden email]> wrote:
>>>>>>>>>
>>>>>>>>>> I agree with Timo that the new table APIs need to be consistent.
>>>>>> I'd
>>>>>>> go
>>>>>>>>>> further that an name (or id) is needed for module definition in
>>>>>> YAML
>>>>>>>>> file.
>>>>>>>>>> In the current design, name is skipped and type has binary
>>>>>> meanings.
>>>>>>>>>> Thanks,
>>>>>>>>>> Xuefu
>>>>>>>>>>
>>>>>>>>>> On Fri, Oct 4, 2019 at 5:24 AM Timo Walther <[hidden email]>
>>>>>>>> wrote:
>>>>>>>>>>> Hi everyone,
>>>>>>>>>>>
>>>>>>>>>>> first, I was also questioning my proposal. But Bowen's proposal
>>>>>> of
>>>>>>>>>>> `tEnv.offloadToYaml(<file_path>)` would not work with the current
>>>>>>>>> design
>>>>>>>>>>> because we don't know how to serialize a catalog or module into
>>>>>>>>>>> properties. Currently, there is no converter from instance to
>>>>>>>>>>> properties. It is a one way conversion. We can add a
>>>>>> `toProperties`
>>>>>>>>>>> method to both Catalog and Module class in the future to solve
>>>>>>> this.
>>>>>>>>>>> Solving the table environment serializability can be future work.
>>>>>>>>>>>
>>>>>>>>>>> However, I find the current proposal for the TableEnvironment
>>>>>>> methods
>>>>>>>>> is
>>>>>>>>>>> contradicting:
>>>>>>>>>>>
>>>>>>>>>>> tableEnv.loadModule(new Yyy());
>>>>>>>>>>> tableEnv.unloadModule(“Xxx”);
>>>>>>>>>>>
>>>>>>>>>>> The loading is specified programmatically whereas the unloading
>>>>>>>>> requires
>>>>>>>>>>> a string that is not specified in the module itself. But is
>>>>>> defined
>>>>>>>> in
>>>>>>>>>>> the factory according to the current design.
>>>>>>>>>>>
>>>>>>>>>>> SQL does it more consistently. There, the name `xxx` is used when
>>>>>>>>>>> loading and unloading the module:
>>>>>>>>>>>
>>>>>>>>>>> LOAD MODULE 'xxx' [WITH ('prop'='myProp', ...)]
>>>>>>>>>>> UNLOAD MODULE 'xxx’
>>>>>>>>>>>
>>>>>>>>>>> How about:
>>>>>>>>>>>
>>>>>>>>>>> tableEnv.loadModule("xxx", new Yyy());
>>>>>>>>>>> tableEnv.unloadModule(“xxx”);
>>>>>>>>>>>
>>>>>>>>>>> This would be similar to the catalog interfaces. The name is not
>>>>>>> part
>>>>>>>>> of
>>>>>>>>>>> the instance itself.
>>>>>>>>>>>
>>>>>>>>>>> What do you think?
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Timo
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 01.10.19 21:17, Bowen Li wrote:
>>>>>>>>>>>> If something like the yaml file is the way to go and achieve
>>>>>> such
>>>>>>>>>>>> motivation, we would cover that with current design.
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Oct 1, 2019 at 12:05 Bowen Li <[hidden email]>
>>>>>>> wrote:
>>>>>>>>>>>>> Hi Timo, Dawid,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I've added the suggested SQL and related changes to
>>>>>>>> TableEnvironment
>>>>>>>>>> API
>>>>>>>>>>>>> and other classes to the google doc. Also removed "USE MODULE"
>>>>>>> and
>>>>>>>>> its
>>>>>>>>>>>>> APIs. Will update FLIP wiki once we have a consensus.
>>>>>>>>>>>>>
>>>>>>>>>>>>> W.r.t. descriptor approach, my gut feeling is similar to
>>>>>>> Dawid's.
>>>>>>>>>>> Besides,
>>>>>>>>>>>>> I feel yaml file would be a better solution to persist
>>>>>>>> serializable
>>>>>>>>>>> state
>>>>>>>>>>>>> of an environment as the file itself is in serializable format
>>>>>>>>>> already.
>>>>>>>>>>>>> Though yaml file only serves SQL CLI at this moment, we may be
>>>>>>>> able
>>>>>>>>> to
>>>>>>>>>>>>> extend its reach to Table API and allow users to load/offload
>>>>>> a
>>>>>>>>>>>>> TableEnvironment from/to yaml files, as something like
>>>>>>>>>> "TableEnvironment
>>>>>>>>>>>>> tEnv = TableEnvironment.loadFromYaml(<file_path>)" and
>>>>>>>>>>>>> "tEnv.offloadToYaml(<file_path>)" to restore and persist
>>>>>> state,
>>>>>>>> and
>>>>>>>>>> try
>>>>>>>>>>> to
>>>>>>>>>>>>> make yaml file more expressive.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Oct 1, 2019 at 6:47 AM Dawid Wysakowicz <
>>>>>>>>>> [hidden email]
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Timo, Bowen,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Unfortunately I did not have enough time to go through all
>>>>>> the
>>>>>>>>>>>>>> suggestions in details so I can not comment on the whole
>>>>>> FLIP.
>>>>>>>>>>>>>> I just wanted to give my opinion on the "descriptor approach
>>>>>> in
>>>>>>>>>>>>>> loadModule" part. I am not sure if we need it here. We might
>>>>>> be
>>>>>>>>>>>>>> overthinking this a bit. It definitely makes sense for
>>>>>> objects
>>>>>>>> like
>>>>>>>>>>>>>> TableSource/TableSink etc. as they are logical definitions
>>>>>> that
>>>>>>>>>> nearly
>>>>>>>>>>>>>> always have to be persisted in a Catalog. I'm not sure if we
>>>>>>>> really
>>>>>>>>>>> need
>>>>>>>>>>>>>> the same for a whole session. If we need a resume session
>>>>>>>> feature,
>>>>>>>>>> the
>>>>>>>>>>>>>> way to go would probably be to keep the session in memory on
>>>>>>> the
>>>>>>>>>> server
>>>>>>>>>>>>>> side. I fear we will never be able to serialize the whole
>>>>>>> session
>>>>>>>>>>>>>> entirely (temporary objects, objects derived from DataStream
>>>>>>>> etc.)
>>>>>>>>>>>>>> I think it is ok to use instances for objects like Catalogs
>>>>>> or
>>>>>>>>>> Modules
>>>>>>>>>>>>>> and have an overlay on top of that that can create instances
>>>>>>> from
>>>>>>>>>>>>>> properties.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Dawid
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 01/10/2019 11:28, Timo Walther wrote:
>>>>>>>>>>>>>>> Hi Bowen,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> thanks for your response.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Re 2) I also don't have a better approach for this issue. It
>>>>>>> is
>>>>>>>>>>>>>>> similar to changing the general TableConfig between two
>>>>>>>>> statements.
>>>>>>>>>> It
>>>>>>>>>>>>>>> would be good to add your explanation to the design
>>>>>> document.
>>>>>>>>>>>>>>> Re 3) It would be interesting to know about which "core"
>>>>>>>> functions
>>>>>>>>>> we
>>>>>>>>>>>>>>> are actually talking about. Also for the overriding built-in
>>>>>>>>>> functions
>>>>>>>>>>>>>>> that we discussed in the other FLIP. But I'm fine with
>>>>>> leaving
>>>>>>>> it
>>>>>>>>> to
>>>>>>>>>>>>>>> the user for now. How about we just introduce loadModule(),
>>>>>>>>>>>>>>> unloadModule() methods instead of useModules()? This would
>>>>>>>> ensure
>>>>>>>>>> that
>>>>>>>>>>>>>>> users don't forget to add the core module when adding an
>>>>>>>>> additional
>>>>>>>>>>>>>>> module and they need to explicitly call
>>>>>>> "unloadModule('core')".
>>>>>>>>>>>>>>> Re 4) Every table environment feature should also be
>>>>>> designed
>>>>>>>> with
>>>>>>>>>> SQL
>>>>>>>>>>>>>>> statements in mind to verify the concept. SQL is also more
>>>>>>>> popular
>>>>>>>>>>>>>>> that Java/Scala API or YAML file. I would like to add it to
>>>>>>> 1.10
>>>>>>>>> for
>>>>>>>>>>>>>>> marking the feature as complete.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> SHOW MODULES -> sounds good to me, we should add a
>>>>>>>> listModules():
>>>>>>>>>>>>>>> List<String> method to table environment
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> LOAD MODULE 'hive' [WITH ('prop'='myProp', ...)] --> we
>>>>>> should
>>>>>>>>> add a
>>>>>>>>>>>>>>> loadModule() method to table environment
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> UNLOAD MODULE 'hive' --> we should add a unloadModule()
>>>>>> method
>>>>>>>> to
>>>>>>>>>>>>>>> table environment
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I would not introduce `USE MODULES 'x' 'y' 'z'` for
>>>>>> simplicity
>>>>>>>> and
>>>>>>>>>>>>>>> concise API. Users need to load the module anyway with
>>>>>>>> properties.
>>>>>>>>>>>>>>> They can also load them "in order" immediately. CREATE TABLE
>>>>>>> can
>>>>>>>>>> also
>>>>>>>>>>>>>>> not create multiple tables but only one at a time in that
>>>>>>> order.
>>>>>>>>>>>>>>> One thing that came to my mind, shall we use a descriptor
>>>>>>>> approach
>>>>>>>>>> for
>>>>>>>>>>>>>>> loadModule()? The past has shown that passing instances
>>>>>> causes
>>>>>>>>>>>>>>> problems when persisting objects. That's why we also want to
>>>>>>> get
>>>>>>>>> rid
>>>>>>>>>>>>>>> of registerTableSource. I could image that users might want
>>>>>> to
>>>>>>>>>> persist
>>>>>>>>>>>>>>> a table environment's state for later use in the future.
>>>>>> Even
>>>>>>>>> though
>>>>>>>>>>>>>>> this is future work, we should already keep such use cases
>>>>>> in
>>>>>>>> mind
>>>>>>>>>>>>>>> when adding new API methods. What do you think?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>> Timo
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 30.09.19 23:17, Bowen Li wrote:
>>>>>>>>>>>>>>>> Hi Timo,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Re 1) I agree. I renamed the title to "Extend Core Table
>>>>>>> System
>>>>>>>>>> with
>>>>>>>>>>>>>>>> Pluggable Modules" and all internal references
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Re 2) First, I'll rename the API to useModules(). The
>>>>>> design
>>>>>>>>>> doesn't
>>>>>>>>>>>>>>>> forbid
>>>>>>>>>>>>>>>> users to call useModules() multi times. Objects in modules
>>>>>>> are
>>>>>>>>>> loaded
>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>> demand instead of eagerly, so there won't be inconsistency.
>>>>>>>> Users
>>>>>>>>>>>>>>>> have to
>>>>>>>>>>>>>>>> be fully aware of the consequences of resetting modules as
>>>>>>> that
>>>>>>>>>> might
>>>>>>>>>>>>>>>> cause
>>>>>>>>>>>>>>>> that some objects can not be referenced anymore or
>>>>>> resolution
>>>>>>>>> order
>>>>>>>>>>>>>>>> of some
>>>>>>>>>>>>>>>> objects changes.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Re 3) Yes, we'd leave that to users.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Another approach can be to have a non-optional "Core"
>>>>>> module
>>>>>>>> for
>>>>>>>>>> all
>>>>>>>>>>>>>>>> objects that cannot be overrode like "CAST" and "AS"
>>>>>>> functions,
>>>>>>>>> and
>>>>>>>>>>>>>>>> have an
>>>>>>>>>>>>>>>> optional "ExtendedCore" module for other replaceable
>>>>>> built-in
>>>>>>>>>>> objects.
>>>>>>>>>>>>>>>> "Core" should be positioned at the 1st in module list all
>>>>>> the
>>>>>>>>> time.
>>>>>>>>>>>>>>>> I'm fine with either solution.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Re 4) It may sound like a nice-to-have advanced feature for
>>>>>>>> 1.10,
>>>>>>>>>> but
>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>> can surely fully discuss it for the sake of feature
>>>>>>>> completeness.
>>>>>>>>>>>>>>>> Unlike other configs, the order of modules would matter in
>>>>>>>> Flink,
>>>>>>>>>> and
>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>> implies the LOAD/UNLOAD commands would not be equal in
>>>>>>>> operation
>>>>>>>>>>>>>>>> positions.
>>>>>>>>>>>>>>>> IIUYC, LOAD MODULE 'x' would be interpreted as appending x
>>>>>> to
>>>>>>>> the
>>>>>>>>>> end
>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>> module list, and UNLOAD MODULE 'x' would be interpreted as
>>>>>>>>>> removing x
>>>>>>>>>>>>>>>> from
>>>>>>>>>>>>>>>> any position in the list?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I'm thinking of the following list of commands:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> SHOW MODULES - list modules in order
>>>>>>>>>>>>>>>> LOAD MODULE 'hive' [WITH ('prop'='myProp', ...)] - load and
>>>>>>>>> append
>>>>>>>>>>> the
>>>>>>>>>>>>>>>> module to end of the module list
>>>>>>>>>>>>>>>> UNLOAD MODULE 'hive' - remove the module from module list,
>>>>>>> and
>>>>>>>>>> other
>>>>>>>>>>>>>>>> modules remain the same relative positions
>>>>>>>>>>>>>>>> USE MODULES 'x' 'y' 'z' (wondering can parser take "'x' 'y'
>>>>>>>>> 'z'"?),
>>>>>>>>>>>>>>>> or USE
>>>>>>>>>>>>>>>> MODULES 'x,y,z' - to reorder module list completely
>>>>>>>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Xuefu Zhang
>>>>>>>>>>
>>>>>>>>>> "In Honey We Trust!"
>>>>>>>>>>
>>>>>> --
>>>>>> Xuefu Zhang
>>>>>>
>>>>>> "In Honey We Trust!"
>>>>>>
>>