[DISCUSS] FLIP-64: Support for Temporary Objects in Table module

classic Classic list List threaded Threaded
27 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[DISCUSS] FLIP-64: Support for Temporary Objects in Table module

dwysakowicz

Hi all,

As part of FLIP-30 a Catalog API was introduced that enables storing table meta objects permanently. At the same time the majority of current APIs create temporary objects that cannot be serialized. We should clarify the creation of meta objects (tables, views, functions) in a unified way.

Another current problem in the API is that all the temporary objects are stored in a special built-in catalog, which is not very intuitive for many users, as they must be aware of that catalog to reference temporary objects.

Lastly, different APIs have different ways of providing object paths:

  • String path…, 
  • String path, String pathContinued…
  • String name

We should choose one approach and unify it across all APIs.

I suggest a FLIP to address the above issues.

Looking forward to your opinions.

FLIP link: https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module


signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table module

JingsongLee-2
Thanks dawid, +1 for this approach.

One concern is the removal of registerTableSink & registerTableSource
 in TableEnvironment. It has two alternatives:
1.the properties approach (DDL, descriptor).
2.from/toDataStream.

#1 can only be properties, not java states, and some Connectors
 are difficult to convert all states to properties.
#2 can contain java state. But can't use TableSource-related features,
like project & filter push down, partition support, etc..

Any idea about this?

Best,
Jingsong Lee


------------------------------------------------------------------
From:Dawid Wysakowicz <[hidden email]>
Send Time:2019年9月4日(星期三) 22:20
To:dev <[hidden email]>
Subject:[DISCUSS] FLIP-64: Support for Temporary Objects in Table module

Hi all,
As part of FLIP-30 a Catalog API was introduced that enables storing table meta objects permanently. At the same time the majority of current APIs create temporary objects that cannot be serialized. We should clarify the creation of meta objects (tables, views, functions) in a unified way.
Another current problem in the API is that all the temporary objects are stored in a special built-in catalog, which is not very intuitive for many users, as they must be aware of that catalog to reference temporary objects.
Lastly, different APIs have different ways of providing object paths:

String path…,
String path, String pathContinued…
String name
We should choose one approach and unify it across all APIs.
I suggest a FLIP to address the above issues.
Looking forward to your opinions.
FLIP link: https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table module

Xuefu Z
Thanks to Dawid for starting the discussion and writeup. It looks pretty
good to me except that I'm a little concerned about the object reference
and string parsing in the code, which seems to an anti-pattern to OOP. Have
we considered using ObjectIdenitifier with optional catalog and db parts,
esp. if we are worried about arguments of variable length or method
overloading? It's quite likely that the result of string parsing is an
ObjectIdentifier instance any way.

Having string parsing logic in the code is a little dangerous as it
duplicates part of the DDL/DML parsing, and they can easily get out of sync.

Thanks,
Xuefu

On Fri, Sep 6, 2019 at 1:57 PM JingsongLee <[hidden email]>
wrote:

> Thanks dawid, +1 for this approach.
>
> One concern is the removal of registerTableSink & registerTableSource
>  in TableEnvironment. It has two alternatives:
> 1.the properties approach (DDL, descriptor).
> 2.from/toDataStream.
>
> #1 can only be properties, not java states, and some Connectors
>  are difficult to convert all states to properties.
> #2 can contain java state. But can't use TableSource-related features,
> like project & filter push down, partition support, etc..
>
> Any idea about this?
>
> Best,
> Jingsong Lee
>
>
> ------------------------------------------------------------------
> From:Dawid Wysakowicz <[hidden email]>
> Send Time:2019年9月4日(星期三) 22:20
> To:dev <[hidden email]>
> Subject:[DISCUSS] FLIP-64: Support for Temporary Objects in Table module
>
> Hi all,
> As part of FLIP-30 a Catalog API was introduced that enables storing table
> meta objects permanently. At the same time the majority of current APIs
> create temporary objects that cannot be serialized. We should clarify the
> creation of meta objects (tables, views, functions) in a unified way.
> Another current problem in the API is that all the temporary objects are
> stored in a special built-in catalog, which is not very intuitive for many
> users, as they must be aware of that catalog to reference temporary objects.
> Lastly, different APIs have different ways of providing object paths:
>
> String path…,
> String path, String pathContinued…
> String name
> We should choose one approach and unify it across all APIs.
> I suggest a FLIP to address the above issues.
> Looking forward to your opinions.
> FLIP link:
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module
>
>

--
Xuefu Zhang

"In Honey We Trust!"
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table module

dwysakowicz

Hi all,

@Jingsong Could you elaborate a bit more what do you mean by

"some Connectors are difficult to convert all states to properties"

All the Flink provided connectors will definitely be expressible with properties (In the end you should be able to use them from DDL). I think if a TableSource is complex enough that it handles filter push down, partition support etc. should rather be made available both from DDL & java/scala code. I'm happy to reconsider adding registerTemporaryTable(String path, TableSource source) if you have some concrete examples in mind.


@Xuefu: We also considered the ObjectIdentifier (or actually introducing a new identifier representation to differentiate between resolved and unresolved identifiers) with the same concerns. We decided to suggest the string & parsing logic because of usability.

    tEnv.from("cat.db.table")

is shorter and easier to write than

    tEnv.from(Identifier.for("cat", "db", "name")

And also implicitly solves the problem what happens if a user (e.g. used to other systems) uses that API in a following manner:

    tEnv.from(Identifier.for("db.name")

I'm happy to revisit it if the general consensus is that it's better to use the OO aproach.

Best,

Dawid

On 06/09/2019 10:00, Xuefu Z wrote:
Thanks to Dawid for starting the discussion and writeup. It looks pretty
good to me except that I'm a little concerned about the object reference
and string parsing in the code, which seems to an anti-pattern to OOP. Have
we considered using ObjectIdenitifier with optional catalog and db parts,
esp. if we are worried about arguments of variable length or method
overloading? It's quite likely that the result of string parsing is an
ObjectIdentifier instance any way.

Having string parsing logic in the code is a little dangerous as it
duplicates part of the DDL/DML parsing, and they can easily get out of sync.

Thanks,
Xuefu

On Fri, Sep 6, 2019 at 1:57 PM JingsongLee [hidden email]
wrote:

Thanks dawid, +1 for this approach.

One concern is the removal of registerTableSink & registerTableSource
 in TableEnvironment. It has two alternatives:
1.the properties approach (DDL, descriptor).
2.from/toDataStream.

#1 can only be properties, not java states, and some Connectors
 are difficult to convert all states to properties.
#2 can contain java state. But can't use TableSource-related features,
like project & filter push down, partition support, etc..

Any idea about this?

Best,
Jingsong Lee


------------------------------------------------------------------
From:Dawid Wysakowicz [hidden email]
Send Time:2019年9月4日(星期三) 22:20
To:dev [hidden email]
Subject:[DISCUSS] FLIP-64: Support for Temporary Objects in Table module

Hi all,
As part of FLIP-30 a Catalog API was introduced that enables storing table
meta objects permanently. At the same time the majority of current APIs
create temporary objects that cannot be serialized. We should clarify the
creation of meta objects (tables, views, functions) in a unified way.
Another current problem in the API is that all the temporary objects are
stored in a special built-in catalog, which is not very intuitive for many
users, as they must be aware of that catalog to reference temporary objects.
Lastly, different APIs have different ways of providing object paths:

String path…,
String path, String pathContinued…
String name
We should choose one approach and unify it across all APIs.
I suggest a FLIP to address the above issues.
Looking forward to your opinions.
FLIP link:
https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module



    

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table module

JingsongLee-2
Hi dawid:

It is difficult to describe specific examples.
Sometimes users will generate some java converters through some
 Java code, or generate some Java classes through third-party
 libraries. Of course, these can be best done through properties.
But this requires additional work from users.My suggestion is to
 keep this Java instance class way that is user-friendly.

Best,
Jingsong Lee


------------------------------------------------------------------
From:Dawid Wysakowicz <[hidden email]>
Send Time:2019年9月6日(星期五) 16:21
To:dev <[hidden email]>
Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table module

Hi all,
@Jingsong Could you elaborate a bit more what do you mean by
"some Connectors are difficult to convert all states to properties"
All the Flink provided connectors will definitely be expressible with properties (In the end you should be able to use them from DDL). I think if a TableSource is complex enough that it handles filter push down, partition support etc. should rather be made available both from DDL & java/scala code. I'm happy to reconsider adding registerTemporaryTable(String path, TableSource source) if you have some concrete examples in mind.


@Xuefu: We also considered the ObjectIdentifier (or actually introducing a new identifier representation to differentiate between resolved and unresolved identifiers) with the same concerns. We decided to suggest the string & parsing logic because of usability.
    tEnv.from("cat.db.table")
is shorter and easier to write than
    tEnv.from(Identifier.for("cat", "db", "name")
And also implicitly solves the problem what happens if a user (e.g. used to other systems) uses that API in a following manner:
    tEnv.from(Identifier.for("db.name")
I'm happy to revisit it if the general consensus is that it's better to use the OO aproach.
Best,
Dawid

On 06/09/2019 10:00, Xuefu Z wrote:

Thanks to Dawid for starting the discussion and writeup. It looks pretty
good to me except that I'm a little concerned about the object reference
and string parsing in the code, which seems to an anti-pattern to OOP. Have
we considered using ObjectIdenitifier with optional catalog and db parts,
esp. if we are worried about arguments of variable length or method
overloading? It's quite likely that the result of string parsing is an
ObjectIdentifier instance any way.

Having string parsing logic in the code is a little dangerous as it
duplicates part of the DDL/DML parsing, and they can easily get out of sync.

Thanks,
Xuefu

On Fri, Sep 6, 2019 at 1:57 PM JingsongLee <[hidden email]>
wrote:


Thanks dawid, +1 for this approach.

One concern is the removal of registerTableSink & registerTableSource
 in TableEnvironment. It has two alternatives:
1.the properties approach (DDL, descriptor).
2.from/toDataStream.

#1 can only be properties, not java states, and some Connectors
 are difficult to convert all states to properties.
#2 can contain java state. But can't use TableSource-related features,
like project & filter push down, partition support, etc..

Any idea about this?

Best,
Jingsong Lee


------------------------------------------------------------------
From:Dawid Wysakowicz <[hidden email]>
Send Time:2019年9月4日(星期三) 22:20
To:dev <[hidden email]>
Subject:[DISCUSS] FLIP-64: Support for Temporary Objects in Table module

Hi all,
As part of FLIP-30 a Catalog API was introduced that enables storing table
meta objects permanently. At the same time the majority of current APIs
create temporary objects that cannot be serialized. We should clarify the
creation of meta objects (tables, views, functions) in a unified way.
Another current problem in the API is that all the temporary objects are
stored in a special built-in catalog, which is not very intuitive for many
users, as they must be aware of that catalog to reference temporary objects.
Lastly, different APIs have different ways of providing object paths:

String path…,
String path, String pathContinued…
String name
We should choose one approach and unify it across all APIs.
I suggest a FLIP to address the above issues.
Looking forward to your opinions.
FLIP link:
https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table module

Zhenghua Gao
In reply to this post by dwysakowicz
Thanks david for pushing this forward.
I have one concern about temporary objects and non-persistent catalog(e.g.,
GenericInMemoryCatalog).
In SQL, temporary objects exist at the session level. They are only visible
to the session in which they were created and are automatically dropped
when that session logs off.
So in that sense, all objects in non-persistent catalogs should be
temporary.

My concern is: How does the non-persistent catalog work with temporary
objects?

*Best Regards,*
*Zhenghua Gao*


On Wed, Sep 4, 2019 at 10:20 PM Dawid Wysakowicz <[hidden email]>
wrote:

> Hi all,
>
> As part of FLIP-30
> <https://cwiki.apache.org/confluence/display/FLINK/FLIP-30%3A+Unified+Catalog+APIs>
> a Catalog API was introduced that enables storing table meta objects
> permanently. At the same time the majority of current APIs create temporary
> objects that cannot be serialized. We should clarify the creation of meta
> objects (tables, views, functions) in a unified way.
>
> Another current problem in the API is that all the temporary objects are
> stored in a special built-in catalog, which is not very intuitive for many
> users, as they must be aware of that catalog to reference temporary objects.
>
> Lastly, different APIs have different ways of providing object paths:
>
>    - String path…,
>    - String path, String pathContinued…
>    - String name
>
> We should choose one approach and unify it across all APIs.
>
> I suggest a FLIP to address the above issues.
>
> Looking forward to your opinions.
>
> FLIP link:
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table module

Timo Walther-2
In reply to this post by JingsongLee-2
Hi Dawid,

thanks for the design document. It fixes big concept gaps due to
historical reasons with proper support for serializability and catalog
support in mind.

I would not mind a registerTemporarySource/Sink, but the problem that I
see is that many people think that this is the recommended way of
registering a table source/sink which is not true. We should guide users
to either use connect() or DDL API which can be validated and stored in
catalog.

Also from a concept perspective, registering a source/sink does not fit
into the SQL world. SQL does not know about source/sinks but only about
tables. If the responsibility of a TableSource/TableSink is just a pure
physical data consumer/producer that is not connected to the actual
logical table schema, we would need a possibility of defining time
attributes and interpreting/converting a changelog. This should be done
by the framework with information from the DDL/connect() and not be
defined in every table source.

Regards,
Timo


On 09.09.19 14:16, JingsongLee wrote:

> Hi dawid:
>
> It is difficult to describe specific examples.
> Sometimes users will generate some java converters through some
>   Java code, or generate some Java classes through third-party
>   libraries. Of course, these can be best done through properties.
> But this requires additional work from users.My suggestion is to
>   keep this Java instance class way that is user-friendly.
>
> Best,
> Jingsong Lee
>
>
> ------------------------------------------------------------------
> From:Dawid Wysakowicz <[hidden email]>
> Send Time:2019年9月6日(星期五) 16:21
> To:dev <[hidden email]>
> Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table module
>
> Hi all,
> @Jingsong Could you elaborate a bit more what do you mean by
> "some Connectors are difficult to convert all states to properties"
> All the Flink provided connectors will definitely be expressible with properties (In the end you should be able to use them from DDL). I think if a TableSource is complex enough that it handles filter push down, partition support etc. should rather be made available both from DDL & java/scala code. I'm happy to reconsider adding registerTemporaryTable(String path, TableSource source) if you have some concrete examples in mind.
>
>
> @Xuefu: We also considered the ObjectIdentifier (or actually introducing a new identifier representation to differentiate between resolved and unresolved identifiers) with the same concerns. We decided to suggest the string & parsing logic because of usability.
>      tEnv.from("cat.db.table")
> is shorter and easier to write than
>      tEnv.from(Identifier.for("cat", "db", "name")
> And also implicitly solves the problem what happens if a user (e.g. used to other systems) uses that API in a following manner:
>      tEnv.from(Identifier.for("db.name")
> I'm happy to revisit it if the general consensus is that it's better to use the OO aproach.
> Best,
> Dawid
>
> On 06/09/2019 10:00, Xuefu Z wrote:
>
> Thanks to Dawid for starting the discussion and writeup. It looks pretty
> good to me except that I'm a little concerned about the object reference
> and string parsing in the code, which seems to an anti-pattern to OOP. Have
> we considered using ObjectIdenitifier with optional catalog and db parts,
> esp. if we are worried about arguments of variable length or method
> overloading? It's quite likely that the result of string parsing is an
> ObjectIdentifier instance any way.
>
> Having string parsing logic in the code is a little dangerous as it
> duplicates part of the DDL/DML parsing, and they can easily get out of sync.
>
> Thanks,
> Xuefu
>
> On Fri, Sep 6, 2019 at 1:57 PM JingsongLee <[hidden email]>
> wrote:
>
>
> Thanks dawid, +1 for this approach.
>
> One concern is the removal of registerTableSink & registerTableSource
>   in TableEnvironment. It has two alternatives:
> 1.the properties approach (DDL, descriptor).
> 2.from/toDataStream.
>
> #1 can only be properties, not java states, and some Connectors
>   are difficult to convert all states to properties.
> #2 can contain java state. But can't use TableSource-related features,
> like project & filter push down, partition support, etc..
>
> Any idea about this?
>
> Best,
> Jingsong Lee
>
>
> ------------------------------------------------------------------
> From:Dawid Wysakowicz <[hidden email]>
> Send Time:2019年9月4日(星期三) 22:20
> To:dev <[hidden email]>
> Subject:[DISCUSS] FLIP-64: Support for Temporary Objects in Table module
>
> Hi all,
> As part of FLIP-30 a Catalog API was introduced that enables storing table
> meta objects permanently. At the same time the majority of current APIs
> create temporary objects that cannot be serialized. We should clarify the
> creation of meta objects (tables, views, functions) in a unified way.
> Another current problem in the API is that all the temporary objects are
> stored in a special built-in catalog, which is not very intuitive for many
> users, as they must be aware of that catalog to reference temporary objects.
> Lastly, different APIs have different ways of providing object paths:
>
> String path…,
> String path, String pathContinued…
> String name
> We should choose one approach and unify it across all APIs.
> I suggest a FLIP to address the above issues.
> Looking forward to your opinions.
> FLIP link:
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module
>

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table module

JingsongLee-2
After some development and thinking, I have a general understanding.
+1 to registering a source/sink does not fit into the SQL world.
I am OK to have a deprecated registerTemporarySource/Sink to compatible with old ways.

Best,
Jingsong Lee


------------------------------------------------------------------
From:Timo Walther <[hidden email]>
Send Time:2019年9月17日(星期二) 08:00
To:dev <[hidden email]>
Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table module

Hi Dawid,

thanks for the design document. It fixes big concept gaps due to
historical reasons with proper support for serializability and catalog
support in mind.

I would not mind a registerTemporarySource/Sink, but the problem that I
see is that many people think that this is the recommended way of
registering a table source/sink which is not true. We should guide users
to either use connect() or DDL API which can be validated and stored in
catalog.

Also from a concept perspective, registering a source/sink does not fit
into the SQL world. SQL does not know about source/sinks but only about
tables. If the responsibility of a TableSource/TableSink is just a pure
physical data consumer/producer that is not connected to the actual
logical table schema, we would need a possibility of defining time
attributes and interpreting/converting a changelog. This should be done
by the framework with information from the DDL/connect() and not be
defined in every table source.

Regards,
Timo


On 09.09.19 14:16, JingsongLee wrote:

> Hi dawid:
>
> It is difficult to describe specific examples.
> Sometimes users will generate some java converters through some
>   Java code, or generate some Java classes through third-party
>   libraries. Of course, these can be best done through properties.
> But this requires additional work from users.My suggestion is to
>   keep this Java instance class way that is user-friendly.
>
> Best,
> Jingsong Lee
>
>
> ------------------------------------------------------------------
> From:Dawid Wysakowicz <[hidden email]>
> Send Time:2019年9月6日(星期五) 16:21
> To:dev <[hidden email]>
> Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table module
>
> Hi all,
> @Jingsong Could you elaborate a bit more what do you mean by
> "some Connectors are difficult to convert all states to properties"
> All the Flink provided connectors will definitely be expressible with properties (In the end you should be able to use them from DDL). I think if a TableSource is complex enough that it handles filter push down, partition support etc. should rather be made available both from DDL & java/scala code. I'm happy to reconsider adding registerTemporaryTable(String path, TableSource source) if you have some concrete examples in mind.
>
>
> @Xuefu: We also considered the ObjectIdentifier (or actually introducing a new identifier representation to differentiate between resolved and unresolved identifiers) with the same concerns. We decided to suggest the string & parsing logic because of usability.
>      tEnv.from("cat.db.table")
> is shorter and easier to write than
>      tEnv.from(Identifier.for("cat", "db", "name")
> And also implicitly solves the problem what happens if a user (e.g. used to other systems) uses that API in a following manner:
>      tEnv.from(Identifier.for("db.name")
> I'm happy to revisit it if the general consensus is that it's better to use the OO aproach.
> Best,
> Dawid
>
> On 06/09/2019 10:00, Xuefu Z wrote:
>
> Thanks to Dawid for starting the discussion and writeup. It looks pretty
> good to me except that I'm a little concerned about the object reference
> and string parsing in the code, which seems to an anti-pattern to OOP. Have
> we considered using ObjectIdenitifier with optional catalog and db parts,
> esp. if we are worried about arguments of variable length or method
> overloading? It's quite likely that the result of string parsing is an
> ObjectIdentifier instance any way.
>
> Having string parsing logic in the code is a little dangerous as it
> duplicates part of the DDL/DML parsing, and they can easily get out of sync.
>
> Thanks,
> Xuefu
>
> On Fri, Sep 6, 2019 at 1:57 PM JingsongLee <[hidden email]>
> wrote:
>
>
> Thanks dawid, +1 for this approach.
>
> One concern is the removal of registerTableSink & registerTableSource
>   in TableEnvironment. It has two alternatives:
> 1.the properties approach (DDL, descriptor).
> 2.from/toDataStream.
>
> #1 can only be properties, not java states, and some Connectors
>   are difficult to convert all states to properties.
> #2 can contain java state. But can't use TableSource-related features,
> like project & filter push down, partition support, etc..
>
> Any idea about this?
>
> Best,
> Jingsong Lee
>
>
> ------------------------------------------------------------------
> From:Dawid Wysakowicz <[hidden email]>
> Send Time:2019年9月4日(星期三) 22:20
> To:dev <[hidden email]>
> Subject:[DISCUSS] FLIP-64: Support for Temporary Objects in Table module
>
> Hi all,
> As part of FLIP-30 a Catalog API was introduced that enables storing table
> meta objects permanently. At the same time the majority of current APIs
> create temporary objects that cannot be serialized. We should clarify the
> creation of meta objects (tables, views, functions) in a unified way.
> Another current problem in the API is that all the temporary objects are
> stored in a special built-in catalog, which is not very intuitive for many
> users, as they must be aware of that catalog to reference temporary objects.
> Lastly, different APIs have different ways of providing object paths:
>
> String path…,
> String path, String pathContinued…
> String name
> We should choose one approach and unify it across all APIs.
> I suggest a FLIP to address the above issues.
> Looking forward to your opinions.
> FLIP link:
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table module

Kurt Young
Hi all,

Sorry to join this party late. Big +1 to this flip, especially for the
dropping
"registerTableSink & registerTableSource" part. These are indeed legacy
and we should try to unify them through CatalogTable after we introduce
the concept of Catalog.

From my understanding, what we can registered should all be metadata,
TableSource/TableSink should only be the one who is responsible to do
the real work, i.e. reading and writing data according to the schema and
other information like computed column, partition, .e.g.

Best,
Kurt


On Wed, Sep 18, 2019 at 5:14 PM JingsongLee <[hidden email]>
wrote:

> After some development and thinking, I have a general understanding.
> +1 to registering a source/sink does not fit into the SQL world.
> I am OK to have a deprecated registerTemporarySource/Sink to compatible
> with old ways.
>
> Best,
> Jingsong Lee
>
>
> ------------------------------------------------------------------
> From:Timo Walther <[hidden email]>
> Send Time:2019年9月17日(星期二) 08:00
> To:dev <[hidden email]>
> Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table
> module
>
> Hi Dawid,
>
> thanks for the design document. It fixes big concept gaps due to
> historical reasons with proper support for serializability and catalog
> support in mind.
>
> I would not mind a registerTemporarySource/Sink, but the problem that I
> see is that many people think that this is the recommended way of
> registering a table source/sink which is not true. We should guide users
> to either use connect() or DDL API which can be validated and stored in
> catalog.
>
> Also from a concept perspective, registering a source/sink does not fit
> into the SQL world. SQL does not know about source/sinks but only about
> tables. If the responsibility of a TableSource/TableSink is just a pure
> physical data consumer/producer that is not connected to the actual
> logical table schema, we would need a possibility of defining time
> attributes and interpreting/converting a changelog. This should be done
> by the framework with information from the DDL/connect() and not be
> defined in every table source.
>
> Regards,
> Timo
>
>
> On 09.09.19 14:16, JingsongLee wrote:
> > Hi dawid:
> >
> > It is difficult to describe specific examples.
> > Sometimes users will generate some java converters through some
> >   Java code, or generate some Java classes through third-party
> >   libraries. Of course, these can be best done through properties.
> > But this requires additional work from users.My suggestion is to
> >   keep this Java instance class way that is user-friendly.
> >
> > Best,
> > Jingsong Lee
> >
> >
> > ------------------------------------------------------------------
> > From:Dawid Wysakowicz <[hidden email]>
> > Send Time:2019年9月6日(星期五) 16:21
> > To:dev <[hidden email]>
> > Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table
> module
> >
> > Hi all,
> > @Jingsong Could you elaborate a bit more what do you mean by
> > "some Connectors are difficult to convert all states to properties"
> > All the Flink provided connectors will definitely be expressible with
> properties (In the end you should be able to use them from DDL). I think if
> a TableSource is complex enough that it handles filter push down, partition
> support etc. should rather be made available both from DDL & java/scala
> code. I'm happy to reconsider adding registerTemporaryTable(String path,
> TableSource source) if you have some concrete examples in mind.
> >
> >
> > @Xuefu: We also considered the ObjectIdentifier (or actually introducing
> a new identifier representation to differentiate between resolved and
> unresolved identifiers) with the same concerns. We decided to suggest the
> string & parsing logic because of usability.
> >      tEnv.from("cat.db.table")
> > is shorter and easier to write than
> >      tEnv.from(Identifier.for("cat", "db", "name")
> > And also implicitly solves the problem what happens if a user (e.g. used
> to other systems) uses that API in a following manner:
> >      tEnv.from(Identifier.for("db.name")
> > I'm happy to revisit it if the general consensus is that it's better to
> use the OO aproach.
> > Best,
> > Dawid
> >
> > On 06/09/2019 10:00, Xuefu Z wrote:
> >
> > Thanks to Dawid for starting the discussion and writeup. It looks pretty
> > good to me except that I'm a little concerned about the object reference
> > and string parsing in the code, which seems to an anti-pattern to OOP.
> Have
> > we considered using ObjectIdenitifier with optional catalog and db parts,
> > esp. if we are worried about arguments of variable length or method
> > overloading? It's quite likely that the result of string parsing is an
> > ObjectIdentifier instance any way.
> >
> > Having string parsing logic in the code is a little dangerous as it
> > duplicates part of the DDL/DML parsing, and they can easily get out of
> sync.
> >
> > Thanks,
> > Xuefu
> >
> > On Fri, Sep 6, 2019 at 1:57 PM JingsongLee <[hidden email]
> .invalid>
> > wrote:
> >
> >
> > Thanks dawid, +1 for this approach.
> >
> > One concern is the removal of registerTableSink & registerTableSource
> >   in TableEnvironment. It has two alternatives:
> > 1.the properties approach (DDL, descriptor).
> > 2.from/toDataStream.
> >
> > #1 can only be properties, not java states, and some Connectors
> >   are difficult to convert all states to properties.
> > #2 can contain java state. But can't use TableSource-related features,
> > like project & filter push down, partition support, etc..
> >
> > Any idea about this?
> >
> > Best,
> > Jingsong Lee
> >
> >
> > ------------------------------------------------------------------
> > From:Dawid Wysakowicz <[hidden email]>
> > Send Time:2019年9月4日(星期三) 22:20
> > To:dev <[hidden email]>
> > Subject:[DISCUSS] FLIP-64: Support for Temporary Objects in Table module
> >
> > Hi all,
> > As part of FLIP-30 a Catalog API was introduced that enables storing
> table
> > meta objects permanently. At the same time the majority of current APIs
> > create temporary objects that cannot be serialized. We should clarify the
> > creation of meta objects (tables, views, functions) in a unified way.
> > Another current problem in the API is that all the temporary objects are
> > stored in a special built-in catalog, which is not very intuitive for
> many
> > users, as they must be aware of that catalog to reference temporary
> objects.
> > Lastly, different APIs have different ways of providing object paths:
> >
> > String path…,
> > String path, String pathContinued…
> > String name
> > We should choose one approach and unify it across all APIs.
> > I suggest a FLIP to address the above issues.
> > Looking forward to your opinions.
> > FLIP link:
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module
> >
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table module

JingsongLee-2
Hi dawid:
Can temporary tables achieve the same capabilities as catalog table?
like statistics: CatalogTableStatistics, CatalogColumnStatistics, PartitionStatistics
like partition support: we have added some catalog equivalent interfaces on TableSource/TableSink: getPartitions, getPartitionFieldNames
Maybe it's not a good idea to add these interfaces to TableSource/TableSink. What do you think?

Best,
Jingsong Lee


------------------------------------------------------------------
From:Kurt Young <[hidden email]>
Send Time:2019年9月18日(星期三) 17:54
To:dev <[hidden email]>
Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table module

Hi all,

Sorry to join this party late. Big +1 to this flip, especially for the
dropping
"registerTableSink & registerTableSource" part. These are indeed legacy
and we should try to unify them through CatalogTable after we introduce
the concept of Catalog.

From my understanding, what we can registered should all be metadata,
TableSource/TableSink should only be the one who is responsible to do
the real work, i.e. reading and writing data according to the schema and
other information like computed column, partition, .e.g.

Best,
Kurt


On Wed, Sep 18, 2019 at 5:14 PM JingsongLee <[hidden email]>
wrote:

> After some development and thinking, I have a general understanding.
> +1 to registering a source/sink does not fit into the SQL world.
> I am OK to have a deprecated registerTemporarySource/Sink to compatible
> with old ways.
>
> Best,
> Jingsong Lee
>
>
> ------------------------------------------------------------------
> From:Timo Walther <[hidden email]>
> Send Time:2019年9月17日(星期二) 08:00
> To:dev <[hidden email]>
> Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table
> module
>
> Hi Dawid,
>
> thanks for the design document. It fixes big concept gaps due to
> historical reasons with proper support for serializability and catalog
> support in mind.
>
> I would not mind a registerTemporarySource/Sink, but the problem that I
> see is that many people think that this is the recommended way of
> registering a table source/sink which is not true. We should guide users
> to either use connect() or DDL API which can be validated and stored in
> catalog.
>
> Also from a concept perspective, registering a source/sink does not fit
> into the SQL world. SQL does not know about source/sinks but only about
> tables. If the responsibility of a TableSource/TableSink is just a pure
> physical data consumer/producer that is not connected to the actual
> logical table schema, we would need a possibility of defining time
> attributes and interpreting/converting a changelog. This should be done
> by the framework with information from the DDL/connect() and not be
> defined in every table source.
>
> Regards,
> Timo
>
>
> On 09.09.19 14:16, JingsongLee wrote:
> > Hi dawid:
> >
> > It is difficult to describe specific examples.
> > Sometimes users will generate some java converters through some
> >   Java code, or generate some Java classes through third-party
> >   libraries. Of course, these can be best done through properties.
> > But this requires additional work from users.My suggestion is to
> >   keep this Java instance class way that is user-friendly.
> >
> > Best,
> > Jingsong Lee
> >
> >
> > ------------------------------------------------------------------
> > From:Dawid Wysakowicz <[hidden email]>
> > Send Time:2019年9月6日(星期五) 16:21
> > To:dev <[hidden email]>
> > Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table
> module
> >
> > Hi all,
> > @Jingsong Could you elaborate a bit more what do you mean by
> > "some Connectors are difficult to convert all states to properties"
> > All the Flink provided connectors will definitely be expressible with
> properties (In the end you should be able to use them from DDL). I think if
> a TableSource is complex enough that it handles filter push down, partition
> support etc. should rather be made available both from DDL & java/scala
> code. I'm happy to reconsider adding registerTemporaryTable(String path,
> TableSource source) if you have some concrete examples in mind.
> >
> >
> > @Xuefu: We also considered the ObjectIdentifier (or actually introducing
> a new identifier representation to differentiate between resolved and
> unresolved identifiers) with the same concerns. We decided to suggest the
> string & parsing logic because of usability.
> >      tEnv.from("cat.db.table")
> > is shorter and easier to write than
> >      tEnv.from(Identifier.for("cat", "db", "name")
> > And also implicitly solves the problem what happens if a user (e.g. used
> to other systems) uses that API in a following manner:
> >      tEnv.from(Identifier.for("db.name")
> > I'm happy to revisit it if the general consensus is that it's better to
> use the OO aproach.
> > Best,
> > Dawid
> >
> > On 06/09/2019 10:00, Xuefu Z wrote:
> >
> > Thanks to Dawid for starting the discussion and writeup. It looks pretty
> > good to me except that I'm a little concerned about the object reference
> > and string parsing in the code, which seems to an anti-pattern to OOP.
> Have
> > we considered using ObjectIdenitifier with optional catalog and db parts,
> > esp. if we are worried about arguments of variable length or method
> > overloading? It's quite likely that the result of string parsing is an
> > ObjectIdentifier instance any way.
> >
> > Having string parsing logic in the code is a little dangerous as it
> > duplicates part of the DDL/DML parsing, and they can easily get out of
> sync.
> >
> > Thanks,
> > Xuefu
> >
> > On Fri, Sep 6, 2019 at 1:57 PM JingsongLee <[hidden email]
> .invalid>
> > wrote:
> >
> >
> > Thanks dawid, +1 for this approach.
> >
> > One concern is the removal of registerTableSink & registerTableSource
> >   in TableEnvironment. It has two alternatives:
> > 1.the properties approach (DDL, descriptor).
> > 2.from/toDataStream.
> >
> > #1 can only be properties, not java states, and some Connectors
> >   are difficult to convert all states to properties.
> > #2 can contain java state. But can't use TableSource-related features,
> > like project & filter push down, partition support, etc..
> >
> > Any idea about this?
> >
> > Best,
> > Jingsong Lee
> >
> >
> > ------------------------------------------------------------------
> > From:Dawid Wysakowicz <[hidden email]>
> > Send Time:2019年9月4日(星期三) 22:20
> > To:dev <[hidden email]>
> > Subject:[DISCUSS] FLIP-64: Support for Temporary Objects in Table module
> >
> > Hi all,
> > As part of FLIP-30 a Catalog API was introduced that enables storing
> table
> > meta objects permanently. At the same time the majority of current APIs
> > create temporary objects that cannot be serialized. We should clarify the
> > creation of meta objects (tables, views, functions) in a unified way.
> > Another current problem in the API is that all the temporary objects are
> > stored in a special built-in catalog, which is not very intuitive for
> many
> > users, as they must be aware of that catalog to reference temporary
> objects.
> > Lastly, different APIs have different ways of providing object paths:
> >
> > String path…,
> > String path, String pathContinued…
> > String name
> > We should choose one approach and unify it across all APIs.
> > I suggest a FLIP to address the above issues.
> > Looking forward to your opinions.
> > FLIP link:
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module
> >

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table module

Dawid Wysakowicz
Hi JingsongLee,
From my understanding they can. Underneath they will be CatalogTables. The
difference is the lifetime of the tables. Plus some of the user facing
interfaces cannot be persisted e.g. datastream. Therefore we must have a
separate methods for that. In the end the temporary tables are held in
memory as CatalogTables.
Best,
Dawid

On Thu, 19 Sep 2019, 10:08 JingsongLee, <[hidden email]>
wrote:

> Hi dawid:
> Can temporary tables achieve the same capabilities as catalog table?
> like statistics: CatalogTableStatistics, CatalogColumnStatistics,
> PartitionStatistics
> like partition support: we have added some catalog equivalent interfaces
> on TableSource/TableSink: getPartitions, getPartitionFieldNames
> Maybe it's not a good idea to add these interfaces to
> TableSource/TableSink. What do you think?
>
> Best,
> Jingsong Lee
>
>
> ------------------------------------------------------------------
> From:Kurt Young <[hidden email]>
> Send Time:2019年9月18日(星期三) 17:54
> To:dev <[hidden email]>
> Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table
> module
>
> Hi all,
>
> Sorry to join this party late. Big +1 to this flip, especially for the
> dropping
> "registerTableSink & registerTableSource" part. These are indeed legacy
> and we should try to unify them through CatalogTable after we introduce
> the concept of Catalog.
>
> From my understanding, what we can registered should all be metadata,
> TableSource/TableSink should only be the one who is responsible to do
> the real work, i.e. reading and writing data according to the schema and
> other information like computed column, partition, .e.g.
>
> Best,
> Kurt
>
>
> On Wed, Sep 18, 2019 at 5:14 PM JingsongLee <[hidden email]
> .invalid>
> wrote:
>
> > After some development and thinking, I have a general understanding.
> > +1 to registering a source/sink does not fit into the SQL world.
> > I am OK to have a deprecated registerTemporarySource/Sink to compatible
> > with old ways.
> >
> > Best,
> > Jingsong Lee
> >
> >
> > ------------------------------------------------------------------
> > From:Timo Walther <[hidden email]>
> > Send Time:2019年9月17日(星期二) 08:00
> > To:dev <[hidden email]>
> > Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table
> > module
> >
> > Hi Dawid,
> >
> > thanks for the design document. It fixes big concept gaps due to
> > historical reasons with proper support for serializability and catalog
> > support in mind.
> >
> > I would not mind a registerTemporarySource/Sink, but the problem that I
> > see is that many people think that this is the recommended way of
> > registering a table source/sink which is not true. We should guide users
> > to either use connect() or DDL API which can be validated and stored in
> > catalog.
> >
> > Also from a concept perspective, registering a source/sink does not fit
> > into the SQL world. SQL does not know about source/sinks but only about
> > tables. If the responsibility of a TableSource/TableSink is just a pure
> > physical data consumer/producer that is not connected to the actual
> > logical table schema, we would need a possibility of defining time
> > attributes and interpreting/converting a changelog. This should be done
> > by the framework with information from the DDL/connect() and not be
> > defined in every table source.
> >
> > Regards,
> > Timo
> >
> >
> > On 09.09.19 14:16, JingsongLee wrote:
> > > Hi dawid:
> > >
> > > It is difficult to describe specific examples.
> > > Sometimes users will generate some java converters through some
> > >   Java code, or generate some Java classes through third-party
> > >   libraries. Of course, these can be best done through properties.
> > > But this requires additional work from users.My suggestion is to
> > >   keep this Java instance class way that is user-friendly.
> > >
> > > Best,
> > > Jingsong Lee
> > >
> > >
> > > ------------------------------------------------------------------
> > > From:Dawid Wysakowicz <[hidden email]>
> > > Send Time:2019年9月6日(星期五) 16:21
> > > To:dev <[hidden email]>
> > > Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table
> > module
> > >
> > > Hi all,
> > > @Jingsong Could you elaborate a bit more what do you mean by
> > > "some Connectors are difficult to convert all states to properties"
> > > All the Flink provided connectors will definitely be expressible with
> > properties (In the end you should be able to use them from DDL). I think
> if
> > a TableSource is complex enough that it handles filter push down,
> partition
> > support etc. should rather be made available both from DDL & java/scala
> > code. I'm happy to reconsider adding registerTemporaryTable(String path,
> > TableSource source) if you have some concrete examples in mind.
> > >
> > >
> > > @Xuefu: We also considered the ObjectIdentifier (or actually
> introducing
> > a new identifier representation to differentiate between resolved and
> > unresolved identifiers) with the same concerns. We decided to suggest the
> > string & parsing logic because of usability.
> > >      tEnv.from("cat.db.table")
> > > is shorter and easier to write than
> > >      tEnv.from(Identifier.for("cat", "db", "name")
> > > And also implicitly solves the problem what happens if a user (e.g.
> used
> > to other systems) uses that API in a following manner:
> > >      tEnv.from(Identifier.for("db.name")
> > > I'm happy to revisit it if the general consensus is that it's better to
> > use the OO aproach.
> > > Best,
> > > Dawid
> > >
> > > On 06/09/2019 10:00, Xuefu Z wrote:
> > >
> > > Thanks to Dawid for starting the discussion and writeup. It looks
> pretty
> > > good to me except that I'm a little concerned about the object
> reference
> > > and string parsing in the code, which seems to an anti-pattern to OOP.
> > Have
> > > we considered using ObjectIdenitifier with optional catalog and db
> parts,
> > > esp. if we are worried about arguments of variable length or method
> > > overloading? It's quite likely that the result of string parsing is an
> > > ObjectIdentifier instance any way.
> > >
> > > Having string parsing logic in the code is a little dangerous as it
> > > duplicates part of the DDL/DML parsing, and they can easily get out of
> > sync.
> > >
> > > Thanks,
> > > Xuefu
> > >
> > > On Fri, Sep 6, 2019 at 1:57 PM JingsongLee <[hidden email]
> > .invalid>
> > > wrote:
> > >
> > >
> > > Thanks dawid, +1 for this approach.
> > >
> > > One concern is the removal of registerTableSink & registerTableSource
> > >   in TableEnvironment. It has two alternatives:
> > > 1.the properties approach (DDL, descriptor).
> > > 2.from/toDataStream.
> > >
> > > #1 can only be properties, not java states, and some Connectors
> > >   are difficult to convert all states to properties.
> > > #2 can contain java state. But can't use TableSource-related features,
> > > like project & filter push down, partition support, etc..
> > >
> > > Any idea about this?
> > >
> > > Best,
> > > Jingsong Lee
> > >
> > >
> > > ------------------------------------------------------------------
> > > From:Dawid Wysakowicz <[hidden email]>
> > > Send Time:2019年9月4日(星期三) 22:20
> > > To:dev <[hidden email]>
> > > Subject:[DISCUSS] FLIP-64: Support for Temporary Objects in Table
> module
> > >
> > > Hi all,
> > > As part of FLIP-30 a Catalog API was introduced that enables storing
> > table
> > > meta objects permanently. At the same time the majority of current APIs
> > > create temporary objects that cannot be serialized. We should clarify
> the
> > > creation of meta objects (tables, views, functions) in a unified way.
> > > Another current problem in the API is that all the temporary objects
> are
> > > stored in a special built-in catalog, which is not very intuitive for
> > many
> > > users, as they must be aware of that catalog to reference temporary
> > objects.
> > > Lastly, different APIs have different ways of providing object paths:
> > >
> > > String path…,
> > > String path, String pathContinued…
> > > String name
> > > We should choose one approach and unify it across all APIs.
> > > I suggest a FLIP to address the above issues.
> > > Looking forward to your opinions.
> > > FLIP link:
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module
> > >
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table module

Jark Wu-2
Thanks Dawid for the design doc.

In general, I’m +1 to the FLIP.


+1 to the single-string and parse way to express object path.

+1 to deprecate registerTableSink & registerTableSource.
But I would suggest to provide an easy way to register a custom source/sink before we drop them (this is another story).
Currently, it’s not easy to implement a custom connector descriptor.

Best,
Jark


> 在 2019年9月19日,11:37,Dawid Wysakowicz <[hidden email]> 写道:
>
> Hi JingsongLee,
> From my understanding they can. Underneath they will be CatalogTables. The
> difference is the lifetime of the tables. Plus some of the user facing
> interfaces cannot be persisted e.g. datastream. Therefore we must have a
> separate methods for that. In the end the temporary tables are held in
> memory as CatalogTables.
> Best,
> Dawid
>
> On Thu, 19 Sep 2019, 10:08 JingsongLee, <[hidden email]>
> wrote:
>
>> Hi dawid:
>> Can temporary tables achieve the same capabilities as catalog table?
>> like statistics: CatalogTableStatistics, CatalogColumnStatistics,
>> PartitionStatistics
>> like partition support: we have added some catalog equivalent interfaces
>> on TableSource/TableSink: getPartitions, getPartitionFieldNames
>> Maybe it's not a good idea to add these interfaces to
>> TableSource/TableSink. What do you think?
>>
>> Best,
>> Jingsong Lee
>>
>>
>> ------------------------------------------------------------------
>> From:Kurt Young <[hidden email]>
>> Send Time:2019年9月18日(星期三) 17:54
>> To:dev <[hidden email]>
>> Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table
>> module
>>
>> Hi all,
>>
>> Sorry to join this party late. Big +1 to this flip, especially for the
>> dropping
>> "registerTableSink & registerTableSource" part. These are indeed legacy
>> and we should try to unify them through CatalogTable after we introduce
>> the concept of Catalog.
>>
>> From my understanding, what we can registered should all be metadata,
>> TableSource/TableSink should only be the one who is responsible to do
>> the real work, i.e. reading and writing data according to the schema and
>> other information like computed column, partition, .e.g.
>>
>> Best,
>> Kurt
>>
>>
>> On Wed, Sep 18, 2019 at 5:14 PM JingsongLee <[hidden email]
>> .invalid>
>> wrote:
>>
>>> After some development and thinking, I have a general understanding.
>>> +1 to registering a source/sink does not fit into the SQL world.
>>> I am OK to have a deprecated registerTemporarySource/Sink to compatible
>>> with old ways.
>>>
>>> Best,
>>> Jingsong Lee
>>>
>>>
>>> ------------------------------------------------------------------
>>> From:Timo Walther <[hidden email]>
>>> Send Time:2019年9月17日(星期二) 08:00
>>> To:dev <[hidden email]>
>>> Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table
>>> module
>>>
>>> Hi Dawid,
>>>
>>> thanks for the design document. It fixes big concept gaps due to
>>> historical reasons with proper support for serializability and catalog
>>> support in mind.
>>>
>>> I would not mind a registerTemporarySource/Sink, but the problem that I
>>> see is that many people think that this is the recommended way of
>>> registering a table source/sink which is not true. We should guide users
>>> to either use connect() or DDL API which can be validated and stored in
>>> catalog.
>>>
>>> Also from a concept perspective, registering a source/sink does not fit
>>> into the SQL world. SQL does not know about source/sinks but only about
>>> tables. If the responsibility of a TableSource/TableSink is just a pure
>>> physical data consumer/producer that is not connected to the actual
>>> logical table schema, we would need a possibility of defining time
>>> attributes and interpreting/converting a changelog. This should be done
>>> by the framework with information from the DDL/connect() and not be
>>> defined in every table source.
>>>
>>> Regards,
>>> Timo
>>>
>>>
>>> On 09.09.19 14:16, JingsongLee wrote:
>>>> Hi dawid:
>>>>
>>>> It is difficult to describe specific examples.
>>>> Sometimes users will generate some java converters through some
>>>>  Java code, or generate some Java classes through third-party
>>>>  libraries. Of course, these can be best done through properties.
>>>> But this requires additional work from users.My suggestion is to
>>>>  keep this Java instance class way that is user-friendly.
>>>>
>>>> Best,
>>>> Jingsong Lee
>>>>
>>>>
>>>> ------------------------------------------------------------------
>>>> From:Dawid Wysakowicz <[hidden email]>
>>>> Send Time:2019年9月6日(星期五) 16:21
>>>> To:dev <[hidden email]>
>>>> Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table
>>> module
>>>>
>>>> Hi all,
>>>> @Jingsong Could you elaborate a bit more what do you mean by
>>>> "some Connectors are difficult to convert all states to properties"
>>>> All the Flink provided connectors will definitely be expressible with
>>> properties (In the end you should be able to use them from DDL). I think
>> if
>>> a TableSource is complex enough that it handles filter push down,
>> partition
>>> support etc. should rather be made available both from DDL & java/scala
>>> code. I'm happy to reconsider adding registerTemporaryTable(String path,
>>> TableSource source) if you have some concrete examples in mind.
>>>>
>>>>
>>>> @Xuefu: We also considered the ObjectIdentifier (or actually
>> introducing
>>> a new identifier representation to differentiate between resolved and
>>> unresolved identifiers) with the same concerns. We decided to suggest the
>>> string & parsing logic because of usability.
>>>>     tEnv.from("cat.db.table")
>>>> is shorter and easier to write than
>>>>     tEnv.from(Identifier.for("cat", "db", "name")
>>>> And also implicitly solves the problem what happens if a user (e.g.
>> used
>>> to other systems) uses that API in a following manner:
>>>>     tEnv.from(Identifier.for("db.name")
>>>> I'm happy to revisit it if the general consensus is that it's better to
>>> use the OO aproach.
>>>> Best,
>>>> Dawid
>>>>
>>>> On 06/09/2019 10:00, Xuefu Z wrote:
>>>>
>>>> Thanks to Dawid for starting the discussion and writeup. It looks
>> pretty
>>>> good to me except that I'm a little concerned about the object
>> reference
>>>> and string parsing in the code, which seems to an anti-pattern to OOP.
>>> Have
>>>> we considered using ObjectIdenitifier with optional catalog and db
>> parts,
>>>> esp. if we are worried about arguments of variable length or method
>>>> overloading? It's quite likely that the result of string parsing is an
>>>> ObjectIdentifier instance any way.
>>>>
>>>> Having string parsing logic in the code is a little dangerous as it
>>>> duplicates part of the DDL/DML parsing, and they can easily get out of
>>> sync.
>>>>
>>>> Thanks,
>>>> Xuefu
>>>>
>>>> On Fri, Sep 6, 2019 at 1:57 PM JingsongLee <[hidden email]
>>> .invalid>
>>>> wrote:
>>>>
>>>>
>>>> Thanks dawid, +1 for this approach.
>>>>
>>>> One concern is the removal of registerTableSink & registerTableSource
>>>>  in TableEnvironment. It has two alternatives:
>>>> 1.the properties approach (DDL, descriptor).
>>>> 2.from/toDataStream.
>>>>
>>>> #1 can only be properties, not java states, and some Connectors
>>>>  are difficult to convert all states to properties.
>>>> #2 can contain java state. But can't use TableSource-related features,
>>>> like project & filter push down, partition support, etc..
>>>>
>>>> Any idea about this?
>>>>
>>>> Best,
>>>> Jingsong Lee
>>>>
>>>>
>>>> ------------------------------------------------------------------
>>>> From:Dawid Wysakowicz <[hidden email]>
>>>> Send Time:2019年9月4日(星期三) 22:20
>>>> To:dev <[hidden email]>
>>>> Subject:[DISCUSS] FLIP-64: Support for Temporary Objects in Table
>> module
>>>>
>>>> Hi all,
>>>> As part of FLIP-30 a Catalog API was introduced that enables storing
>>> table
>>>> meta objects permanently. At the same time the majority of current APIs
>>>> create temporary objects that cannot be serialized. We should clarify
>> the
>>>> creation of meta objects (tables, views, functions) in a unified way.
>>>> Another current problem in the API is that all the temporary objects
>> are
>>>> stored in a special built-in catalog, which is not very intuitive for
>>> many
>>>> users, as they must be aware of that catalog to reference temporary
>>> objects.
>>>> Lastly, different APIs have different ways of providing object paths:
>>>>
>>>> String path…,
>>>> String path, String pathContinued…
>>>> String name
>>>> We should choose one approach and unify it across all APIs.
>>>> I suggest a FLIP to address the above issues.
>>>> Looking forward to your opinions.
>>>> FLIP link:
>>>>
>>>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module
>>>>
>>
>>

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table module

Kurt Young
IIUC it's good to see that both serializable (tables description from DDL)
and unserializable (tables with DataStream underneath) tables are treated
unify with CatalogTable.

Can I also assume functions that either come from a function class (from
DDL)
or function objects (newed by user) will also treated unify with
CatalogFunction?

This will greatly simplify and unify current API level concepts and design.

And it seems only one thing left, how do we deal with
ConnectTableDescriptor?
It's actually very similar with serializable CatalogTable, both carry some
text
properties which even are the same. Is there any chance we can further unify
this to CatalogTable?

object
Best,
Kurt


On Thu, Sep 19, 2019 at 3:13 PM Jark Wu <[hidden email]> wrote:

> Thanks Dawid for the design doc.
>
> In general, I’m +1 to the FLIP.
>
>
> +1 to the single-string and parse way to express object path.
>
> +1 to deprecate registerTableSink & registerTableSource.
> But I would suggest to provide an easy way to register a custom
> source/sink before we drop them (this is another story).
> Currently, it’s not easy to implement a custom connector descriptor.
>
> Best,
> Jark
>
>
> > 在 2019年9月19日,11:37,Dawid Wysakowicz <[hidden email]> 写道:
> >
> > Hi JingsongLee,
> > From my understanding they can. Underneath they will be CatalogTables.
> The
> > difference is the lifetime of the tables. Plus some of the user facing
> > interfaces cannot be persisted e.g. datastream. Therefore we must have a
> > separate methods for that. In the end the temporary tables are held in
> > memory as CatalogTables.
> > Best,
> > Dawid
> >
> > On Thu, 19 Sep 2019, 10:08 JingsongLee, <[hidden email]
> .invalid>
> > wrote:
> >
> >> Hi dawid:
> >> Can temporary tables achieve the same capabilities as catalog table?
> >> like statistics: CatalogTableStatistics, CatalogColumnStatistics,
> >> PartitionStatistics
> >> like partition support: we have added some catalog equivalent interfaces
> >> on TableSource/TableSink: getPartitions, getPartitionFieldNames
> >> Maybe it's not a good idea to add these interfaces to
> >> TableSource/TableSink. What do you think?
> >>
> >> Best,
> >> Jingsong Lee
> >>
> >>
> >> ------------------------------------------------------------------
> >> From:Kurt Young <[hidden email]>
> >> Send Time:2019年9月18日(星期三) 17:54
> >> To:dev <[hidden email]>
> >> Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table
> >> module
> >>
> >> Hi all,
> >>
> >> Sorry to join this party late. Big +1 to this flip, especially for the
> >> dropping
> >> "registerTableSink & registerTableSource" part. These are indeed legacy
> >> and we should try to unify them through CatalogTable after we introduce
> >> the concept of Catalog.
> >>
> >> From my understanding, what we can registered should all be metadata,
> >> TableSource/TableSink should only be the one who is responsible to do
> >> the real work, i.e. reading and writing data according to the schema and
> >> other information like computed column, partition, .e.g.
> >>
> >> Best,
> >> Kurt
> >>
> >>
> >> On Wed, Sep 18, 2019 at 5:14 PM JingsongLee <[hidden email]
> >> .invalid>
> >> wrote:
> >>
> >>> After some development and thinking, I have a general understanding.
> >>> +1 to registering a source/sink does not fit into the SQL world.
> >>> I am OK to have a deprecated registerTemporarySource/Sink to compatible
> >>> with old ways.
> >>>
> >>> Best,
> >>> Jingsong Lee
> >>>
> >>>
> >>> ------------------------------------------------------------------
> >>> From:Timo Walther <[hidden email]>
> >>> Send Time:2019年9月17日(星期二) 08:00
> >>> To:dev <[hidden email]>
> >>> Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table
> >>> module
> >>>
> >>> Hi Dawid,
> >>>
> >>> thanks for the design document. It fixes big concept gaps due to
> >>> historical reasons with proper support for serializability and catalog
> >>> support in mind.
> >>>
> >>> I would not mind a registerTemporarySource/Sink, but the problem that I
> >>> see is that many people think that this is the recommended way of
> >>> registering a table source/sink which is not true. We should guide
> users
> >>> to either use connect() or DDL API which can be validated and stored in
> >>> catalog.
> >>>
> >>> Also from a concept perspective, registering a source/sink does not fit
> >>> into the SQL world. SQL does not know about source/sinks but only about
> >>> tables. If the responsibility of a TableSource/TableSink is just a pure
> >>> physical data consumer/producer that is not connected to the actual
> >>> logical table schema, we would need a possibility of defining time
> >>> attributes and interpreting/converting a changelog. This should be done
> >>> by the framework with information from the DDL/connect() and not be
> >>> defined in every table source.
> >>>
> >>> Regards,
> >>> Timo
> >>>
> >>>
> >>> On 09.09.19 14:16, JingsongLee wrote:
> >>>> Hi dawid:
> >>>>
> >>>> It is difficult to describe specific examples.
> >>>> Sometimes users will generate some java converters through some
> >>>>  Java code, or generate some Java classes through third-party
> >>>>  libraries. Of course, these can be best done through properties.
> >>>> But this requires additional work from users.My suggestion is to
> >>>>  keep this Java instance class way that is user-friendly.
> >>>>
> >>>> Best,
> >>>> Jingsong Lee
> >>>>
> >>>>
> >>>> ------------------------------------------------------------------
> >>>> From:Dawid Wysakowicz <[hidden email]>
> >>>> Send Time:2019年9月6日(星期五) 16:21
> >>>> To:dev <[hidden email]>
> >>>> Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table
> >>> module
> >>>>
> >>>> Hi all,
> >>>> @Jingsong Could you elaborate a bit more what do you mean by
> >>>> "some Connectors are difficult to convert all states to properties"
> >>>> All the Flink provided connectors will definitely be expressible with
> >>> properties (In the end you should be able to use them from DDL). I
> think
> >> if
> >>> a TableSource is complex enough that it handles filter push down,
> >> partition
> >>> support etc. should rather be made available both from DDL & java/scala
> >>> code. I'm happy to reconsider adding registerTemporaryTable(String
> path,
> >>> TableSource source) if you have some concrete examples in mind.
> >>>>
> >>>>
> >>>> @Xuefu: We also considered the ObjectIdentifier (or actually
> >> introducing
> >>> a new identifier representation to differentiate between resolved and
> >>> unresolved identifiers) with the same concerns. We decided to suggest
> the
> >>> string & parsing logic because of usability.
> >>>>     tEnv.from("cat.db.table")
> >>>> is shorter and easier to write than
> >>>>     tEnv.from(Identifier.for("cat", "db", "name")
> >>>> And also implicitly solves the problem what happens if a user (e.g.
> >> used
> >>> to other systems) uses that API in a following manner:
> >>>>     tEnv.from(Identifier.for("db.name")
> >>>> I'm happy to revisit it if the general consensus is that it's better
> to
> >>> use the OO aproach.
> >>>> Best,
> >>>> Dawid
> >>>>
> >>>> On 06/09/2019 10:00, Xuefu Z wrote:
> >>>>
> >>>> Thanks to Dawid for starting the discussion and writeup. It looks
> >> pretty
> >>>> good to me except that I'm a little concerned about the object
> >> reference
> >>>> and string parsing in the code, which seems to an anti-pattern to OOP.
> >>> Have
> >>>> we considered using ObjectIdenitifier with optional catalog and db
> >> parts,
> >>>> esp. if we are worried about arguments of variable length or method
> >>>> overloading? It's quite likely that the result of string parsing is an
> >>>> ObjectIdentifier instance any way.
> >>>>
> >>>> Having string parsing logic in the code is a little dangerous as it
> >>>> duplicates part of the DDL/DML parsing, and they can easily get out of
> >>> sync.
> >>>>
> >>>> Thanks,
> >>>> Xuefu
> >>>>
> >>>> On Fri, Sep 6, 2019 at 1:57 PM JingsongLee <[hidden email]
> >>> .invalid>
> >>>> wrote:
> >>>>
> >>>>
> >>>> Thanks dawid, +1 for this approach.
> >>>>
> >>>> One concern is the removal of registerTableSink & registerTableSource
> >>>>  in TableEnvironment. It has two alternatives:
> >>>> 1.the properties approach (DDL, descriptor).
> >>>> 2.from/toDataStream.
> >>>>
> >>>> #1 can only be properties, not java states, and some Connectors
> >>>>  are difficult to convert all states to properties.
> >>>> #2 can contain java state. But can't use TableSource-related features,
> >>>> like project & filter push down, partition support, etc..
> >>>>
> >>>> Any idea about this?
> >>>>
> >>>> Best,
> >>>> Jingsong Lee
> >>>>
> >>>>
> >>>> ------------------------------------------------------------------
> >>>> From:Dawid Wysakowicz <[hidden email]>
> >>>> Send Time:2019年9月4日(星期三) 22:20
> >>>> To:dev <[hidden email]>
> >>>> Subject:[DISCUSS] FLIP-64: Support for Temporary Objects in Table
> >> module
> >>>>
> >>>> Hi all,
> >>>> As part of FLIP-30 a Catalog API was introduced that enables storing
> >>> table
> >>>> meta objects permanently. At the same time the majority of current
> APIs
> >>>> create temporary objects that cannot be serialized. We should clarify
> >> the
> >>>> creation of meta objects (tables, views, functions) in a unified way.
> >>>> Another current problem in the API is that all the temporary objects
> >> are
> >>>> stored in a special built-in catalog, which is not very intuitive for
> >>> many
> >>>> users, as they must be aware of that catalog to reference temporary
> >>> objects.
> >>>> Lastly, different APIs have different ways of providing object paths:
> >>>>
> >>>> String path…,
> >>>> String path, String pathContinued…
> >>>> String name
> >>>> We should choose one approach and unify it across all APIs.
> >>>> I suggest a FLIP to address the above issues.
> >>>> Looking forward to your opinions.
> >>>> FLIP link:
> >>>>
> >>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module
> >>>>
> >>
> >>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table module

dwysakowicz
Sorry for a very delayed response.

@Kurt Yes, this is the goal to have a function created like new
Function(...) also be wrapped into CatalogFunction. This would have to
be though a temporary function as we cannot represent that as a set of
properties. Similar to the createTemporaryView(DataStream stream).

As for the ConnectTableDescriptor I agree this is very similar to
CatalogTable. I am not sure though if we should get rid of it. In the
end I see it as a builder for a CatalogTable, which is a slightly more
internal API, but we might revisit that some time in the future if we
find that it makes more sense.

@All I updated the FLIP page with some more details from the outcome of
the discussions around FLIP-57. Please take a look. I would like to
start a vote on this FLIP as soon as the vote on FLIP-57 goes through.

Best,

Dawid


On 19/09/2019 09:24, Kurt Young wrote:

> IIUC it's good to see that both serializable (tables description from DDL)
> and unserializable (tables with DataStream underneath) tables are treated
> unify with CatalogTable.
>
> Can I also assume functions that either come from a function class (from
> DDL)
> or function objects (newed by user) will also treated unify with
> CatalogFunction?
>
> This will greatly simplify and unify current API level concepts and design.
>
> And it seems only one thing left, how do we deal with
> ConnectTableDescriptor?
> It's actually very similar with serializable CatalogTable, both carry some
> text
> properties which even are the same. Is there any chance we can further unify
> this to CatalogTable?
>
> object
> Best,
> Kurt
>
>
> On Thu, Sep 19, 2019 at 3:13 PM Jark Wu <[hidden email]> wrote:
>
>> Thanks Dawid for the design doc.
>>
>> In general, I’m +1 to the FLIP.
>>
>>
>> +1 to the single-string and parse way to express object path.
>>
>> +1 to deprecate registerTableSink & registerTableSource.
>> But I would suggest to provide an easy way to register a custom
>> source/sink before we drop them (this is another story).
>> Currently, it’s not easy to implement a custom connector descriptor.
>>
>> Best,
>> Jark
>>
>>
>>> 在 2019年9月19日,11:37,Dawid Wysakowicz <[hidden email]> 写道:
>>>
>>> Hi JingsongLee,
>>> From my understanding they can. Underneath they will be CatalogTables.
>> The
>>> difference is the lifetime of the tables. Plus some of the user facing
>>> interfaces cannot be persisted e.g. datastream. Therefore we must have a
>>> separate methods for that. In the end the temporary tables are held in
>>> memory as CatalogTables.
>>> Best,
>>> Dawid
>>>
>>> On Thu, 19 Sep 2019, 10:08 JingsongLee, <[hidden email]
>> .invalid>
>>> wrote:
>>>
>>>> Hi dawid:
>>>> Can temporary tables achieve the same capabilities as catalog table?
>>>> like statistics: CatalogTableStatistics, CatalogColumnStatistics,
>>>> PartitionStatistics
>>>> like partition support: we have added some catalog equivalent interfaces
>>>> on TableSource/TableSink: getPartitions, getPartitionFieldNames
>>>> Maybe it's not a good idea to add these interfaces to
>>>> TableSource/TableSink. What do you think?
>>>>
>>>> Best,
>>>> Jingsong Lee
>>>>
>>>>
>>>> ------------------------------------------------------------------
>>>> From:Kurt Young <[hidden email]>
>>>> Send Time:2019年9月18日(星期三) 17:54
>>>> To:dev <[hidden email]>
>>>> Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table
>>>> module
>>>>
>>>> Hi all,
>>>>
>>>> Sorry to join this party late. Big +1 to this flip, especially for the
>>>> dropping
>>>> "registerTableSink & registerTableSource" part. These are indeed legacy
>>>> and we should try to unify them through CatalogTable after we introduce
>>>> the concept of Catalog.
>>>>
>>>> From my understanding, what we can registered should all be metadata,
>>>> TableSource/TableSink should only be the one who is responsible to do
>>>> the real work, i.e. reading and writing data according to the schema and
>>>> other information like computed column, partition, .e.g.
>>>>
>>>> Best,
>>>> Kurt
>>>>
>>>>
>>>> On Wed, Sep 18, 2019 at 5:14 PM JingsongLee <[hidden email]
>>>> .invalid>
>>>> wrote:
>>>>
>>>>> After some development and thinking, I have a general understanding.
>>>>> +1 to registering a source/sink does not fit into the SQL world.
>>>>> I am OK to have a deprecated registerTemporarySource/Sink to compatible
>>>>> with old ways.
>>>>>
>>>>> Best,
>>>>> Jingsong Lee
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------
>>>>> From:Timo Walther <[hidden email]>
>>>>> Send Time:2019年9月17日(星期二) 08:00
>>>>> To:dev <[hidden email]>
>>>>> Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table
>>>>> module
>>>>>
>>>>> Hi Dawid,
>>>>>
>>>>> thanks for the design document. It fixes big concept gaps due to
>>>>> historical reasons with proper support for serializability and catalog
>>>>> support in mind.
>>>>>
>>>>> I would not mind a registerTemporarySource/Sink, but the problem that I
>>>>> see is that many people think that this is the recommended way of
>>>>> registering a table source/sink which is not true. We should guide
>> users
>>>>> to either use connect() or DDL API which can be validated and stored in
>>>>> catalog.
>>>>>
>>>>> Also from a concept perspective, registering a source/sink does not fit
>>>>> into the SQL world. SQL does not know about source/sinks but only about
>>>>> tables. If the responsibility of a TableSource/TableSink is just a pure
>>>>> physical data consumer/producer that is not connected to the actual
>>>>> logical table schema, we would need a possibility of defining time
>>>>> attributes and interpreting/converting a changelog. This should be done
>>>>> by the framework with information from the DDL/connect() and not be
>>>>> defined in every table source.
>>>>>
>>>>> Regards,
>>>>> Timo
>>>>>
>>>>>
>>>>> On 09.09.19 14:16, JingsongLee wrote:
>>>>>> Hi dawid:
>>>>>>
>>>>>> It is difficult to describe specific examples.
>>>>>> Sometimes users will generate some java converters through some
>>>>>>  Java code, or generate some Java classes through third-party
>>>>>>  libraries. Of course, these can be best done through properties.
>>>>>> But this requires additional work from users.My suggestion is to
>>>>>>  keep this Java instance class way that is user-friendly.
>>>>>>
>>>>>> Best,
>>>>>> Jingsong Lee
>>>>>>
>>>>>>
>>>>>> ------------------------------------------------------------------
>>>>>> From:Dawid Wysakowicz <[hidden email]>
>>>>>> Send Time:2019年9月6日(星期五) 16:21
>>>>>> To:dev <[hidden email]>
>>>>>> Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table
>>>>> module
>>>>>> Hi all,
>>>>>> @Jingsong Could you elaborate a bit more what do you mean by
>>>>>> "some Connectors are difficult to convert all states to properties"
>>>>>> All the Flink provided connectors will definitely be expressible with
>>>>> properties (In the end you should be able to use them from DDL). I
>> think
>>>> if
>>>>> a TableSource is complex enough that it handles filter push down,
>>>> partition
>>>>> support etc. should rather be made available both from DDL & java/scala
>>>>> code. I'm happy to reconsider adding registerTemporaryTable(String
>> path,
>>>>> TableSource source) if you have some concrete examples in mind.
>>>>>>
>>>>>> @Xuefu: We also considered the ObjectIdentifier (or actually
>>>> introducing
>>>>> a new identifier representation to differentiate between resolved and
>>>>> unresolved identifiers) with the same concerns. We decided to suggest
>> the
>>>>> string & parsing logic because of usability.
>>>>>>     tEnv.from("cat.db.table")
>>>>>> is shorter and easier to write than
>>>>>>     tEnv.from(Identifier.for("cat", "db", "name")
>>>>>> And also implicitly solves the problem what happens if a user (e.g.
>>>> used
>>>>> to other systems) uses that API in a following manner:
>>>>>>     tEnv.from(Identifier.for("db.name")
>>>>>> I'm happy to revisit it if the general consensus is that it's better
>> to
>>>>> use the OO aproach.
>>>>>> Best,
>>>>>> Dawid
>>>>>>
>>>>>> On 06/09/2019 10:00, Xuefu Z wrote:
>>>>>>
>>>>>> Thanks to Dawid for starting the discussion and writeup. It looks
>>>> pretty
>>>>>> good to me except that I'm a little concerned about the object
>>>> reference
>>>>>> and string parsing in the code, which seems to an anti-pattern to OOP.
>>>>> Have
>>>>>> we considered using ObjectIdenitifier with optional catalog and db
>>>> parts,
>>>>>> esp. if we are worried about arguments of variable length or method
>>>>>> overloading? It's quite likely that the result of string parsing is an
>>>>>> ObjectIdentifier instance any way.
>>>>>>
>>>>>> Having string parsing logic in the code is a little dangerous as it
>>>>>> duplicates part of the DDL/DML parsing, and they can easily get out of
>>>>> sync.
>>>>>> Thanks,
>>>>>> Xuefu
>>>>>>
>>>>>> On Fri, Sep 6, 2019 at 1:57 PM JingsongLee <[hidden email]
>>>>> .invalid>
>>>>>> wrote:
>>>>>>
>>>>>>
>>>>>> Thanks dawid, +1 for this approach.
>>>>>>
>>>>>> One concern is the removal of registerTableSink & registerTableSource
>>>>>>  in TableEnvironment. It has two alternatives:
>>>>>> 1.the properties approach (DDL, descriptor).
>>>>>> 2.from/toDataStream.
>>>>>>
>>>>>> #1 can only be properties, not java states, and some Connectors
>>>>>>  are difficult to convert all states to properties.
>>>>>> #2 can contain java state. But can't use TableSource-related features,
>>>>>> like project & filter push down, partition support, etc..
>>>>>>
>>>>>> Any idea about this?
>>>>>>
>>>>>> Best,
>>>>>> Jingsong Lee
>>>>>>
>>>>>>
>>>>>> ------------------------------------------------------------------
>>>>>> From:Dawid Wysakowicz <[hidden email]>
>>>>>> Send Time:2019年9月4日(星期三) 22:20
>>>>>> To:dev <[hidden email]>
>>>>>> Subject:[DISCUSS] FLIP-64: Support for Temporary Objects in Table
>>>> module
>>>>>> Hi all,
>>>>>> As part of FLIP-30 a Catalog API was introduced that enables storing
>>>>> table
>>>>>> meta objects permanently. At the same time the majority of current
>> APIs
>>>>>> create temporary objects that cannot be serialized. We should clarify
>>>> the
>>>>>> creation of meta objects (tables, views, functions) in a unified way.
>>>>>> Another current problem in the API is that all the temporary objects
>>>> are
>>>>>> stored in a special built-in catalog, which is not very intuitive for
>>>>> many
>>>>>> users, as they must be aware of that catalog to reference temporary
>>>>> objects.
>>>>>> Lastly, different APIs have different ways of providing object paths:
>>>>>>
>>>>>> String path…,
>>>>>> String path, String pathContinued…
>>>>>> String name
>>>>>> We should choose one approach and unify it across all APIs.
>>>>>> I suggest a FLIP to address the above issues.
>>>>>> Looking forward to your opinions.
>>>>>> FLIP link:
>>>>>>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module
>>>>
>>


signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table module

bowen.li
Hi Dawid,

+1 for proposed changes

On Wed, Oct 9, 2019 at 12:15 PM Dawid Wysakowicz <[hidden email]>
wrote:

> Sorry for a very delayed response.
>
> @Kurt Yes, this is the goal to have a function created like new
> Function(...) also be wrapped into CatalogFunction. This would have to
> be though a temporary function as we cannot represent that as a set of
> properties. Similar to the createTemporaryView(DataStream stream).
>
> As for the ConnectTableDescriptor I agree this is very similar to
> CatalogTable. I am not sure though if we should get rid of it. In the
> end I see it as a builder for a CatalogTable, which is a slightly more
> internal API, but we might revisit that some time in the future if we
> find that it makes more sense.
>
> @All I updated the FLIP page with some more details from the outcome of
> the discussions around FLIP-57. Please take a look. I would like to
> start a vote on this FLIP as soon as the vote on FLIP-57 goes through.
>
> Best,
>
> Dawid
>
>
> On 19/09/2019 09:24, Kurt Young wrote:
> > IIUC it's good to see that both serializable (tables description from
> DDL)
> > and unserializable (tables with DataStream underneath) tables are treated
> > unify with CatalogTable.
> >
> > Can I also assume functions that either come from a function class (from
> > DDL)
> > or function objects (newed by user) will also treated unify with
> > CatalogFunction?
> >
> > This will greatly simplify and unify current API level concepts and
> design.
> >
> > And it seems only one thing left, how do we deal with
> > ConnectTableDescriptor?
> > It's actually very similar with serializable CatalogTable, both carry
> some
> > text
> > properties which even are the same. Is there any chance we can further
> unify
> > this to CatalogTable?
> >
> > object
> > Best,
> > Kurt
> >
> >
> > On Thu, Sep 19, 2019 at 3:13 PM Jark Wu <[hidden email]> wrote:
> >
> >> Thanks Dawid for the design doc.
> >>
> >> In general, I’m +1 to the FLIP.
> >>
> >>
> >> +1 to the single-string and parse way to express object path.
> >>
> >> +1 to deprecate registerTableSink & registerTableSource.
> >> But I would suggest to provide an easy way to register a custom
> >> source/sink before we drop them (this is another story).
> >> Currently, it’s not easy to implement a custom connector descriptor.
> >>
> >> Best,
> >> Jark
> >>
> >>
> >>> 在 2019年9月19日,11:37,Dawid Wysakowicz <[hidden email]> 写道:
> >>>
> >>> Hi JingsongLee,
> >>> From my understanding they can. Underneath they will be CatalogTables.
> >> The
> >>> difference is the lifetime of the tables. Plus some of the user facing
> >>> interfaces cannot be persisted e.g. datastream. Therefore we must have
> a
> >>> separate methods for that. In the end the temporary tables are held in
> >>> memory as CatalogTables.
> >>> Best,
> >>> Dawid
> >>>
> >>> On Thu, 19 Sep 2019, 10:08 JingsongLee, <[hidden email]
> >> .invalid>
> >>> wrote:
> >>>
> >>>> Hi dawid:
> >>>> Can temporary tables achieve the same capabilities as catalog table?
> >>>> like statistics: CatalogTableStatistics, CatalogColumnStatistics,
> >>>> PartitionStatistics
> >>>> like partition support: we have added some catalog equivalent
> interfaces
> >>>> on TableSource/TableSink: getPartitions, getPartitionFieldNames
> >>>> Maybe it's not a good idea to add these interfaces to
> >>>> TableSource/TableSink. What do you think?
> >>>>
> >>>> Best,
> >>>> Jingsong Lee
> >>>>
> >>>>
> >>>> ------------------------------------------------------------------
> >>>> From:Kurt Young <[hidden email]>
> >>>> Send Time:2019年9月18日(星期三) 17:54
> >>>> To:dev <[hidden email]>
> >>>> Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table
> >>>> module
> >>>>
> >>>> Hi all,
> >>>>
> >>>> Sorry to join this party late. Big +1 to this flip, especially for the
> >>>> dropping
> >>>> "registerTableSink & registerTableSource" part. These are indeed
> legacy
> >>>> and we should try to unify them through CatalogTable after we
> introduce
> >>>> the concept of Catalog.
> >>>>
> >>>> From my understanding, what we can registered should all be metadata,
> >>>> TableSource/TableSink should only be the one who is responsible to do
> >>>> the real work, i.e. reading and writing data according to the schema
> and
> >>>> other information like computed column, partition, .e.g.
> >>>>
> >>>> Best,
> >>>> Kurt
> >>>>
> >>>>
> >>>> On Wed, Sep 18, 2019 at 5:14 PM JingsongLee <[hidden email]
> >>>> .invalid>
> >>>> wrote:
> >>>>
> >>>>> After some development and thinking, I have a general understanding.
> >>>>> +1 to registering a source/sink does not fit into the SQL world.
> >>>>> I am OK to have a deprecated registerTemporarySource/Sink to
> compatible
> >>>>> with old ways.
> >>>>>
> >>>>> Best,
> >>>>> Jingsong Lee
> >>>>>
> >>>>>
> >>>>> ------------------------------------------------------------------
> >>>>> From:Timo Walther <[hidden email]>
> >>>>> Send Time:2019年9月17日(星期二) 08:00
> >>>>> To:dev <[hidden email]>
> >>>>> Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table
> >>>>> module
> >>>>>
> >>>>> Hi Dawid,
> >>>>>
> >>>>> thanks for the design document. It fixes big concept gaps due to
> >>>>> historical reasons with proper support for serializability and
> catalog
> >>>>> support in mind.
> >>>>>
> >>>>> I would not mind a registerTemporarySource/Sink, but the problem
> that I
> >>>>> see is that many people think that this is the recommended way of
> >>>>> registering a table source/sink which is not true. We should guide
> >> users
> >>>>> to either use connect() or DDL API which can be validated and stored
> in
> >>>>> catalog.
> >>>>>
> >>>>> Also from a concept perspective, registering a source/sink does not
> fit
> >>>>> into the SQL world. SQL does not know about source/sinks but only
> about
> >>>>> tables. If the responsibility of a TableSource/TableSink is just a
> pure
> >>>>> physical data consumer/producer that is not connected to the actual
> >>>>> logical table schema, we would need a possibility of defining time
> >>>>> attributes and interpreting/converting a changelog. This should be
> done
> >>>>> by the framework with information from the DDL/connect() and not be
> >>>>> defined in every table source.
> >>>>>
> >>>>> Regards,
> >>>>> Timo
> >>>>>
> >>>>>
> >>>>> On 09.09.19 14:16, JingsongLee wrote:
> >>>>>> Hi dawid:
> >>>>>>
> >>>>>> It is difficult to describe specific examples.
> >>>>>> Sometimes users will generate some java converters through some
> >>>>>>  Java code, or generate some Java classes through third-party
> >>>>>>  libraries. Of course, these can be best done through properties.
> >>>>>> But this requires additional work from users.My suggestion is to
> >>>>>>  keep this Java instance class way that is user-friendly.
> >>>>>>
> >>>>>> Best,
> >>>>>> Jingsong Lee
> >>>>>>
> >>>>>>
> >>>>>> ------------------------------------------------------------------
> >>>>>> From:Dawid Wysakowicz <[hidden email]>
> >>>>>> Send Time:2019年9月6日(星期五) 16:21
> >>>>>> To:dev <[hidden email]>
> >>>>>> Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in
> Table
> >>>>> module
> >>>>>> Hi all,
> >>>>>> @Jingsong Could you elaborate a bit more what do you mean by
> >>>>>> "some Connectors are difficult to convert all states to properties"
> >>>>>> All the Flink provided connectors will definitely be expressible
> with
> >>>>> properties (In the end you should be able to use them from DDL). I
> >> think
> >>>> if
> >>>>> a TableSource is complex enough that it handles filter push down,
> >>>> partition
> >>>>> support etc. should rather be made available both from DDL &
> java/scala
> >>>>> code. I'm happy to reconsider adding registerTemporaryTable(String
> >> path,
> >>>>> TableSource source) if you have some concrete examples in mind.
> >>>>>>
> >>>>>> @Xuefu: We also considered the ObjectIdentifier (or actually
> >>>> introducing
> >>>>> a new identifier representation to differentiate between resolved and
> >>>>> unresolved identifiers) with the same concerns. We decided to suggest
> >> the
> >>>>> string & parsing logic because of usability.
> >>>>>>     tEnv.from("cat.db.table")
> >>>>>> is shorter and easier to write than
> >>>>>>     tEnv.from(Identifier.for("cat", "db", "name")
> >>>>>> And also implicitly solves the problem what happens if a user (e.g.
> >>>> used
> >>>>> to other systems) uses that API in a following manner:
> >>>>>>     tEnv.from(Identifier.for("db.name")
> >>>>>> I'm happy to revisit it if the general consensus is that it's better
> >> to
> >>>>> use the OO aproach.
> >>>>>> Best,
> >>>>>> Dawid
> >>>>>>
> >>>>>> On 06/09/2019 10:00, Xuefu Z wrote:
> >>>>>>
> >>>>>> Thanks to Dawid for starting the discussion and writeup. It looks
> >>>> pretty
> >>>>>> good to me except that I'm a little concerned about the object
> >>>> reference
> >>>>>> and string parsing in the code, which seems to an anti-pattern to
> OOP.
> >>>>> Have
> >>>>>> we considered using ObjectIdenitifier with optional catalog and db
> >>>> parts,
> >>>>>> esp. if we are worried about arguments of variable length or method
> >>>>>> overloading? It's quite likely that the result of string parsing is
> an
> >>>>>> ObjectIdentifier instance any way.
> >>>>>>
> >>>>>> Having string parsing logic in the code is a little dangerous as it
> >>>>>> duplicates part of the DDL/DML parsing, and they can easily get out
> of
> >>>>> sync.
> >>>>>> Thanks,
> >>>>>> Xuefu
> >>>>>>
> >>>>>> On Fri, Sep 6, 2019 at 1:57 PM JingsongLee <[hidden email]
> >>>>> .invalid>
> >>>>>> wrote:
> >>>>>>
> >>>>>>
> >>>>>> Thanks dawid, +1 for this approach.
> >>>>>>
> >>>>>> One concern is the removal of registerTableSink &
> registerTableSource
> >>>>>>  in TableEnvironment. It has two alternatives:
> >>>>>> 1.the properties approach (DDL, descriptor).
> >>>>>> 2.from/toDataStream.
> >>>>>>
> >>>>>> #1 can only be properties, not java states, and some Connectors
> >>>>>>  are difficult to convert all states to properties.
> >>>>>> #2 can contain java state. But can't use TableSource-related
> features,
> >>>>>> like project & filter push down, partition support, etc..
> >>>>>>
> >>>>>> Any idea about this?
> >>>>>>
> >>>>>> Best,
> >>>>>> Jingsong Lee
> >>>>>>
> >>>>>>
> >>>>>> ------------------------------------------------------------------
> >>>>>> From:Dawid Wysakowicz <[hidden email]>
> >>>>>> Send Time:2019年9月4日(星期三) 22:20
> >>>>>> To:dev <[hidden email]>
> >>>>>> Subject:[DISCUSS] FLIP-64: Support for Temporary Objects in Table
> >>>> module
> >>>>>> Hi all,
> >>>>>> As part of FLIP-30 a Catalog API was introduced that enables storing
> >>>>> table
> >>>>>> meta objects permanently. At the same time the majority of current
> >> APIs
> >>>>>> create temporary objects that cannot be serialized. We should
> clarify
> >>>> the
> >>>>>> creation of meta objects (tables, views, functions) in a unified
> way.
> >>>>>> Another current problem in the API is that all the temporary objects
> >>>> are
> >>>>>> stored in a special built-in catalog, which is not very intuitive
> for
> >>>>> many
> >>>>>> users, as they must be aware of that catalog to reference temporary
> >>>>> objects.
> >>>>>> Lastly, different APIs have different ways of providing object
> paths:
> >>>>>>
> >>>>>> String path…,
> >>>>>> String path, String pathContinued…
> >>>>>> String name
> >>>>>> We should choose one approach and unify it across all APIs.
> >>>>>> I suggest a FLIP to address the above issues.
> >>>>>> Looking forward to your opinions.
> >>>>>> FLIP link:
> >>>>>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module
> >>>>
> >>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table module

Timo Walther-2
Hi everyone,

thanks for the great discussion with a good outcome.

I have one last comment about:

void createTemporaryFunction(String path, UserDefinedFunction function);

We should encourage users to register a class instead of an instance. We
should enforce a default constructor in the future. If we need metadata
or type inference, the planner can simply instantiate the class and call
the corresponding getters. If people would like to parameterize a
function, they can use global job parameters instead - which are
available in the open() method.

I suggest:

void createTemporaryFunction(String path, Class<? extends
UserDefinedFunction> functionClass);

This is also closer to the SQL DDL that is about to be implemented in:
https://issues.apache.org/jira/browse/FLINK-7151

Thanks,
Timo


On 09.10.19 17:07, Bowen Li wrote:

> Hi Dawid,
>
> +1 for proposed changes
>
> On Wed, Oct 9, 2019 at 12:15 PM Dawid Wysakowicz <[hidden email]>
> wrote:
>
>> Sorry for a very delayed response.
>>
>> @Kurt Yes, this is the goal to have a function created like new
>> Function(...) also be wrapped into CatalogFunction. This would have to
>> be though a temporary function as we cannot represent that as a set of
>> properties. Similar to the createTemporaryView(DataStream stream).
>>
>> As for the ConnectTableDescriptor I agree this is very similar to
>> CatalogTable. I am not sure though if we should get rid of it. In the
>> end I see it as a builder for a CatalogTable, which is a slightly more
>> internal API, but we might revisit that some time in the future if we
>> find that it makes more sense.
>>
>> @All I updated the FLIP page with some more details from the outcome of
>> the discussions around FLIP-57. Please take a look. I would like to
>> start a vote on this FLIP as soon as the vote on FLIP-57 goes through.
>>
>> Best,
>>
>> Dawid
>>
>>
>> On 19/09/2019 09:24, Kurt Young wrote:
>>> IIUC it's good to see that both serializable (tables description from
>> DDL)
>>> and unserializable (tables with DataStream underneath) tables are treated
>>> unify with CatalogTable.
>>>
>>> Can I also assume functions that either come from a function class (from
>>> DDL)
>>> or function objects (newed by user) will also treated unify with
>>> CatalogFunction?
>>>
>>> This will greatly simplify and unify current API level concepts and
>> design.
>>> And it seems only one thing left, how do we deal with
>>> ConnectTableDescriptor?
>>> It's actually very similar with serializable CatalogTable, both carry
>> some
>>> text
>>> properties which even are the same. Is there any chance we can further
>> unify
>>> this to CatalogTable?
>>>
>>> object
>>> Best,
>>> Kurt
>>>
>>>
>>> On Thu, Sep 19, 2019 at 3:13 PM Jark Wu <[hidden email]> wrote:
>>>
>>>> Thanks Dawid for the design doc.
>>>>
>>>> In general, I’m +1 to the FLIP.
>>>>
>>>>
>>>> +1 to the single-string and parse way to express object path.
>>>>
>>>> +1 to deprecate registerTableSink & registerTableSource.
>>>> But I would suggest to provide an easy way to register a custom
>>>> source/sink before we drop them (this is another story).
>>>> Currently, it’s not easy to implement a custom connector descriptor.
>>>>
>>>> Best,
>>>> Jark
>>>>
>>>>
>>>>> 在 2019年9月19日,11:37,Dawid Wysakowicz <[hidden email]> 写道:
>>>>>
>>>>> Hi JingsongLee,
>>>>>  From my understanding they can. Underneath they will be CatalogTables.
>>>> The
>>>>> difference is the lifetime of the tables. Plus some of the user facing
>>>>> interfaces cannot be persisted e.g. datastream. Therefore we must have
>> a
>>>>> separate methods for that. In the end the temporary tables are held in
>>>>> memory as CatalogTables.
>>>>> Best,
>>>>> Dawid
>>>>>
>>>>> On Thu, 19 Sep 2019, 10:08 JingsongLee, <[hidden email]
>>>> .invalid>
>>>>> wrote:
>>>>>
>>>>>> Hi dawid:
>>>>>> Can temporary tables achieve the same capabilities as catalog table?
>>>>>> like statistics: CatalogTableStatistics, CatalogColumnStatistics,
>>>>>> PartitionStatistics
>>>>>> like partition support: we have added some catalog equivalent
>> interfaces
>>>>>> on TableSource/TableSink: getPartitions, getPartitionFieldNames
>>>>>> Maybe it's not a good idea to add these interfaces to
>>>>>> TableSource/TableSink. What do you think?
>>>>>>
>>>>>> Best,
>>>>>> Jingsong Lee
>>>>>>
>>>>>>
>>>>>> ------------------------------------------------------------------
>>>>>> From:Kurt Young <[hidden email]>
>>>>>> Send Time:2019年9月18日(星期三) 17:54
>>>>>> To:dev <[hidden email]>
>>>>>> Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table
>>>>>> module
>>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> Sorry to join this party late. Big +1 to this flip, especially for the
>>>>>> dropping
>>>>>> "registerTableSink & registerTableSource" part. These are indeed
>> legacy
>>>>>> and we should try to unify them through CatalogTable after we
>> introduce
>>>>>> the concept of Catalog.
>>>>>>
>>>>>>  From my understanding, what we can registered should all be metadata,
>>>>>> TableSource/TableSink should only be the one who is responsible to do
>>>>>> the real work, i.e. reading and writing data according to the schema
>> and
>>>>>> other information like computed column, partition, .e.g.
>>>>>>
>>>>>> Best,
>>>>>> Kurt
>>>>>>
>>>>>>
>>>>>> On Wed, Sep 18, 2019 at 5:14 PM JingsongLee <[hidden email]
>>>>>> .invalid>
>>>>>> wrote:
>>>>>>
>>>>>>> After some development and thinking, I have a general understanding.
>>>>>>> +1 to registering a source/sink does not fit into the SQL world.
>>>>>>> I am OK to have a deprecated registerTemporarySource/Sink to
>> compatible
>>>>>>> with old ways.
>>>>>>>
>>>>>>> Best,
>>>>>>> Jingsong Lee
>>>>>>>
>>>>>>>
>>>>>>> ------------------------------------------------------------------
>>>>>>> From:Timo Walther <[hidden email]>
>>>>>>> Send Time:2019年9月17日(星期二) 08:00
>>>>>>> To:dev <[hidden email]>
>>>>>>> Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table
>>>>>>> module
>>>>>>>
>>>>>>> Hi Dawid,
>>>>>>>
>>>>>>> thanks for the design document. It fixes big concept gaps due to
>>>>>>> historical reasons with proper support for serializability and
>> catalog
>>>>>>> support in mind.
>>>>>>>
>>>>>>> I would not mind a registerTemporarySource/Sink, but the problem
>> that I
>>>>>>> see is that many people think that this is the recommended way of
>>>>>>> registering a table source/sink which is not true. We should guide
>>>> users
>>>>>>> to either use connect() or DDL API which can be validated and stored
>> in
>>>>>>> catalog.
>>>>>>>
>>>>>>> Also from a concept perspective, registering a source/sink does not
>> fit
>>>>>>> into the SQL world. SQL does not know about source/sinks but only
>> about
>>>>>>> tables. If the responsibility of a TableSource/TableSink is just a
>> pure
>>>>>>> physical data consumer/producer that is not connected to the actual
>>>>>>> logical table schema, we would need a possibility of defining time
>>>>>>> attributes and interpreting/converting a changelog. This should be
>> done
>>>>>>> by the framework with information from the DDL/connect() and not be
>>>>>>> defined in every table source.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Timo
>>>>>>>
>>>>>>>
>>>>>>> On 09.09.19 14:16, JingsongLee wrote:
>>>>>>>> Hi dawid:
>>>>>>>>
>>>>>>>> It is difficult to describe specific examples.
>>>>>>>> Sometimes users will generate some java converters through some
>>>>>>>>   Java code, or generate some Java classes through third-party
>>>>>>>>   libraries. Of course, these can be best done through properties.
>>>>>>>> But this requires additional work from users.My suggestion is to
>>>>>>>>   keep this Java instance class way that is user-friendly.
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Jingsong Lee
>>>>>>>>
>>>>>>>>
>>>>>>>> ------------------------------------------------------------------
>>>>>>>> From:Dawid Wysakowicz <[hidden email]>
>>>>>>>> Send Time:2019年9月6日(星期五) 16:21
>>>>>>>> To:dev <[hidden email]>
>>>>>>>> Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in
>> Table
>>>>>>> module
>>>>>>>> Hi all,
>>>>>>>> @Jingsong Could you elaborate a bit more what do you mean by
>>>>>>>> "some Connectors are difficult to convert all states to properties"
>>>>>>>> All the Flink provided connectors will definitely be expressible
>> with
>>>>>>> properties (In the end you should be able to use them from DDL). I
>>>> think
>>>>>> if
>>>>>>> a TableSource is complex enough that it handles filter push down,
>>>>>> partition
>>>>>>> support etc. should rather be made available both from DDL &
>> java/scala
>>>>>>> code. I'm happy to reconsider adding registerTemporaryTable(String
>>>> path,
>>>>>>> TableSource source) if you have some concrete examples in mind.
>>>>>>>> @Xuefu: We also considered the ObjectIdentifier (or actually
>>>>>> introducing
>>>>>>> a new identifier representation to differentiate between resolved and
>>>>>>> unresolved identifiers) with the same concerns. We decided to suggest
>>>> the
>>>>>>> string & parsing logic because of usability.
>>>>>>>>      tEnv.from("cat.db.table")
>>>>>>>> is shorter and easier to write than
>>>>>>>>      tEnv.from(Identifier.for("cat", "db", "name")
>>>>>>>> And also implicitly solves the problem what happens if a user (e.g.
>>>>>> used
>>>>>>> to other systems) uses that API in a following manner:
>>>>>>>>      tEnv.from(Identifier.for("db.name")
>>>>>>>> I'm happy to revisit it if the general consensus is that it's better
>>>> to
>>>>>>> use the OO aproach.
>>>>>>>> Best,
>>>>>>>> Dawid
>>>>>>>>
>>>>>>>> On 06/09/2019 10:00, Xuefu Z wrote:
>>>>>>>>
>>>>>>>> Thanks to Dawid for starting the discussion and writeup. It looks
>>>>>> pretty
>>>>>>>> good to me except that I'm a little concerned about the object
>>>>>> reference
>>>>>>>> and string parsing in the code, which seems to an anti-pattern to
>> OOP.
>>>>>>> Have
>>>>>>>> we considered using ObjectIdenitifier with optional catalog and db
>>>>>> parts,
>>>>>>>> esp. if we are worried about arguments of variable length or method
>>>>>>>> overloading? It's quite likely that the result of string parsing is
>> an
>>>>>>>> ObjectIdentifier instance any way.
>>>>>>>>
>>>>>>>> Having string parsing logic in the code is a little dangerous as it
>>>>>>>> duplicates part of the DDL/DML parsing, and they can easily get out
>> of
>>>>>>> sync.
>>>>>>>> Thanks,
>>>>>>>> Xuefu
>>>>>>>>
>>>>>>>> On Fri, Sep 6, 2019 at 1:57 PM JingsongLee <[hidden email]
>>>>>>> .invalid>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks dawid, +1 for this approach.
>>>>>>>>
>>>>>>>> One concern is the removal of registerTableSink &
>> registerTableSource
>>>>>>>>   in TableEnvironment. It has two alternatives:
>>>>>>>> 1.the properties approach (DDL, descriptor).
>>>>>>>> 2.from/toDataStream.
>>>>>>>>
>>>>>>>> #1 can only be properties, not java states, and some Connectors
>>>>>>>>   are difficult to convert all states to properties.
>>>>>>>> #2 can contain java state. But can't use TableSource-related
>> features,
>>>>>>>> like project & filter push down, partition support, etc..
>>>>>>>>
>>>>>>>> Any idea about this?
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Jingsong Lee
>>>>>>>>
>>>>>>>>
>>>>>>>> ------------------------------------------------------------------
>>>>>>>> From:Dawid Wysakowicz <[hidden email]>
>>>>>>>> Send Time:2019年9月4日(星期三) 22:20
>>>>>>>> To:dev <[hidden email]>
>>>>>>>> Subject:[DISCUSS] FLIP-64: Support for Temporary Objects in Table
>>>>>> module
>>>>>>>> Hi all,
>>>>>>>> As part of FLIP-30 a Catalog API was introduced that enables storing
>>>>>>> table
>>>>>>>> meta objects permanently. At the same time the majority of current
>>>> APIs
>>>>>>>> create temporary objects that cannot be serialized. We should
>> clarify
>>>>>> the
>>>>>>>> creation of meta objects (tables, views, functions) in a unified
>> way.
>>>>>>>> Another current problem in the API is that all the temporary objects
>>>>>> are
>>>>>>>> stored in a special built-in catalog, which is not very intuitive
>> for
>>>>>>> many
>>>>>>>> users, as they must be aware of that catalog to reference temporary
>>>>>>> objects.
>>>>>>>> Lastly, different APIs have different ways of providing object
>> paths:
>>>>>>>> String path…,
>>>>>>>> String path, String pathContinued…
>>>>>>>> String name
>>>>>>>> We should choose one approach and unify it across all APIs.
>>>>>>>> I suggest a FLIP to address the above issues.
>>>>>>>> Looking forward to your opinions.
>>>>>>>> FLIP link:
>>>>>>>>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module
>>

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table module

Kurt Young
Regarding to ConnectTableDescriptor, if in the end it becomes a builder
of CatalogTable, I would be ok with that. But it doesn't look like a builder
now, it's more like another form of TableSource/TableSink.

Best,
Kurt


On Thu, Oct 10, 2019 at 3:44 PM Timo Walther <[hidden email]> wrote:

> Hi everyone,
>
> thanks for the great discussion with a good outcome.
>
> I have one last comment about:
>
> void createTemporaryFunction(String path, UserDefinedFunction function);
>
> We should encourage users to register a class instead of an instance. We
> should enforce a default constructor in the future. If we need metadata
> or type inference, the planner can simply instantiate the class and call
> the corresponding getters. If people would like to parameterize a
> function, they can use global job parameters instead - which are
> available in the open() method.
>
> I suggest:
>
> void createTemporaryFunction(String path, Class<? extends
> UserDefinedFunction> functionClass);
>
> This is also closer to the SQL DDL that is about to be implemented in:
> https://issues.apache.org/jira/browse/FLINK-7151
>
> Thanks,
> Timo
>
>
> On 09.10.19 17:07, Bowen Li wrote:
> > Hi Dawid,
> >
> > +1 for proposed changes
> >
> > On Wed, Oct 9, 2019 at 12:15 PM Dawid Wysakowicz <[hidden email]
> >
> > wrote:
> >
> >> Sorry for a very delayed response.
> >>
> >> @Kurt Yes, this is the goal to have a function created like new
> >> Function(...) also be wrapped into CatalogFunction. This would have to
> >> be though a temporary function as we cannot represent that as a set of
> >> properties. Similar to the createTemporaryView(DataStream stream).
> >>
> >> As for the ConnectTableDescriptor I agree this is very similar to
> >> CatalogTable. I am not sure though if we should get rid of it. In the
> >> end I see it as a builder for a CatalogTable, which is a slightly more
> >> internal API, but we might revisit that some time in the future if we
> >> find that it makes more sense.
> >>
> >> @All I updated the FLIP page with some more details from the outcome of
> >> the discussions around FLIP-57. Please take a look. I would like to
> >> start a vote on this FLIP as soon as the vote on FLIP-57 goes through.
> >>
> >> Best,
> >>
> >> Dawid
> >>
> >>
> >> On 19/09/2019 09:24, Kurt Young wrote:
> >>> IIUC it's good to see that both serializable (tables description from
> >> DDL)
> >>> and unserializable (tables with DataStream underneath) tables are
> treated
> >>> unify with CatalogTable.
> >>>
> >>> Can I also assume functions that either come from a function class
> (from
> >>> DDL)
> >>> or function objects (newed by user) will also treated unify with
> >>> CatalogFunction?
> >>>
> >>> This will greatly simplify and unify current API level concepts and
> >> design.
> >>> And it seems only one thing left, how do we deal with
> >>> ConnectTableDescriptor?
> >>> It's actually very similar with serializable CatalogTable, both carry
> >> some
> >>> text
> >>> properties which even are the same. Is there any chance we can further
> >> unify
> >>> this to CatalogTable?
> >>>
> >>> object
> >>> Best,
> >>> Kurt
> >>>
> >>>
> >>> On Thu, Sep 19, 2019 at 3:13 PM Jark Wu <[hidden email]> wrote:
> >>>
> >>>> Thanks Dawid for the design doc.
> >>>>
> >>>> In general, I’m +1 to the FLIP.
> >>>>
> >>>>
> >>>> +1 to the single-string and parse way to express object path.
> >>>>
> >>>> +1 to deprecate registerTableSink & registerTableSource.
> >>>> But I would suggest to provide an easy way to register a custom
> >>>> source/sink before we drop them (this is another story).
> >>>> Currently, it’s not easy to implement a custom connector descriptor.
> >>>>
> >>>> Best,
> >>>> Jark
> >>>>
> >>>>
> >>>>> 在 2019年9月19日,11:37,Dawid Wysakowicz <[hidden email]> 写道:
> >>>>>
> >>>>> Hi JingsongLee,
> >>>>>  From my understanding they can. Underneath they will be
> CatalogTables.
> >>>> The
> >>>>> difference is the lifetime of the tables. Plus some of the user
> facing
> >>>>> interfaces cannot be persisted e.g. datastream. Therefore we must
> have
> >> a
> >>>>> separate methods for that. In the end the temporary tables are held
> in
> >>>>> memory as CatalogTables.
> >>>>> Best,
> >>>>> Dawid
> >>>>>
> >>>>> On Thu, 19 Sep 2019, 10:08 JingsongLee, <[hidden email]
> >>>> .invalid>
> >>>>> wrote:
> >>>>>
> >>>>>> Hi dawid:
> >>>>>> Can temporary tables achieve the same capabilities as catalog table?
> >>>>>> like statistics: CatalogTableStatistics, CatalogColumnStatistics,
> >>>>>> PartitionStatistics
> >>>>>> like partition support: we have added some catalog equivalent
> >> interfaces
> >>>>>> on TableSource/TableSink: getPartitions, getPartitionFieldNames
> >>>>>> Maybe it's not a good idea to add these interfaces to
> >>>>>> TableSource/TableSink. What do you think?
> >>>>>>
> >>>>>> Best,
> >>>>>> Jingsong Lee
> >>>>>>
> >>>>>>
> >>>>>> ------------------------------------------------------------------
> >>>>>> From:Kurt Young <[hidden email]>
> >>>>>> Send Time:2019年9月18日(星期三) 17:54
> >>>>>> To:dev <[hidden email]>
> >>>>>> Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in
> Table
> >>>>>> module
> >>>>>>
> >>>>>> Hi all,
> >>>>>>
> >>>>>> Sorry to join this party late. Big +1 to this flip, especially for
> the
> >>>>>> dropping
> >>>>>> "registerTableSink & registerTableSource" part. These are indeed
> >> legacy
> >>>>>> and we should try to unify them through CatalogTable after we
> >> introduce
> >>>>>> the concept of Catalog.
> >>>>>>
> >>>>>>  From my understanding, what we can registered should all be
> metadata,
> >>>>>> TableSource/TableSink should only be the one who is responsible to
> do
> >>>>>> the real work, i.e. reading and writing data according to the schema
> >> and
> >>>>>> other information like computed column, partition, .e.g.
> >>>>>>
> >>>>>> Best,
> >>>>>> Kurt
> >>>>>>
> >>>>>>
> >>>>>> On Wed, Sep 18, 2019 at 5:14 PM JingsongLee <
> [hidden email]
> >>>>>> .invalid>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> After some development and thinking, I have a general
> understanding.
> >>>>>>> +1 to registering a source/sink does not fit into the SQL world.
> >>>>>>> I am OK to have a deprecated registerTemporarySource/Sink to
> >> compatible
> >>>>>>> with old ways.
> >>>>>>>
> >>>>>>> Best,
> >>>>>>> Jingsong Lee
> >>>>>>>
> >>>>>>>
> >>>>>>> ------------------------------------------------------------------
> >>>>>>> From:Timo Walther <[hidden email]>
> >>>>>>> Send Time:2019年9月17日(星期二) 08:00
> >>>>>>> To:dev <[hidden email]>
> >>>>>>> Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in
> Table
> >>>>>>> module
> >>>>>>>
> >>>>>>> Hi Dawid,
> >>>>>>>
> >>>>>>> thanks for the design document. It fixes big concept gaps due to
> >>>>>>> historical reasons with proper support for serializability and
> >> catalog
> >>>>>>> support in mind.
> >>>>>>>
> >>>>>>> I would not mind a registerTemporarySource/Sink, but the problem
> >> that I
> >>>>>>> see is that many people think that this is the recommended way of
> >>>>>>> registering a table source/sink which is not true. We should guide
> >>>> users
> >>>>>>> to either use connect() or DDL API which can be validated and
> stored
> >> in
> >>>>>>> catalog.
> >>>>>>>
> >>>>>>> Also from a concept perspective, registering a source/sink does not
> >> fit
> >>>>>>> into the SQL world. SQL does not know about source/sinks but only
> >> about
> >>>>>>> tables. If the responsibility of a TableSource/TableSink is just a
> >> pure
> >>>>>>> physical data consumer/producer that is not connected to the actual
> >>>>>>> logical table schema, we would need a possibility of defining time
> >>>>>>> attributes and interpreting/converting a changelog. This should be
> >> done
> >>>>>>> by the framework with information from the DDL/connect() and not be
> >>>>>>> defined in every table source.
> >>>>>>>
> >>>>>>> Regards,
> >>>>>>> Timo
> >>>>>>>
> >>>>>>>
> >>>>>>> On 09.09.19 14:16, JingsongLee wrote:
> >>>>>>>> Hi dawid:
> >>>>>>>>
> >>>>>>>> It is difficult to describe specific examples.
> >>>>>>>> Sometimes users will generate some java converters through some
> >>>>>>>>   Java code, or generate some Java classes through third-party
> >>>>>>>>   libraries. Of course, these can be best done through properties.
> >>>>>>>> But this requires additional work from users.My suggestion is to
> >>>>>>>>   keep this Java instance class way that is user-friendly.
> >>>>>>>>
> >>>>>>>> Best,
> >>>>>>>> Jingsong Lee
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> ------------------------------------------------------------------
> >>>>>>>> From:Dawid Wysakowicz <[hidden email]>
> >>>>>>>> Send Time:2019年9月6日(星期五) 16:21
> >>>>>>>> To:dev <[hidden email]>
> >>>>>>>> Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in
> >> Table
> >>>>>>> module
> >>>>>>>> Hi all,
> >>>>>>>> @Jingsong Could you elaborate a bit more what do you mean by
> >>>>>>>> "some Connectors are difficult to convert all states to
> properties"
> >>>>>>>> All the Flink provided connectors will definitely be expressible
> >> with
> >>>>>>> properties (In the end you should be able to use them from DDL). I
> >>>> think
> >>>>>> if
> >>>>>>> a TableSource is complex enough that it handles filter push down,
> >>>>>> partition
> >>>>>>> support etc. should rather be made available both from DDL &
> >> java/scala
> >>>>>>> code. I'm happy to reconsider adding registerTemporaryTable(String
> >>>> path,
> >>>>>>> TableSource source) if you have some concrete examples in mind.
> >>>>>>>> @Xuefu: We also considered the ObjectIdentifier (or actually
> >>>>>> introducing
> >>>>>>> a new identifier representation to differentiate between resolved
> and
> >>>>>>> unresolved identifiers) with the same concerns. We decided to
> suggest
> >>>> the
> >>>>>>> string & parsing logic because of usability.
> >>>>>>>>      tEnv.from("cat.db.table")
> >>>>>>>> is shorter and easier to write than
> >>>>>>>>      tEnv.from(Identifier.for("cat", "db", "name")
> >>>>>>>> And also implicitly solves the problem what happens if a user
> (e.g.
> >>>>>> used
> >>>>>>> to other systems) uses that API in a following manner:
> >>>>>>>>      tEnv.from(Identifier.for("db.name")
> >>>>>>>> I'm happy to revisit it if the general consensus is that it's
> better
> >>>> to
> >>>>>>> use the OO aproach.
> >>>>>>>> Best,
> >>>>>>>> Dawid
> >>>>>>>>
> >>>>>>>> On 06/09/2019 10:00, Xuefu Z wrote:
> >>>>>>>>
> >>>>>>>> Thanks to Dawid for starting the discussion and writeup. It looks
> >>>>>> pretty
> >>>>>>>> good to me except that I'm a little concerned about the object
> >>>>>> reference
> >>>>>>>> and string parsing in the code, which seems to an anti-pattern to
> >> OOP.
> >>>>>>> Have
> >>>>>>>> we considered using ObjectIdenitifier with optional catalog and db
> >>>>>> parts,
> >>>>>>>> esp. if we are worried about arguments of variable length or
> method
> >>>>>>>> overloading? It's quite likely that the result of string parsing
> is
> >> an
> >>>>>>>> ObjectIdentifier instance any way.
> >>>>>>>>
> >>>>>>>> Having string parsing logic in the code is a little dangerous as
> it
> >>>>>>>> duplicates part of the DDL/DML parsing, and they can easily get
> out
> >> of
> >>>>>>> sync.
> >>>>>>>> Thanks,
> >>>>>>>> Xuefu
> >>>>>>>>
> >>>>>>>> On Fri, Sep 6, 2019 at 1:57 PM JingsongLee <
> [hidden email]
> >>>>>>> .invalid>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Thanks dawid, +1 for this approach.
> >>>>>>>>
> >>>>>>>> One concern is the removal of registerTableSink &
> >> registerTableSource
> >>>>>>>>   in TableEnvironment. It has two alternatives:
> >>>>>>>> 1.the properties approach (DDL, descriptor).
> >>>>>>>> 2.from/toDataStream.
> >>>>>>>>
> >>>>>>>> #1 can only be properties, not java states, and some Connectors
> >>>>>>>>   are difficult to convert all states to properties.
> >>>>>>>> #2 can contain java state. But can't use TableSource-related
> >> features,
> >>>>>>>> like project & filter push down, partition support, etc..
> >>>>>>>>
> >>>>>>>> Any idea about this?
> >>>>>>>>
> >>>>>>>> Best,
> >>>>>>>> Jingsong Lee
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> ------------------------------------------------------------------
> >>>>>>>> From:Dawid Wysakowicz <[hidden email]>
> >>>>>>>> Send Time:2019年9月4日(星期三) 22:20
> >>>>>>>> To:dev <[hidden email]>
> >>>>>>>> Subject:[DISCUSS] FLIP-64: Support for Temporary Objects in Table
> >>>>>> module
> >>>>>>>> Hi all,
> >>>>>>>> As part of FLIP-30 a Catalog API was introduced that enables
> storing
> >>>>>>> table
> >>>>>>>> meta objects permanently. At the same time the majority of current
> >>>> APIs
> >>>>>>>> create temporary objects that cannot be serialized. We should
> >> clarify
> >>>>>> the
> >>>>>>>> creation of meta objects (tables, views, functions) in a unified
> >> way.
> >>>>>>>> Another current problem in the API is that all the temporary
> objects
> >>>>>> are
> >>>>>>>> stored in a special built-in catalog, which is not very intuitive
> >> for
> >>>>>>> many
> >>>>>>>> users, as they must be aware of that catalog to reference
> temporary
> >>>>>>> objects.
> >>>>>>>> Lastly, different APIs have different ways of providing object
> >> paths:
> >>>>>>>> String path…,
> >>>>>>>> String path, String pathContinued…
> >>>>>>>> String name
> >>>>>>>> We should choose one approach and unify it across all APIs.
> >>>>>>>> I suggest a FLIP to address the above issues.
> >>>>>>>> Looking forward to your opinions.
> >>>>>>>> FLIP link:
> >>>>>>>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module
> >>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table module

Timo Walther-2
The purpose of ConnectTableDescriptor was to have a programmatic way of
expressing `CREATE TABLE` with JavaDocs and IDE support. But I agree
that it needs some rework in the future, once we have finalized the DDL
to ensure that both concepts are in sync.

Regards,
Timo


On 10.10.19 16:08, Kurt Young wrote:

> Regarding to ConnectTableDescriptor, if in the end it becomes a builder
> of CatalogTable, I would be ok with that. But it doesn't look like a builder
> now, it's more like another form of TableSource/TableSink.
>
> Best,
> Kurt
>
>
> On Thu, Oct 10, 2019 at 3:44 PM Timo Walther <[hidden email]> wrote:
>
>> Hi everyone,
>>
>> thanks for the great discussion with a good outcome.
>>
>> I have one last comment about:
>>
>> void createTemporaryFunction(String path, UserDefinedFunction function);
>>
>> We should encourage users to register a class instead of an instance. We
>> should enforce a default constructor in the future. If we need metadata
>> or type inference, the planner can simply instantiate the class and call
>> the corresponding getters. If people would like to parameterize a
>> function, they can use global job parameters instead - which are
>> available in the open() method.
>>
>> I suggest:
>>
>> void createTemporaryFunction(String path, Class<? extends
>> UserDefinedFunction> functionClass);
>>
>> This is also closer to the SQL DDL that is about to be implemented in:
>> https://issues.apache.org/jira/browse/FLINK-7151
>>
>> Thanks,
>> Timo
>>
>>
>> On 09.10.19 17:07, Bowen Li wrote:
>>> Hi Dawid,
>>>
>>> +1 for proposed changes
>>>
>>> On Wed, Oct 9, 2019 at 12:15 PM Dawid Wysakowicz <[hidden email]
>>>
>>> wrote:
>>>
>>>> Sorry for a very delayed response.
>>>>
>>>> @Kurt Yes, this is the goal to have a function created like new
>>>> Function(...) also be wrapped into CatalogFunction. This would have to
>>>> be though a temporary function as we cannot represent that as a set of
>>>> properties. Similar to the createTemporaryView(DataStream stream).
>>>>
>>>> As for the ConnectTableDescriptor I agree this is very similar to
>>>> CatalogTable. I am not sure though if we should get rid of it. In the
>>>> end I see it as a builder for a CatalogTable, which is a slightly more
>>>> internal API, but we might revisit that some time in the future if we
>>>> find that it makes more sense.
>>>>
>>>> @All I updated the FLIP page with some more details from the outcome of
>>>> the discussions around FLIP-57. Please take a look. I would like to
>>>> start a vote on this FLIP as soon as the vote on FLIP-57 goes through.
>>>>
>>>> Best,
>>>>
>>>> Dawid
>>>>
>>>>
>>>> On 19/09/2019 09:24, Kurt Young wrote:
>>>>> IIUC it's good to see that both serializable (tables description from
>>>> DDL)
>>>>> and unserializable (tables with DataStream underneath) tables are
>> treated
>>>>> unify with CatalogTable.
>>>>>
>>>>> Can I also assume functions that either come from a function class
>> (from
>>>>> DDL)
>>>>> or function objects (newed by user) will also treated unify with
>>>>> CatalogFunction?
>>>>>
>>>>> This will greatly simplify and unify current API level concepts and
>>>> design.
>>>>> And it seems only one thing left, how do we deal with
>>>>> ConnectTableDescriptor?
>>>>> It's actually very similar with serializable CatalogTable, both carry
>>>> some
>>>>> text
>>>>> properties which even are the same. Is there any chance we can further
>>>> unify
>>>>> this to CatalogTable?
>>>>>
>>>>> object
>>>>> Best,
>>>>> Kurt
>>>>>
>>>>>
>>>>> On Thu, Sep 19, 2019 at 3:13 PM Jark Wu <[hidden email]> wrote:
>>>>>
>>>>>> Thanks Dawid for the design doc.
>>>>>>
>>>>>> In general, I’m +1 to the FLIP.
>>>>>>
>>>>>>
>>>>>> +1 to the single-string and parse way to express object path.
>>>>>>
>>>>>> +1 to deprecate registerTableSink & registerTableSource.
>>>>>> But I would suggest to provide an easy way to register a custom
>>>>>> source/sink before we drop them (this is another story).
>>>>>> Currently, it’s not easy to implement a custom connector descriptor.
>>>>>>
>>>>>> Best,
>>>>>> Jark
>>>>>>
>>>>>>
>>>>>>> 在 2019年9月19日,11:37,Dawid Wysakowicz <[hidden email]> 写道:
>>>>>>>
>>>>>>> Hi JingsongLee,
>>>>>>>   From my understanding they can. Underneath they will be
>> CatalogTables.
>>>>>> The
>>>>>>> difference is the lifetime of the tables. Plus some of the user
>> facing
>>>>>>> interfaces cannot be persisted e.g. datastream. Therefore we must
>> have
>>>> a
>>>>>>> separate methods for that. In the end the temporary tables are held
>> in
>>>>>>> memory as CatalogTables.
>>>>>>> Best,
>>>>>>> Dawid
>>>>>>>
>>>>>>> On Thu, 19 Sep 2019, 10:08 JingsongLee, <[hidden email]
>>>>>> .invalid>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi dawid:
>>>>>>>> Can temporary tables achieve the same capabilities as catalog table?
>>>>>>>> like statistics: CatalogTableStatistics, CatalogColumnStatistics,
>>>>>>>> PartitionStatistics
>>>>>>>> like partition support: we have added some catalog equivalent
>>>> interfaces
>>>>>>>> on TableSource/TableSink: getPartitions, getPartitionFieldNames
>>>>>>>> Maybe it's not a good idea to add these interfaces to
>>>>>>>> TableSource/TableSink. What do you think?
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Jingsong Lee
>>>>>>>>
>>>>>>>>
>>>>>>>> ------------------------------------------------------------------
>>>>>>>> From:Kurt Young <[hidden email]>
>>>>>>>> Send Time:2019年9月18日(星期三) 17:54
>>>>>>>> To:dev <[hidden email]>
>>>>>>>> Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in
>> Table
>>>>>>>> module
>>>>>>>>
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> Sorry to join this party late. Big +1 to this flip, especially for
>> the
>>>>>>>> dropping
>>>>>>>> "registerTableSink & registerTableSource" part. These are indeed
>>>> legacy
>>>>>>>> and we should try to unify them through CatalogTable after we
>>>> introduce
>>>>>>>> the concept of Catalog.
>>>>>>>>
>>>>>>>>   From my understanding, what we can registered should all be
>> metadata,
>>>>>>>> TableSource/TableSink should only be the one who is responsible to
>> do
>>>>>>>> the real work, i.e. reading and writing data according to the schema
>>>> and
>>>>>>>> other information like computed column, partition, .e.g.
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Kurt
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Sep 18, 2019 at 5:14 PM JingsongLee <
>> [hidden email]
>>>>>>>> .invalid>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> After some development and thinking, I have a general
>> understanding.
>>>>>>>>> +1 to registering a source/sink does not fit into the SQL world.
>>>>>>>>> I am OK to have a deprecated registerTemporarySource/Sink to
>>>> compatible
>>>>>>>>> with old ways.
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>> Jingsong Lee
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ------------------------------------------------------------------
>>>>>>>>> From:Timo Walther <[hidden email]>
>>>>>>>>> Send Time:2019年9月17日(星期二) 08:00
>>>>>>>>> To:dev <[hidden email]>
>>>>>>>>> Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in
>> Table
>>>>>>>>> module
>>>>>>>>>
>>>>>>>>> Hi Dawid,
>>>>>>>>>
>>>>>>>>> thanks for the design document. It fixes big concept gaps due to
>>>>>>>>> historical reasons with proper support for serializability and
>>>> catalog
>>>>>>>>> support in mind.
>>>>>>>>>
>>>>>>>>> I would not mind a registerTemporarySource/Sink, but the problem
>>>> that I
>>>>>>>>> see is that many people think that this is the recommended way of
>>>>>>>>> registering a table source/sink which is not true. We should guide
>>>>>> users
>>>>>>>>> to either use connect() or DDL API which can be validated and
>> stored
>>>> in
>>>>>>>>> catalog.
>>>>>>>>>
>>>>>>>>> Also from a concept perspective, registering a source/sink does not
>>>> fit
>>>>>>>>> into the SQL world. SQL does not know about source/sinks but only
>>>> about
>>>>>>>>> tables. If the responsibility of a TableSource/TableSink is just a
>>>> pure
>>>>>>>>> physical data consumer/producer that is not connected to the actual
>>>>>>>>> logical table schema, we would need a possibility of defining time
>>>>>>>>> attributes and interpreting/converting a changelog. This should be
>>>> done
>>>>>>>>> by the framework with information from the DDL/connect() and not be
>>>>>>>>> defined in every table source.
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Timo
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 09.09.19 14:16, JingsongLee wrote:
>>>>>>>>>> Hi dawid:
>>>>>>>>>>
>>>>>>>>>> It is difficult to describe specific examples.
>>>>>>>>>> Sometimes users will generate some java converters through some
>>>>>>>>>>    Java code, or generate some Java classes through third-party
>>>>>>>>>>    libraries. Of course, these can be best done through properties.
>>>>>>>>>> But this requires additional work from users.My suggestion is to
>>>>>>>>>>    keep this Java instance class way that is user-friendly.
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> Jingsong Lee
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ------------------------------------------------------------------
>>>>>>>>>> From:Dawid Wysakowicz <[hidden email]>
>>>>>>>>>> Send Time:2019年9月6日(星期五) 16:21
>>>>>>>>>> To:dev <[hidden email]>
>>>>>>>>>> Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in
>>>> Table
>>>>>>>>> module
>>>>>>>>>> Hi all,
>>>>>>>>>> @Jingsong Could you elaborate a bit more what do you mean by
>>>>>>>>>> "some Connectors are difficult to convert all states to
>> properties"
>>>>>>>>>> All the Flink provided connectors will definitely be expressible
>>>> with
>>>>>>>>> properties (In the end you should be able to use them from DDL). I
>>>>>> think
>>>>>>>> if
>>>>>>>>> a TableSource is complex enough that it handles filter push down,
>>>>>>>> partition
>>>>>>>>> support etc. should rather be made available both from DDL &
>>>> java/scala
>>>>>>>>> code. I'm happy to reconsider adding registerTemporaryTable(String
>>>>>> path,
>>>>>>>>> TableSource source) if you have some concrete examples in mind.
>>>>>>>>>> @Xuefu: We also considered the ObjectIdentifier (or actually
>>>>>>>> introducing
>>>>>>>>> a new identifier representation to differentiate between resolved
>> and
>>>>>>>>> unresolved identifiers) with the same concerns. We decided to
>> suggest
>>>>>> the
>>>>>>>>> string & parsing logic because of usability.
>>>>>>>>>>       tEnv.from("cat.db.table")
>>>>>>>>>> is shorter and easier to write than
>>>>>>>>>>       tEnv.from(Identifier.for("cat", "db", "name")
>>>>>>>>>> And also implicitly solves the problem what happens if a user
>> (e.g.
>>>>>>>> used
>>>>>>>>> to other systems) uses that API in a following manner:
>>>>>>>>>>       tEnv.from(Identifier.for("db.name")
>>>>>>>>>> I'm happy to revisit it if the general consensus is that it's
>> better
>>>>>> to
>>>>>>>>> use the OO aproach.
>>>>>>>>>> Best,
>>>>>>>>>> Dawid
>>>>>>>>>>
>>>>>>>>>> On 06/09/2019 10:00, Xuefu Z wrote:
>>>>>>>>>>
>>>>>>>>>> Thanks to Dawid for starting the discussion and writeup. It looks
>>>>>>>> pretty
>>>>>>>>>> good to me except that I'm a little concerned about the object
>>>>>>>> reference
>>>>>>>>>> and string parsing in the code, which seems to an anti-pattern to
>>>> OOP.
>>>>>>>>> Have
>>>>>>>>>> we considered using ObjectIdenitifier with optional catalog and db
>>>>>>>> parts,
>>>>>>>>>> esp. if we are worried about arguments of variable length or
>> method
>>>>>>>>>> overloading? It's quite likely that the result of string parsing
>> is
>>>> an
>>>>>>>>>> ObjectIdentifier instance any way.
>>>>>>>>>>
>>>>>>>>>> Having string parsing logic in the code is a little dangerous as
>> it
>>>>>>>>>> duplicates part of the DDL/DML parsing, and they can easily get
>> out
>>>> of
>>>>>>>>> sync.
>>>>>>>>>> Thanks,
>>>>>>>>>> Xuefu
>>>>>>>>>>
>>>>>>>>>> On Fri, Sep 6, 2019 at 1:57 PM JingsongLee <
>> [hidden email]
>>>>>>>>> .invalid>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks dawid, +1 for this approach.
>>>>>>>>>>
>>>>>>>>>> One concern is the removal of registerTableSink &
>>>> registerTableSource
>>>>>>>>>>    in TableEnvironment. It has two alternatives:
>>>>>>>>>> 1.the properties approach (DDL, descriptor).
>>>>>>>>>> 2.from/toDataStream.
>>>>>>>>>>
>>>>>>>>>> #1 can only be properties, not java states, and some Connectors
>>>>>>>>>>    are difficult to convert all states to properties.
>>>>>>>>>> #2 can contain java state. But can't use TableSource-related
>>>> features,
>>>>>>>>>> like project & filter push down, partition support, etc..
>>>>>>>>>>
>>>>>>>>>> Any idea about this?
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> Jingsong Lee
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ------------------------------------------------------------------
>>>>>>>>>> From:Dawid Wysakowicz <[hidden email]>
>>>>>>>>>> Send Time:2019年9月4日(星期三) 22:20
>>>>>>>>>> To:dev <[hidden email]>
>>>>>>>>>> Subject:[DISCUSS] FLIP-64: Support for Temporary Objects in Table
>>>>>>>> module
>>>>>>>>>> Hi all,
>>>>>>>>>> As part of FLIP-30 a Catalog API was introduced that enables
>> storing
>>>>>>>>> table
>>>>>>>>>> meta objects permanently. At the same time the majority of current
>>>>>> APIs
>>>>>>>>>> create temporary objects that cannot be serialized. We should
>>>> clarify
>>>>>>>> the
>>>>>>>>>> creation of meta objects (tables, views, functions) in a unified
>>>> way.
>>>>>>>>>> Another current problem in the API is that all the temporary
>> objects
>>>>>>>> are
>>>>>>>>>> stored in a special built-in catalog, which is not very intuitive
>>>> for
>>>>>>>>> many
>>>>>>>>>> users, as they must be aware of that catalog to reference
>> temporary
>>>>>>>>> objects.
>>>>>>>>>> Lastly, different APIs have different ways of providing object
>>>> paths:
>>>>>>>>>> String path…,
>>>>>>>>>> String path, String pathContinued…
>>>>>>>>>> String name
>>>>>>>>>> We should choose one approach and unify it across all APIs.
>>>>>>>>>> I suggest a FLIP to address the above issues.
>>>>>>>>>> Looking forward to your opinions.
>>>>>>>>>> FLIP link:
>>>>>>>>>>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module
>>

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table module

Kurt Young
Thanks for the clarification Timo, that's sounds good to me. Let's continue
to discuss other things.

Best,
Kurt


On Thu, Oct 10, 2019 at 4:14 PM Timo Walther <[hidden email]> wrote:

> The purpose of ConnectTableDescriptor was to have a programmatic way of
> expressing `CREATE TABLE` with JavaDocs and IDE support. But I agree
> that it needs some rework in the future, once we have finalized the DDL
> to ensure that both concepts are in sync.
>
> Regards,
> Timo
>
>
> On 10.10.19 16:08, Kurt Young wrote:
> > Regarding to ConnectTableDescriptor, if in the end it becomes a builder
> > of CatalogTable, I would be ok with that. But it doesn't look like a
> builder
> > now, it's more like another form of TableSource/TableSink.
> >
> > Best,
> > Kurt
> >
> >
> > On Thu, Oct 10, 2019 at 3:44 PM Timo Walther <[hidden email]> wrote:
> >
> >> Hi everyone,
> >>
> >> thanks for the great discussion with a good outcome.
> >>
> >> I have one last comment about:
> >>
> >> void createTemporaryFunction(String path, UserDefinedFunction function);
> >>
> >> We should encourage users to register a class instead of an instance. We
> >> should enforce a default constructor in the future. If we need metadata
> >> or type inference, the planner can simply instantiate the class and call
> >> the corresponding getters. If people would like to parameterize a
> >> function, they can use global job parameters instead - which are
> >> available in the open() method.
> >>
> >> I suggest:
> >>
> >> void createTemporaryFunction(String path, Class<? extends
> >> UserDefinedFunction> functionClass);
> >>
> >> This is also closer to the SQL DDL that is about to be implemented in:
> >> https://issues.apache.org/jira/browse/FLINK-7151
> >>
> >> Thanks,
> >> Timo
> >>
> >>
> >> On 09.10.19 17:07, Bowen Li wrote:
> >>> Hi Dawid,
> >>>
> >>> +1 for proposed changes
> >>>
> >>> On Wed, Oct 9, 2019 at 12:15 PM Dawid Wysakowicz <
> [hidden email]
> >>>
> >>> wrote:
> >>>
> >>>> Sorry for a very delayed response.
> >>>>
> >>>> @Kurt Yes, this is the goal to have a function created like new
> >>>> Function(...) also be wrapped into CatalogFunction. This would have to
> >>>> be though a temporary function as we cannot represent that as a set of
> >>>> properties. Similar to the createTemporaryView(DataStream stream).
> >>>>
> >>>> As for the ConnectTableDescriptor I agree this is very similar to
> >>>> CatalogTable. I am not sure though if we should get rid of it. In the
> >>>> end I see it as a builder for a CatalogTable, which is a slightly more
> >>>> internal API, but we might revisit that some time in the future if we
> >>>> find that it makes more sense.
> >>>>
> >>>> @All I updated the FLIP page with some more details from the outcome
> of
> >>>> the discussions around FLIP-57. Please take a look. I would like to
> >>>> start a vote on this FLIP as soon as the vote on FLIP-57 goes through.
> >>>>
> >>>> Best,
> >>>>
> >>>> Dawid
> >>>>
> >>>>
> >>>> On 19/09/2019 09:24, Kurt Young wrote:
> >>>>> IIUC it's good to see that both serializable (tables description from
> >>>> DDL)
> >>>>> and unserializable (tables with DataStream underneath) tables are
> >> treated
> >>>>> unify with CatalogTable.
> >>>>>
> >>>>> Can I also assume functions that either come from a function class
> >> (from
> >>>>> DDL)
> >>>>> or function objects (newed by user) will also treated unify with
> >>>>> CatalogFunction?
> >>>>>
> >>>>> This will greatly simplify and unify current API level concepts and
> >>>> design.
> >>>>> And it seems only one thing left, how do we deal with
> >>>>> ConnectTableDescriptor?
> >>>>> It's actually very similar with serializable CatalogTable, both carry
> >>>> some
> >>>>> text
> >>>>> properties which even are the same. Is there any chance we can
> further
> >>>> unify
> >>>>> this to CatalogTable?
> >>>>>
> >>>>> object
> >>>>> Best,
> >>>>> Kurt
> >>>>>
> >>>>>
> >>>>> On Thu, Sep 19, 2019 at 3:13 PM Jark Wu <[hidden email]> wrote:
> >>>>>
> >>>>>> Thanks Dawid for the design doc.
> >>>>>>
> >>>>>> In general, I’m +1 to the FLIP.
> >>>>>>
> >>>>>>
> >>>>>> +1 to the single-string and parse way to express object path.
> >>>>>>
> >>>>>> +1 to deprecate registerTableSink & registerTableSource.
> >>>>>> But I would suggest to provide an easy way to register a custom
> >>>>>> source/sink before we drop them (this is another story).
> >>>>>> Currently, it’s not easy to implement a custom connector descriptor.
> >>>>>>
> >>>>>> Best,
> >>>>>> Jark
> >>>>>>
> >>>>>>
> >>>>>>> 在 2019年9月19日,11:37,Dawid Wysakowicz <[hidden email]>
> 写道:
> >>>>>>>
> >>>>>>> Hi JingsongLee,
> >>>>>>>   From my understanding they can. Underneath they will be
> >> CatalogTables.
> >>>>>> The
> >>>>>>> difference is the lifetime of the tables. Plus some of the user
> >> facing
> >>>>>>> interfaces cannot be persisted e.g. datastream. Therefore we must
> >> have
> >>>> a
> >>>>>>> separate methods for that. In the end the temporary tables are held
> >> in
> >>>>>>> memory as CatalogTables.
> >>>>>>> Best,
> >>>>>>> Dawid
> >>>>>>>
> >>>>>>> On Thu, 19 Sep 2019, 10:08 JingsongLee, <[hidden email]
> >>>>>> .invalid>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> Hi dawid:
> >>>>>>>> Can temporary tables achieve the same capabilities as catalog
> table?
> >>>>>>>> like statistics: CatalogTableStatistics, CatalogColumnStatistics,
> >>>>>>>> PartitionStatistics
> >>>>>>>> like partition support: we have added some catalog equivalent
> >>>> interfaces
> >>>>>>>> on TableSource/TableSink: getPartitions, getPartitionFieldNames
> >>>>>>>> Maybe it's not a good idea to add these interfaces to
> >>>>>>>> TableSource/TableSink. What do you think?
> >>>>>>>>
> >>>>>>>> Best,
> >>>>>>>> Jingsong Lee
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> ------------------------------------------------------------------
> >>>>>>>> From:Kurt Young <[hidden email]>
> >>>>>>>> Send Time:2019年9月18日(星期三) 17:54
> >>>>>>>> To:dev <[hidden email]>
> >>>>>>>> Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in
> >> Table
> >>>>>>>> module
> >>>>>>>>
> >>>>>>>> Hi all,
> >>>>>>>>
> >>>>>>>> Sorry to join this party late. Big +1 to this flip, especially for
> >> the
> >>>>>>>> dropping
> >>>>>>>> "registerTableSink & registerTableSource" part. These are indeed
> >>>> legacy
> >>>>>>>> and we should try to unify them through CatalogTable after we
> >>>> introduce
> >>>>>>>> the concept of Catalog.
> >>>>>>>>
> >>>>>>>>   From my understanding, what we can registered should all be
> >> metadata,
> >>>>>>>> TableSource/TableSink should only be the one who is responsible to
> >> do
> >>>>>>>> the real work, i.e. reading and writing data according to the
> schema
> >>>> and
> >>>>>>>> other information like computed column, partition, .e.g.
> >>>>>>>>
> >>>>>>>> Best,
> >>>>>>>> Kurt
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Wed, Sep 18, 2019 at 5:14 PM JingsongLee <
> >> [hidden email]
> >>>>>>>> .invalid>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> After some development and thinking, I have a general
> >> understanding.
> >>>>>>>>> +1 to registering a source/sink does not fit into the SQL world.
> >>>>>>>>> I am OK to have a deprecated registerTemporarySource/Sink to
> >>>> compatible
> >>>>>>>>> with old ways.
> >>>>>>>>>
> >>>>>>>>> Best,
> >>>>>>>>> Jingsong Lee
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> ------------------------------------------------------------------
> >>>>>>>>> From:Timo Walther <[hidden email]>
> >>>>>>>>> Send Time:2019年9月17日(星期二) 08:00
> >>>>>>>>> To:dev <[hidden email]>
> >>>>>>>>> Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in
> >> Table
> >>>>>>>>> module
> >>>>>>>>>
> >>>>>>>>> Hi Dawid,
> >>>>>>>>>
> >>>>>>>>> thanks for the design document. It fixes big concept gaps due to
> >>>>>>>>> historical reasons with proper support for serializability and
> >>>> catalog
> >>>>>>>>> support in mind.
> >>>>>>>>>
> >>>>>>>>> I would not mind a registerTemporarySource/Sink, but the problem
> >>>> that I
> >>>>>>>>> see is that many people think that this is the recommended way of
> >>>>>>>>> registering a table source/sink which is not true. We should
> guide
> >>>>>> users
> >>>>>>>>> to either use connect() or DDL API which can be validated and
> >> stored
> >>>> in
> >>>>>>>>> catalog.
> >>>>>>>>>
> >>>>>>>>> Also from a concept perspective, registering a source/sink does
> not
> >>>> fit
> >>>>>>>>> into the SQL world. SQL does not know about source/sinks but only
> >>>> about
> >>>>>>>>> tables. If the responsibility of a TableSource/TableSink is just
> a
> >>>> pure
> >>>>>>>>> physical data consumer/producer that is not connected to the
> actual
> >>>>>>>>> logical table schema, we would need a possibility of defining
> time
> >>>>>>>>> attributes and interpreting/converting a changelog. This should
> be
> >>>> done
> >>>>>>>>> by the framework with information from the DDL/connect() and not
> be
> >>>>>>>>> defined in every table source.
> >>>>>>>>>
> >>>>>>>>> Regards,
> >>>>>>>>> Timo
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On 09.09.19 14:16, JingsongLee wrote:
> >>>>>>>>>> Hi dawid:
> >>>>>>>>>>
> >>>>>>>>>> It is difficult to describe specific examples.
> >>>>>>>>>> Sometimes users will generate some java converters through some
> >>>>>>>>>>    Java code, or generate some Java classes through third-party
> >>>>>>>>>>    libraries. Of course, these can be best done through
> properties.
> >>>>>>>>>> But this requires additional work from users.My suggestion is to
> >>>>>>>>>>    keep this Java instance class way that is user-friendly.
> >>>>>>>>>>
> >>>>>>>>>> Best,
> >>>>>>>>>> Jingsong Lee
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> ------------------------------------------------------------------
> >>>>>>>>>> From:Dawid Wysakowicz <[hidden email]>
> >>>>>>>>>> Send Time:2019年9月6日(星期五) 16:21
> >>>>>>>>>> To:dev <[hidden email]>
> >>>>>>>>>> Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in
> >>>> Table
> >>>>>>>>> module
> >>>>>>>>>> Hi all,
> >>>>>>>>>> @Jingsong Could you elaborate a bit more what do you mean by
> >>>>>>>>>> "some Connectors are difficult to convert all states to
> >> properties"
> >>>>>>>>>> All the Flink provided connectors will definitely be expressible
> >>>> with
> >>>>>>>>> properties (In the end you should be able to use them from DDL).
> I
> >>>>>> think
> >>>>>>>> if
> >>>>>>>>> a TableSource is complex enough that it handles filter push down,
> >>>>>>>> partition
> >>>>>>>>> support etc. should rather be made available both from DDL &
> >>>> java/scala
> >>>>>>>>> code. I'm happy to reconsider adding
> registerTemporaryTable(String
> >>>>>> path,
> >>>>>>>>> TableSource source) if you have some concrete examples in mind.
> >>>>>>>>>> @Xuefu: We also considered the ObjectIdentifier (or actually
> >>>>>>>> introducing
> >>>>>>>>> a new identifier representation to differentiate between resolved
> >> and
> >>>>>>>>> unresolved identifiers) with the same concerns. We decided to
> >> suggest
> >>>>>> the
> >>>>>>>>> string & parsing logic because of usability.
> >>>>>>>>>>       tEnv.from("cat.db.table")
> >>>>>>>>>> is shorter and easier to write than
> >>>>>>>>>>       tEnv.from(Identifier.for("cat", "db", "name")
> >>>>>>>>>> And also implicitly solves the problem what happens if a user
> >> (e.g.
> >>>>>>>> used
> >>>>>>>>> to other systems) uses that API in a following manner:
> >>>>>>>>>>       tEnv.from(Identifier.for("db.name")
> >>>>>>>>>> I'm happy to revisit it if the general consensus is that it's
> >> better
> >>>>>> to
> >>>>>>>>> use the OO aproach.
> >>>>>>>>>> Best,
> >>>>>>>>>> Dawid
> >>>>>>>>>>
> >>>>>>>>>> On 06/09/2019 10:00, Xuefu Z wrote:
> >>>>>>>>>>
> >>>>>>>>>> Thanks to Dawid for starting the discussion and writeup. It
> looks
> >>>>>>>> pretty
> >>>>>>>>>> good to me except that I'm a little concerned about the object
> >>>>>>>> reference
> >>>>>>>>>> and string parsing in the code, which seems to an anti-pattern
> to
> >>>> OOP.
> >>>>>>>>> Have
> >>>>>>>>>> we considered using ObjectIdenitifier with optional catalog and
> db
> >>>>>>>> parts,
> >>>>>>>>>> esp. if we are worried about arguments of variable length or
> >> method
> >>>>>>>>>> overloading? It's quite likely that the result of string parsing
> >> is
> >>>> an
> >>>>>>>>>> ObjectIdentifier instance any way.
> >>>>>>>>>>
> >>>>>>>>>> Having string parsing logic in the code is a little dangerous as
> >> it
> >>>>>>>>>> duplicates part of the DDL/DML parsing, and they can easily get
> >> out
> >>>> of
> >>>>>>>>> sync.
> >>>>>>>>>> Thanks,
> >>>>>>>>>> Xuefu
> >>>>>>>>>>
> >>>>>>>>>> On Fri, Sep 6, 2019 at 1:57 PM JingsongLee <
> >> [hidden email]
> >>>>>>>>> .invalid>
> >>>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Thanks dawid, +1 for this approach.
> >>>>>>>>>>
> >>>>>>>>>> One concern is the removal of registerTableSink &
> >>>> registerTableSource
> >>>>>>>>>>    in TableEnvironment. It has two alternatives:
> >>>>>>>>>> 1.the properties approach (DDL, descriptor).
> >>>>>>>>>> 2.from/toDataStream.
> >>>>>>>>>>
> >>>>>>>>>> #1 can only be properties, not java states, and some Connectors
> >>>>>>>>>>    are difficult to convert all states to properties.
> >>>>>>>>>> #2 can contain java state. But can't use TableSource-related
> >>>> features,
> >>>>>>>>>> like project & filter push down, partition support, etc..
> >>>>>>>>>>
> >>>>>>>>>> Any idea about this?
> >>>>>>>>>>
> >>>>>>>>>> Best,
> >>>>>>>>>> Jingsong Lee
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> ------------------------------------------------------------------
> >>>>>>>>>> From:Dawid Wysakowicz <[hidden email]>
> >>>>>>>>>> Send Time:2019年9月4日(星期三) 22:20
> >>>>>>>>>> To:dev <[hidden email]>
> >>>>>>>>>> Subject:[DISCUSS] FLIP-64: Support for Temporary Objects in
> Table
> >>>>>>>> module
> >>>>>>>>>> Hi all,
> >>>>>>>>>> As part of FLIP-30 a Catalog API was introduced that enables
> >> storing
> >>>>>>>>> table
> >>>>>>>>>> meta objects permanently. At the same time the majority of
> current
> >>>>>> APIs
> >>>>>>>>>> create temporary objects that cannot be serialized. We should
> >>>> clarify
> >>>>>>>> the
> >>>>>>>>>> creation of meta objects (tables, views, functions) in a unified
> >>>> way.
> >>>>>>>>>> Another current problem in the API is that all the temporary
> >> objects
> >>>>>>>> are
> >>>>>>>>>> stored in a special built-in catalog, which is not very
> intuitive
> >>>> for
> >>>>>>>>> many
> >>>>>>>>>> users, as they must be aware of that catalog to reference
> >> temporary
> >>>>>>>>> objects.
> >>>>>>>>>> Lastly, different APIs have different ways of providing object
> >>>> paths:
> >>>>>>>>>> String path…,
> >>>>>>>>>> String path, String pathContinued…
> >>>>>>>>>> String name
> >>>>>>>>>> We should choose one approach and unify it across all APIs.
> >>>>>>>>>> I suggest a FLIP to address the above issues.
> >>>>>>>>>> Looking forward to your opinions.
> >>>>>>>>>> FLIP link:
> >>>>>>>>>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module
> >>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table module

dwysakowicz

Thanks for the comments.

@Timo I think your suggestion makes sense. I changed the signature to

void createTemporaryFunction(String path, Class<? extends UserDefinedFunction> functionClass);

This will mean that after we make QueryOperations string serializable and we expose the type inference in functions we will have only a single object that cannot be persisted in a catalog and thus have only a "Temporary" method:

void createTemporaryView(String path, DataStream stream);

I updated the FLIP page accordingly. I also listed explicitly methods that we need to deprecate and added a note that the implementation of the methods concerning UserDefinedFunctions has to be postponed until we have type inference in place.

As the voting period for FLIP-57 ended, I will start the VOTE thread for FLIP-64 tomorrow morning, unless there are some objections until then.

Best,

Dawid


On 10/10/2019 16:16, Kurt Young wrote:
Thanks for the clarification Timo, that's sounds good to me. Let's continue
to discuss other things.

Best,
Kurt


On Thu, Oct 10, 2019 at 4:14 PM Timo Walther [hidden email] wrote:

The purpose of ConnectTableDescriptor was to have a programmatic way of
expressing `CREATE TABLE` with JavaDocs and IDE support. But I agree
that it needs some rework in the future, once we have finalized the DDL
to ensure that both concepts are in sync.

Regards,
Timo


On 10.10.19 16:08, Kurt Young wrote:
Regarding to ConnectTableDescriptor, if in the end it becomes a builder
of CatalogTable, I would be ok with that. But it doesn't look like a
builder
now, it's more like another form of TableSource/TableSink.

Best,
Kurt


On Thu, Oct 10, 2019 at 3:44 PM Timo Walther [hidden email] wrote:

Hi everyone,

thanks for the great discussion with a good outcome.

I have one last comment about:

void createTemporaryFunction(String path, UserDefinedFunction function);

We should encourage users to register a class instead of an instance. We
should enforce a default constructor in the future. If we need metadata
or type inference, the planner can simply instantiate the class and call
the corresponding getters. If people would like to parameterize a
function, they can use global job parameters instead - which are
available in the open() method.

I suggest:

void createTemporaryFunction(String path, Class<? extends
UserDefinedFunction> functionClass);

This is also closer to the SQL DDL that is about to be implemented in:
https://issues.apache.org/jira/browse/FLINK-7151

Thanks,
Timo


On 09.10.19 17:07, Bowen Li wrote:
Hi Dawid,

+1 for proposed changes

On Wed, Oct 9, 2019 at 12:15 PM Dawid Wysakowicz <
[hidden email]
wrote:

Sorry for a very delayed response.

@Kurt Yes, this is the goal to have a function created like new
Function(...) also be wrapped into CatalogFunction. This would have to
be though a temporary function as we cannot represent that as a set of
properties. Similar to the createTemporaryView(DataStream stream).

As for the ConnectTableDescriptor I agree this is very similar to
CatalogTable. I am not sure though if we should get rid of it. In the
end I see it as a builder for a CatalogTable, which is a slightly more
internal API, but we might revisit that some time in the future if we
find that it makes more sense.

@All I updated the FLIP page with some more details from the outcome
of
the discussions around FLIP-57. Please take a look. I would like to
start a vote on this FLIP as soon as the vote on FLIP-57 goes through.

Best,

Dawid


On 19/09/2019 09:24, Kurt Young wrote:
IIUC it's good to see that both serializable (tables description from
DDL)
and unserializable (tables with DataStream underneath) tables are
treated
unify with CatalogTable.

Can I also assume functions that either come from a function class
(from
DDL)
or function objects (newed by user) will also treated unify with
CatalogFunction?

This will greatly simplify and unify current API level concepts and
design.
And it seems only one thing left, how do we deal with
ConnectTableDescriptor?
It's actually very similar with serializable CatalogTable, both carry
some
text
properties which even are the same. Is there any chance we can
further
unify
this to CatalogTable?

object
Best,
Kurt


On Thu, Sep 19, 2019 at 3:13 PM Jark Wu [hidden email] wrote:

Thanks Dawid for the design doc.

In general, I’m +1 to the FLIP.


+1 to the single-string and parse way to express object path.

+1 to deprecate registerTableSink & registerTableSource.
But I would suggest to provide an easy way to register a custom
source/sink before we drop them (this is another story).
Currently, it’s not easy to implement a custom connector descriptor.

Best,
Jark


在 2019年9月19日,11:37,Dawid Wysakowicz [hidden email]
写道:
Hi JingsongLee,
  From my understanding they can. Underneath they will be
CatalogTables.
The
difference is the lifetime of the tables. Plus some of the user
facing
interfaces cannot be persisted e.g. datastream. Therefore we must
have
a
separate methods for that. In the end the temporary tables are held
in
memory as CatalogTables.
Best,
Dawid

On Thu, 19 Sep 2019, 10:08 JingsongLee, <[hidden email]
.invalid>
wrote:

Hi dawid:
Can temporary tables achieve the same capabilities as catalog
table?
like statistics: CatalogTableStatistics, CatalogColumnStatistics,
PartitionStatistics
like partition support: we have added some catalog equivalent
interfaces
on TableSource/TableSink: getPartitions, getPartitionFieldNames
Maybe it's not a good idea to add these interfaces to
TableSource/TableSink. What do you think?

Best,
Jingsong Lee


------------------------------------------------------------------
From:Kurt Young [hidden email]
Send Time:2019年9月18日(星期三) 17:54
To:dev [hidden email]
Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in
Table
module

Hi all,

Sorry to join this party late. Big +1 to this flip, especially for
the
dropping
"registerTableSink & registerTableSource" part. These are indeed
legacy
and we should try to unify them through CatalogTable after we
introduce
the concept of Catalog.

  From my understanding, what we can registered should all be
metadata,
TableSource/TableSink should only be the one who is responsible to
do
the real work, i.e. reading and writing data according to the
schema
and
other information like computed column, partition, .e.g.

Best,
Kurt


On Wed, Sep 18, 2019 at 5:14 PM JingsongLee <
[hidden email]
.invalid>
wrote:

After some development and thinking, I have a general
understanding.
+1 to registering a source/sink does not fit into the SQL world.
I am OK to have a deprecated registerTemporarySource/Sink to
compatible
with old ways.

Best,
Jingsong Lee



------------------------------------------------------------------
From:Timo Walther [hidden email]
Send Time:2019年9月17日(星期二) 08:00
To:dev [hidden email]
Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in
Table
module

Hi Dawid,

thanks for the design document. It fixes big concept gaps due to
historical reasons with proper support for serializability and
catalog
support in mind.

I would not mind a registerTemporarySource/Sink, but the problem
that I
see is that many people think that this is the recommended way of
registering a table source/sink which is not true. We should
guide
users
to either use connect() or DDL API which can be validated and
stored
in
catalog.

Also from a concept perspective, registering a source/sink does
not
fit
into the SQL world. SQL does not know about source/sinks but only
about
tables. If the responsibility of a TableSource/TableSink is just
a
pure
physical data consumer/producer that is not connected to the
actual
logical table schema, we would need a possibility of defining
time
attributes and interpreting/converting a changelog. This should
be
done
by the framework with information from the DDL/connect() and not
be
defined in every table source.

Regards,
Timo


On 09.09.19 14:16, JingsongLee wrote:
Hi dawid:

It is difficult to describe specific examples.
Sometimes users will generate some java converters through some
   Java code, or generate some Java classes through third-party
   libraries. Of course, these can be best done through
properties.
But this requires additional work from users.My suggestion is to
   keep this Java instance class way that is user-friendly.

Best,
Jingsong Lee



------------------------------------------------------------------
From:Dawid Wysakowicz [hidden email]
Send Time:2019年9月6日(星期五) 16:21
To:dev [hidden email]
Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in
Table
module
Hi all,
@Jingsong Could you elaborate a bit more what do you mean by
"some Connectors are difficult to convert all states to
properties"
All the Flink provided connectors will definitely be expressible
with
properties (In the end you should be able to use them from DDL).
I
think
if
a TableSource is complex enough that it handles filter push down,
partition
support etc. should rather be made available both from DDL &
java/scala
code. I'm happy to reconsider adding
registerTemporaryTable(String
path,
TableSource source) if you have some concrete examples in mind.
@Xuefu: We also considered the ObjectIdentifier (or actually
introducing
a new identifier representation to differentiate between resolved
and
unresolved identifiers) with the same concerns. We decided to
suggest
the
string & parsing logic because of usability.
      tEnv.from("cat.db.table")
is shorter and easier to write than
      tEnv.from(Identifier.for("cat", "db", "name")
And also implicitly solves the problem what happens if a user
(e.g.
used
to other systems) uses that API in a following manner:
      tEnv.from(Identifier.for("db.name")
I'm happy to revisit it if the general consensus is that it's
better
to
use the OO aproach.
Best,
Dawid

On 06/09/2019 10:00, Xuefu Z wrote:

Thanks to Dawid for starting the discussion and writeup. It
looks
pretty
good to me except that I'm a little concerned about the object
reference
and string parsing in the code, which seems to an anti-pattern
to
OOP.
Have
we considered using ObjectIdenitifier with optional catalog and
db
parts,
esp. if we are worried about arguments of variable length or
method
overloading? It's quite likely that the result of string parsing
is
an
ObjectIdentifier instance any way.

Having string parsing logic in the code is a little dangerous as
it
duplicates part of the DDL/DML parsing, and they can easily get
out
of
sync.
Thanks,
Xuefu

On Fri, Sep 6, 2019 at 1:57 PM JingsongLee <
[hidden email]
.invalid>
wrote:


Thanks dawid, +1 for this approach.

One concern is the removal of registerTableSink &
registerTableSource
   in TableEnvironment. It has two alternatives:
1.the properties approach (DDL, descriptor).
2.from/toDataStream.

#1 can only be properties, not java states, and some Connectors
   are difficult to convert all states to properties.
#2 can contain java state. But can't use TableSource-related
features,
like project & filter push down, partition support, etc..

Any idea about this?

Best,
Jingsong Lee



------------------------------------------------------------------
From:Dawid Wysakowicz [hidden email]
Send Time:2019年9月4日(星期三) 22:20
To:dev [hidden email]
Subject:[DISCUSS] FLIP-64: Support for Temporary Objects in
Table
module
Hi all,
As part of FLIP-30 a Catalog API was introduced that enables
storing
table
meta objects permanently. At the same time the majority of
current
APIs
create temporary objects that cannot be serialized. We should
clarify
the
creation of meta objects (tables, views, functions) in a unified
way.
Another current problem in the API is that all the temporary
objects
are
stored in a special built-in catalog, which is not very
intuitive
for
many
users, as they must be aware of that catalog to reference
temporary
objects.
Lastly, different APIs have different ways of providing object
paths:
String path…,
String path, String pathContinued…
String name
We should choose one approach and unify it across all APIs.
I suggest a FLIP to address the above issues.
Looking forward to your opinions.
FLIP link:


          
https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module

          


    

signature.asc (849 bytes) Download Attachment
12