[Discussion] FLIP-79 Flink Function DDL Support

classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

[Discussion] FLIP-79 Flink Function DDL Support

Peter Huang
Dear Community,

FLIP-79 Flink Function DDL Support
<https://docs.google.com/document/d/16kkHlis80s61ifnIahCj-0IEdy5NJ1z-vGEJd_JuLog/edit#>

This proposal aims to support function DDL with the consideration of SQL
syntax, language compliance, and advanced external UDF lib registration.
The Flink DDL is initialized and discussed in the design
<https://docs.google.com/document/d/1TTP-GCC8wSsibJaSUyFZ_5NBAHYEB1FVmPpP7RgDGBA/edit#heading=h.wpsqidkaaoil>
[1] by Shuyi Chen and Timo. As the initial discussion mainly focused on the
table, type and view. FLIP-69 [2] extend it with a more detailed discussion
of DDL for catalog, database, and function. Original the function DDL was
under the scope of FLIP-69. After some discussion
<https://issues.apache.org/jira/browse/FLINK-7151> with the community, we
found that there are several ongoing efforts, such as FLIP-64 [3], FLIP-65
[4], and FLIP-78 [5]. As they will directly impact the SQL syntax of
function DDL, the proposal wants to describe the problem clearly with the
consideration of existing works and make sure the design aligns with
efforts of API change of temporary objects and type inference for UDF
defined by different languages.

The FlLIP outlines the requirements from related works, and propose a SQL
syntax to meet those requirements. The corresponding implementation is also
discussed. Please kindly review and give feedback.


Best Regards
Peter Huang
Reply | Threaded
Open this post in threaded view
|

Re: [Discussion] FLIP-79 Flink Function DDL Support

bowen.li
Hi Zhenqiu,

Thanks for taking on this effort!

A couple questions:
- Though this FLIP is about function DDL, can we also think about how the
created functions can be mapped to CatalogFunction and see if we need to
modify CatalogFunction interface? Syntax changes need to be backed by the
backend.
- Can we define a clearer, smaller scope targeting for Flink 1.10 among all
the proposed changes? The current overall scope seems to be quite wide, and
it may be unrealistic to get everything in a single release, or even a
couple. However, I believe the most common user story can be something as
simple as "being able to create and persist a java class-based udf and use
it later in queries", which will add great value for most Flink users and
is achievable in 1.10.

Bowen

On Sun, Oct 13, 2019 at 10:46 PM Peter Huang <[hidden email]>
wrote:

> Dear Community,
>
> FLIP-79 Flink Function DDL Support
> <
> https://docs.google.com/document/d/16kkHlis80s61ifnIahCj-0IEdy5NJ1z-vGEJd_JuLog/edit#
> >
>
> This proposal aims to support function DDL with the consideration of SQL
> syntax, language compliance, and advanced external UDF lib registration.
> The Flink DDL is initialized and discussed in the design
> <
> https://docs.google.com/document/d/1TTP-GCC8wSsibJaSUyFZ_5NBAHYEB1FVmPpP7RgDGBA/edit#heading=h.wpsqidkaaoil
> >
> [1] by Shuyi Chen and Timo. As the initial discussion mainly focused on the
> table, type and view. FLIP-69 [2] extend it with a more detailed discussion
> of DDL for catalog, database, and function. Original the function DDL was
> under the scope of FLIP-69. After some discussion
> <https://issues.apache.org/jira/browse/FLINK-7151> with the community, we
> found that there are several ongoing efforts, such as FLIP-64 [3], FLIP-65
> [4], and FLIP-78 [5]. As they will directly impact the SQL syntax of
> function DDL, the proposal wants to describe the problem clearly with the
> consideration of existing works and make sure the design aligns with
> efforts of API change of temporary objects and type inference for UDF
> defined by different languages.
>
> The FlLIP outlines the requirements from related works, and propose a SQL
> syntax to meet those requirements. The corresponding implementation is also
> discussed. Please kindly review and give feedback.
>
>
> Best Regards
> Peter Huang
>
Reply | Threaded
Open this post in threaded view
|

Re: [Discussion] FLIP-79 Flink Function DDL Support

Xuefu Z
Thanks to Peter for the proposal!

I left some comments in the google doc. Besides what Bowen pointed out, I'm
unclear about how things  work end to end from the document. For instance,
SQL DDL-like function definition is mentioned. I guess just having a DDL
for it doesn't explain how it's supported functionally. I think it's better
to have some clarification on what is expected work and what's for the
future.

Thanks,
Xuefu


On Tue, Oct 15, 2019 at 11:05 AM Bowen Li <[hidden email]> wrote:

> Hi Zhenqiu,
>
> Thanks for taking on this effort!
>
> A couple questions:
> - Though this FLIP is about function DDL, can we also think about how the
> created functions can be mapped to CatalogFunction and see if we need to
> modify CatalogFunction interface? Syntax changes need to be backed by the
> backend.
> - Can we define a clearer, smaller scope targeting for Flink 1.10 among all
> the proposed changes? The current overall scope seems to be quite wide, and
> it may be unrealistic to get everything in a single release, or even a
> couple. However, I believe the most common user story can be something as
> simple as "being able to create and persist a java class-based udf and use
> it later in queries", which will add great value for most Flink users and
> is achievable in 1.10.
>
> Bowen
>
> On Sun, Oct 13, 2019 at 10:46 PM Peter Huang <[hidden email]>
> wrote:
>
> > Dear Community,
> >
> > FLIP-79 Flink Function DDL Support
> > <
> >
> https://docs.google.com/document/d/16kkHlis80s61ifnIahCj-0IEdy5NJ1z-vGEJd_JuLog/edit#
> > >
> >
> > This proposal aims to support function DDL with the consideration of SQL
> > syntax, language compliance, and advanced external UDF lib registration.
> > The Flink DDL is initialized and discussed in the design
> > <
> >
> https://docs.google.com/document/d/1TTP-GCC8wSsibJaSUyFZ_5NBAHYEB1FVmPpP7RgDGBA/edit#heading=h.wpsqidkaaoil
> > >
> > [1] by Shuyi Chen and Timo. As the initial discussion mainly focused on
> the
> > table, type and view. FLIP-69 [2] extend it with a more detailed
> discussion
> > of DDL for catalog, database, and function. Original the function DDL was
> > under the scope of FLIP-69. After some discussion
> > <https://issues.apache.org/jira/browse/FLINK-7151> with the community,
> we
> > found that there are several ongoing efforts, such as FLIP-64 [3],
> FLIP-65
> > [4], and FLIP-78 [5]. As they will directly impact the SQL syntax of
> > function DDL, the proposal wants to describe the problem clearly with the
> > consideration of existing works and make sure the design aligns with
> > efforts of API change of temporary objects and type inference for UDF
> > defined by different languages.
> >
> > The FlLIP outlines the requirements from related works, and propose a SQL
> > syntax to meet those requirements. The corresponding implementation is
> also
> > discussed. Please kindly review and give feedback.
> >
> >
> > Best Regards
> > Peter Huang
> >
>


--
Xuefu Zhang

"In Honey We Trust!"
Reply | Threaded
Open this post in threaded view
|

Re: [Discussion] FLIP-79 Flink Function DDL Support

Peter Huang
In reply to this post by bowen.li
Hi Bowen,

Thanks for your kind feedback. This FLIP is to propose a function DDL
syntax that is compatible with all of the other related work.
I agree it is a little bit covering too many cases. From the perspective of
execution, I will mainly focus on the create and
persist a java class-based udf and use it later in queries in this cycle.
As we considering language, remote resource, and properties,
CatalogFunction will be changed accordingly. It is missing in the current
FLIP.  I will add more details about the change in the doc.
We can have more discussion based on that.

Best Regards
Peter Huang





On Tue, Oct 15, 2019 at 11:04 AM Bowen Li <[hidden email]> wrote:

> Hi Zhenqiu,
>
> Thanks for taking on this effort!
>
> A couple questions:
> - Though this FLIP is about function DDL, can we also think about how the
> created functions can be mapped to CatalogFunction and see if we need to
> modify CatalogFunction interface? Syntax changes need to be backed by the
> backend.
> - Can we define a clearer, smaller scope targeting for Flink 1.10 among all
> the proposed changes? The current overall scope seems to be quite wide, and
> it may be unrealistic to get everything in a single release, or even a
> couple. However, I believe the most common user story can be something as
> simple as "being able to create and persist a java class-based udf and use
> it later in queries", which will add great value for most Flink users and
> is achievable in 1.10.
>
> Bowen
>
> On Sun, Oct 13, 2019 at 10:46 PM Peter Huang <[hidden email]>
> wrote:
>
> > Dear Community,
> >
> > FLIP-79 Flink Function DDL Support
> > <
> >
> https://docs.google.com/document/d/16kkHlis80s61ifnIahCj-0IEdy5NJ1z-vGEJd_JuLog/edit#
> > >
> >
> > This proposal aims to support function DDL with the consideration of SQL
> > syntax, language compliance, and advanced external UDF lib registration.
> > The Flink DDL is initialized and discussed in the design
> > <
> >
> https://docs.google.com/document/d/1TTP-GCC8wSsibJaSUyFZ_5NBAHYEB1FVmPpP7RgDGBA/edit#heading=h.wpsqidkaaoil
> > >
> > [1] by Shuyi Chen and Timo. As the initial discussion mainly focused on
> the
> > table, type and view. FLIP-69 [2] extend it with a more detailed
> discussion
> > of DDL for catalog, database, and function. Original the function DDL was
> > under the scope of FLIP-69. After some discussion
> > <https://issues.apache.org/jira/browse/FLINK-7151> with the community,
> we
> > found that there are several ongoing efforts, such as FLIP-64 [3],
> FLIP-65
> > [4], and FLIP-78 [5]. As they will directly impact the SQL syntax of
> > function DDL, the proposal wants to describe the problem clearly with the
> > consideration of existing works and make sure the design aligns with
> > efforts of API change of temporary objects and type inference for UDF
> > defined by different languages.
> >
> > The FlLIP outlines the requirements from related works, and propose a SQL
> > syntax to meet those requirements. The corresponding implementation is
> also
> > discussed. Please kindly review and give feedback.
> >
> >
> > Best Regards
> > Peter Huang
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [Discussion] FLIP-79 Flink Function DDL Support

Peter Huang
In reply to this post by Xuefu Z
Hi Xuefu,

Thank you for the feedback. I think you are pointing out a similar concern
with Bowen. Let me describe
how the catalog function and function factory will be changed in the
implementation section.
Then, we can have more discussion in detail.


Best Regards
Peter Huang

On Tue, Oct 15, 2019 at 4:18 PM Xuefu Z <[hidden email]> wrote:

> Thanks to Peter for the proposal!
>
> I left some comments in the google doc. Besides what Bowen pointed out, I'm
> unclear about how things  work end to end from the document. For instance,
> SQL DDL-like function definition is mentioned. I guess just having a DDL
> for it doesn't explain how it's supported functionally. I think it's better
> to have some clarification on what is expected work and what's for the
> future.
>
> Thanks,
> Xuefu
>
>
> On Tue, Oct 15, 2019 at 11:05 AM Bowen Li <[hidden email]> wrote:
>
> > Hi Zhenqiu,
> >
> > Thanks for taking on this effort!
> >
> > A couple questions:
> > - Though this FLIP is about function DDL, can we also think about how the
> > created functions can be mapped to CatalogFunction and see if we need to
> > modify CatalogFunction interface? Syntax changes need to be backed by the
> > backend.
> > - Can we define a clearer, smaller scope targeting for Flink 1.10 among
> all
> > the proposed changes? The current overall scope seems to be quite wide,
> and
> > it may be unrealistic to get everything in a single release, or even a
> > couple. However, I believe the most common user story can be something as
> > simple as "being able to create and persist a java class-based udf and
> use
> > it later in queries", which will add great value for most Flink users and
> > is achievable in 1.10.
> >
> > Bowen
> >
> > On Sun, Oct 13, 2019 at 10:46 PM Peter Huang <[hidden email]
> >
> > wrote:
> >
> > > Dear Community,
> > >
> > > FLIP-79 Flink Function DDL Support
> > > <
> > >
> >
> https://docs.google.com/document/d/16kkHlis80s61ifnIahCj-0IEdy5NJ1z-vGEJd_JuLog/edit#
> > > >
> > >
> > > This proposal aims to support function DDL with the consideration of
> SQL
> > > syntax, language compliance, and advanced external UDF lib
> registration.
> > > The Flink DDL is initialized and discussed in the design
> > > <
> > >
> >
> https://docs.google.com/document/d/1TTP-GCC8wSsibJaSUyFZ_5NBAHYEB1FVmPpP7RgDGBA/edit#heading=h.wpsqidkaaoil
> > > >
> > > [1] by Shuyi Chen and Timo. As the initial discussion mainly focused on
> > the
> > > table, type and view. FLIP-69 [2] extend it with a more detailed
> > discussion
> > > of DDL for catalog, database, and function. Original the function DDL
> was
> > > under the scope of FLIP-69. After some discussion
> > > <https://issues.apache.org/jira/browse/FLINK-7151> with the community,
> > we
> > > found that there are several ongoing efforts, such as FLIP-64 [3],
> > FLIP-65
> > > [4], and FLIP-78 [5]. As they will directly impact the SQL syntax of
> > > function DDL, the proposal wants to describe the problem clearly with
> the
> > > consideration of existing works and make sure the design aligns with
> > > efforts of API change of temporary objects and type inference for UDF
> > > defined by different languages.
> > >
> > > The FlLIP outlines the requirements from related works, and propose a
> SQL
> > > syntax to meet those requirements. The corresponding implementation is
> > also
> > > discussed. Please kindly review and give feedback.
> > >
> > >
> > > Best Regards
> > > Peter Huang
> > >
> >
>
>
> --
> Xuefu Zhang
>
> "In Honey We Trust!"
>
Reply | Threaded
Open this post in threaded view
|

Re: [Discussion] FLIP-79 Flink Function DDL Support

Terry Wang
Hi Peter,

Sorry late to reply. Thanks for your efforts on this and I just looked through your design.
I left some comments in the doc about alter function section and  function catalog interface.
IMO, the overall design is ok and we can discuss further more about some details.
I also think it’s necessary to have this awesome feature limit to basic function (of course better to have all :) ) in 1.10 release.

Best,
Terry Wang



> 2019年10月16日 14:19,Peter Huang <[hidden email]> 写道:
>
> Hi Xuefu,
>
> Thank you for the feedback. I think you are pointing out a similar concern
> with Bowen. Let me describe
> how the catalog function and function factory will be changed in the
> implementation section.
> Then, we can have more discussion in detail.
>
>
> Best Regards
> Peter Huang
>
> On Tue, Oct 15, 2019 at 4:18 PM Xuefu Z <[hidden email]> wrote:
>
>> Thanks to Peter for the proposal!
>>
>> I left some comments in the google doc. Besides what Bowen pointed out, I'm
>> unclear about how things  work end to end from the document. For instance,
>> SQL DDL-like function definition is mentioned. I guess just having a DDL
>> for it doesn't explain how it's supported functionally. I think it's better
>> to have some clarification on what is expected work and what's for the
>> future.
>>
>> Thanks,
>> Xuefu
>>
>>
>> On Tue, Oct 15, 2019 at 11:05 AM Bowen Li <[hidden email]> wrote:
>>
>>> Hi Zhenqiu,
>>>
>>> Thanks for taking on this effort!
>>>
>>> A couple questions:
>>> - Though this FLIP is about function DDL, can we also think about how the
>>> created functions can be mapped to CatalogFunction and see if we need to
>>> modify CatalogFunction interface? Syntax changes need to be backed by the
>>> backend.
>>> - Can we define a clearer, smaller scope targeting for Flink 1.10 among
>> all
>>> the proposed changes? The current overall scope seems to be quite wide,
>> and
>>> it may be unrealistic to get everything in a single release, or even a
>>> couple. However, I believe the most common user story can be something as
>>> simple as "being able to create and persist a java class-based udf and
>> use
>>> it later in queries", which will add great value for most Flink users and
>>> is achievable in 1.10.
>>>
>>> Bowen
>>>
>>> On Sun, Oct 13, 2019 at 10:46 PM Peter Huang <[hidden email]
>>>
>>> wrote:
>>>
>>>> Dear Community,
>>>>
>>>> FLIP-79 Flink Function DDL Support
>>>> <
>>>>
>>>
>> https://docs.google.com/document/d/16kkHlis80s61ifnIahCj-0IEdy5NJ1z-vGEJd_JuLog/edit#
>>>>>
>>>>
>>>> This proposal aims to support function DDL with the consideration of
>> SQL
>>>> syntax, language compliance, and advanced external UDF lib
>> registration.
>>>> The Flink DDL is initialized and discussed in the design
>>>> <
>>>>
>>>
>> https://docs.google.com/document/d/1TTP-GCC8wSsibJaSUyFZ_5NBAHYEB1FVmPpP7RgDGBA/edit#heading=h.wpsqidkaaoil
>>>>>
>>>> [1] by Shuyi Chen and Timo. As the initial discussion mainly focused on
>>> the
>>>> table, type and view. FLIP-69 [2] extend it with a more detailed
>>> discussion
>>>> of DDL for catalog, database, and function. Original the function DDL
>> was
>>>> under the scope of FLIP-69. After some discussion
>>>> <https://issues.apache.org/jira/browse/FLINK-7151> with the community,
>>> we
>>>> found that there are several ongoing efforts, such as FLIP-64 [3],
>>> FLIP-65
>>>> [4], and FLIP-78 [5]. As they will directly impact the SQL syntax of
>>>> function DDL, the proposal wants to describe the problem clearly with
>> the
>>>> consideration of existing works and make sure the design aligns with
>>>> efforts of API change of temporary objects and type inference for UDF
>>>> defined by different languages.
>>>>
>>>> The FlLIP outlines the requirements from related works, and propose a
>> SQL
>>>> syntax to meet those requirements. The corresponding implementation is
>>> also
>>>> discussed. Please kindly review and give feedback.
>>>>
>>>>
>>>> Best Regards
>>>> Peter Huang
>>>>
>>>
>>
>>
>> --
>> Xuefu Zhang
>>
>> "In Honey We Trust!"
>>

Reply | Threaded
Open this post in threaded view
|

Re: [Discussion] FLIP-79 Flink Function DDL Support

Timo Walther-2
Hi Peter,

thanks for your proposal. I left some comments in the FLIP document. I
agree with Terry that we can have a MVP in Flink 1.10 but should already
discuss the bigger picture as a DDL string cannot be changed easily once
released.

In particular we should discuss how resources for function are loaded.
If they are simply added to the JobGraph they are available to all
functions and could potentially interfere with each other, right?

Thanks,
Timo



On 24.10.19 05:32, Terry Wang wrote:

> Hi Peter,
>
> Sorry late to reply. Thanks for your efforts on this and I just looked through your design.
> I left some comments in the doc about alter function section and  function catalog interface.
> IMO, the overall design is ok and we can discuss further more about some details.
> I also think it’s necessary to have this awesome feature limit to basic function (of course better to have all :) ) in 1.10 release.
>
> Best,
> Terry Wang
>
>
>
>> 2019年10月16日 14:19,Peter Huang <[hidden email]> 写道:
>>
>> Hi Xuefu,
>>
>> Thank you for the feedback. I think you are pointing out a similar concern
>> with Bowen. Let me describe
>> how the catalog function and function factory will be changed in the
>> implementation section.
>> Then, we can have more discussion in detail.
>>
>>
>> Best Regards
>> Peter Huang
>>
>> On Tue, Oct 15, 2019 at 4:18 PM Xuefu Z <[hidden email]> wrote:
>>
>>> Thanks to Peter for the proposal!
>>>
>>> I left some comments in the google doc. Besides what Bowen pointed out, I'm
>>> unclear about how things  work end to end from the document. For instance,
>>> SQL DDL-like function definition is mentioned. I guess just having a DDL
>>> for it doesn't explain how it's supported functionally. I think it's better
>>> to have some clarification on what is expected work and what's for the
>>> future.
>>>
>>> Thanks,
>>> Xuefu
>>>
>>>
>>> On Tue, Oct 15, 2019 at 11:05 AM Bowen Li <[hidden email]> wrote:
>>>
>>>> Hi Zhenqiu,
>>>>
>>>> Thanks for taking on this effort!
>>>>
>>>> A couple questions:
>>>> - Though this FLIP is about function DDL, can we also think about how the
>>>> created functions can be mapped to CatalogFunction and see if we need to
>>>> modify CatalogFunction interface? Syntax changes need to be backed by the
>>>> backend.
>>>> - Can we define a clearer, smaller scope targeting for Flink 1.10 among
>>> all
>>>> the proposed changes? The current overall scope seems to be quite wide,
>>> and
>>>> it may be unrealistic to get everything in a single release, or even a
>>>> couple. However, I believe the most common user story can be something as
>>>> simple as "being able to create and persist a java class-based udf and
>>> use
>>>> it later in queries", which will add great value for most Flink users and
>>>> is achievable in 1.10.
>>>>
>>>> Bowen
>>>>
>>>> On Sun, Oct 13, 2019 at 10:46 PM Peter Huang <[hidden email]
>>>>
>>>> wrote:
>>>>
>>>>> Dear Community,
>>>>>
>>>>> FLIP-79 Flink Function DDL Support
>>>>> <
>>>>>
>>>>
>>> https://docs.google.com/document/d/16kkHlis80s61ifnIahCj-0IEdy5NJ1z-vGEJd_JuLog/edit#
>>>>>>
>>>>>
>>>>> This proposal aims to support function DDL with the consideration of
>>> SQL
>>>>> syntax, language compliance, and advanced external UDF lib
>>> registration.
>>>>> The Flink DDL is initialized and discussed in the design
>>>>> <
>>>>>
>>>>
>>> https://docs.google.com/document/d/1TTP-GCC8wSsibJaSUyFZ_5NBAHYEB1FVmPpP7RgDGBA/edit#heading=h.wpsqidkaaoil
>>>>>>
>>>>> [1] by Shuyi Chen and Timo. As the initial discussion mainly focused on
>>>> the
>>>>> table, type and view. FLIP-69 [2] extend it with a more detailed
>>>> discussion
>>>>> of DDL for catalog, database, and function. Original the function DDL
>>> was
>>>>> under the scope of FLIP-69. After some discussion
>>>>> <https://issues.apache.org/jira/browse/FLINK-7151> with the community,
>>>> we
>>>>> found that there are several ongoing efforts, such as FLIP-64 [3],
>>>> FLIP-65
>>>>> [4], and FLIP-78 [5]. As they will directly impact the SQL syntax of
>>>>> function DDL, the proposal wants to describe the problem clearly with
>>> the
>>>>> consideration of existing works and make sure the design aligns with
>>>>> efforts of API change of temporary objects and type inference for UDF
>>>>> defined by different languages.
>>>>>
>>>>> The FlLIP outlines the requirements from related works, and propose a
>>> SQL
>>>>> syntax to meet those requirements. The corresponding implementation is
>>>> also
>>>>> discussed. Please kindly review and give feedback.
>>>>>
>>>>>
>>>>> Best Regards
>>>>> Peter Huang
>>>>>
>>>>
>>>
>>>
>>> --
>>> Xuefu Zhang
>>>
>>> "In Honey We Trust!"
>>>

Reply | Threaded
Open this post in threaded view
|

Re: [Discussion] FLIP-79 Flink Function DDL Support

Jingsong Li
Hi Peter,

Thanks for your proposal. The first thing I care about most is whether it
can cover the needs of hive.
Hive create function:

CREATE FUNCTION [db_name.]function_name AS class_name
  [USING JAR|FILE|ARCHIVE 'file_uri' [, JAR|FILE|ARCHIVE 'file_uri'] ];

Hive support a list of resources, and support jar/file/archive, Maybe we
need users to tell us exactly what kind of resources are. So we can see
whether to add it to the ClassLoader or other processing?

+1 for the internal implementation as timo said, like:
- how to load resources for function. (How to deal with jar/file/archive)
- how to pass properties to function.
- How does python udf work? Hive use Transform command to run shell and
python. It would be better if we could make clear how to do.

Hope to get your reply~

Best,
Jingsong Lee

On Thu, Oct 24, 2019 at 5:14 PM Timo Walther <[hidden email]> wrote:

> Hi Peter,
>
> thanks for your proposal. I left some comments in the FLIP document. I
> agree with Terry that we can have a MVP in Flink 1.10 but should already
> discuss the bigger picture as a DDL string cannot be changed easily once
> released.
>
> In particular we should discuss how resources for function are loaded.
> If they are simply added to the JobGraph they are available to all
> functions and could potentially interfere with each other, right?
>
> Thanks,
> Timo
>
>
>
> On 24.10.19 05:32, Terry Wang wrote:
> > Hi Peter,
> >
> > Sorry late to reply. Thanks for your efforts on this and I just looked
> through your design.
> > I left some comments in the doc about alter function section and
> function catalog interface.
> > IMO, the overall design is ok and we can discuss further more about some
> details.
> > I also think it’s necessary to have this awesome feature limit to basic
> function (of course better to have all :) ) in 1.10 release.
> >
> > Best,
> > Terry Wang
> >
> >
> >
> >> 2019年10月16日 14:19,Peter Huang <[hidden email]> 写道:
> >>
> >> Hi Xuefu,
> >>
> >> Thank you for the feedback. I think you are pointing out a similar
> concern
> >> with Bowen. Let me describe
> >> how the catalog function and function factory will be changed in the
> >> implementation section.
> >> Then, we can have more discussion in detail.
> >>
> >>
> >> Best Regards
> >> Peter Huang
> >>
> >> On Tue, Oct 15, 2019 at 4:18 PM Xuefu Z <[hidden email]> wrote:
> >>
> >>> Thanks to Peter for the proposal!
> >>>
> >>> I left some comments in the google doc. Besides what Bowen pointed
> out, I'm
> >>> unclear about how things  work end to end from the document. For
> instance,
> >>> SQL DDL-like function definition is mentioned. I guess just having a
> DDL
> >>> for it doesn't explain how it's supported functionally. I think it's
> better
> >>> to have some clarification on what is expected work and what's for the
> >>> future.
> >>>
> >>> Thanks,
> >>> Xuefu
> >>>
> >>>
> >>> On Tue, Oct 15, 2019 at 11:05 AM Bowen Li <[hidden email]> wrote:
> >>>
> >>>> Hi Zhenqiu,
> >>>>
> >>>> Thanks for taking on this effort!
> >>>>
> >>>> A couple questions:
> >>>> - Though this FLIP is about function DDL, can we also think about how
> the
> >>>> created functions can be mapped to CatalogFunction and see if we need
> to
> >>>> modify CatalogFunction interface? Syntax changes need to be backed by
> the
> >>>> backend.
> >>>> - Can we define a clearer, smaller scope targeting for Flink 1.10
> among
> >>> all
> >>>> the proposed changes? The current overall scope seems to be quite
> wide,
> >>> and
> >>>> it may be unrealistic to get everything in a single release, or even a
> >>>> couple. However, I believe the most common user story can be
> something as
> >>>> simple as "being able to create and persist a java class-based udf and
> >>> use
> >>>> it later in queries", which will add great value for most Flink users
> and
> >>>> is achievable in 1.10.
> >>>>
> >>>> Bowen
> >>>>
> >>>> On Sun, Oct 13, 2019 at 10:46 PM Peter Huang <
> [hidden email]
> >>>>
> >>>> wrote:
> >>>>
> >>>>> Dear Community,
> >>>>>
> >>>>> FLIP-79 Flink Function DDL Support
> >>>>> <
> >>>>>
> >>>>
> >>>
> https://docs.google.com/document/d/16kkHlis80s61ifnIahCj-0IEdy5NJ1z-vGEJd_JuLog/edit#
> >>>>>>
> >>>>>
> >>>>> This proposal aims to support function DDL with the consideration of
> >>> SQL
> >>>>> syntax, language compliance, and advanced external UDF lib
> >>> registration.
> >>>>> The Flink DDL is initialized and discussed in the design
> >>>>> <
> >>>>>
> >>>>
> >>>
> https://docs.google.com/document/d/1TTP-GCC8wSsibJaSUyFZ_5NBAHYEB1FVmPpP7RgDGBA/edit#heading=h.wpsqidkaaoil
> >>>>>>
> >>>>> [1] by Shuyi Chen and Timo. As the initial discussion mainly focused
> on
> >>>> the
> >>>>> table, type and view. FLIP-69 [2] extend it with a more detailed
> >>>> discussion
> >>>>> of DDL for catalog, database, and function. Original the function DDL
> >>> was
> >>>>> under the scope of FLIP-69. After some discussion
> >>>>> <https://issues.apache.org/jira/browse/FLINK-7151> with the
> community,
> >>>> we
> >>>>> found that there are several ongoing efforts, such as FLIP-64 [3],
> >>>> FLIP-65
> >>>>> [4], and FLIP-78 [5]. As they will directly impact the SQL syntax of
> >>>>> function DDL, the proposal wants to describe the problem clearly with
> >>> the
> >>>>> consideration of existing works and make sure the design aligns with
> >>>>> efforts of API change of temporary objects and type inference for UDF
> >>>>> defined by different languages.
> >>>>>
> >>>>> The FlLIP outlines the requirements from related works, and propose a
> >>> SQL
> >>>>> syntax to meet those requirements. The corresponding implementation
> is
> >>>> also
> >>>>> discussed. Please kindly review and give feedback.
> >>>>>
> >>>>>
> >>>>> Best Regards
> >>>>> Peter Huang
> >>>>>
> >>>>
> >>>
> >>>
> >>> --
> >>> Xuefu Zhang
> >>>
> >>> "In Honey We Trust!"
> >>>
>
>

--
Best, Jingsong Lee
Reply | Threaded
Open this post in threaded view
|

Re: [Discussion] FLIP-79 Flink Function DDL Support

Peter Huang
In reply to this post by Timo Walther-2
Hi Timo,

Thanks for the feedback. I replied and adjust the design accordingly. For
the concern of class loading.
I think we need to distinguish the function class loading for Temporary and
Permanent function.

1) For Permanent function, we can add it to the job graph so that we don't
need to load it multiple times for the different sessions.
2) For Temporary function, we can register function with a session key, and
use different class loaders in RuntimeContext implementation.

I added more description in the doc. Please review it again.


Best Regards
Peter Huang




On Thu, Oct 24, 2019 at 2:14 AM Timo Walther <[hidden email]> wrote:

> Hi Peter,
>
> thanks for your proposal. I left some comments in the FLIP document. I
> agree with Terry that we can have a MVP in Flink 1.10 but should already
> discuss the bigger picture as a DDL string cannot be changed easily once
> released.
>
> In particular we should discuss how resources for function are loaded.
> If they are simply added to the JobGraph they are available to all
> functions and could potentially interfere with each other, right?
>
> Thanks,
> Timo
>
>
>
> On 24.10.19 05:32, Terry Wang wrote:
> > Hi Peter,
> >
> > Sorry late to reply. Thanks for your efforts on this and I just looked
> through your design.
> > I left some comments in the doc about alter function section and
> function catalog interface.
> > IMO, the overall design is ok and we can discuss further more about some
> details.
> > I also think it’s necessary to have this awesome feature limit to basic
> function (of course better to have all :) ) in 1.10 release.
> >
> > Best,
> > Terry Wang
> >
> >
> >
> >> 2019年10月16日 14:19,Peter Huang <[hidden email]> 写道:
> >>
> >> Hi Xuefu,
> >>
> >> Thank you for the feedback. I think you are pointing out a similar
> concern
> >> with Bowen. Let me describe
> >> how the catalog function and function factory will be changed in the
> >> implementation section.
> >> Then, we can have more discussion in detail.
> >>
> >>
> >> Best Regards
> >> Peter Huang
> >>
> >> On Tue, Oct 15, 2019 at 4:18 PM Xuefu Z <[hidden email]> wrote:
> >>
> >>> Thanks to Peter for the proposal!
> >>>
> >>> I left some comments in the google doc. Besides what Bowen pointed
> out, I'm
> >>> unclear about how things  work end to end from the document. For
> instance,
> >>> SQL DDL-like function definition is mentioned. I guess just having a
> DDL
> >>> for it doesn't explain how it's supported functionally. I think it's
> better
> >>> to have some clarification on what is expected work and what's for the
> >>> future.
> >>>
> >>> Thanks,
> >>> Xuefu
> >>>
> >>>
> >>> On Tue, Oct 15, 2019 at 11:05 AM Bowen Li <[hidden email]> wrote:
> >>>
> >>>> Hi Zhenqiu,
> >>>>
> >>>> Thanks for taking on this effort!
> >>>>
> >>>> A couple questions:
> >>>> - Though this FLIP is about function DDL, can we also think about how
> the
> >>>> created functions can be mapped to CatalogFunction and see if we need
> to
> >>>> modify CatalogFunction interface? Syntax changes need to be backed by
> the
> >>>> backend.
> >>>> - Can we define a clearer, smaller scope targeting for Flink 1.10
> among
> >>> all
> >>>> the proposed changes? The current overall scope seems to be quite
> wide,
> >>> and
> >>>> it may be unrealistic to get everything in a single release, or even a
> >>>> couple. However, I believe the most common user story can be
> something as
> >>>> simple as "being able to create and persist a java class-based udf and
> >>> use
> >>>> it later in queries", which will add great value for most Flink users
> and
> >>>> is achievable in 1.10.
> >>>>
> >>>> Bowen
> >>>>
> >>>> On Sun, Oct 13, 2019 at 10:46 PM Peter Huang <
> [hidden email]
> >>>>
> >>>> wrote:
> >>>>
> >>>>> Dear Community,
> >>>>>
> >>>>> FLIP-79 Flink Function DDL Support
> >>>>> <
> >>>>>
> >>>>
> >>>
> https://docs.google.com/document/d/16kkHlis80s61ifnIahCj-0IEdy5NJ1z-vGEJd_JuLog/edit#
> >>>>>>
> >>>>>
> >>>>> This proposal aims to support function DDL with the consideration of
> >>> SQL
> >>>>> syntax, language compliance, and advanced external UDF lib
> >>> registration.
> >>>>> The Flink DDL is initialized and discussed in the design
> >>>>> <
> >>>>>
> >>>>
> >>>
> https://docs.google.com/document/d/1TTP-GCC8wSsibJaSUyFZ_5NBAHYEB1FVmPpP7RgDGBA/edit#heading=h.wpsqidkaaoil
> >>>>>>
> >>>>> [1] by Shuyi Chen and Timo. As the initial discussion mainly focused
> on
> >>>> the
> >>>>> table, type and view. FLIP-69 [2] extend it with a more detailed
> >>>> discussion
> >>>>> of DDL for catalog, database, and function. Original the function DDL
> >>> was
> >>>>> under the scope of FLIP-69. After some discussion
> >>>>> <https://issues.apache.org/jira/browse/FLINK-7151> with the
> community,
> >>>> we
> >>>>> found that there are several ongoing efforts, such as FLIP-64 [3],
> >>>> FLIP-65
> >>>>> [4], and FLIP-78 [5]. As they will directly impact the SQL syntax of
> >>>>> function DDL, the proposal wants to describe the problem clearly with
> >>> the
> >>>>> consideration of existing works and make sure the design aligns with
> >>>>> efforts of API change of temporary objects and type inference for UDF
> >>>>> defined by different languages.
> >>>>>
> >>>>> The FlLIP outlines the requirements from related works, and propose a
> >>> SQL
> >>>>> syntax to meet those requirements. The corresponding implementation
> is
> >>>> also
> >>>>> discussed. Please kindly review and give feedback.
> >>>>>
> >>>>>
> >>>>> Best Regards
> >>>>> Peter Huang
> >>>>>
> >>>>
> >>>
> >>>
> >>> --
> >>> Xuefu Zhang
> >>>
> >>> "In Honey We Trust!"
> >>>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [Discussion] FLIP-79 Flink Function DDL Support

bowen.li
Hi all,

Besides all the good questions raised above, we seem all agree to have a
MVP for Flink 1.10, "to support users to create and persist a java
class-based udf that's already in classpath (no extra resource loading),
and use it later in queries".

IIUIC, to achieve that in 1.10, the following are currently the core
issues/blockers we should figure out, and solve them as our **highest
priority**:

- what's the syntax to distinguish function language (java, scala, python,
etc)? we only need to implement the java one in 1.10 but have to settle
down the long term solution
- how to persist function language in backend catalog? as a k-v pair in
properties map, or a dedicate field?
- do we really need to allow users set a properties map for udf? what needs
to be stored there? what are they used for?
- should a catalog impl be able to decide whether it can take a properties
map (if we decide to have one), and which language of a udf it can persist?
   - E.g. Hive metastore, which backs Flink's HiveCatalog, cannot take a
properties map and is only able to persist java udf [1], unless we do
something hacky to it

I feel these questions are essential to Flink functions in the long run,
but most importantly, are also the minimum scope for Flink 1.10. Aspects
like resource loading security or compatibility with Hive syntax are
important too, however if we focus on them now, we may not be able to get
the MVP out in time.

[1]
-
https://hive.apache.org/javadocs/r3.1.2/api/org/apache/hadoop/hive/metastore/api/Function.html
-
https://hive.apache.org/javadocs/r3.1.2/api/org/apache/hadoop/hive/metastore/api/FunctionType.html



On Sun, Oct 27, 2019 at 8:22 PM Peter Huang <[hidden email]>
wrote:

> Hi Timo,
>
> Thanks for the feedback. I replied and adjust the design accordingly. For
> the concern of class loading.
> I think we need to distinguish the function class loading for Temporary and
> Permanent function.
>
> 1) For Permanent function, we can add it to the job graph so that we don't
> need to load it multiple times for the different sessions.
> 2) For Temporary function, we can register function with a session key, and
> use different class loaders in RuntimeContext implementation.
>
> I added more description in the doc. Please review it again.
>
>
> Best Regards
> Peter Huang
>
>
>
>
> On Thu, Oct 24, 2019 at 2:14 AM Timo Walther <[hidden email]> wrote:
>
> > Hi Peter,
> >
> > thanks for your proposal. I left some comments in the FLIP document. I
> > agree with Terry that we can have a MVP in Flink 1.10 but should already
> > discuss the bigger picture as a DDL string cannot be changed easily once
> > released.
> >
> > In particular we should discuss how resources for function are loaded.
> > If they are simply added to the JobGraph they are available to all
> > functions and could potentially interfere with each other, right?
> >
> > Thanks,
> > Timo
> >
> >
> >
> > On 24.10.19 05:32, Terry Wang wrote:
> > > Hi Peter,
> > >
> > > Sorry late to reply. Thanks for your efforts on this and I just looked
> > through your design.
> > > I left some comments in the doc about alter function section and
> > function catalog interface.
> > > IMO, the overall design is ok and we can discuss further more about
> some
> > details.
> > > I also think it’s necessary to have this awesome feature limit to basic
> > function (of course better to have all :) ) in 1.10 release.
> > >
> > > Best,
> > > Terry Wang
> > >
> > >
> > >
> > >> 2019年10月16日 14:19,Peter Huang <[hidden email]> 写道:
> > >>
> > >> Hi Xuefu,
> > >>
> > >> Thank you for the feedback. I think you are pointing out a similar
> > concern
> > >> with Bowen. Let me describe
> > >> how the catalog function and function factory will be changed in the
> > >> implementation section.
> > >> Then, we can have more discussion in detail.
> > >>
> > >>
> > >> Best Regards
> > >> Peter Huang
> > >>
> > >> On Tue, Oct 15, 2019 at 4:18 PM Xuefu Z <[hidden email]> wrote:
> > >>
> > >>> Thanks to Peter for the proposal!
> > >>>
> > >>> I left some comments in the google doc. Besides what Bowen pointed
> > out, I'm
> > >>> unclear about how things  work end to end from the document. For
> > instance,
> > >>> SQL DDL-like function definition is mentioned. I guess just having a
> > DDL
> > >>> for it doesn't explain how it's supported functionally. I think it's
> > better
> > >>> to have some clarification on what is expected work and what's for
> the
> > >>> future.
> > >>>
> > >>> Thanks,
> > >>> Xuefu
> > >>>
> > >>>
> > >>> On Tue, Oct 15, 2019 at 11:05 AM Bowen Li <[hidden email]>
> wrote:
> > >>>
> > >>>> Hi Zhenqiu,
> > >>>>
> > >>>> Thanks for taking on this effort!
> > >>>>
> > >>>> A couple questions:
> > >>>> - Though this FLIP is about function DDL, can we also think about
> how
> > the
> > >>>> created functions can be mapped to CatalogFunction and see if we
> need
> > to
> > >>>> modify CatalogFunction interface? Syntax changes need to be backed
> by
> > the
> > >>>> backend.
> > >>>> - Can we define a clearer, smaller scope targeting for Flink 1.10
> > among
> > >>> all
> > >>>> the proposed changes? The current overall scope seems to be quite
> > wide,
> > >>> and
> > >>>> it may be unrealistic to get everything in a single release, or
> even a
> > >>>> couple. However, I believe the most common user story can be
> > something as
> > >>>> simple as "being able to create and persist a java class-based udf
> and
> > >>> use
> > >>>> it later in queries", which will add great value for most Flink
> users
> > and
> > >>>> is achievable in 1.10.
> > >>>>
> > >>>> Bowen
> > >>>>
> > >>>> On Sun, Oct 13, 2019 at 10:46 PM Peter Huang <
> > [hidden email]
> > >>>>
> > >>>> wrote:
> > >>>>
> > >>>>> Dear Community,
> > >>>>>
> > >>>>> FLIP-79 Flink Function DDL Support
> > >>>>> <
> > >>>>>
> > >>>>
> > >>>
> >
> https://docs.google.com/document/d/16kkHlis80s61ifnIahCj-0IEdy5NJ1z-vGEJd_JuLog/edit#
> > >>>>>>
> > >>>>>
> > >>>>> This proposal aims to support function DDL with the consideration
> of
> > >>> SQL
> > >>>>> syntax, language compliance, and advanced external UDF lib
> > >>> registration.
> > >>>>> The Flink DDL is initialized and discussed in the design
> > >>>>> <
> > >>>>>
> > >>>>
> > >>>
> >
> https://docs.google.com/document/d/1TTP-GCC8wSsibJaSUyFZ_5NBAHYEB1FVmPpP7RgDGBA/edit#heading=h.wpsqidkaaoil
> > >>>>>>
> > >>>>> [1] by Shuyi Chen and Timo. As the initial discussion mainly
> focused
> > on
> > >>>> the
> > >>>>> table, type and view. FLIP-69 [2] extend it with a more detailed
> > >>>> discussion
> > >>>>> of DDL for catalog, database, and function. Original the function
> DDL
> > >>> was
> > >>>>> under the scope of FLIP-69. After some discussion
> > >>>>> <https://issues.apache.org/jira/browse/FLINK-7151> with the
> > community,
> > >>>> we
> > >>>>> found that there are several ongoing efforts, such as FLIP-64 [3],
> > >>>> FLIP-65
> > >>>>> [4], and FLIP-78 [5]. As they will directly impact the SQL syntax
> of
> > >>>>> function DDL, the proposal wants to describe the problem clearly
> with
> > >>> the
> > >>>>> consideration of existing works and make sure the design aligns
> with
> > >>>>> efforts of API change of temporary objects and type inference for
> UDF
> > >>>>> defined by different languages.
> > >>>>>
> > >>>>> The FlLIP outlines the requirements from related works, and
> propose a
> > >>> SQL
> > >>>>> syntax to meet those requirements. The corresponding implementation
> > is
> > >>>> also
> > >>>>> discussed. Please kindly review and give feedback.
> > >>>>>
> > >>>>>
> > >>>>> Best Regards
> > >>>>> Peter Huang
> > >>>>>
> > >>>>
> > >>>
> > >>>
> > >>> --
> > >>> Xuefu Zhang
> > >>>
> > >>> "In Honey We Trust!"
> > >>>
> >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [Discussion] FLIP-79 Flink Function DDL Support

Peter Huang
In reply to this post by Jingsong Li
Hi Jingsong,

Thanks for the input. The FLINK function DDL definitely needs to align with
HQL, I updated the doc accordingly.
CREATE FUNCTION [db_name.]function_name AS class_name
  [USING JAR|FILE|ARCHIVE 'file_uri' [, JAR|FILE|ARCHIVE 'file_uri'] ];

For you other questions below:

1) how to load resources for function. (How to deal with jar/file/archive)
Let about consider the jar from the beginning. For file and archive, I will
do more study on the Hive side.
The basic idea of loading jar without dependency conflicts is to use
separate class loaders for different sessions.
I updated doc with the interface change required to achieve the goal.

2) how to pass properties to function.
It can be an setProperties function in UDF interface or a constructor with
Map with parameters.
As Bowen comments on the doc, I think we probably just need to let
customers provide such a constructor if they want to use properties in DDL.

3) How does python udf work?
It is not in the scope of this FLIP. I think the FLIP 78 will provide the
runtime support. Somehow, we just need to bridge the DDL with their runtime
interface.
But yes, this part needs to be added. But probably in the next phase after
the MVP is done.


Best Regards
Peter Huang







On Thu, Oct 24, 2019 at 11:07 PM Jingsong Li <[hidden email]> wrote:

> Hi Peter,
>
> Thanks for your proposal. The first thing I care about most is whether it
> can cover the needs of hive.
> Hive create function:
>
> CREATE FUNCTION [db_name.]function_name AS class_name
>   [USING JAR|FILE|ARCHIVE 'file_uri' [, JAR|FILE|ARCHIVE 'file_uri'] ];
>
> Hive support a list of resources, and support jar/file/archive, Maybe we
> need users to tell us exactly what kind of resources are. So we can see
> whether to add it to the ClassLoader or other processing?
>
> +1 for the internal implementation as timo said, like:
> - how to load resources for function. (How to deal with jar/file/archive)
> - how to pass properties to function.
> - How does python udf work? Hive use Transform command to run shell and
> python. It would be better if we could make clear how to do.
>
> Hope to get your reply~
>
> Best,
> Jingsong Lee
>
> On Thu, Oct 24, 2019 at 5:14 PM Timo Walther <[hidden email]> wrote:
>
> > Hi Peter,
> >
> > thanks for your proposal. I left some comments in the FLIP document. I
> > agree with Terry that we can have a MVP in Flink 1.10 but should already
> > discuss the bigger picture as a DDL string cannot be changed easily once
> > released.
> >
> > In particular we should discuss how resources for function are loaded.
> > If they are simply added to the JobGraph they are available to all
> > functions and could potentially interfere with each other, right?
> >
> > Thanks,
> > Timo
> >
> >
> >
> > On 24.10.19 05:32, Terry Wang wrote:
> > > Hi Peter,
> > >
> > > Sorry late to reply. Thanks for your efforts on this and I just looked
> > through your design.
> > > I left some comments in the doc about alter function section and
> > function catalog interface.
> > > IMO, the overall design is ok and we can discuss further more about
> some
> > details.
> > > I also think it’s necessary to have this awesome feature limit to basic
> > function (of course better to have all :) ) in 1.10 release.
> > >
> > > Best,
> > > Terry Wang
> > >
> > >
> > >
> > >> 2019年10月16日 14:19,Peter Huang <[hidden email]> 写道:
> > >>
> > >> Hi Xuefu,
> > >>
> > >> Thank you for the feedback. I think you are pointing out a similar
> > concern
> > >> with Bowen. Let me describe
> > >> how the catalog function and function factory will be changed in the
> > >> implementation section.
> > >> Then, we can have more discussion in detail.
> > >>
> > >>
> > >> Best Regards
> > >> Peter Huang
> > >>
> > >> On Tue, Oct 15, 2019 at 4:18 PM Xuefu Z <[hidden email]> wrote:
> > >>
> > >>> Thanks to Peter for the proposal!
> > >>>
> > >>> I left some comments in the google doc. Besides what Bowen pointed
> > out, I'm
> > >>> unclear about how things  work end to end from the document. For
> > instance,
> > >>> SQL DDL-like function definition is mentioned. I guess just having a
> > DDL
> > >>> for it doesn't explain how it's supported functionally. I think it's
> > better
> > >>> to have some clarification on what is expected work and what's for
> the
> > >>> future.
> > >>>
> > >>> Thanks,
> > >>> Xuefu
> > >>>
> > >>>
> > >>> On Tue, Oct 15, 2019 at 11:05 AM Bowen Li <[hidden email]>
> wrote:
> > >>>
> > >>>> Hi Zhenqiu,
> > >>>>
> > >>>> Thanks for taking on this effort!
> > >>>>
> > >>>> A couple questions:
> > >>>> - Though this FLIP is about function DDL, can we also think about
> how
> > the
> > >>>> created functions can be mapped to CatalogFunction and see if we
> need
> > to
> > >>>> modify CatalogFunction interface? Syntax changes need to be backed
> by
> > the
> > >>>> backend.
> > >>>> - Can we define a clearer, smaller scope targeting for Flink 1.10
> > among
> > >>> all
> > >>>> the proposed changes? The current overall scope seems to be quite
> > wide,
> > >>> and
> > >>>> it may be unrealistic to get everything in a single release, or
> even a
> > >>>> couple. However, I believe the most common user story can be
> > something as
> > >>>> simple as "being able to create and persist a java class-based udf
> and
> > >>> use
> > >>>> it later in queries", which will add great value for most Flink
> users
> > and
> > >>>> is achievable in 1.10.
> > >>>>
> > >>>> Bowen
> > >>>>
> > >>>> On Sun, Oct 13, 2019 at 10:46 PM Peter Huang <
> > [hidden email]
> > >>>>
> > >>>> wrote:
> > >>>>
> > >>>>> Dear Community,
> > >>>>>
> > >>>>> FLIP-79 Flink Function DDL Support
> > >>>>> <
> > >>>>>
> > >>>>
> > >>>
> >
> https://docs.google.com/document/d/16kkHlis80s61ifnIahCj-0IEdy5NJ1z-vGEJd_JuLog/edit#
> > >>>>>>
> > >>>>>
> > >>>>> This proposal aims to support function DDL with the consideration
> of
> > >>> SQL
> > >>>>> syntax, language compliance, and advanced external UDF lib
> > >>> registration.
> > >>>>> The Flink DDL is initialized and discussed in the design
> > >>>>> <
> > >>>>>
> > >>>>
> > >>>
> >
> https://docs.google.com/document/d/1TTP-GCC8wSsibJaSUyFZ_5NBAHYEB1FVmPpP7RgDGBA/edit#heading=h.wpsqidkaaoil
> > >>>>>>
> > >>>>> [1] by Shuyi Chen and Timo. As the initial discussion mainly
> focused
> > on
> > >>>> the
> > >>>>> table, type and view. FLIP-69 [2] extend it with a more detailed
> > >>>> discussion
> > >>>>> of DDL for catalog, database, and function. Original the function
> DDL
> > >>> was
> > >>>>> under the scope of FLIP-69. After some discussion
> > >>>>> <https://issues.apache.org/jira/browse/FLINK-7151> with the
> > community,
> > >>>> we
> > >>>>> found that there are several ongoing efforts, such as FLIP-64 [3],
> > >>>> FLIP-65
> > >>>>> [4], and FLIP-78 [5]. As they will directly impact the SQL syntax
> of
> > >>>>> function DDL, the proposal wants to describe the problem clearly
> with
> > >>> the
> > >>>>> consideration of existing works and make sure the design aligns
> with
> > >>>>> efforts of API change of temporary objects and type inference for
> UDF
> > >>>>> defined by different languages.
> > >>>>>
> > >>>>> The FlLIP outlines the requirements from related works, and
> propose a
> > >>> SQL
> > >>>>> syntax to meet those requirements. The corresponding implementation
> > is
> > >>>> also
> > >>>>> discussed. Please kindly review and give feedback.
> > >>>>>
> > >>>>>
> > >>>>> Best Regards
> > >>>>> Peter Huang
> > >>>>>
> > >>>>
> > >>>
> > >>>
> > >>> --
> > >>> Xuefu Zhang
> > >>>
> > >>> "In Honey We Trust!"
> > >>>
> >
> >
>
> --
> Best, Jingsong Lee
>
Reply | Threaded
Open this post in threaded view
|

Re: [Discussion] FLIP-79 Flink Function DDL Support

Peter Huang
In reply to this post by bowen.li
Hi Bowen,

I can't agree more about we first have an agreement on the DDL syntax and
focus on the MVP in the current phase.

1) what's the syntax to distinguish function language
Currently, there are two opinions:

   - USING 'python .....'
   - [LANGUAGE JVM|PYTHON] USING JAR '...'

As we need to support multiple resources as HQL, we shouldn't repeat the
language symbol as a suffix of each resource.
I would prefer option two, but definitely open to more comments.

2) How to persist function language in backend catalog? as a k-v pair in
properties map, or a dedicate field?
Even though language type is also a property, I think a separate field in
CatalogFunction is a more clean solution.

3) do we really need to allow users set a properties map for udf? what needs
to be stored there? what are they used for?

I am considering a type of use case that use UDFS for realtime inference.
The model is nested in the udf as a resource. But there are
multiple parameters are customizable. In this way, user can use properties
to define those parameters.

I only have answers to these questions. For questions about the catalog
implementation, I hope we can collect more feedback from the community.


Best Regards
Peter Huang





Best Regards
Peter Huang

On Tue, Oct 29, 2019 at 11:31 AM Bowen Li <[hidden email]> wrote:

> Hi all,
>
> Besides all the good questions raised above, we seem all agree to have a
> MVP for Flink 1.10, "to support users to create and persist a java
> class-based udf that's already in classpath (no extra resource loading),
> and use it later in queries".
>
> IIUIC, to achieve that in 1.10, the following are currently the core
> issues/blockers we should figure out, and solve them as our **highest
> priority**:
>
> - what's the syntax to distinguish function language (java, scala, python,
> etc)? we only need to implement the java one in 1.10 but have to settle
> down the long term solution
> - how to persist function language in backend catalog? as a k-v pair in
> properties map, or a dedicate field?
> - do we really need to allow users set a properties map for udf? what needs
> to be stored there? what are they used for?
> - should a catalog impl be able to decide whether it can take a properties
> map (if we decide to have one), and which language of a udf it can persist?
>    - E.g. Hive metastore, which backs Flink's HiveCatalog, cannot take a
> properties map and is only able to persist java udf [1], unless we do
> something hacky to it
>
> I feel these questions are essential to Flink functions in the long run,
> but most importantly, are also the minimum scope for Flink 1.10. Aspects
> like resource loading security or compatibility with Hive syntax are
> important too, however if we focus on them now, we may not be able to get
> the MVP out in time.
>
> [1]
> -
>
> https://hive.apache.org/javadocs/r3.1.2/api/org/apache/hadoop/hive/metastore/api/Function.html
> -
>
> https://hive.apache.org/javadocs/r3.1.2/api/org/apache/hadoop/hive/metastore/api/FunctionType.html
>
>
>
> On Sun, Oct 27, 2019 at 8:22 PM Peter Huang <[hidden email]>
> wrote:
>
> > Hi Timo,
> >
> > Thanks for the feedback. I replied and adjust the design accordingly. For
> > the concern of class loading.
> > I think we need to distinguish the function class loading for Temporary
> and
> > Permanent function.
> >
> > 1) For Permanent function, we can add it to the job graph so that we
> don't
> > need to load it multiple times for the different sessions.
> > 2) For Temporary function, we can register function with a session key,
> and
> > use different class loaders in RuntimeContext implementation.
> >
> > I added more description in the doc. Please review it again.
> >
> >
> > Best Regards
> > Peter Huang
> >
> >
> >
> >
> > On Thu, Oct 24, 2019 at 2:14 AM Timo Walther <[hidden email]> wrote:
> >
> > > Hi Peter,
> > >
> > > thanks for your proposal. I left some comments in the FLIP document. I
> > > agree with Terry that we can have a MVP in Flink 1.10 but should
> already
> > > discuss the bigger picture as a DDL string cannot be changed easily
> once
> > > released.
> > >
> > > In particular we should discuss how resources for function are loaded.
> > > If they are simply added to the JobGraph they are available to all
> > > functions and could potentially interfere with each other, right?
> > >
> > > Thanks,
> > > Timo
> > >
> > >
> > >
> > > On 24.10.19 05:32, Terry Wang wrote:
> > > > Hi Peter,
> > > >
> > > > Sorry late to reply. Thanks for your efforts on this and I just
> looked
> > > through your design.
> > > > I left some comments in the doc about alter function section and
> > > function catalog interface.
> > > > IMO, the overall design is ok and we can discuss further more about
> > some
> > > details.
> > > > I also think it’s necessary to have this awesome feature limit to
> basic
> > > function (of course better to have all :) ) in 1.10 release.
> > > >
> > > > Best,
> > > > Terry Wang
> > > >
> > > >
> > > >
> > > >> 2019年10月16日 14:19,Peter Huang <[hidden email]> 写道:
> > > >>
> > > >> Hi Xuefu,
> > > >>
> > > >> Thank you for the feedback. I think you are pointing out a similar
> > > concern
> > > >> with Bowen. Let me describe
> > > >> how the catalog function and function factory will be changed in the
> > > >> implementation section.
> > > >> Then, we can have more discussion in detail.
> > > >>
> > > >>
> > > >> Best Regards
> > > >> Peter Huang
> > > >>
> > > >> On Tue, Oct 15, 2019 at 4:18 PM Xuefu Z <[hidden email]> wrote:
> > > >>
> > > >>> Thanks to Peter for the proposal!
> > > >>>
> > > >>> I left some comments in the google doc. Besides what Bowen pointed
> > > out, I'm
> > > >>> unclear about how things  work end to end from the document. For
> > > instance,
> > > >>> SQL DDL-like function definition is mentioned. I guess just having
> a
> > > DDL
> > > >>> for it doesn't explain how it's supported functionally. I think
> it's
> > > better
> > > >>> to have some clarification on what is expected work and what's for
> > the
> > > >>> future.
> > > >>>
> > > >>> Thanks,
> > > >>> Xuefu
> > > >>>
> > > >>>
> > > >>> On Tue, Oct 15, 2019 at 11:05 AM Bowen Li <[hidden email]>
> > wrote:
> > > >>>
> > > >>>> Hi Zhenqiu,
> > > >>>>
> > > >>>> Thanks for taking on this effort!
> > > >>>>
> > > >>>> A couple questions:
> > > >>>> - Though this FLIP is about function DDL, can we also think about
> > how
> > > the
> > > >>>> created functions can be mapped to CatalogFunction and see if we
> > need
> > > to
> > > >>>> modify CatalogFunction interface? Syntax changes need to be backed
> > by
> > > the
> > > >>>> backend.
> > > >>>> - Can we define a clearer, smaller scope targeting for Flink 1.10
> > > among
> > > >>> all
> > > >>>> the proposed changes? The current overall scope seems to be quite
> > > wide,
> > > >>> and
> > > >>>> it may be unrealistic to get everything in a single release, or
> > even a
> > > >>>> couple. However, I believe the most common user story can be
> > > something as
> > > >>>> simple as "being able to create and persist a java class-based udf
> > and
> > > >>> use
> > > >>>> it later in queries", which will add great value for most Flink
> > users
> > > and
> > > >>>> is achievable in 1.10.
> > > >>>>
> > > >>>> Bowen
> > > >>>>
> > > >>>> On Sun, Oct 13, 2019 at 10:46 PM Peter Huang <
> > > [hidden email]
> > > >>>>
> > > >>>> wrote:
> > > >>>>
> > > >>>>> Dear Community,
> > > >>>>>
> > > >>>>> FLIP-79 Flink Function DDL Support
> > > >>>>> <
> > > >>>>>
> > > >>>>
> > > >>>
> > >
> >
> https://docs.google.com/document/d/16kkHlis80s61ifnIahCj-0IEdy5NJ1z-vGEJd_JuLog/edit#
> > > >>>>>>
> > > >>>>>
> > > >>>>> This proposal aims to support function DDL with the consideration
> > of
> > > >>> SQL
> > > >>>>> syntax, language compliance, and advanced external UDF lib
> > > >>> registration.
> > > >>>>> The Flink DDL is initialized and discussed in the design
> > > >>>>> <
> > > >>>>>
> > > >>>>
> > > >>>
> > >
> >
> https://docs.google.com/document/d/1TTP-GCC8wSsibJaSUyFZ_5NBAHYEB1FVmPpP7RgDGBA/edit#heading=h.wpsqidkaaoil
> > > >>>>>>
> > > >>>>> [1] by Shuyi Chen and Timo. As the initial discussion mainly
> > focused
> > > on
> > > >>>> the
> > > >>>>> table, type and view. FLIP-69 [2] extend it with a more detailed
> > > >>>> discussion
> > > >>>>> of DDL for catalog, database, and function. Original the function
> > DDL
> > > >>> was
> > > >>>>> under the scope of FLIP-69. After some discussion
> > > >>>>> <https://issues.apache.org/jira/browse/FLINK-7151> with the
> > > community,
> > > >>>> we
> > > >>>>> found that there are several ongoing efforts, such as FLIP-64
> [3],
> > > >>>> FLIP-65
> > > >>>>> [4], and FLIP-78 [5]. As they will directly impact the SQL syntax
> > of
> > > >>>>> function DDL, the proposal wants to describe the problem clearly
> > with
> > > >>> the
> > > >>>>> consideration of existing works and make sure the design aligns
> > with
> > > >>>>> efforts of API change of temporary objects and type inference for
> > UDF
> > > >>>>> defined by different languages.
> > > >>>>>
> > > >>>>> The FlLIP outlines the requirements from related works, and
> > propose a
> > > >>> SQL
> > > >>>>> syntax to meet those requirements. The corresponding
> implementation
> > > is
> > > >>>> also
> > > >>>>> discussed. Please kindly review and give feedback.
> > > >>>>>
> > > >>>>>
> > > >>>>> Best Regards
> > > >>>>> Peter Huang
> > > >>>>>
> > > >>>>
> > > >>>
> > > >>>
> > > >>> --
> > > >>> Xuefu Zhang
> > > >>>
> > > >>> "In Honey We Trust!"
> > > >>>
> > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [Discussion] FLIP-79 Flink Function DDL Support

Terry Wang
Hi Peter,

I’d like to share some thoughts from mysids:
1. what's the syntax to distinguish function language ?
        +1 for using `[LANGUAGE JVM|PYTHON] USING JAR`
2. How to persist function language in backend catalog ?
        + 1 for a separate field in CatalogFunction. But as to specific backend, we may persist it case by case. Special case includes how HiveCatalog store the kind of CatalogFucnction.
3. do we really need to allow users set a properties map for a udf?
    There are use case requiring passing external arguments to udf for sure, but the need can also be met by passing arguments to `eval` when calling udf in sql.
IMO, there is not much need to support set properties map for a udf.

4. Should a catalog implement to be able to decide whether it can take a properties map, and which language of a udf it can persist?
IMO, it’s necessary for catalog implementation to provide such information. But for flink 1.10 map goal, we can just skip this part.



Best,
Terry Wang



> 2019年10月30日 13:52,Peter Huang <[hidden email]> 写道:
>
> Hi Bowen,
>
> I can't agree more about we first have an agreement on the DDL syntax and
> focus on the MVP in the current phase.
>
> 1) what's the syntax to distinguish function language
> Currently, there are two opinions:
>
>   - USING 'python .....'
>   - [LANGUAGE JVM|PYTHON] USING JAR '...'
>
> As we need to support multiple resources as HQL, we shouldn't repeat the
> language symbol as a suffix of each resource.
> I would prefer option two, but definitely open to more comments.
>
> 2) How to persist function language in backend catalog? as a k-v pair in
> properties map, or a dedicate field?
> Even though language type is also a property, I think a separate field in
> CatalogFunction is a more clean solution.
>
> 3) do we really need to allow users set a properties map for udf? what needs
> to be stored there? what are they used for?
>
> I am considering a type of use case that use UDFS for realtime inference.
> The model is nested in the udf as a resource. But there are
> multiple parameters are customizable. In this way, user can use properties
> to define those parameters.
>
> I only have answers to these questions. For questions about the catalog
> implementation, I hope we can collect more feedback from the community.
>
>
> Best Regards
> Peter Huang
>
>
>
>
>
> Best Regards
> Peter Huang
>
> On Tue, Oct 29, 2019 at 11:31 AM Bowen Li <[hidden email]> wrote:
>
>> Hi all,
>>
>> Besides all the good questions raised above, we seem all agree to have a
>> MVP for Flink 1.10, "to support users to create and persist a java
>> class-based udf that's already in classpath (no extra resource loading),
>> and use it later in queries".
>>
>> IIUIC, to achieve that in 1.10, the following are currently the core
>> issues/blockers we should figure out, and solve them as our **highest
>> priority**:
>>
>> - what's the syntax to distinguish function language (java, scala, python,
>> etc)? we only need to implement the java one in 1.10 but have to settle
>> down the long term solution
>> - how to persist function language in backend catalog? as a k-v pair in
>> properties map, or a dedicate field?
>> - do we really need to allow users set a properties map for udf? what needs
>> to be stored there? what are they used for?
>> - should a catalog impl be able to decide whether it can take a properties
>> map (if we decide to have one), and which language of a udf it can persist?
>>   - E.g. Hive metastore, which backs Flink's HiveCatalog, cannot take a
>> properties map and is only able to persist java udf [1], unless we do
>> something hacky to it
>>
>> I feel these questions are essential to Flink functions in the long run,
>> but most importantly, are also the minimum scope for Flink 1.10. Aspects
>> like resource loading security or compatibility with Hive syntax are
>> important too, however if we focus on them now, we may not be able to get
>> the MVP out in time.
>>
>> [1]
>> -
>>
>> https://hive.apache.org/javadocs/r3.1.2/api/org/apache/hadoop/hive/metastore/api/Function.html
>> -
>>
>> https://hive.apache.org/javadocs/r3.1.2/api/org/apache/hadoop/hive/metastore/api/FunctionType.html
>>
>>
>>
>> On Sun, Oct 27, 2019 at 8:22 PM Peter Huang <[hidden email]>
>> wrote:
>>
>>> Hi Timo,
>>>
>>> Thanks for the feedback. I replied and adjust the design accordingly. For
>>> the concern of class loading.
>>> I think we need to distinguish the function class loading for Temporary
>> and
>>> Permanent function.
>>>
>>> 1) For Permanent function, we can add it to the job graph so that we
>> don't
>>> need to load it multiple times for the different sessions.
>>> 2) For Temporary function, we can register function with a session key,
>> and
>>> use different class loaders in RuntimeContext implementation.
>>>
>>> I added more description in the doc. Please review it again.
>>>
>>>
>>> Best Regards
>>> Peter Huang
>>>
>>>
>>>
>>>
>>> On Thu, Oct 24, 2019 at 2:14 AM Timo Walther <[hidden email]> wrote:
>>>
>>>> Hi Peter,
>>>>
>>>> thanks for your proposal. I left some comments in the FLIP document. I
>>>> agree with Terry that we can have a MVP in Flink 1.10 but should
>> already
>>>> discuss the bigger picture as a DDL string cannot be changed easily
>> once
>>>> released.
>>>>
>>>> In particular we should discuss how resources for function are loaded.
>>>> If they are simply added to the JobGraph they are available to all
>>>> functions and could potentially interfere with each other, right?
>>>>
>>>> Thanks,
>>>> Timo
>>>>
>>>>
>>>>
>>>> On 24.10.19 05:32, Terry Wang wrote:
>>>>> Hi Peter,
>>>>>
>>>>> Sorry late to reply. Thanks for your efforts on this and I just
>> looked
>>>> through your design.
>>>>> I left some comments in the doc about alter function section and
>>>> function catalog interface.
>>>>> IMO, the overall design is ok and we can discuss further more about
>>> some
>>>> details.
>>>>> I also think it’s necessary to have this awesome feature limit to
>> basic
>>>> function (of course better to have all :) ) in 1.10 release.
>>>>>
>>>>> Best,
>>>>> Terry Wang
>>>>>
>>>>>
>>>>>
>>>>>> 2019年10月16日 14:19,Peter Huang <[hidden email]> 写道:
>>>>>>
>>>>>> Hi Xuefu,
>>>>>>
>>>>>> Thank you for the feedback. I think you are pointing out a similar
>>>> concern
>>>>>> with Bowen. Let me describe
>>>>>> how the catalog function and function factory will be changed in the
>>>>>> implementation section.
>>>>>> Then, we can have more discussion in detail.
>>>>>>
>>>>>>
>>>>>> Best Regards
>>>>>> Peter Huang
>>>>>>
>>>>>> On Tue, Oct 15, 2019 at 4:18 PM Xuefu Z <[hidden email]> wrote:
>>>>>>
>>>>>>> Thanks to Peter for the proposal!
>>>>>>>
>>>>>>> I left some comments in the google doc. Besides what Bowen pointed
>>>> out, I'm
>>>>>>> unclear about how things  work end to end from the document. For
>>>> instance,
>>>>>>> SQL DDL-like function definition is mentioned. I guess just having
>> a
>>>> DDL
>>>>>>> for it doesn't explain how it's supported functionally. I think
>> it's
>>>> better
>>>>>>> to have some clarification on what is expected work and what's for
>>> the
>>>>>>> future.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Xuefu
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Oct 15, 2019 at 11:05 AM Bowen Li <[hidden email]>
>>> wrote:
>>>>>>>
>>>>>>>> Hi Zhenqiu,
>>>>>>>>
>>>>>>>> Thanks for taking on this effort!
>>>>>>>>
>>>>>>>> A couple questions:
>>>>>>>> - Though this FLIP is about function DDL, can we also think about
>>> how
>>>> the
>>>>>>>> created functions can be mapped to CatalogFunction and see if we
>>> need
>>>> to
>>>>>>>> modify CatalogFunction interface? Syntax changes need to be backed
>>> by
>>>> the
>>>>>>>> backend.
>>>>>>>> - Can we define a clearer, smaller scope targeting for Flink 1.10
>>>> among
>>>>>>> all
>>>>>>>> the proposed changes? The current overall scope seems to be quite
>>>> wide,
>>>>>>> and
>>>>>>>> it may be unrealistic to get everything in a single release, or
>>> even a
>>>>>>>> couple. However, I believe the most common user story can be
>>>> something as
>>>>>>>> simple as "being able to create and persist a java class-based udf
>>> and
>>>>>>> use
>>>>>>>> it later in queries", which will add great value for most Flink
>>> users
>>>> and
>>>>>>>> is achievable in 1.10.
>>>>>>>>
>>>>>>>> Bowen
>>>>>>>>
>>>>>>>> On Sun, Oct 13, 2019 at 10:46 PM Peter Huang <
>>>> [hidden email]
>>>>>>>>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Dear Community,
>>>>>>>>>
>>>>>>>>> FLIP-79 Flink Function DDL Support
>>>>>>>>> <
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>
>>>
>> https://docs.google.com/document/d/16kkHlis80s61ifnIahCj-0IEdy5NJ1z-vGEJd_JuLog/edit#
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> This proposal aims to support function DDL with the consideration
>>> of
>>>>>>> SQL
>>>>>>>>> syntax, language compliance, and advanced external UDF lib
>>>>>>> registration.
>>>>>>>>> The Flink DDL is initialized and discussed in the design
>>>>>>>>> <
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>
>>>
>> https://docs.google.com/document/d/1TTP-GCC8wSsibJaSUyFZ_5NBAHYEB1FVmPpP7RgDGBA/edit#heading=h.wpsqidkaaoil
>>>>>>>>>>
>>>>>>>>> [1] by Shuyi Chen and Timo. As the initial discussion mainly
>>> focused
>>>> on
>>>>>>>> the
>>>>>>>>> table, type and view. FLIP-69 [2] extend it with a more detailed
>>>>>>>> discussion
>>>>>>>>> of DDL for catalog, database, and function. Original the function
>>> DDL
>>>>>>> was
>>>>>>>>> under the scope of FLIP-69. After some discussion
>>>>>>>>> <https://issues.apache.org/jira/browse/FLINK-7151> with the
>>>> community,
>>>>>>>> we
>>>>>>>>> found that there are several ongoing efforts, such as FLIP-64
>> [3],
>>>>>>>> FLIP-65
>>>>>>>>> [4], and FLIP-78 [5]. As they will directly impact the SQL syntax
>>> of
>>>>>>>>> function DDL, the proposal wants to describe the problem clearly
>>> with
>>>>>>> the
>>>>>>>>> consideration of existing works and make sure the design aligns
>>> with
>>>>>>>>> efforts of API change of temporary objects and type inference for
>>> UDF
>>>>>>>>> defined by different languages.
>>>>>>>>>
>>>>>>>>> The FlLIP outlines the requirements from related works, and
>>> propose a
>>>>>>> SQL
>>>>>>>>> syntax to meet those requirements. The corresponding
>> implementation
>>>> is
>>>>>>>> also
>>>>>>>>> discussed. Please kindly review and give feedback.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Best Regards
>>>>>>>>> Peter Huang
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Xuefu Zhang
>>>>>>>
>>>>>>> "In Honey We Trust!"
>>>>>>>
>>>>
>>>>
>>>
>>

Reply | Threaded
Open this post in threaded view
|

Re: [Discussion] FLIP-79 Flink Function DDL Support

Peter Huang
Hi Terry,

Thanks for the quick response. We are on the same page. For the
properties of function DDL, let's see whether there is such a need from
other people.
I will start voting on the design in 24 hours.


Best Regards
Peter Huang







On Thu, Oct 31, 2019 at 3:18 AM Terry Wang <[hidden email]> wrote:

> Hi Peter,
>
> I’d like to share some thoughts from mysids:
> 1. what's the syntax to distinguish function language ?
>         +1 for using `[LANGUAGE JVM|PYTHON] USING JAR`
> 2. How to persist function language in backend catalog ?
>         + 1 for a separate field in CatalogFunction. But as to specific
> backend, we may persist it case by case. Special case includes how
> HiveCatalog store the kind of CatalogFucnction.
> 3. do we really need to allow users set a properties map for a udf?
>     There are use case requiring passing external arguments to udf for
> sure, but the need can also be met by passing arguments to `eval` when
> calling udf in sql.
> IMO, there is not much need to support set properties map for a udf.
>
> 4. Should a catalog implement to be able to decide whether it can take a
> properties map, and which language of a udf it can persist?
> IMO, it’s necessary for catalog implementation to provide such
> information. But for flink 1.10 map goal, we can just skip this part.
>
>
>
> Best,
> Terry Wang
>
>
>
> > 2019年10月30日 13:52,Peter Huang <[hidden email]> 写道:
> >
> > Hi Bowen,
> >
> > I can't agree more about we first have an agreement on the DDL syntax and
> > focus on the MVP in the current phase.
> >
> > 1) what's the syntax to distinguish function language
> > Currently, there are two opinions:
> >
> >   - USING 'python .....'
> >   - [LANGUAGE JVM|PYTHON] USING JAR '...'
> >
> > As we need to support multiple resources as HQL, we shouldn't repeat the
> > language symbol as a suffix of each resource.
> > I would prefer option two, but definitely open to more comments.
> >
> > 2) How to persist function language in backend catalog? as a k-v pair in
> > properties map, or a dedicate field?
> > Even though language type is also a property, I think a separate field in
> > CatalogFunction is a more clean solution.
> >
> > 3) do we really need to allow users set a properties map for udf? what
> needs
> > to be stored there? what are they used for?
> >
> > I am considering a type of use case that use UDFS for realtime inference.
> > The model is nested in the udf as a resource. But there are
> > multiple parameters are customizable. In this way, user can use
> properties
> > to define those parameters.
> >
> > I only have answers to these questions. For questions about the catalog
> > implementation, I hope we can collect more feedback from the community.
> >
> >
> > Best Regards
> > Peter Huang
> >
> >
> >
> >
> >
> > Best Regards
> > Peter Huang
> >
> > On Tue, Oct 29, 2019 at 11:31 AM Bowen Li <[hidden email]> wrote:
> >
> >> Hi all,
> >>
> >> Besides all the good questions raised above, we seem all agree to have a
> >> MVP for Flink 1.10, "to support users to create and persist a java
> >> class-based udf that's already in classpath (no extra resource loading),
> >> and use it later in queries".
> >>
> >> IIUIC, to achieve that in 1.10, the following are currently the core
> >> issues/blockers we should figure out, and solve them as our **highest
> >> priority**:
> >>
> >> - what's the syntax to distinguish function language (java, scala,
> python,
> >> etc)? we only need to implement the java one in 1.10 but have to settle
> >> down the long term solution
> >> - how to persist function language in backend catalog? as a k-v pair in
> >> properties map, or a dedicate field?
> >> - do we really need to allow users set a properties map for udf? what
> needs
> >> to be stored there? what are they used for?
> >> - should a catalog impl be able to decide whether it can take a
> properties
> >> map (if we decide to have one), and which language of a udf it can
> persist?
> >>   - E.g. Hive metastore, which backs Flink's HiveCatalog, cannot take a
> >> properties map and is only able to persist java udf [1], unless we do
> >> something hacky to it
> >>
> >> I feel these questions are essential to Flink functions in the long run,
> >> but most importantly, are also the minimum scope for Flink 1.10. Aspects
> >> like resource loading security or compatibility with Hive syntax are
> >> important too, however if we focus on them now, we may not be able to
> get
> >> the MVP out in time.
> >>
> >> [1]
> >> -
> >>
> >>
> https://hive.apache.org/javadocs/r3.1.2/api/org/apache/hadoop/hive/metastore/api/Function.html
> >> -
> >>
> >>
> https://hive.apache.org/javadocs/r3.1.2/api/org/apache/hadoop/hive/metastore/api/FunctionType.html
> >>
> >>
> >>
> >> On Sun, Oct 27, 2019 at 8:22 PM Peter Huang <[hidden email]
> >
> >> wrote:
> >>
> >>> Hi Timo,
> >>>
> >>> Thanks for the feedback. I replied and adjust the design accordingly.
> For
> >>> the concern of class loading.
> >>> I think we need to distinguish the function class loading for Temporary
> >> and
> >>> Permanent function.
> >>>
> >>> 1) For Permanent function, we can add it to the job graph so that we
> >> don't
> >>> need to load it multiple times for the different sessions.
> >>> 2) For Temporary function, we can register function with a session key,
> >> and
> >>> use different class loaders in RuntimeContext implementation.
> >>>
> >>> I added more description in the doc. Please review it again.
> >>>
> >>>
> >>> Best Regards
> >>> Peter Huang
> >>>
> >>>
> >>>
> >>>
> >>> On Thu, Oct 24, 2019 at 2:14 AM Timo Walther <[hidden email]>
> wrote:
> >>>
> >>>> Hi Peter,
> >>>>
> >>>> thanks for your proposal. I left some comments in the FLIP document. I
> >>>> agree with Terry that we can have a MVP in Flink 1.10 but should
> >> already
> >>>> discuss the bigger picture as a DDL string cannot be changed easily
> >> once
> >>>> released.
> >>>>
> >>>> In particular we should discuss how resources for function are loaded.
> >>>> If they are simply added to the JobGraph they are available to all
> >>>> functions and could potentially interfere with each other, right?
> >>>>
> >>>> Thanks,
> >>>> Timo
> >>>>
> >>>>
> >>>>
> >>>> On 24.10.19 05:32, Terry Wang wrote:
> >>>>> Hi Peter,
> >>>>>
> >>>>> Sorry late to reply. Thanks for your efforts on this and I just
> >> looked
> >>>> through your design.
> >>>>> I left some comments in the doc about alter function section and
> >>>> function catalog interface.
> >>>>> IMO, the overall design is ok and we can discuss further more about
> >>> some
> >>>> details.
> >>>>> I also think it’s necessary to have this awesome feature limit to
> >> basic
> >>>> function (of course better to have all :) ) in 1.10 release.
> >>>>>
> >>>>> Best,
> >>>>> Terry Wang
> >>>>>
> >>>>>
> >>>>>
> >>>>>> 2019年10月16日 14:19,Peter Huang <[hidden email]> 写道:
> >>>>>>
> >>>>>> Hi Xuefu,
> >>>>>>
> >>>>>> Thank you for the feedback. I think you are pointing out a similar
> >>>> concern
> >>>>>> with Bowen. Let me describe
> >>>>>> how the catalog function and function factory will be changed in the
> >>>>>> implementation section.
> >>>>>> Then, we can have more discussion in detail.
> >>>>>>
> >>>>>>
> >>>>>> Best Regards
> >>>>>> Peter Huang
> >>>>>>
> >>>>>> On Tue, Oct 15, 2019 at 4:18 PM Xuefu Z <[hidden email]> wrote:
> >>>>>>
> >>>>>>> Thanks to Peter for the proposal!
> >>>>>>>
> >>>>>>> I left some comments in the google doc. Besides what Bowen pointed
> >>>> out, I'm
> >>>>>>> unclear about how things  work end to end from the document. For
> >>>> instance,
> >>>>>>> SQL DDL-like function definition is mentioned. I guess just having
> >> a
> >>>> DDL
> >>>>>>> for it doesn't explain how it's supported functionally. I think
> >> it's
> >>>> better
> >>>>>>> to have some clarification on what is expected work and what's for
> >>> the
> >>>>>>> future.
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> Xuefu
> >>>>>>>
> >>>>>>>
> >>>>>>> On Tue, Oct 15, 2019 at 11:05 AM Bowen Li <[hidden email]>
> >>> wrote:
> >>>>>>>
> >>>>>>>> Hi Zhenqiu,
> >>>>>>>>
> >>>>>>>> Thanks for taking on this effort!
> >>>>>>>>
> >>>>>>>> A couple questions:
> >>>>>>>> - Though this FLIP is about function DDL, can we also think about
> >>> how
> >>>> the
> >>>>>>>> created functions can be mapped to CatalogFunction and see if we
> >>> need
> >>>> to
> >>>>>>>> modify CatalogFunction interface? Syntax changes need to be backed
> >>> by
> >>>> the
> >>>>>>>> backend.
> >>>>>>>> - Can we define a clearer, smaller scope targeting for Flink 1.10
> >>>> among
> >>>>>>> all
> >>>>>>>> the proposed changes? The current overall scope seems to be quite
> >>>> wide,
> >>>>>>> and
> >>>>>>>> it may be unrealistic to get everything in a single release, or
> >>> even a
> >>>>>>>> couple. However, I believe the most common user story can be
> >>>> something as
> >>>>>>>> simple as "being able to create and persist a java class-based udf
> >>> and
> >>>>>>> use
> >>>>>>>> it later in queries", which will add great value for most Flink
> >>> users
> >>>> and
> >>>>>>>> is achievable in 1.10.
> >>>>>>>>
> >>>>>>>> Bowen
> >>>>>>>>
> >>>>>>>> On Sun, Oct 13, 2019 at 10:46 PM Peter Huang <
> >>>> [hidden email]
> >>>>>>>>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> Dear Community,
> >>>>>>>>>
> >>>>>>>>> FLIP-79 Flink Function DDL Support
> >>>>>>>>> <
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>
> >>>
> >>
> https://docs.google.com/document/d/16kkHlis80s61ifnIahCj-0IEdy5NJ1z-vGEJd_JuLog/edit#
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> This proposal aims to support function DDL with the consideration
> >>> of
> >>>>>>> SQL
> >>>>>>>>> syntax, language compliance, and advanced external UDF lib
> >>>>>>> registration.
> >>>>>>>>> The Flink DDL is initialized and discussed in the design
> >>>>>>>>> <
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>
> >>>
> >>
> https://docs.google.com/document/d/1TTP-GCC8wSsibJaSUyFZ_5NBAHYEB1FVmPpP7RgDGBA/edit#heading=h.wpsqidkaaoil
> >>>>>>>>>>
> >>>>>>>>> [1] by Shuyi Chen and Timo. As the initial discussion mainly
> >>> focused
> >>>> on
> >>>>>>>> the
> >>>>>>>>> table, type and view. FLIP-69 [2] extend it with a more detailed
> >>>>>>>> discussion
> >>>>>>>>> of DDL for catalog, database, and function. Original the function
> >>> DDL
> >>>>>>> was
> >>>>>>>>> under the scope of FLIP-69. After some discussion
> >>>>>>>>> <https://issues.apache.org/jira/browse/FLINK-7151> with the
> >>>> community,
> >>>>>>>> we
> >>>>>>>>> found that there are several ongoing efforts, such as FLIP-64
> >> [3],
> >>>>>>>> FLIP-65
> >>>>>>>>> [4], and FLIP-78 [5]. As they will directly impact the SQL syntax
> >>> of
> >>>>>>>>> function DDL, the proposal wants to describe the problem clearly
> >>> with
> >>>>>>> the
> >>>>>>>>> consideration of existing works and make sure the design aligns
> >>> with
> >>>>>>>>> efforts of API change of temporary objects and type inference for
> >>> UDF
> >>>>>>>>> defined by different languages.
> >>>>>>>>>
> >>>>>>>>> The FlLIP outlines the requirements from related works, and
> >>> propose a
> >>>>>>> SQL
> >>>>>>>>> syntax to meet those requirements. The corresponding
> >> implementation
> >>>> is
> >>>>>>>> also
> >>>>>>>>> discussed. Please kindly review and give feedback.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Best Regards
> >>>>>>>>> Peter Huang
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>> Xuefu Zhang
> >>>>>>>
> >>>>>>> "In Honey We Trust!"
> >>>>>>>
> >>>>
> >>>>
> >>>
> >>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [Discussion] FLIP-79 Flink Function DDL Support

bowen.li
Re 1) I'd prefer syntax of [LANGUAGE JVM|PYTHON|...]. It's also adopted by
Postgres [1] and MySQL [2]

"USING 'python .....' " seems need extra parsing of the content in single
quotes, which is not very ideal.

Re 2) I agree.

Besides, the doc proposes new field to be a string. I think it's better be
an enum, say LanguangeType. JAVA and PYTHON can be the only values
available for now.

Re 3) I think we can re-evaluate the situation when requirements come, and
can remove properties for now. Afterall, the interface can evolve. Please
update the doc to remove the properties field.


W.r.t voting, can we have a dedicated section for Flink 1.10 and include
all the outcome we reached consensus so far? I think we are only gonna vote
for that section, rather than the full FLIP-79, right?

[1] https://www.postgresql.org/docs/9.5/sql-createfunction.html
[2] https://dev.mysql.com/doc/refman/8.0/en/create-procedure.html


On Thu, Oct 31, 2019 at 10:30 PM Peter Huang <[hidden email]>
wrote:

> Hi Terry,
>
> Thanks for the quick response. We are on the same page. For the
> properties of function DDL, let's see whether there is such a need from
> other people.
> I will start voting on the design in 24 hours.
>
>
> Best Regards
> Peter Huang
>
>
>
>
>
>
>
> On Thu, Oct 31, 2019 at 3:18 AM Terry Wang <[hidden email]> wrote:
>
> > Hi Peter,
> >
> > I’d like to share some thoughts from mysids:
> > 1. what's the syntax to distinguish function language ?
> >         +1 for using `[LANGUAGE JVM|PYTHON] USING JAR`
> > 2. How to persist function language in backend catalog ?
> >         + 1 for a separate field in CatalogFunction. But as to specific
> > backend, we may persist it case by case. Special case includes how
> > HiveCatalog store the kind of CatalogFucnction.
> > 3. do we really need to allow users set a properties map for a udf?
> >     There are use case requiring passing external arguments to udf for
> > sure, but the need can also be met by passing arguments to `eval` when
> > calling udf in sql.
> > IMO, there is not much need to support set properties map for a udf.
> >
> > 4. Should a catalog implement to be able to decide whether it can take a
> > properties map, and which language of a udf it can persist?
> > IMO, it’s necessary for catalog implementation to provide such
> > information. But for flink 1.10 map goal, we can just skip this part.
> >
> >
> >
> > Best,
> > Terry Wang
> >
> >
> >
> > > 2019年10月30日 13:52,Peter Huang <[hidden email]> 写道:
> > >
> > > Hi Bowen,
> > >
> > > I can't agree more about we first have an agreement on the DDL syntax
> and
> > > focus on the MVP in the current phase.
> > >
> > > 1) what's the syntax to distinguish function language
> > > Currently, there are two opinions:
> > >
> > >   - USING 'python .....'
> > >   - [LANGUAGE JVM|PYTHON] USING JAR '...'
> > >
> > > As we need to support multiple resources as HQL, we shouldn't repeat
> the
> > > language symbol as a suffix of each resource.
> > > I would prefer option two, but definitely open to more comments.
> > >
> > > 2) How to persist function language in backend catalog? as a k-v pair
> in
> > > properties map, or a dedicate field?
> > > Even though language type is also a property, I think a separate field
> in
> > > CatalogFunction is a more clean solution.
> > >
> > > 3) do we really need to allow users set a properties map for udf? what
> > needs
> > > to be stored there? what are they used for?
> > >
> > > I am considering a type of use case that use UDFS for realtime
> inference.
> > > The model is nested in the udf as a resource. But there are
> > > multiple parameters are customizable. In this way, user can use
> > properties
> > > to define those parameters.
> > >
> > > I only have answers to these questions. For questions about the catalog
> > > implementation, I hope we can collect more feedback from the community.
> > >
> > >
> > > Best Regards
> > > Peter Huang
> > >
> > >
> > >
> > >
> > >
> > > Best Regards
> > > Peter Huang
> > >
> > > On Tue, Oct 29, 2019 at 11:31 AM Bowen Li <[hidden email]> wrote:
> > >
> > >> Hi all,
> > >>
> > >> Besides all the good questions raised above, we seem all agree to
> have a
> > >> MVP for Flink 1.10, "to support users to create and persist a java
> > >> class-based udf that's already in classpath (no extra resource
> loading),
> > >> and use it later in queries".
> > >>
> > >> IIUIC, to achieve that in 1.10, the following are currently the core
> > >> issues/blockers we should figure out, and solve them as our **highest
> > >> priority**:
> > >>
> > >> - what's the syntax to distinguish function language (java, scala,
> > python,
> > >> etc)? we only need to implement the java one in 1.10 but have to
> settle
> > >> down the long term solution
> > >> - how to persist function language in backend catalog? as a k-v pair
> in
> > >> properties map, or a dedicate field?
> > >> - do we really need to allow users set a properties map for udf? what
> > needs
> > >> to be stored there? what are they used for?
> > >> - should a catalog impl be able to decide whether it can take a
> > properties
> > >> map (if we decide to have one), and which language of a udf it can
> > persist?
> > >>   - E.g. Hive metastore, which backs Flink's HiveCatalog, cannot take
> a
> > >> properties map and is only able to persist java udf [1], unless we do
> > >> something hacky to it
> > >>
> > >> I feel these questions are essential to Flink functions in the long
> run,
> > >> but most importantly, are also the minimum scope for Flink 1.10.
> Aspects
> > >> like resource loading security or compatibility with Hive syntax are
> > >> important too, however if we focus on them now, we may not be able to
> > get
> > >> the MVP out in time.
> > >>
> > >> [1]
> > >> -
> > >>
> > >>
> >
> https://hive.apache.org/javadocs/r3.1.2/api/org/apache/hadoop/hive/metastore/api/Function.html
> > >> -
> > >>
> > >>
> >
> https://hive.apache.org/javadocs/r3.1.2/api/org/apache/hadoop/hive/metastore/api/FunctionType.html
> > >>
> > >>
> > >>
> > >> On Sun, Oct 27, 2019 at 8:22 PM Peter Huang <
> [hidden email]
> > >
> > >> wrote:
> > >>
> > >>> Hi Timo,
> > >>>
> > >>> Thanks for the feedback. I replied and adjust the design accordingly.
> > For
> > >>> the concern of class loading.
> > >>> I think we need to distinguish the function class loading for
> Temporary
> > >> and
> > >>> Permanent function.
> > >>>
> > >>> 1) For Permanent function, we can add it to the job graph so that we
> > >> don't
> > >>> need to load it multiple times for the different sessions.
> > >>> 2) For Temporary function, we can register function with a session
> key,
> > >> and
> > >>> use different class loaders in RuntimeContext implementation.
> > >>>
> > >>> I added more description in the doc. Please review it again.
> > >>>
> > >>>
> > >>> Best Regards
> > >>> Peter Huang
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> On Thu, Oct 24, 2019 at 2:14 AM Timo Walther <[hidden email]>
> > wrote:
> > >>>
> > >>>> Hi Peter,
> > >>>>
> > >>>> thanks for your proposal. I left some comments in the FLIP
> document. I
> > >>>> agree with Terry that we can have a MVP in Flink 1.10 but should
> > >> already
> > >>>> discuss the bigger picture as a DDL string cannot be changed easily
> > >> once
> > >>>> released.
> > >>>>
> > >>>> In particular we should discuss how resources for function are
> loaded.
> > >>>> If they are simply added to the JobGraph they are available to all
> > >>>> functions and could potentially interfere with each other, right?
> > >>>>
> > >>>> Thanks,
> > >>>> Timo
> > >>>>
> > >>>>
> > >>>>
> > >>>> On 24.10.19 05:32, Terry Wang wrote:
> > >>>>> Hi Peter,
> > >>>>>
> > >>>>> Sorry late to reply. Thanks for your efforts on this and I just
> > >> looked
> > >>>> through your design.
> > >>>>> I left some comments in the doc about alter function section and
> > >>>> function catalog interface.
> > >>>>> IMO, the overall design is ok and we can discuss further more about
> > >>> some
> > >>>> details.
> > >>>>> I also think it’s necessary to have this awesome feature limit to
> > >> basic
> > >>>> function (of course better to have all :) ) in 1.10 release.
> > >>>>>
> > >>>>> Best,
> > >>>>> Terry Wang
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>> 2019年10月16日 14:19,Peter Huang <[hidden email]> 写道:
> > >>>>>>
> > >>>>>> Hi Xuefu,
> > >>>>>>
> > >>>>>> Thank you for the feedback. I think you are pointing out a similar
> > >>>> concern
> > >>>>>> with Bowen. Let me describe
> > >>>>>> how the catalog function and function factory will be changed in
> the
> > >>>>>> implementation section.
> > >>>>>> Then, we can have more discussion in detail.
> > >>>>>>
> > >>>>>>
> > >>>>>> Best Regards
> > >>>>>> Peter Huang
> > >>>>>>
> > >>>>>> On Tue, Oct 15, 2019 at 4:18 PM Xuefu Z <[hidden email]>
> wrote:
> > >>>>>>
> > >>>>>>> Thanks to Peter for the proposal!
> > >>>>>>>
> > >>>>>>> I left some comments in the google doc. Besides what Bowen
> pointed
> > >>>> out, I'm
> > >>>>>>> unclear about how things  work end to end from the document. For
> > >>>> instance,
> > >>>>>>> SQL DDL-like function definition is mentioned. I guess just
> having
> > >> a
> > >>>> DDL
> > >>>>>>> for it doesn't explain how it's supported functionally. I think
> > >> it's
> > >>>> better
> > >>>>>>> to have some clarification on what is expected work and what's
> for
> > >>> the
> > >>>>>>> future.
> > >>>>>>>
> > >>>>>>> Thanks,
> > >>>>>>> Xuefu
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> On Tue, Oct 15, 2019 at 11:05 AM Bowen Li <[hidden email]>
> > >>> wrote:
> > >>>>>>>
> > >>>>>>>> Hi Zhenqiu,
> > >>>>>>>>
> > >>>>>>>> Thanks for taking on this effort!
> > >>>>>>>>
> > >>>>>>>> A couple questions:
> > >>>>>>>> - Though this FLIP is about function DDL, can we also think
> about
> > >>> how
> > >>>> the
> > >>>>>>>> created functions can be mapped to CatalogFunction and see if we
> > >>> need
> > >>>> to
> > >>>>>>>> modify CatalogFunction interface? Syntax changes need to be
> backed
> > >>> by
> > >>>> the
> > >>>>>>>> backend.
> > >>>>>>>> - Can we define a clearer, smaller scope targeting for Flink
> 1.10
> > >>>> among
> > >>>>>>> all
> > >>>>>>>> the proposed changes? The current overall scope seems to be
> quite
> > >>>> wide,
> > >>>>>>> and
> > >>>>>>>> it may be unrealistic to get everything in a single release, or
> > >>> even a
> > >>>>>>>> couple. However, I believe the most common user story can be
> > >>>> something as
> > >>>>>>>> simple as "being able to create and persist a java class-based
> udf
> > >>> and
> > >>>>>>> use
> > >>>>>>>> it later in queries", which will add great value for most Flink
> > >>> users
> > >>>> and
> > >>>>>>>> is achievable in 1.10.
> > >>>>>>>>
> > >>>>>>>> Bowen
> > >>>>>>>>
> > >>>>>>>> On Sun, Oct 13, 2019 at 10:46 PM Peter Huang <
> > >>>> [hidden email]
> > >>>>>>>>
> > >>>>>>>> wrote:
> > >>>>>>>>
> > >>>>>>>>> Dear Community,
> > >>>>>>>>>
> > >>>>>>>>> FLIP-79 Flink Function DDL Support
> > >>>>>>>>> <
> > >>>>>>>>>
> > >>>>>>>>
> > >>>>>>>
> > >>>>
> > >>>
> > >>
> >
> https://docs.google.com/document/d/16kkHlis80s61ifnIahCj-0IEdy5NJ1z-vGEJd_JuLog/edit#
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> This proposal aims to support function DDL with the
> consideration
> > >>> of
> > >>>>>>> SQL
> > >>>>>>>>> syntax, language compliance, and advanced external UDF lib
> > >>>>>>> registration.
> > >>>>>>>>> The Flink DDL is initialized and discussed in the design
> > >>>>>>>>> <
> > >>>>>>>>>
> > >>>>>>>>
> > >>>>>>>
> > >>>>
> > >>>
> > >>
> >
> https://docs.google.com/document/d/1TTP-GCC8wSsibJaSUyFZ_5NBAHYEB1FVmPpP7RgDGBA/edit#heading=h.wpsqidkaaoil
> > >>>>>>>>>>
> > >>>>>>>>> [1] by Shuyi Chen and Timo. As the initial discussion mainly
> > >>> focused
> > >>>> on
> > >>>>>>>> the
> > >>>>>>>>> table, type and view. FLIP-69 [2] extend it with a more
> detailed
> > >>>>>>>> discussion
> > >>>>>>>>> of DDL for catalog, database, and function. Original the
> function
> > >>> DDL
> > >>>>>>> was
> > >>>>>>>>> under the scope of FLIP-69. After some discussion
> > >>>>>>>>> <https://issues.apache.org/jira/browse/FLINK-7151> with the
> > >>>> community,
> > >>>>>>>> we
> > >>>>>>>>> found that there are several ongoing efforts, such as FLIP-64
> > >> [3],
> > >>>>>>>> FLIP-65
> > >>>>>>>>> [4], and FLIP-78 [5]. As they will directly impact the SQL
> syntax
> > >>> of
> > >>>>>>>>> function DDL, the proposal wants to describe the problem
> clearly
> > >>> with
> > >>>>>>> the
> > >>>>>>>>> consideration of existing works and make sure the design aligns
> > >>> with
> > >>>>>>>>> efforts of API change of temporary objects and type inference
> for
> > >>> UDF
> > >>>>>>>>> defined by different languages.
> > >>>>>>>>>
> > >>>>>>>>> The FlLIP outlines the requirements from related works, and
> > >>> propose a
> > >>>>>>> SQL
> > >>>>>>>>> syntax to meet those requirements. The corresponding
> > >> implementation
> > >>>> is
> > >>>>>>>> also
> > >>>>>>>>> discussed. Please kindly review and give feedback.
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> Best Regards
> > >>>>>>>>> Peter Huang
> > >>>>>>>>>
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> --
> > >>>>>>> Xuefu Zhang
> > >>>>>>>
> > >>>>>>> "In Honey We Trust!"
> > >>>>>>>
> > >>>>
> > >>>>
> > >>>
> > >>
> >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [Discussion] FLIP-79 Flink Function DDL Support

Peter Huang
Hi Bowen,

I revised the doc according to our existing agreement. In the
implementation section, TODO items are split into two parts.
Currently, we just want to have a basic implementation for Flink 1.10
release. Please take one more round of look.

Yes. The vote only for the section of Flink 1.10 release.


Best Regards
Peter Huang




On Fri, Nov 1, 2019 at 2:22 PM Bowen Li <[hidden email]> wrote:

> Re 1) I'd prefer syntax of [LANGUAGE JVM|PYTHON|...]. It's also adopted by
> Postgres [1] and MySQL [2]
>
> "USING 'python .....' " seems need extra parsing of the content in single
> quotes, which is not very ideal.
>
> Re 2) I agree.
>
> Besides, the doc proposes new field to be a string. I think it's better be
> an enum, say LanguangeType. JAVA and PYTHON can be the only values
> available for now.
>
> Re 3) I think we can re-evaluate the situation when requirements come, and
> can remove properties for now. Afterall, the interface can evolve. Please
> update the doc to remove the properties field.
>
>
> W.r.t voting, can we have a dedicated section for Flink 1.10 and include
> all the outcome we reached consensus so far? I think we are only gonna vote
> for that section, rather than the full FLIP-79, right?
>
> [1] https://www.postgresql.org/docs/9.5/sql-createfunction.html
> [2] https://dev.mysql.com/doc/refman/8.0/en/create-procedure.html
>
>
> On Thu, Oct 31, 2019 at 10:30 PM Peter Huang <[hidden email]>
> wrote:
>
> > Hi Terry,
> >
> > Thanks for the quick response. We are on the same page. For the
> > properties of function DDL, let's see whether there is such a need from
> > other people.
> > I will start voting on the design in 24 hours.
> >
> >
> > Best Regards
> > Peter Huang
> >
> >
> >
> >
> >
> >
> >
> > On Thu, Oct 31, 2019 at 3:18 AM Terry Wang <[hidden email]> wrote:
> >
> > > Hi Peter,
> > >
> > > I’d like to share some thoughts from mysids:
> > > 1. what's the syntax to distinguish function language ?
> > >         +1 for using `[LANGUAGE JVM|PYTHON] USING JAR`
> > > 2. How to persist function language in backend catalog ?
> > >         + 1 for a separate field in CatalogFunction. But as to specific
> > > backend, we may persist it case by case. Special case includes how
> > > HiveCatalog store the kind of CatalogFucnction.
> > > 3. do we really need to allow users set a properties map for a udf?
> > >     There are use case requiring passing external arguments to udf for
> > > sure, but the need can also be met by passing arguments to `eval` when
> > > calling udf in sql.
> > > IMO, there is not much need to support set properties map for a udf.
> > >
> > > 4. Should a catalog implement to be able to decide whether it can take
> a
> > > properties map, and which language of a udf it can persist?
> > > IMO, it’s necessary for catalog implementation to provide such
> > > information. But for flink 1.10 map goal, we can just skip this part.
> > >
> > >
> > >
> > > Best,
> > > Terry Wang
> > >
> > >
> > >
> > > > 2019年10月30日 13:52,Peter Huang <[hidden email]> 写道:
> > > >
> > > > Hi Bowen,
> > > >
> > > > I can't agree more about we first have an agreement on the DDL syntax
> > and
> > > > focus on the MVP in the current phase.
> > > >
> > > > 1) what's the syntax to distinguish function language
> > > > Currently, there are two opinions:
> > > >
> > > >   - USING 'python .....'
> > > >   - [LANGUAGE JVM|PYTHON] USING JAR '...'
> > > >
> > > > As we need to support multiple resources as HQL, we shouldn't repeat
> > the
> > > > language symbol as a suffix of each resource.
> > > > I would prefer option two, but definitely open to more comments.
> > > >
> > > > 2) How to persist function language in backend catalog? as a k-v pair
> > in
> > > > properties map, or a dedicate field?
> > > > Even though language type is also a property, I think a separate
> field
> > in
> > > > CatalogFunction is a more clean solution.
> > > >
> > > > 3) do we really need to allow users set a properties map for udf?
> what
> > > needs
> > > > to be stored there? what are they used for?
> > > >
> > > > I am considering a type of use case that use UDFS for realtime
> > inference.
> > > > The model is nested in the udf as a resource. But there are
> > > > multiple parameters are customizable. In this way, user can use
> > > properties
> > > > to define those parameters.
> > > >
> > > > I only have answers to these questions. For questions about the
> catalog
> > > > implementation, I hope we can collect more feedback from the
> community.
> > > >
> > > >
> > > > Best Regards
> > > > Peter Huang
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > Best Regards
> > > > Peter Huang
> > > >
> > > > On Tue, Oct 29, 2019 at 11:31 AM Bowen Li <[hidden email]>
> wrote:
> > > >
> > > >> Hi all,
> > > >>
> > > >> Besides all the good questions raised above, we seem all agree to
> > have a
> > > >> MVP for Flink 1.10, "to support users to create and persist a java
> > > >> class-based udf that's already in classpath (no extra resource
> > loading),
> > > >> and use it later in queries".
> > > >>
> > > >> IIUIC, to achieve that in 1.10, the following are currently the core
> > > >> issues/blockers we should figure out, and solve them as our
> **highest
> > > >> priority**:
> > > >>
> > > >> - what's the syntax to distinguish function language (java, scala,
> > > python,
> > > >> etc)? we only need to implement the java one in 1.10 but have to
> > settle
> > > >> down the long term solution
> > > >> - how to persist function language in backend catalog? as a k-v pair
> > in
> > > >> properties map, or a dedicate field?
> > > >> - do we really need to allow users set a properties map for udf?
> what
> > > needs
> > > >> to be stored there? what are they used for?
> > > >> - should a catalog impl be able to decide whether it can take a
> > > properties
> > > >> map (if we decide to have one), and which language of a udf it can
> > > persist?
> > > >>   - E.g. Hive metastore, which backs Flink's HiveCatalog, cannot
> take
> > a
> > > >> properties map and is only able to persist java udf [1], unless we
> do
> > > >> something hacky to it
> > > >>
> > > >> I feel these questions are essential to Flink functions in the long
> > run,
> > > >> but most importantly, are also the minimum scope for Flink 1.10.
> > Aspects
> > > >> like resource loading security or compatibility with Hive syntax are
> > > >> important too, however if we focus on them now, we may not be able
> to
> > > get
> > > >> the MVP out in time.
> > > >>
> > > >> [1]
> > > >> -
> > > >>
> > > >>
> > >
> >
> https://hive.apache.org/javadocs/r3.1.2/api/org/apache/hadoop/hive/metastore/api/Function.html
> > > >> -
> > > >>
> > > >>
> > >
> >
> https://hive.apache.org/javadocs/r3.1.2/api/org/apache/hadoop/hive/metastore/api/FunctionType.html
> > > >>
> > > >>
> > > >>
> > > >> On Sun, Oct 27, 2019 at 8:22 PM Peter Huang <
> > [hidden email]
> > > >
> > > >> wrote:
> > > >>
> > > >>> Hi Timo,
> > > >>>
> > > >>> Thanks for the feedback. I replied and adjust the design
> accordingly.
> > > For
> > > >>> the concern of class loading.
> > > >>> I think we need to distinguish the function class loading for
> > Temporary
> > > >> and
> > > >>> Permanent function.
> > > >>>
> > > >>> 1) For Permanent function, we can add it to the job graph so that
> we
> > > >> don't
> > > >>> need to load it multiple times for the different sessions.
> > > >>> 2) For Temporary function, we can register function with a session
> > key,
> > > >> and
> > > >>> use different class loaders in RuntimeContext implementation.
> > > >>>
> > > >>> I added more description in the doc. Please review it again.
> > > >>>
> > > >>>
> > > >>> Best Regards
> > > >>> Peter Huang
> > > >>>
> > > >>>
> > > >>>
> > > >>>
> > > >>> On Thu, Oct 24, 2019 at 2:14 AM Timo Walther <[hidden email]>
> > > wrote:
> > > >>>
> > > >>>> Hi Peter,
> > > >>>>
> > > >>>> thanks for your proposal. I left some comments in the FLIP
> > document. I
> > > >>>> agree with Terry that we can have a MVP in Flink 1.10 but should
> > > >> already
> > > >>>> discuss the bigger picture as a DDL string cannot be changed
> easily
> > > >> once
> > > >>>> released.
> > > >>>>
> > > >>>> In particular we should discuss how resources for function are
> > loaded.
> > > >>>> If they are simply added to the JobGraph they are available to all
> > > >>>> functions and could potentially interfere with each other, right?
> > > >>>>
> > > >>>> Thanks,
> > > >>>> Timo
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> On 24.10.19 05:32, Terry Wang wrote:
> > > >>>>> Hi Peter,
> > > >>>>>
> > > >>>>> Sorry late to reply. Thanks for your efforts on this and I just
> > > >> looked
> > > >>>> through your design.
> > > >>>>> I left some comments in the doc about alter function section and
> > > >>>> function catalog interface.
> > > >>>>> IMO, the overall design is ok and we can discuss further more
> about
> > > >>> some
> > > >>>> details.
> > > >>>>> I also think it’s necessary to have this awesome feature limit to
> > > >> basic
> > > >>>> function (of course better to have all :) ) in 1.10 release.
> > > >>>>>
> > > >>>>> Best,
> > > >>>>> Terry Wang
> > > >>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>>> 2019年10月16日 14:19,Peter Huang <[hidden email]> 写道:
> > > >>>>>>
> > > >>>>>> Hi Xuefu,
> > > >>>>>>
> > > >>>>>> Thank you for the feedback. I think you are pointing out a
> similar
> > > >>>> concern
> > > >>>>>> with Bowen. Let me describe
> > > >>>>>> how the catalog function and function factory will be changed in
> > the
> > > >>>>>> implementation section.
> > > >>>>>> Then, we can have more discussion in detail.
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> Best Regards
> > > >>>>>> Peter Huang
> > > >>>>>>
> > > >>>>>> On Tue, Oct 15, 2019 at 4:18 PM Xuefu Z <[hidden email]>
> > wrote:
> > > >>>>>>
> > > >>>>>>> Thanks to Peter for the proposal!
> > > >>>>>>>
> > > >>>>>>> I left some comments in the google doc. Besides what Bowen
> > pointed
> > > >>>> out, I'm
> > > >>>>>>> unclear about how things  work end to end from the document.
> For
> > > >>>> instance,
> > > >>>>>>> SQL DDL-like function definition is mentioned. I guess just
> > having
> > > >> a
> > > >>>> DDL
> > > >>>>>>> for it doesn't explain how it's supported functionally. I think
> > > >> it's
> > > >>>> better
> > > >>>>>>> to have some clarification on what is expected work and what's
> > for
> > > >>> the
> > > >>>>>>> future.
> > > >>>>>>>
> > > >>>>>>> Thanks,
> > > >>>>>>> Xuefu
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> On Tue, Oct 15, 2019 at 11:05 AM Bowen Li <[hidden email]
> >
> > > >>> wrote:
> > > >>>>>>>
> > > >>>>>>>> Hi Zhenqiu,
> > > >>>>>>>>
> > > >>>>>>>> Thanks for taking on this effort!
> > > >>>>>>>>
> > > >>>>>>>> A couple questions:
> > > >>>>>>>> - Though this FLIP is about function DDL, can we also think
> > about
> > > >>> how
> > > >>>> the
> > > >>>>>>>> created functions can be mapped to CatalogFunction and see if
> we
> > > >>> need
> > > >>>> to
> > > >>>>>>>> modify CatalogFunction interface? Syntax changes need to be
> > backed
> > > >>> by
> > > >>>> the
> > > >>>>>>>> backend.
> > > >>>>>>>> - Can we define a clearer, smaller scope targeting for Flink
> > 1.10
> > > >>>> among
> > > >>>>>>> all
> > > >>>>>>>> the proposed changes? The current overall scope seems to be
> > quite
> > > >>>> wide,
> > > >>>>>>> and
> > > >>>>>>>> it may be unrealistic to get everything in a single release,
> or
> > > >>> even a
> > > >>>>>>>> couple. However, I believe the most common user story can be
> > > >>>> something as
> > > >>>>>>>> simple as "being able to create and persist a java class-based
> > udf
> > > >>> and
> > > >>>>>>> use
> > > >>>>>>>> it later in queries", which will add great value for most
> Flink
> > > >>> users
> > > >>>> and
> > > >>>>>>>> is achievable in 1.10.
> > > >>>>>>>>
> > > >>>>>>>> Bowen
> > > >>>>>>>>
> > > >>>>>>>> On Sun, Oct 13, 2019 at 10:46 PM Peter Huang <
> > > >>>> [hidden email]
> > > >>>>>>>>
> > > >>>>>>>> wrote:
> > > >>>>>>>>
> > > >>>>>>>>> Dear Community,
> > > >>>>>>>>>
> > > >>>>>>>>> FLIP-79 Flink Function DDL Support
> > > >>>>>>>>> <
> > > >>>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>
> > > >>>>
> > > >>>
> > > >>
> > >
> >
> https://docs.google.com/document/d/16kkHlis80s61ifnIahCj-0IEdy5NJ1z-vGEJd_JuLog/edit#
> > > >>>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>> This proposal aims to support function DDL with the
> > consideration
> > > >>> of
> > > >>>>>>> SQL
> > > >>>>>>>>> syntax, language compliance, and advanced external UDF lib
> > > >>>>>>> registration.
> > > >>>>>>>>> The Flink DDL is initialized and discussed in the design
> > > >>>>>>>>> <
> > > >>>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>
> > > >>>>
> > > >>>
> > > >>
> > >
> >
> https://docs.google.com/document/d/1TTP-GCC8wSsibJaSUyFZ_5NBAHYEB1FVmPpP7RgDGBA/edit#heading=h.wpsqidkaaoil
> > > >>>>>>>>>>
> > > >>>>>>>>> [1] by Shuyi Chen and Timo. As the initial discussion mainly
> > > >>> focused
> > > >>>> on
> > > >>>>>>>> the
> > > >>>>>>>>> table, type and view. FLIP-69 [2] extend it with a more
> > detailed
> > > >>>>>>>> discussion
> > > >>>>>>>>> of DDL for catalog, database, and function. Original the
> > function
> > > >>> DDL
> > > >>>>>>> was
> > > >>>>>>>>> under the scope of FLIP-69. After some discussion
> > > >>>>>>>>> <https://issues.apache.org/jira/browse/FLINK-7151> with the
> > > >>>> community,
> > > >>>>>>>> we
> > > >>>>>>>>> found that there are several ongoing efforts, such as FLIP-64
> > > >> [3],
> > > >>>>>>>> FLIP-65
> > > >>>>>>>>> [4], and FLIP-78 [5]. As they will directly impact the SQL
> > syntax
> > > >>> of
> > > >>>>>>>>> function DDL, the proposal wants to describe the problem
> > clearly
> > > >>> with
> > > >>>>>>> the
> > > >>>>>>>>> consideration of existing works and make sure the design
> aligns
> > > >>> with
> > > >>>>>>>>> efforts of API change of temporary objects and type inference
> > for
> > > >>> UDF
> > > >>>>>>>>> defined by different languages.
> > > >>>>>>>>>
> > > >>>>>>>>> The FlLIP outlines the requirements from related works, and
> > > >>> propose a
> > > >>>>>>> SQL
> > > >>>>>>>>> syntax to meet those requirements. The corresponding
> > > >> implementation
> > > >>>> is
> > > >>>>>>>> also
> > > >>>>>>>>> discussed. Please kindly review and give feedback.
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>> Best Regards
> > > >>>>>>>>> Peter Huang
> > > >>>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> --
> > > >>>>>>> Xuefu Zhang
> > > >>>>>>>
> > > >>>>>>> "In Honey We Trust!"
> > > >>>>>>>
> > > >>>>
> > > >>>>
> > > >>>
> > > >>
> > >
> > >
> >
>