(DEPRECATED) Apache Flink Mailing List archive.

[DISCUSS] Include flink-ml-api and flink-ml-lib in opt

Classic

List

Threaded

20 messages Options

Hequn Cheng

[DISCUSS] Include flink-ml-api and flink-ml-lib in opt

Hi everyone,

FLIP-39[1] rebuilds Flink ML pipeline on top of TableAPI which moves Flink
ML a step further. Base on it, users can develop their ML jobs and more and
more machine learning platforms are providing ML services.

However, the problem now is the jars of flink-ml-api and flink-ml-lib are
only exist on maven repo. Whenever users want to submit ML jobs, they can
only depend on the ml modules and package a fat jar. This would be
inconvenient especially for the machine learning platforms on which nearly
all jobs depend on Flink ML modules and have to package a fat jar.

Given this, it would be better to include jars of flink-ml-api and
flink-ml-lib in the `opt` folder, so that users can directly use the jars
with the binary release. For example, users can move the jars into the
`lib` folder or use -j to upload the jars. (Currently, -j only support
upload one jar. Supporting multi jars for -j can be discussed in another
discussion.)

Putting the jars in the `opt` folder instead of the `lib` folder is because
currently, the ml jars are still optional for the Flink project by default.

What do you think? Welcome any feedback!

Best,

Hequn

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-39+Flink+ML+pipeline+and+ML+libs

jincheng sun

Re: [DISCUSS] Include flink-ml-api and flink-ml-lib in opt

Thanks for bring up this discussion Hequn!

+1 for include `flink-ml-api` and `flink-ml-lib` in opt.

BTW: I think would be great if bring up a discussion for upload multiple
Jars at the same time. as PyFlink JOB also can have the benefit if we do
that improvement.

Best,
Jincheng

Hequn Cheng <[hidden email]> 于2020年1月8日周三上午11:50写道：

> Hi everyone,
>
> FLIP-39[1] rebuilds Flink ML pipeline on top of TableAPI which moves Flink
> ML a step further. Base on it, users can develop their ML jobs and more and
> more machine learning platforms are providing ML services.
>
> However, the problem now is the jars of flink-ml-api and flink-ml-lib are
> only exist on maven repo. Whenever users want to submit ML jobs, they can
> only depend on the ml modules and package a fat jar. This would be
> inconvenient especially for the machine learning platforms on which nearly
> all jobs depend on Flink ML modules and have to package a fat jar.
>
> Given this, it would be better to include jars of flink-ml-api and
> flink-ml-lib in the `opt` folder, so that users can directly use the jars
> with the binary release. For example, users can move the jars into the
> `lib` folder or use -j to upload the jars. (Currently, -j only support
> upload one jar. Supporting multi jars for -j can be discussed in another
> discussion.)
>
> Putting the jars in the `opt` folder instead of the `lib` folder is because
> currently, the ml jars are still optional for the Flink project by default.
>
> What do you think? Welcome any feedback!
>
> Best,
>
> Hequn
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-39+Flink+ML+pipeline+and+ML+libs
>

Hequn Cheng

Re: [DISCUSS] Include flink-ml-api and flink-ml-lib in opt

Hi Jincheng,

Thanks a lot for your feedback!
Yes, I agree with you. There are cases that multi jars need to be uploaded.
I will prepare another discussion later. Maybe with a simple design doc.

Best, Hequn

On Wed, Jan 8, 2020 at 3:06 PM jincheng sun <[hidden email]>
wrote:

> Thanks for bring up this discussion Hequn!
>
> +1 for include `flink-ml-api` and `flink-ml-lib` in opt.
>
> BTW: I think would be great if bring up a discussion for upload multiple
> Jars at the same time. as PyFlink JOB also can have the benefit if we do
> that improvement.
>
> Best,
> Jincheng
>
>
> Hequn Cheng <[hidden email]> 于2020年1月8日周三上午11:50写道：
>
> > Hi everyone,
> >
> > FLIP-39[1] rebuilds Flink ML pipeline on top of TableAPI which moves
> Flink
> > ML a step further. Base on it, users can develop their ML jobs and more
> and
> > more machine learning platforms are providing ML services.
> >
> > However, the problem now is the jars of flink-ml-api and flink-ml-lib are
> > only exist on maven repo. Whenever users want to submit ML jobs, they can
> > only depend on the ml modules and package a fat jar. This would be
> > inconvenient especially for the machine learning platforms on which
> nearly
> > all jobs depend on Flink ML modules and have to package a fat jar.
> >
> > Given this, it would be better to include jars of flink-ml-api and
> > flink-ml-lib in the `opt` folder, so that users can directly use the jars
> > with the binary release. For example, users can move the jars into the
> > `lib` folder or use -j to upload the jars. (Currently, -j only support
> > upload one jar. Supporting multi jars for -j can be discussed in another
> > discussion.)
> >
> > Putting the jars in the `opt` folder instead of the `lib` folder is
> because
> > currently, the ml jars are still optional for the Flink project by
> default.
> >
> > What do you think? Welcome any feedback!
> >
> > Best,
> >
> > Hequn
> >
> > [1]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-39+Flink+ML+pipeline+and+ML+libs
> >
>

Hequn Cheng-2

Re: [DISCUSS] Include flink-ml-api and flink-ml-lib in opt

Hi everyone,

Thanks for the feedback. As there are no objections, I've opened a JIRA
issue(FLINK-15847[1]) to address this issue.
The implementation details can be discussed in the issue or in the
following PR.

Best,
Hequn

[1] https://issues.apache.org/jira/browse/FLINK-15847

On Wed, Jan 8, 2020 at 9:15 PM Hequn Cheng <[hidden email]> wrote:

> Hi Jincheng,
>
> Thanks a lot for your feedback!
> Yes, I agree with you. There are cases that multi jars need to be
> uploaded. I will prepare another discussion later. Maybe with a simple
> design doc.
>
> Best, Hequn
>
> On Wed, Jan 8, 2020 at 3:06 PM jincheng sun <[hidden email]>
> wrote:
>
>> Thanks for bring up this discussion Hequn!
>>
>> +1 for include `flink-ml-api` and `flink-ml-lib` in opt.
>>
>> BTW: I think would be great if bring up a discussion for upload multiple
>> Jars at the same time. as PyFlink JOB also can have the benefit if we do
>> that improvement.
>>
>> Best,
>> Jincheng
>>
>>
>> Hequn Cheng <[hidden email]> 于2020年1月8日周三上午11:50写道：
>>
>> > Hi everyone,
>> >
>> > FLIP-39[1] rebuilds Flink ML pipeline on top of TableAPI which moves
>> Flink
>> > ML a step further. Base on it, users can develop their ML jobs and more
>> and
>> > more machine learning platforms are providing ML services.
>> >
>> > However, the problem now is the jars of flink-ml-api and flink-ml-lib
>> are
>> > only exist on maven repo. Whenever users want to submit ML jobs, they
>> can
>> > only depend on the ml modules and package a fat jar. This would be
>> > inconvenient especially for the machine learning platforms on which
>> nearly
>> > all jobs depend on Flink ML modules and have to package a fat jar.
>> >
>> > Given this, it would be better to include jars of flink-ml-api and
>> > flink-ml-lib in the `opt` folder, so that users can directly use the
>> jars
>> > with the binary release. For example, users can move the jars into the
>> > `lib` folder or use -j to upload the jars. (Currently, -j only support
>> > upload one jar. Supporting multi jars for -j can be discussed in another
>> > discussion.)
>> >
>> > Putting the jars in the `opt` folder instead of the `lib` folder is
>> because
>> > currently, the ml jars are still optional for the Flink project by
>> default.
>> >
>> > What do you think? Welcome any feedback!
>> >
>> > Best,
>> >
>> > Hequn
>> >
>> > [1]
>> >
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-39+Flink+ML+pipeline+and+ML+libs
>> >
>>
>

jincheng sun

Re: [DISCUSS] Include flink-ml-api and flink-ml-lib in opt

Thank you for pushing forward @Hequn Cheng <[hidden email]> !

Hi @Becket Qin <[hidden email]> , Do you have any concerns on this ?

Best,
Jincheng

Hequn Cheng <[hidden email]> 于2020年2月3日周一下午2:09写道：

> Hi everyone,
>
> Thanks for the feedback. As there are no objections, I've opened a JIRA
> issue(FLINK-15847[1]) to address this issue.
> The implementation details can be discussed in the issue or in the
> following PR.
>
> Best,
> Hequn
>
> [1] https://issues.apache.org/jira/browse/FLINK-15847
>
> On Wed, Jan 8, 2020 at 9:15 PM Hequn Cheng <[hidden email]> wrote:
>
> > Hi Jincheng,
> >
> > Thanks a lot for your feedback!
> > Yes, I agree with you. There are cases that multi jars need to be
> > uploaded. I will prepare another discussion later. Maybe with a simple
> > design doc.
> >
> > Best, Hequn
> >
> > On Wed, Jan 8, 2020 at 3:06 PM jincheng sun <[hidden email]>
> > wrote:
> >
> >> Thanks for bring up this discussion Hequn!
> >>
> >> +1 for include `flink-ml-api` and `flink-ml-lib` in opt.
> >>
> >> BTW: I think would be great if bring up a discussion for upload multiple
> >> Jars at the same time. as PyFlink JOB also can have the benefit if we do
> >> that improvement.
> >>
> >> Best,
> >> Jincheng
> >>
> >>
> >> Hequn Cheng <[hidden email]> 于2020年1月8日周三上午11:50写道：
> >>
> >> > Hi everyone,
> >> >
> >> > FLIP-39[1] rebuilds Flink ML pipeline on top of TableAPI which moves
> >> Flink
> >> > ML a step further. Base on it, users can develop their ML jobs and
> more
> >> and
> >> > more machine learning platforms are providing ML services.
> >> >
> >> > However, the problem now is the jars of flink-ml-api and flink-ml-lib
> >> are
> >> > only exist on maven repo. Whenever users want to submit ML jobs, they
> >> can
> >> > only depend on the ml modules and package a fat jar. This would be
> >> > inconvenient especially for the machine learning platforms on which
> >> nearly
> >> > all jobs depend on Flink ML modules and have to package a fat jar.
> >> >
> >> > Given this, it would be better to include jars of flink-ml-api and
> >> > flink-ml-lib in the `opt` folder, so that users can directly use the
> >> jars
> >> > with the binary release. For example, users can move the jars into the
> >> > `lib` folder or use -j to upload the jars. (Currently, -j only support
> >> > upload one jar. Supporting multi jars for -j can be discussed in
> another
> >> > discussion.)
> >> >
> >> > Putting the jars in the `opt` folder instead of the `lib` folder is
> >> because
> >> > currently, the ml jars are still optional for the Flink project by
> >> default.
> >> >
> >> > What do you think? Welcome any feedback!
> >> >
> >> > Best,
> >> >
> >> > Hequn
> >> >
> >> > [1]
> >> >
> >> >
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-39+Flink+ML+pipeline+and+ML+libs
> >> >
> >>
> >
>

Becket Qin

Re: [DISCUSS] Include flink-ml-api and flink-ml-lib in opt

Thanks for bringing up the discussion, Hequn.

+1 on adding `flink-ml-api` and `flink-ml-lib` into opt. This would make it
much easier for the users to try out some simple ml tasks.

Thanks,

Jiangjie (Becket) Qin

On Mon, Feb 3, 2020 at 4:34 PM jincheng sun <[hidden email]>
wrote:

> Thank you for pushing forward @Hequn Cheng <[hidden email]> !
>
> Hi @Becket Qin <[hidden email]> , Do you have any concerns on this
> ?
>
> Best,
> Jincheng
>
> Hequn Cheng <[hidden email]> 于2020年2月3日周一下午2:09写道：
>
>> Hi everyone,
>>
>> Thanks for the feedback. As there are no objections, I've opened a JIRA
>> issue(FLINK-15847[1]) to address this issue.
>> The implementation details can be discussed in the issue or in the
>> following PR.
>>
>> Best,
>> Hequn
>>
>> [1] https://issues.apache.org/jira/browse/FLINK-15847
>>
>> On Wed, Jan 8, 2020 at 9:15 PM Hequn Cheng <[hidden email]> wrote:
>>
>> > Hi Jincheng,
>> >
>> > Thanks a lot for your feedback!
>> > Yes, I agree with you. There are cases that multi jars need to be
>> > uploaded. I will prepare another discussion later. Maybe with a simple
>> > design doc.
>> >
>> > Best, Hequn
>> >
>> > On Wed, Jan 8, 2020 at 3:06 PM jincheng sun <[hidden email]>
>> > wrote:
>> >
>> >> Thanks for bring up this discussion Hequn!
>> >>
>> >> +1 for include `flink-ml-api` and `flink-ml-lib` in opt.
>> >>
>> >> BTW: I think would be great if bring up a discussion for upload
>> multiple
>> >> Jars at the same time. as PyFlink JOB also can have the benefit if we
>> do
>> >> that improvement.
>> >>
>> >> Best,
>> >> Jincheng
>> >>
>> >>
>> >> Hequn Cheng <[hidden email]> 于2020年1月8日周三上午11:50写道：
>> >>
>> >> > Hi everyone,
>> >> >
>> >> > FLIP-39[1] rebuilds Flink ML pipeline on top of TableAPI which moves
>> >> Flink
>> >> > ML a step further. Base on it, users can develop their ML jobs and
>> more
>> >> and
>> >> > more machine learning platforms are providing ML services.
>> >> >
>> >> > However, the problem now is the jars of flink-ml-api and flink-ml-lib
>> >> are
>> >> > only exist on maven repo. Whenever users want to submit ML jobs, they
>> >> can
>> >> > only depend on the ml modules and package a fat jar. This would be
>> >> > inconvenient especially for the machine learning platforms on which
>> >> nearly
>> >> > all jobs depend on Flink ML modules and have to package a fat jar.
>> >> >
>> >> > Given this, it would be better to include jars of flink-ml-api and
>> >> > flink-ml-lib in the `opt` folder, so that users can directly use the
>> >> jars
>> >> > with the binary release. For example, users can move the jars into
>> the
>> >> > `lib` folder or use -j to upload the jars. (Currently, -j only
>> support
>> >> > upload one jar. Supporting multi jars for -j can be discussed in
>> another
>> >> > discussion.)
>> >> >
>> >> > Putting the jars in the `opt` folder instead of the `lib` folder is
>> >> because
>> >> > currently, the ml jars are still optional for the Flink project by
>> >> default.
>> >> >
>> >> > What do you think? Welcome any feedback!
>> >> >
>> >> > Best,
>> >> >
>> >> > Hequn
>> >> >
>> >> > [1]
>> >> >
>> >> >
>> >>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-39+Flink+ML+pipeline+and+ML+libs
>> >> >
>> >>
>> >
>>
>

Hequn Cheng-2

Re: [DISCUSS] Include flink-ml-api and flink-ml-lib in opt

Thank you all for your feedback and suggestions!

Best, Hequn

On Mon, Feb 3, 2020 at 5:07 PM Becket Qin <[hidden email]> wrote:

> Thanks for bringing up the discussion, Hequn.
>
> +1 on adding `flink-ml-api` and `flink-ml-lib` into opt. This would make
> it much easier for the users to try out some simple ml tasks.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On Mon, Feb 3, 2020 at 4:34 PM jincheng sun <[hidden email]>
> wrote:
>
>> Thank you for pushing forward @Hequn Cheng <[hidden email]> !
>>
>> Hi @Becket Qin <[hidden email]> , Do you have any concerns on
>> this ?
>>
>> Best,
>> Jincheng
>>
>> Hequn Cheng <[hidden email]> 于2020年2月3日周一下午2:09写道：
>>
>>> Hi everyone,
>>>
>>> Thanks for the feedback. As there are no objections, I've opened a JIRA
>>> issue(FLINK-15847[1]) to address this issue.
>>> The implementation details can be discussed in the issue or in the
>>> following PR.
>>>
>>> Best,
>>> Hequn
>>>
>>> [1] https://issues.apache.org/jira/browse/FLINK-15847
>>>
>>> On Wed, Jan 8, 2020 at 9:15 PM Hequn Cheng <[hidden email]> wrote:
>>>
>>> > Hi Jincheng,
>>> >
>>> > Thanks a lot for your feedback!
>>> > Yes, I agree with you. There are cases that multi jars need to be
>>> > uploaded. I will prepare another discussion later. Maybe with a simple
>>> > design doc.
>>> >
>>> > Best, Hequn
>>> >
>>> > On Wed, Jan 8, 2020 at 3:06 PM jincheng sun <[hidden email]>
>>> > wrote:
>>> >
>>> >> Thanks for bring up this discussion Hequn!
>>> >>
>>> >> +1 for include `flink-ml-api` and `flink-ml-lib` in opt.
>>> >>
>>> >> BTW: I think would be great if bring up a discussion for upload
>>> multiple
>>> >> Jars at the same time. as PyFlink JOB also can have the benefit if we
>>> do
>>> >> that improvement.
>>> >>
>>> >> Best,
>>> >> Jincheng
>>> >>
>>> >>
>>> >> Hequn Cheng <[hidden email]> 于2020年1月8日周三上午11:50写道：
>>> >>
>>> >> > Hi everyone,
>>> >> >
>>> >> > FLIP-39[1] rebuilds Flink ML pipeline on top of TableAPI which moves
>>> >> Flink
>>> >> > ML a step further. Base on it, users can develop their ML jobs and
>>> more
>>> >> and
>>> >> > more machine learning platforms are providing ML services.
>>> >> >
>>> >> > However, the problem now is the jars of flink-ml-api and
>>> flink-ml-lib
>>> >> are
>>> >> > only exist on maven repo. Whenever users want to submit ML jobs,
>>> they
>>> >> can
>>> >> > only depend on the ml modules and package a fat jar. This would be
>>> >> > inconvenient especially for the machine learning platforms on which
>>> >> nearly
>>> >> > all jobs depend on Flink ML modules and have to package a fat jar.
>>> >> >
>>> >> > Given this, it would be better to include jars of flink-ml-api and
>>> >> > flink-ml-lib in the `opt` folder, so that users can directly use the
>>> >> jars
>>> >> > with the binary release. For example, users can move the jars into
>>> the
>>> >> > `lib` folder or use -j to upload the jars. (Currently, -j only
>>> support
>>> >> > upload one jar. Supporting multi jars for -j can be discussed in
>>> another
>>> >> > discussion.)
>>> >> >
>>> >> > Putting the jars in the `opt` folder instead of the `lib` folder is
>>> >> because
>>> >> > currently, the ml jars are still optional for the Flink project by
>>> >> default.
>>> >> >
>>> >> > What do you think? Welcome any feedback!
>>> >> >
>>> >> > Best,
>>> >> >
>>> >> > Hequn
>>> >> >
>>> >> > [1]
>>> >> >
>>> >> >
>>> >>
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-39+Flink+ML+pipeline+and+ML+libs
>>> >> >
>>> >>
>>> >
>>>
>>

Till Rohrmann

Re: [DISCUSS] Include flink-ml-api and flink-ml-lib in opt

An alternative solution would be to offer the flink-ml libraries as
optional dependencies on the download page. Similar to how we offer the
different SQL formats and Hadoop releases [1].

[1] https://flink.apache.org/downloads.html

Cheers,
Till

On Mon, Feb 3, 2020 at 10:19 AM Hequn Cheng <[hidden email]> wrote:

> Thank you all for your feedback and suggestions!
>
> Best, Hequn
>
> On Mon, Feb 3, 2020 at 5:07 PM Becket Qin <[hidden email]> wrote:
>
> > Thanks for bringing up the discussion, Hequn.
> >
> > +1 on adding `flink-ml-api` and `flink-ml-lib` into opt. This would make
> > it much easier for the users to try out some simple ml tasks.
> >
> > Thanks,
> >
> > Jiangjie (Becket) Qin
> >
> > On Mon, Feb 3, 2020 at 4:34 PM jincheng sun <[hidden email]>
> > wrote:
> >
> >> Thank you for pushing forward @Hequn Cheng <[hidden email]> !
> >>
> >> Hi @Becket Qin <[hidden email]> , Do you have any concerns on
> >> this ?
> >>
> >> Best,
> >> Jincheng
> >>
> >> Hequn Cheng <[hidden email]> 于2020年2月3日周一下午2:09写道：
> >>
> >>> Hi everyone,
> >>>
> >>> Thanks for the feedback. As there are no objections, I've opened a JIRA
> >>> issue(FLINK-15847[1]) to address this issue.
> >>> The implementation details can be discussed in the issue or in the
> >>> following PR.
> >>>
> >>> Best,
> >>> Hequn
> >>>
> >>> [1] https://issues.apache.org/jira/browse/FLINK-15847
> >>>
> >>> On Wed, Jan 8, 2020 at 9:15 PM Hequn Cheng <[hidden email]>
> wrote:
> >>>
> >>> > Hi Jincheng,
> >>> >
> >>> > Thanks a lot for your feedback!
> >>> > Yes, I agree with you. There are cases that multi jars need to be
> >>> > uploaded. I will prepare another discussion later. Maybe with a
> simple
> >>> > design doc.
> >>> >
> >>> > Best, Hequn
> >>> >
> >>> > On Wed, Jan 8, 2020 at 3:06 PM jincheng sun <
> [hidden email]>
> >>> > wrote:
> >>> >
> >>> >> Thanks for bring up this discussion Hequn!
> >>> >>
> >>> >> +1 for include `flink-ml-api` and `flink-ml-lib` in opt.
> >>> >>
> >>> >> BTW: I think would be great if bring up a discussion for upload
> >>> multiple
> >>> >> Jars at the same time. as PyFlink JOB also can have the benefit if
> we
> >>> do
> >>> >> that improvement.
> >>> >>
> >>> >> Best,
> >>> >> Jincheng
> >>> >>
> >>> >>
> >>> >> Hequn Cheng <[hidden email]> 于2020年1月8日周三上午11:50写道：
> >>> >>
> >>> >> > Hi everyone,
> >>> >> >
> >>> >> > FLIP-39[1] rebuilds Flink ML pipeline on top of TableAPI which
> moves
> >>> >> Flink
> >>> >> > ML a step further. Base on it, users can develop their ML jobs and
> >>> more
> >>> >> and
> >>> >> > more machine learning platforms are providing ML services.
> >>> >> >
> >>> >> > However, the problem now is the jars of flink-ml-api and
> >>> flink-ml-lib
> >>> >> are
> >>> >> > only exist on maven repo. Whenever users want to submit ML jobs,
> >>> they
> >>> >> can
> >>> >> > only depend on the ml modules and package a fat jar. This would be
> >>> >> > inconvenient especially for the machine learning platforms on
> which
> >>> >> nearly
> >>> >> > all jobs depend on Flink ML modules and have to package a fat jar.
> >>> >> >
> >>> >> > Given this, it would be better to include jars of flink-ml-api and
> >>> >> > flink-ml-lib in the `opt` folder, so that users can directly use
> the
> >>> >> jars
> >>> >> > with the binary release. For example, users can move the jars into
> >>> the
> >>> >> > `lib` folder or use -j to upload the jars. (Currently, -j only
> >>> support
> >>> >> > upload one jar. Supporting multi jars for -j can be discussed in
> >>> another
> >>> >> > discussion.)
> >>> >> >
> >>> >> > Putting the jars in the `opt` folder instead of the `lib` folder
> is
> >>> >> because
> >>> >> > currently, the ml jars are still optional for the Flink project by
> >>> >> default.
> >>> >> >
> >>> >> > What do you think? Welcome any feedback!
> >>> >> >
> >>> >> > Best,
> >>> >> >
> >>> >> > Hequn
> >>> >> >
> >>> >> > [1]
> >>> >> >
> >>> >> >
> >>> >>
> >>>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-39+Flink+ML+pipeline+and+ML+libs
> >>> >> >
> >>> >>
> >>> >
> >>>
> >>
>

Hequn Cheng-2

Re: [DISCUSS] Include flink-ml-api and flink-ml-lib in opt

Hi Till,

Thanks a lot for your suggestion. It's a good idea to offer the flink-ml
libraries as optional dependencies on the download page which can make the
dist smaller.

But I also have some concerns for it, e.g., the download page now only
includes the latest 3 releases. We may need to find ways to support more
versions.
On the other hand, the size of the flink-ml libraries now is very
small(about 246K), so it would not bring much impact on the size of dist.

What do you think?

Best,
Hequn

On Mon, Feb 3, 2020 at 6:24 PM Till Rohrmann <[hidden email]> wrote:

> An alternative solution would be to offer the flink-ml libraries as
> optional dependencies on the download page. Similar to how we offer the
> different SQL formats and Hadoop releases [1].
>
> [1] https://flink.apache.org/downloads.html
>
> Cheers,
> Till
>
> On Mon, Feb 3, 2020 at 10:19 AM Hequn Cheng <[hidden email]> wrote:
>
> > Thank you all for your feedback and suggestions!
> >
> > Best, Hequn
> >
> > On Mon, Feb 3, 2020 at 5:07 PM Becket Qin <[hidden email]> wrote:
> >
> > > Thanks for bringing up the discussion, Hequn.
> > >
> > > +1 on adding `flink-ml-api` and `flink-ml-lib` into opt. This would
> make
> > > it much easier for the users to try out some simple ml tasks.
> > >
> > > Thanks,
> > >
> > > Jiangjie (Becket) Qin
> > >
> > > On Mon, Feb 3, 2020 at 4:34 PM jincheng sun <[hidden email]>
> > > wrote:
> > >
> > >> Thank you for pushing forward @Hequn Cheng <[hidden email]> !
> > >>
> > >> Hi @Becket Qin <[hidden email]> , Do you have any concerns on
> > >> this ?
> > >>
> > >> Best,
> > >> Jincheng
> > >>
> > >> Hequn Cheng <[hidden email]> 于2020年2月3日周一下午2:09写道：
> > >>
> > >>> Hi everyone,
> > >>>
> > >>> Thanks for the feedback. As there are no objections, I've opened a
> JIRA
> > >>> issue(FLINK-15847[1]) to address this issue.
> > >>> The implementation details can be discussed in the issue or in the
> > >>> following PR.
> > >>>
> > >>> Best,
> > >>> Hequn
> > >>>
> > >>> [1] https://issues.apache.org/jira/browse/FLINK-15847
> > >>>
> > >>> On Wed, Jan 8, 2020 at 9:15 PM Hequn Cheng <[hidden email]>
> > wrote:
> > >>>
> > >>> > Hi Jincheng,
> > >>> >
> > >>> > Thanks a lot for your feedback!
> > >>> > Yes, I agree with you. There are cases that multi jars need to be
> > >>> > uploaded. I will prepare another discussion later. Maybe with a
> > simple
> > >>> > design doc.
> > >>> >
> > >>> > Best, Hequn
> > >>> >
> > >>> > On Wed, Jan 8, 2020 at 3:06 PM jincheng sun <
> > [hidden email]>
> > >>> > wrote:
> > >>> >
> > >>> >> Thanks for bring up this discussion Hequn!
> > >>> >>
> > >>> >> +1 for include `flink-ml-api` and `flink-ml-lib` in opt.
> > >>> >>
> > >>> >> BTW: I think would be great if bring up a discussion for upload
> > >>> multiple
> > >>> >> Jars at the same time. as PyFlink JOB also can have the benefit if
> > we
> > >>> do
> > >>> >> that improvement.
> > >>> >>
> > >>> >> Best,
> > >>> >> Jincheng
> > >>> >>
> > >>> >>
> > >>> >> Hequn Cheng <[hidden email]> 于2020年1月8日周三上午11:50写道：
> > >>> >>
> > >>> >> > Hi everyone,
> > >>> >> >
> > >>> >> > FLIP-39[1] rebuilds Flink ML pipeline on top of TableAPI which
> > moves
> > >>> >> Flink
> > >>> >> > ML a step further. Base on it, users can develop their ML jobs
> and
> > >>> more
> > >>> >> and
> > >>> >> > more machine learning platforms are providing ML services.
> > >>> >> >
> > >>> >> > However, the problem now is the jars of flink-ml-api and
> > >>> flink-ml-lib
> > >>> >> are
> > >>> >> > only exist on maven repo. Whenever users want to submit ML jobs,
> > >>> they
> > >>> >> can
> > >>> >> > only depend on the ml modules and package a fat jar. This would
> be
> > >>> >> > inconvenient especially for the machine learning platforms on
> > which
> > >>> >> nearly
> > >>> >> > all jobs depend on Flink ML modules and have to package a fat
> jar.
> > >>> >> >
> > >>> >> > Given this, it would be better to include jars of flink-ml-api
> and
> > >>> >> > flink-ml-lib in the `opt` folder, so that users can directly use
> > the
> > >>> >> jars
> > >>> >> > with the binary release. For example, users can move the jars
> into
> > >>> the
> > >>> >> > `lib` folder or use -j to upload the jars. (Currently, -j only
> > >>> support
> > >>> >> > upload one jar. Supporting multi jars for -j can be discussed in
> > >>> another
> > >>> >> > discussion.)
> > >>> >> >
> > >>> >> > Putting the jars in the `opt` folder instead of the `lib` folder
> > is
> > >>> >> because
> > >>> >> > currently, the ml jars are still optional for the Flink project
> by
> > >>> >> default.
> > >>> >> >
> > >>> >> > What do you think? Welcome any feedback!
> > >>> >> >
> > >>> >> > Best,
> > >>> >> >
> > >>> >> > Hequn
> > >>> >> >
> > >>> >> > [1]
> > >>> >> >
> > >>> >> >
> > >>> >>
> > >>>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-39+Flink+ML+pipeline+and+ML+libs
> > >>> >> >
> > >>> >>
> > >>> >
> > >>>
> > >>
> >
>

Becket Qin

Re: [DISCUSS] Include flink-ml-api and flink-ml-lib in opt

Thanks for the suggestion, Till.

I am curious about how do we usually decide when to put the jars into the
opt folder?

Technically speaking, it seems that `flink-ml-api` should be put into the
opt directory because they are actually API instead of libraries, just like
CEP and Table.

`flink-ml-lib` seems to be on the border. On one hand, it is a library. On
the other hand, unlike SQL formats and Hadoop whose major code are outside
of Flink, the algorithm codes are in Flink. So `flink-ml-lib` is more like
those of built-in SQL UDFs. So it seems fine to either put it in the opt
folder or in the downloads page.

From the user experience perspective, it might be better to have both
`flink-ml-lib` and `flink-ml-api` in opt folder so users needn't go to two
places for the required dependencies.

Thanks,

Jiangjie (Becket) Qin

On Tue, Feb 4, 2020 at 2:32 PM Hequn Cheng <[hidden email]> wrote:

> Hi Till,
>
> Thanks a lot for your suggestion. It's a good idea to offer the flink-ml
> libraries as optional dependencies on the download page which can make the
> dist smaller.
>
> But I also have some concerns for it, e.g., the download page now only
> includes the latest 3 releases. We may need to find ways to support more
> versions.
> On the other hand, the size of the flink-ml libraries now is very
> small(about 246K), so it would not bring much impact on the size of dist.
>
> What do you think?
>
> Best,
> Hequn
>
> On Mon, Feb 3, 2020 at 6:24 PM Till Rohrmann <[hidden email]> wrote:
>
>> An alternative solution would be to offer the flink-ml libraries as
>> optional dependencies on the download page. Similar to how we offer the
>> different SQL formats and Hadoop releases [1].
>>
>> [1] https://flink.apache.org/downloads.html
>>
>> Cheers,
>> Till
>>
>> On Mon, Feb 3, 2020 at 10:19 AM Hequn Cheng <[hidden email]> wrote:
>>
>> > Thank you all for your feedback and suggestions!
>> >
>> > Best, Hequn
>> >
>> > On Mon, Feb 3, 2020 at 5:07 PM Becket Qin <[hidden email]> wrote:
>> >
>> > > Thanks for bringing up the discussion, Hequn.
>> > >
>> > > +1 on adding `flink-ml-api` and `flink-ml-lib` into opt. This would
>> make
>> > > it much easier for the users to try out some simple ml tasks.
>> > >
>> > > Thanks,
>> > >
>> > > Jiangjie (Becket) Qin
>> > >
>> > > On Mon, Feb 3, 2020 at 4:34 PM jincheng sun <[hidden email]
>> >
>> > > wrote:
>> > >
>> > >> Thank you for pushing forward @Hequn Cheng <[hidden email]> !
>> > >>
>> > >> Hi @Becket Qin <[hidden email]> , Do you have any concerns on
>> > >> this ?
>> > >>
>> > >> Best,
>> > >> Jincheng
>> > >>
>> > >> Hequn Cheng <[hidden email]> 于2020年2月3日周一下午2:09写道：
>> > >>
>> > >>> Hi everyone,
>> > >>>
>> > >>> Thanks for the feedback. As there are no objections, I've opened a
>> JIRA
>> > >>> issue(FLINK-15847[1]) to address this issue.
>> > >>> The implementation details can be discussed in the issue or in the
>> > >>> following PR.
>> > >>>
>> > >>> Best,
>> > >>> Hequn
>> > >>>
>> > >>> [1] https://issues.apache.org/jira/browse/FLINK-15847
>> > >>>
>> > >>> On Wed, Jan 8, 2020 at 9:15 PM Hequn Cheng <[hidden email]>
>> > wrote:
>> > >>>
>> > >>> > Hi Jincheng,
>> > >>> >
>> > >>> > Thanks a lot for your feedback!
>> > >>> > Yes, I agree with you. There are cases that multi jars need to be
>> > >>> > uploaded. I will prepare another discussion later. Maybe with a
>> > simple
>> > >>> > design doc.
>> > >>> >
>> > >>> > Best, Hequn
>> > >>> >
>> > >>> > On Wed, Jan 8, 2020 at 3:06 PM jincheng sun <
>> > [hidden email]>
>> > >>> > wrote:
>> > >>> >
>> > >>> >> Thanks for bring up this discussion Hequn!
>> > >>> >>
>> > >>> >> +1 for include `flink-ml-api` and `flink-ml-lib` in opt.
>> > >>> >>
>> > >>> >> BTW: I think would be great if bring up a discussion for upload
>> > >>> multiple
>> > >>> >> Jars at the same time. as PyFlink JOB also can have the benefit
>> if
>> > we
>> > >>> do
>> > >>> >> that improvement.
>> > >>> >>
>> > >>> >> Best,
>> > >>> >> Jincheng
>> > >>> >>
>> > >>> >>
>> > >>> >> Hequn Cheng <[hidden email]> 于2020年1月8日周三上午11:50写道：
>> > >>> >>
>> > >>> >> > Hi everyone,
>> > >>> >> >
>> > >>> >> > FLIP-39[1] rebuilds Flink ML pipeline on top of TableAPI which
>> > moves
>> > >>> >> Flink
>> > >>> >> > ML a step further. Base on it, users can develop their ML jobs
>> and
>> > >>> more
>> > >>> >> and
>> > >>> >> > more machine learning platforms are providing ML services.
>> > >>> >> >
>> > >>> >> > However, the problem now is the jars of flink-ml-api and
>> > >>> flink-ml-lib
>> > >>> >> are
>> > >>> >> > only exist on maven repo. Whenever users want to submit ML
>> jobs,
>> > >>> they
>> > >>> >> can
>> > >>> >> > only depend on the ml modules and package a fat jar. This
>> would be
>> > >>> >> > inconvenient especially for the machine learning platforms on
>> > which
>> > >>> >> nearly
>> > >>> >> > all jobs depend on Flink ML modules and have to package a fat
>> jar.
>> > >>> >> >
>> > >>> >> > Given this, it would be better to include jars of flink-ml-api
>> and
>> > >>> >> > flink-ml-lib in the `opt` folder, so that users can directly
>> use
>> > the
>> > >>> >> jars
>> > >>> >> > with the binary release. For example, users can move the jars
>> into
>> > >>> the
>> > >>> >> > `lib` folder or use -j to upload the jars. (Currently, -j only
>> > >>> support
>> > >>> >> > upload one jar. Supporting multi jars for -j can be discussed
>> in
>> > >>> another
>> > >>> >> > discussion.)
>> > >>> >> >
>> > >>> >> > Putting the jars in the `opt` folder instead of the `lib`
>> folder
>> > is
>> > >>> >> because
>> > >>> >> > currently, the ml jars are still optional for the Flink
>> project by
>> > >>> >> default.
>> > >>> >> >
>> > >>> >> > What do you think? Welcome any feedback!
>> > >>> >> >
>> > >>> >> > Best,
>> > >>> >> >
>> > >>> >> > Hequn
>> > >>> >> >
>> > >>> >> > [1]
>> > >>> >> >
>> > >>> >> >
>> > >>> >>
>> > >>>
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-39+Flink+ML+pipeline+and+ML+libs
>> > >>> >> >
>> > >>> >>
>> > >>> >
>> > >>>
>> > >>
>> >
>>
>

Till Rohrmann

Re: [DISCUSS] Include flink-ml-api and flink-ml-lib in opt

I think there is no such rule that APIs go automatically into opt/ and
"libraries" not. The contents of opt/ have mainly grown over time w/o
following a strict rule.

I think the decisive factor for what goes into Flink's binary distribution
should be how core it is to Flink. Of course another important
consideration is which use cases Flink should promote "out of the box" (not
sure whether this is actual true for content shipped in opt/ because you
also have to move it to lib).

For example, Gelly would be an example which I would rather see as an
optional component than shipping it with every Flink binary distribution.

Cheers,
Till

On Tue, Feb 4, 2020 at 11:24 AM Becket Qin <[hidden email]> wrote:

> Thanks for the suggestion, Till.
>
> I am curious about how do we usually decide when to put the jars into the
> opt folder?
>
> Technically speaking, it seems that `flink-ml-api` should be put into the
> opt directory because they are actually API instead of libraries, just like
> CEP and Table.
>
> `flink-ml-lib` seems to be on the border. On one hand, it is a library. On
> the other hand, unlike SQL formats and Hadoop whose major code are outside
> of Flink, the algorithm codes are in Flink. So `flink-ml-lib` is more like
> those of built-in SQL UDFs. So it seems fine to either put it in the opt
> folder or in the downloads page.
>
> From the user experience perspective, it might be better to have both
> `flink-ml-lib` and `flink-ml-api` in opt folder so users needn't go to two
> places for the required dependencies.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On Tue, Feb 4, 2020 at 2:32 PM Hequn Cheng <[hidden email]> wrote:
>
> > Hi Till,
> >
> > Thanks a lot for your suggestion. It's a good idea to offer the flink-ml
> > libraries as optional dependencies on the download page which can make
> the
> > dist smaller.
> >
> > But I also have some concerns for it, e.g., the download page now only
> > includes the latest 3 releases. We may need to find ways to support more
> > versions.
> > On the other hand, the size of the flink-ml libraries now is very
> > small(about 246K), so it would not bring much impact on the size of dist.
> >
> > What do you think?
> >
> > Best,
> > Hequn
> >
> > On Mon, Feb 3, 2020 at 6:24 PM Till Rohrmann <[hidden email]>
> wrote:
> >
> >> An alternative solution would be to offer the flink-ml libraries as
> >> optional dependencies on the download page. Similar to how we offer the
> >> different SQL formats and Hadoop releases [1].
> >>
> >> [1] https://flink.apache.org/downloads.html
> >>
> >> Cheers,
> >> Till
> >>
> >> On Mon, Feb 3, 2020 at 10:19 AM Hequn Cheng <[hidden email]> wrote:
> >>
> >> > Thank you all for your feedback and suggestions!
> >> >
> >> > Best, Hequn
> >> >
> >> > On Mon, Feb 3, 2020 at 5:07 PM Becket Qin <[hidden email]>
> wrote:
> >> >
> >> > > Thanks for bringing up the discussion, Hequn.
> >> > >
> >> > > +1 on adding `flink-ml-api` and `flink-ml-lib` into opt. This would
> >> make
> >> > > it much easier for the users to try out some simple ml tasks.
> >> > >
> >> > > Thanks,
> >> > >
> >> > > Jiangjie (Becket) Qin
> >> > >
> >> > > On Mon, Feb 3, 2020 at 4:34 PM jincheng sun <
> [hidden email]
> >> >
> >> > > wrote:
> >> > >
> >> > >> Thank you for pushing forward @Hequn Cheng <[hidden email]> !
> >> > >>
> >> > >> Hi @Becket Qin <[hidden email]> , Do you have any concerns
> on
> >> > >> this ?
> >> > >>
> >> > >> Best,
> >> > >> Jincheng
> >> > >>
> >> > >> Hequn Cheng <[hidden email]> 于2020年2月3日周一下午2:09写道：
> >> > >>
> >> > >>> Hi everyone,
> >> > >>>
> >> > >>> Thanks for the feedback. As there are no objections, I've opened a
> >> JIRA
> >> > >>> issue(FLINK-15847[1]) to address this issue.
> >> > >>> The implementation details can be discussed in the issue or in the
> >> > >>> following PR.
> >> > >>>
> >> > >>> Best,
> >> > >>> Hequn
> >> > >>>
> >> > >>> [1] https://issues.apache.org/jira/browse/FLINK-15847
> >> > >>>
> >> > >>> On Wed, Jan 8, 2020 at 9:15 PM Hequn Cheng <[hidden email]>
> >> > wrote:
> >> > >>>
> >> > >>> > Hi Jincheng,
> >> > >>> >
> >> > >>> > Thanks a lot for your feedback!
> >> > >>> > Yes, I agree with you. There are cases that multi jars need to
> be
> >> > >>> > uploaded. I will prepare another discussion later. Maybe with a
> >> > simple
> >> > >>> > design doc.
> >> > >>> >
> >> > >>> > Best, Hequn
> >> > >>> >
> >> > >>> > On Wed, Jan 8, 2020 at 3:06 PM jincheng sun <
> >> > [hidden email]>
> >> > >>> > wrote:
> >> > >>> >
> >> > >>> >> Thanks for bring up this discussion Hequn!
> >> > >>> >>
> >> > >>> >> +1 for include `flink-ml-api` and `flink-ml-lib` in opt.
> >> > >>> >>
> >> > >>> >> BTW: I think would be great if bring up a discussion for upload
> >> > >>> multiple
> >> > >>> >> Jars at the same time. as PyFlink JOB also can have the benefit
> >> if
> >> > we
> >> > >>> do
> >> > >>> >> that improvement.
> >> > >>> >>
> >> > >>> >> Best,
> >> > >>> >> Jincheng
> >> > >>> >>
> >> > >>> >>
> >> > >>> >> Hequn Cheng <[hidden email]> 于2020年1月8日周三上午11:50写道：
> >> > >>> >>
> >> > >>> >> > Hi everyone,
> >> > >>> >> >
> >> > >>> >> > FLIP-39[1] rebuilds Flink ML pipeline on top of TableAPI
> which
> >> > moves
> >> > >>> >> Flink
> >> > >>> >> > ML a step further. Base on it, users can develop their ML
> jobs
> >> and
> >> > >>> more
> >> > >>> >> and
> >> > >>> >> > more machine learning platforms are providing ML services.
> >> > >>> >> >
> >> > >>> >> > However, the problem now is the jars of flink-ml-api and
> >> > >>> flink-ml-lib
> >> > >>> >> are
> >> > >>> >> > only exist on maven repo. Whenever users want to submit ML
> >> jobs,
> >> > >>> they
> >> > >>> >> can
> >> > >>> >> > only depend on the ml modules and package a fat jar. This
> >> would be
> >> > >>> >> > inconvenient especially for the machine learning platforms on
> >> > which
> >> > >>> >> nearly
> >> > >>> >> > all jobs depend on Flink ML modules and have to package a fat
> >> jar.
> >> > >>> >> >
> >> > >>> >> > Given this, it would be better to include jars of
> flink-ml-api
> >> and
> >> > >>> >> > flink-ml-lib in the `opt` folder, so that users can directly
> >> use
> >> > the
> >> > >>> >> jars
> >> > >>> >> > with the binary release. For example, users can move the jars
> >> into
> >> > >>> the
> >> > >>> >> > `lib` folder or use -j to upload the jars. (Currently, -j
> only
> >> > >>> support
> >> > >>> >> > upload one jar. Supporting multi jars for -j can be discussed
> >> in
> >> > >>> another
> >> > >>> >> > discussion.)
> >> > >>> >> >
> >> > >>> >> > Putting the jars in the `opt` folder instead of the `lib`
> >> folder
> >> > is
> >> > >>> >> because
> >> > >>> >> > currently, the ml jars are still optional for the Flink
> >> project by
> >> > >>> >> default.
> >> > >>> >> >
> >> > >>> >> > What do you think? Welcome any feedback!
> >> > >>> >> >
> >> > >>> >> > Best,
> >> > >>> >> >
> >> > >>> >> > Hequn
> >> > >>> >> >
> >> > >>> >> > [1]
> >> > >>> >> >
> >> > >>> >> >
> >> > >>> >>
> >> > >>>
> >> >
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-39+Flink+ML+pipeline+and+ML+libs
> >> > >>> >> >
> >> > >>> >>
> >> > >>> >
> >> > >>>
> >> > >>
> >> >
> >>
> >
>

Chesnay Schepler-3

Re: [DISCUSS] Include flink-ml-api and flink-ml-lib in opt

Around a year ago I started a discussion
<http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/DISCUSS-Towards-a-leaner-flink-dist-tp25615.html>
on reducing the amount of jars we ship with the distribution.

While there was no definitive conclusion there was a shared sentiment
that APIs should be shipped with the distribution.

On 04/02/2020 17:25, Till Rohrmann wrote:

> I think there is no such rule that APIs go automatically into opt/ and
> "libraries" not. The contents of opt/ have mainly grown over time w/o
> following a strict rule.
>
> I think the decisive factor for what goes into Flink's binary distribution
> should be how core it is to Flink. Of course another important
> consideration is which use cases Flink should promote "out of the box" (not
> sure whether this is actual true for content shipped in opt/ because you
> also have to move it to lib).
>
> For example, Gelly would be an example which I would rather see as an
> optional component than shipping it with every Flink binary distribution.
>
> Cheers,
> Till
>
> On Tue, Feb 4, 2020 at 11:24 AM Becket Qin <[hidden email]> wrote:
>
>> Thanks for the suggestion, Till.
>>
>> I am curious about how do we usually decide when to put the jars into the
>> opt folder?
>>
>> Technically speaking, it seems that `flink-ml-api` should be put into the
>> opt directory because they are actually API instead of libraries, just like
>> CEP and Table.
>>
>> `flink-ml-lib` seems to be on the border. On one hand, it is a library. On
>> the other hand, unlike SQL formats and Hadoop whose major code are outside
>> of Flink, the algorithm codes are in Flink. So `flink-ml-lib` is more like
>> those of built-in SQL UDFs. So it seems fine to either put it in the opt
>> folder or in the downloads page.
>>
>> From the user experience perspective, it might be better to have both
>> `flink-ml-lib` and `flink-ml-api` in opt folder so users needn't go to two
>> places for the required dependencies.
>>
>> Thanks,
>>
>> Jiangjie (Becket) Qin
>>
>> On Tue, Feb 4, 2020 at 2:32 PM Hequn Cheng <[hidden email]> wrote:
>>
>>> Hi Till,
>>>
>>> Thanks a lot for your suggestion. It's a good idea to offer the flink-ml
>>> libraries as optional dependencies on the download page which can make
>> the
>>> dist smaller.
>>>
>>> But I also have some concerns for it, e.g., the download page now only
>>> includes the latest 3 releases. We may need to find ways to support more
>>> versions.
>>> On the other hand, the size of the flink-ml libraries now is very
>>> small(about 246K), so it would not bring much impact on the size of dist.
>>>
>>> What do you think?
>>>
>>> Best,
>>> Hequn
>>>
>>> On Mon, Feb 3, 2020 at 6:24 PM Till Rohrmann <[hidden email]>
>> wrote:
>>>> An alternative solution would be to offer the flink-ml libraries as
>>>> optional dependencies on the download page. Similar to how we offer the
>>>> different SQL formats and Hadoop releases [1].
>>>>
>>>> [1] https://flink.apache.org/downloads.html
>>>>
>>>> Cheers,
>>>> Till
>>>>
>>>> On Mon, Feb 3, 2020 at 10:19 AM Hequn Cheng <[hidden email]> wrote:
>>>>
>>>>> Thank you all for your feedback and suggestions!
>>>>>
>>>>> Best, Hequn
>>>>>
>>>>> On Mon, Feb 3, 2020 at 5:07 PM Becket Qin <[hidden email]>
>> wrote:
>>>>>> Thanks for bringing up the discussion, Hequn.
>>>>>>
>>>>>> +1 on adding `flink-ml-api` and `flink-ml-lib` into opt. This would
>>>> make
>>>>>> it much easier for the users to try out some simple ml tasks.
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Jiangjie (Becket) Qin
>>>>>>
>>>>>> On Mon, Feb 3, 2020 at 4:34 PM jincheng sun <
>> [hidden email]
>>>>>> wrote:
>>>>>>
>>>>>>> Thank you for pushing forward @Hequn Cheng <[hidden email]> !
>>>>>>>
>>>>>>> Hi @Becket Qin <[hidden email]> , Do you have any concerns
>> on
>>>>>>> this ?
>>>>>>>
>>>>>>> Best,
>>>>>>> Jincheng
>>>>>>>
>>>>>>> Hequn Cheng <[hidden email]> 于2020年2月3日周一下午2:09写道：
>>>>>>>
>>>>>>>> Hi everyone,
>>>>>>>>
>>>>>>>> Thanks for the feedback. As there are no objections, I've opened a
>>>> JIRA
>>>>>>>> issue(FLINK-15847[1]) to address this issue.
>>>>>>>> The implementation details can be discussed in the issue or in the
>>>>>>>> following PR.
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Hequn
>>>>>>>>
>>>>>>>> [1] https://issues.apache.org/jira/browse/FLINK-15847
>>>>>>>>
>>>>>>>> On Wed, Jan 8, 2020 at 9:15 PM Hequn Cheng <[hidden email]>
>>>>> wrote:
>>>>>>>>> Hi Jincheng,
>>>>>>>>>
>>>>>>>>> Thanks a lot for your feedback!
>>>>>>>>> Yes, I agree with you. There are cases that multi jars need to
>> be
>>>>>>>>> uploaded. I will prepare another discussion later. Maybe with a
>>>>> simple
>>>>>>>>> design doc.
>>>>>>>>>
>>>>>>>>> Best, Hequn
>>>>>>>>>
>>>>>>>>> On Wed, Jan 8, 2020 at 3:06 PM jincheng sun <
>>>>> [hidden email]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Thanks for bring up this discussion Hequn!
>>>>>>>>>>
>>>>>>>>>> +1 for include `flink-ml-api` and `flink-ml-lib` in opt.
>>>>>>>>>>
>>>>>>>>>> BTW: I think would be great if bring up a discussion for upload
>>>>>>>> multiple
>>>>>>>>>> Jars at the same time. as PyFlink JOB also can have the benefit
>>>> if
>>>>> we
>>>>>>>> do
>>>>>>>>>> that improvement.
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> Jincheng
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Hequn Cheng <[hidden email]> 于2020年1月8日周三上午11:50写道：
>>>>>>>>>>
>>>>>>>>>>> Hi everyone,
>>>>>>>>>>>
>>>>>>>>>>> FLIP-39[1] rebuilds Flink ML pipeline on top of TableAPI
>> which
>>>>> moves
>>>>>>>>>> Flink
>>>>>>>>>>> ML a step further. Base on it, users can develop their ML
>> jobs
>>>> and
>>>>>>>> more
>>>>>>>>>> and
>>>>>>>>>>> more machine learning platforms are providing ML services.
>>>>>>>>>>>
>>>>>>>>>>> However, the problem now is the jars of flink-ml-api and
>>>>>>>> flink-ml-lib
>>>>>>>>>> are
>>>>>>>>>>> only exist on maven repo. Whenever users want to submit ML
>>>> jobs,
>>>>>>>> they
>>>>>>>>>> can
>>>>>>>>>>> only depend on the ml modules and package a fat jar. This
>>>> would be
>>>>>>>>>>> inconvenient especially for the machine learning platforms on
>>>>> which
>>>>>>>>>> nearly
>>>>>>>>>>> all jobs depend on Flink ML modules and have to package a fat
>>>> jar.
>>>>>>>>>>> Given this, it would be better to include jars of
>> flink-ml-api
>>>> and
>>>>>>>>>>> flink-ml-lib in the `opt` folder, so that users can directly
>>>> use
>>>>> the
>>>>>>>>>> jars
>>>>>>>>>>> with the binary release. For example, users can move the jars
>>>> into
>>>>>>>> the
>>>>>>>>>>> `lib` folder or use -j to upload the jars. (Currently, -j
>> only
>>>>>>>> support
>>>>>>>>>>> upload one jar. Supporting multi jars for -j can be discussed
>>>> in
>>>>>>>> another
>>>>>>>>>>> discussion.)
>>>>>>>>>>>
>>>>>>>>>>> Putting the jars in the `opt` folder instead of the `lib`
>>>> folder
>>>>> is
>>>>>>>>>> because
>>>>>>>>>>> currently, the ml jars are still optional for the Flink
>>>> project by
>>>>>>>>>> default.
>>>>>>>>>>> What do you think? Welcome any feedback!
>>>>>>>>>>>
>>>>>>>>>>> Best,
>>>>>>>>>>>
>>>>>>>>>>> Hequn
>>>>>>>>>>>
>>>>>>>>>>> [1]
>>>>>>>>>>>
>>>>>>>>>>>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-39+Flink+ML+pipeline+and+ML+libs

Hequn Cheng-2

Re: [DISCUSS] Include flink-ml-api and flink-ml-lib in opt

Hi everyone,

Thank you all for the great inputs!

I think probably what we all agree on is we should try to make a leaner
flink-dist. However, we may also need to do some compromises considering
the user experience that users don't need to download the dependencies from
different places. Otherwise, we can move all the jars in the current opt
folder to the download page.

The missing of clear rules for guiding such compromises makes things more
complicated now. I would agree that the decisive factor for what goes into
Flink's binary distribution should be how core it is to Flink. Meanwhile,
it's better to treat Flink API as a (core) core to Flink. Not only it is a
very clear rule that easy to be followed but also in most cases, API is
very significant and deserved to be included in the dist.

Given this, it might make sense to put flink-ml-api and flink-ml-lib into
the opt.
What do you think?

Best,
Hequn

On Wed, Feb 5, 2020 at 12:39 AM Chesnay Schepler <[hidden email]> wrote:

> Around a year ago I started a discussion
> <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/DISCUSS-Towards-a-leaner-flink-dist-tp25615.html>
> on reducing the amount of jars we ship with the distribution.
>
> While there was no definitive conclusion there was a shared sentiment that
> APIs should be shipped with the distribution.
>
> On 04/02/2020 17:25, Till Rohrmann wrote:
>
> I think there is no such rule that APIs go automatically into opt/ and
> "libraries" not. The contents of opt/ have mainly grown over time w/o
> following a strict rule.
>
> I think the decisive factor for what goes into Flink's binary distribution
> should be how core it is to Flink. Of course another important
> consideration is which use cases Flink should promote "out of the box" (not
> sure whether this is actual true for content shipped in opt/ because you
> also have to move it to lib).
>
> For example, Gelly would be an example which I would rather see as an
> optional component than shipping it with every Flink binary distribution.
>
> Cheers,
> Till
>
> On Tue, Feb 4, 2020 at 11:24 AM Becket Qin <[hidden email]> <[hidden email]> wrote:
>
>
> Thanks for the suggestion, Till.
>
> I am curious about how do we usually decide when to put the jars into the
> opt folder?
>
> Technically speaking, it seems that `flink-ml-api` should be put into the
> opt directory because they are actually API instead of libraries, just like
> CEP and Table.
>
> `flink-ml-lib` seems to be on the border. On one hand, it is a library. On
> the other hand, unlike SQL formats and Hadoop whose major code are outside
> of Flink, the algorithm codes are in Flink. So `flink-ml-lib` is more like
> those of built-in SQL UDFs. So it seems fine to either put it in the opt
> folder or in the downloads page.
>
> From the user experience perspective, it might be better to have both
> `flink-ml-lib` and `flink-ml-api` in opt folder so users needn't go to two
> places for the required dependencies.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On Tue, Feb 4, 2020 at 2:32 PM Hequn Cheng <[hidden email]> <[hidden email]> wrote:
>
>
> Hi Till,
>
> Thanks a lot for your suggestion. It's a good idea to offer the flink-ml
> libraries as optional dependencies on the download page which can make
>
> the
>
> dist smaller.
>
> But I also have some concerns for it, e.g., the download page now only
> includes the latest 3 releases. We may need to find ways to support more
> versions.
> On the other hand, the size of the flink-ml libraries now is very
> small(about 246K), so it would not bring much impact on the size of dist.
>
> What do you think?
>
> Best,
> Hequn
>
> On Mon, Feb 3, 2020 at 6:24 PM Till Rohrmann <[hidden email]> <[hidden email]>
>
> wrote:
>
> An alternative solution would be to offer the flink-ml libraries as
> optional dependencies on the download page. Similar to how we offer the
> different SQL formats and Hadoop releases [1].
>
> [1] https://flink.apache.org/downloads.html
>
> Cheers,
> Till
>
> On Mon, Feb 3, 2020 at 10:19 AM Hequn Cheng <[hidden email]> <[hidden email]> wrote:
>
>
> Thank you all for your feedback and suggestions!
>
> Best, Hequn
>
> On Mon, Feb 3, 2020 at 5:07 PM Becket Qin <[hidden email]> <[hidden email]>
>
> wrote:
>
> Thanks for bringing up the discussion, Hequn.
>
> +1 on adding `flink-ml-api` and `flink-ml-lib` into opt. This would
>
> make
>
> it much easier for the users to try out some simple ml tasks.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On Mon, Feb 3, 2020 at 4:34 PM jincheng sun <
>
> [hidden email]
>
> wrote:
>
>
> Thank you for pushing forward @Hequn Cheng <[hidden email]> <[hidden email]> !
>
> Hi @Becket Qin <[hidden email]> <[hidden email]> , Do you have any concerns
>
> on
>
> this ?
>
> Best,
> Jincheng
>
> Hequn Cheng <[hidden email]> <[hidden email]> 于2020年2月3日周一下午2:09写道：
>
>
> Hi everyone,
>
> Thanks for the feedback. As there are no objections, I've opened a
>
> JIRA
>
> issue(FLINK-15847[1]) to address this issue.
> The implementation details can be discussed in the issue or in the
> following PR.
>
> Best,
> Hequn
>
> [1] https://issues.apache.org/jira/browse/FLINK-15847
>
> On Wed, Jan 8, 2020 at 9:15 PM Hequn Cheng <[hidden email]> <[hidden email]>
>
> wrote:
>
> Hi Jincheng,
>
> Thanks a lot for your feedback!
> Yes, I agree with you. There are cases that multi jars need to
>
> be
>
> uploaded. I will prepare another discussion later. Maybe with a
>
> simple
>
> design doc.
>
> Best, Hequn
>
> On Wed, Jan 8, 2020 at 3:06 PM jincheng sun <
>
> [hidden email]>
>
> wrote:
>
>
> Thanks for bring up this discussion Hequn!
>
> +1 for include `flink-ml-api` and `flink-ml-lib` in opt.
>
> BTW: I think would be great if bring up a discussion for upload
>
> multiple
>
> Jars at the same time. as PyFlink JOB also can have the benefit
>
> if
>
> we
>
> do
>
> that improvement.
>
> Best,
> Jincheng
>
>
> Hequn Cheng <[hidden email]> <[hidden email]> 于2020年1月8日周三上午11:50写道：
>
>
> Hi everyone,
>
> FLIP-39[1] rebuilds Flink ML pipeline on top of TableAPI
>
> which
>
> moves
>
> Flink
>
> ML a step further. Base on it, users can develop their ML
>
> jobs
>
> and
>
> more
>
> and
>
> more machine learning platforms are providing ML services.
>
> However, the problem now is the jars of flink-ml-api and
>
> flink-ml-lib
>
> are
>
> only exist on maven repo. Whenever users want to submit ML
>
> jobs,
>
> they
>
> can
>
> only depend on the ml modules and package a fat jar. This
>
> would be
>
> inconvenient especially for the machine learning platforms on
>
> which
>
> nearly
>
> all jobs depend on Flink ML modules and have to package a fat
>
> jar.
>
> Given this, it would be better to include jars of
>
> flink-ml-api
>
> and
>
> flink-ml-lib in the `opt` folder, so that users can directly
>
> use
>
> the
>
> jars
>
> with the binary release. For example, users can move the jars
>
> into
>
> the
>
> `lib` folder or use -j to upload the jars. (Currently, -j
>
> only
>
> support
>
> upload one jar. Supporting multi jars for -j can be discussed
>
> in
>
> another
>
> discussion.)
>
> Putting the jars in the `opt` folder instead of the `lib`
>
> folder
>
> is
>
> because
>
> currently, the ml jars are still optional for the Flink
>
> project by
>
> default.
>
> What do you think? Welcome any feedback!
>
> Best,
>
> Hequn
>
> [1]
>
>
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-39+Flink+ML+pipeline+and+ML+libs
>
>
>

Till Rohrmann

Re: [DISCUSS] Include flink-ml-api and flink-ml-lib in opt

I would not object given that it is rather small at the moment. However, I
also think that we should have a plan how to handle the ever growing Flink
ecosystem and how to make it easily accessible to our users. E.g. one far
fetched idea could be something like a configuration script which downloads
the required components for the user. But this deserves definitely a
separate discussion and does not really belong here.

Cheers,
Till

On Thu, Feb 6, 2020 at 3:35 PM Hequn Cheng <[hidden email]> wrote:

>
> Hi everyone,
>
> Thank you all for the great inputs!
>
> I think probably what we all agree on is we should try to make a leaner
> flink-dist. However, we may also need to do some compromises considering
> the user experience that users don't need to download the dependencies from
> different places. Otherwise, we can move all the jars in the current opt
> folder to the download page.
>
> The missing of clear rules for guiding such compromises makes things more
> complicated now. I would agree that the decisive factor for what goes into
> Flink's binary distribution should be how core it is to Flink. Meanwhile,
> it's better to treat Flink API as a (core) core to Flink. Not only it is a
> very clear rule that easy to be followed but also in most cases, API is
> very significant and deserved to be included in the dist.
>
> Given this, it might make sense to put flink-ml-api and flink-ml-lib into
> the opt.
> What do you think?
>
> Best,
> Hequn
>
> On Wed, Feb 5, 2020 at 12:39 AM Chesnay Schepler <[hidden email]>
> wrote:
>
>> Around a year ago I started a discussion
>> <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/DISCUSS-Towards-a-leaner-flink-dist-tp25615.html>
>> on reducing the amount of jars we ship with the distribution.
>>
>> While there was no definitive conclusion there was a shared sentiment
>> that APIs should be shipped with the distribution.
>>
>> On 04/02/2020 17:25, Till Rohrmann wrote:
>>
>> I think there is no such rule that APIs go automatically into opt/ and
>> "libraries" not. The contents of opt/ have mainly grown over time w/o
>> following a strict rule.
>>
>> I think the decisive factor for what goes into Flink's binary distribution
>> should be how core it is to Flink. Of course another important
>> consideration is which use cases Flink should promote "out of the box" (not
>> sure whether this is actual true for content shipped in opt/ because you
>> also have to move it to lib).
>>
>> For example, Gelly would be an example which I would rather see as an
>> optional component than shipping it with every Flink binary distribution.
>>
>> Cheers,
>> Till
>>
>> On Tue, Feb 4, 2020 at 11:24 AM Becket Qin <[hidden email]> <[hidden email]> wrote:
>>
>>
>> Thanks for the suggestion, Till.
>>
>> I am curious about how do we usually decide when to put the jars into the
>> opt folder?
>>
>> Technically speaking, it seems that `flink-ml-api` should be put into the
>> opt directory because they are actually API instead of libraries, just like
>> CEP and Table.
>>
>> `flink-ml-lib` seems to be on the border. On one hand, it is a library. On
>> the other hand, unlike SQL formats and Hadoop whose major code are outside
>> of Flink, the algorithm codes are in Flink. So `flink-ml-lib` is more like
>> those of built-in SQL UDFs. So it seems fine to either put it in the opt
>> folder or in the downloads page.
>>
>> From the user experience perspective, it might be better to have both
>> `flink-ml-lib` and `flink-ml-api` in opt folder so users needn't go to two
>> places for the required dependencies.
>>
>> Thanks,
>>
>> Jiangjie (Becket) Qin
>>
>> On Tue, Feb 4, 2020 at 2:32 PM Hequn Cheng <[hidden email]> <[hidden email]> wrote:
>>
>>
>> Hi Till,
>>
>> Thanks a lot for your suggestion. It's a good idea to offer the flink-ml
>> libraries as optional dependencies on the download page which can make
>>
>> the
>>
>> dist smaller.
>>
>> But I also have some concerns for it, e.g., the download page now only
>> includes the latest 3 releases. We may need to find ways to support more
>> versions.
>> On the other hand, the size of the flink-ml libraries now is very
>> small(about 246K), so it would not bring much impact on the size of dist.
>>
>> What do you think?
>>
>> Best,
>> Hequn
>>
>> On Mon, Feb 3, 2020 at 6:24 PM Till Rohrmann <[hidden email]> <[hidden email]>
>>
>> wrote:
>>
>> An alternative solution would be to offer the flink-ml libraries as
>> optional dependencies on the download page. Similar to how we offer the
>> different SQL formats and Hadoop releases [1].
>>
>> [1] https://flink.apache.org/downloads.html
>>
>> Cheers,
>> Till
>>
>> On Mon, Feb 3, 2020 at 10:19 AM Hequn Cheng <[hidden email]> <[hidden email]> wrote:
>>
>>
>> Thank you all for your feedback and suggestions!
>>
>> Best, Hequn
>>
>> On Mon, Feb 3, 2020 at 5:07 PM Becket Qin <[hidden email]> <[hidden email]>
>>
>> wrote:
>>
>> Thanks for bringing up the discussion, Hequn.
>>
>> +1 on adding `flink-ml-api` and `flink-ml-lib` into opt. This would
>>
>> make
>>
>> it much easier for the users to try out some simple ml tasks.
>>
>> Thanks,
>>
>> Jiangjie (Becket) Qin
>>
>> On Mon, Feb 3, 2020 at 4:34 PM jincheng sun <
>>
>> [hidden email]
>>
>> wrote:
>>
>>
>> Thank you for pushing forward @Hequn Cheng <[hidden email]> <[hidden email]> !
>>
>> Hi @Becket Qin <[hidden email]> <[hidden email]> , Do you have any concerns
>>
>> on
>>
>> this ?
>>
>> Best,
>> Jincheng
>>
>> Hequn Cheng <[hidden email]> <[hidden email]> 于2020年2月3日周一下午2:09写道：
>>
>>
>> Hi everyone,
>>
>> Thanks for the feedback. As there are no objections, I've opened a
>>
>> JIRA
>>
>> issue(FLINK-15847[1]) to address this issue.
>> The implementation details can be discussed in the issue or in the
>> following PR.
>>
>> Best,
>> Hequn
>>
>> [1] https://issues.apache.org/jira/browse/FLINK-15847
>>
>> On Wed, Jan 8, 2020 at 9:15 PM Hequn Cheng <[hidden email]> <[hidden email]>
>>
>> wrote:
>>
>> Hi Jincheng,
>>
>> Thanks a lot for your feedback!
>> Yes, I agree with you. There are cases that multi jars need to
>>
>> be
>>
>> uploaded. I will prepare another discussion later. Maybe with a
>>
>> simple
>>
>> design doc.
>>
>> Best, Hequn
>>
>> On Wed, Jan 8, 2020 at 3:06 PM jincheng sun <
>>
>> [hidden email]>
>>
>> wrote:
>>
>>
>> Thanks for bring up this discussion Hequn!
>>
>> +1 for include `flink-ml-api` and `flink-ml-lib` in opt.
>>
>> BTW: I think would be great if bring up a discussion for upload
>>
>> multiple
>>
>> Jars at the same time. as PyFlink JOB also can have the benefit
>>
>> if
>>
>> we
>>
>> do
>>
>> that improvement.
>>
>> Best,
>> Jincheng
>>
>>
>> Hequn Cheng <[hidden email]> <[hidden email]> 于2020年1月8日周三上午11:50写道：
>>
>>
>> Hi everyone,
>>
>> FLIP-39[1] rebuilds Flink ML pipeline on top of TableAPI
>>
>> which
>>
>> moves
>>
>> Flink
>>
>> ML a step further. Base on it, users can develop their ML
>>
>> jobs
>>
>> and
>>
>> more
>>
>> and
>>
>> more machine learning platforms are providing ML services.
>>
>> However, the problem now is the jars of flink-ml-api and
>>
>> flink-ml-lib
>>
>> are
>>
>> only exist on maven repo. Whenever users want to submit ML
>>
>> jobs,
>>
>> they
>>
>> can
>>
>> only depend on the ml modules and package a fat jar. This
>>
>> would be
>>
>> inconvenient especially for the machine learning platforms on
>>
>> which
>>
>> nearly
>>
>> all jobs depend on Flink ML modules and have to package a fat
>>
>> jar.
>>
>> Given this, it would be better to include jars of
>>
>> flink-ml-api
>>
>> and
>>
>> flink-ml-lib in the `opt` folder, so that users can directly
>>
>> use
>>
>> the
>>
>> jars
>>
>> with the binary release. For example, users can move the jars
>>
>> into
>>
>> the
>>
>> `lib` folder or use -j to upload the jars. (Currently, -j
>>
>> only
>>
>> support
>>
>> upload one jar. Supporting multi jars for -j can be discussed
>>
>> in
>>
>> another
>>
>> discussion.)
>>
>> Putting the jars in the `opt` folder instead of the `lib`
>>
>> folder
>>
>> is
>>
>> because
>>
>> currently, the ml jars are still optional for the Flink
>>
>> project by
>>
>> default.
>>
>> What do you think? Welcome any feedback!
>>
>> Best,
>>
>> Hequn
>>
>> [1]
>>
>>
>>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-39+Flink+ML+pipeline+and+ML+libs
>>
>>
>>

Jeff Zhang

Re: [DISCUSS] Include flink-ml-api and flink-ml-lib in opt

I have another concern which may not be closely related to this thread.
Since flink doesn't include all the necessary jars, I think it is critical
for flink to display meaningful error message when any class is missing.
e.g. Here's the error message when I use kafka but miss
including flink-json. To be honest, the kind of error message is hard to
understand for new users.

Reason: No factory implements
'org.apache.flink.table.factories.DeserializationSchemaFactory'. The
following properties are requested:
connector.properties.bootstrap.servers=localhost:9092
connector.properties.group.id=testGroup
connector.properties.zookeeper.connect=localhost:2181
connector.startup-mode=earliest-offset connector.topic=generated.events
connector.type=kafka connector.version=universal format.type=json
schema.0.data-type=VARCHAR(2147483647) schema.0.name=status
schema.1.data-type=VARCHAR(2147483647) schema.1.name=direction
schema.2.data-type=BIGINT schema.2.name=event_ts update-mode=append The
following factories have been considered:
org.apache.flink.table.catalog.hive.factories.HiveCatalogFactory
org.apache.flink.table.module.hive.HiveModuleFactory
org.apache.flink.table.module.CoreModuleFactory
org.apache.flink.table.catalog.GenericInMemoryCatalogFactory
org.apache.flink.table.sources.CsvBatchTableSourceFactory
org.apache.flink.table.sources.CsvAppendTableSourceFactory
org.apache.flink.table.sinks.CsvBatchTableSinkFactory
org.apache.flink.table.sinks.CsvAppendTableSinkFactory
org.apache.flink.table.planner.delegation.BlinkPlannerFactory
org.apache.flink.table.planner.delegation.BlinkExecutorFactory
org.apache.flink.table.planner.StreamPlannerFactory
org.apache.flink.table.executor.StreamExecutorFactory
org.apache.flink.streaming.connectors.kafka.KafkaTableSourceSinkFactory at
org.apache.flink.table.factories.TableFactoryService.filterByFactoryClass(TableFactoryService.java:238)
at
org.apache.flink.table.factories.TableFactoryService.filter(TableFactoryService.java:185)
at
org.apache.flink.table.factories.TableFactoryService.findSingleInternal(TableFactoryService.java:143)
at
org.apache.flink.table.factories.TableFactoryService.find(TableFactoryService.java:113)
at
org.apache.flink.streaming.connectors.kafka.KafkaTableSourceSinkFactoryBase.getDeserializationSchema(KafkaTableSourceSinkFactoryBase.java:277)
at
org.apache.flink.streaming.connectors.kafka.KafkaTableSourceSinkFactoryBase.createStreamTableSource(KafkaTableSourceSinkFactoryBase.java:161)
at
org.apache.flink.table.factories.StreamTableSourceFactory.createTableSource(StreamTableSourceFactory.java:49)
at
org.apache.flink.table.factories.TableFactoryUtil.findAndCreateTableSource(TableFactoryUtil.java:53)
... 36 more

Till Rohrmann <[hidden email]> 于2020年2月6日周四下午11:30写道：

> I would not object given that it is rather small at the moment. However, I
> also think that we should have a plan how to handle the ever growing Flink
> ecosystem and how to make it easily accessible to our users. E.g. one far
> fetched idea could be something like a configuration script which downloads
> the required components for the user. But this deserves definitely a
> separate discussion and does not really belong here.
>
> Cheers,
> Till
>
> On Thu, Feb 6, 2020 at 3:35 PM Hequn Cheng <[hidden email]> wrote:
>
> >
> > Hi everyone,
> >
> > Thank you all for the great inputs!
> >
> > I think probably what we all agree on is we should try to make a leaner
> > flink-dist. However, we may also need to do some compromises considering
> > the user experience that users don't need to download the dependencies
> from
> > different places. Otherwise, we can move all the jars in the current opt
> > folder to the download page.
> >
> > The missing of clear rules for guiding such compromises makes things more
> > complicated now. I would agree that the decisive factor for what goes
> into
> > Flink's binary distribution should be how core it is to Flink. Meanwhile,
> > it's better to treat Flink API as a (core) core to Flink. Not only it is
> a
> > very clear rule that easy to be followed but also in most cases, API is
> > very significant and deserved to be included in the dist.
> >
> > Given this, it might make sense to put flink-ml-api and flink-ml-lib into
> > the opt.
> > What do you think?
> >
> > Best,
> > Hequn
> >
> > On Wed, Feb 5, 2020 at 12:39 AM Chesnay Schepler <[hidden email]>
> > wrote:
> >
> >> Around a year ago I started a discussion
> >> <
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/DISCUSS-Towards-a-leaner-flink-dist-tp25615.html
> >
> >> on reducing the amount of jars we ship with the distribution.
> >>
> >> While there was no definitive conclusion there was a shared sentiment
> >> that APIs should be shipped with the distribution.
> >>
> >> On 04/02/2020 17:25, Till Rohrmann wrote:
> >>
> >> I think there is no such rule that APIs go automatically into opt/ and
> >> "libraries" not. The contents of opt/ have mainly grown over time w/o
> >> following a strict rule.
> >>
> >> I think the decisive factor for what goes into Flink's binary
> distribution
> >> should be how core it is to Flink. Of course another important
> >> consideration is which use cases Flink should promote "out of the box"
> (not
> >> sure whether this is actual true for content shipped in opt/ because you
> >> also have to move it to lib).
> >>
> >> For example, Gelly would be an example which I would rather see as an
> >> optional component than shipping it with every Flink binary
> distribution.
> >>
> >> Cheers,
> >> Till
> >>
> >> On Tue, Feb 4, 2020 at 11:24 AM Becket Qin <[hidden email]> <
> [hidden email]> wrote:
> >>
> >>
> >> Thanks for the suggestion, Till.
> >>
> >> I am curious about how do we usually decide when to put the jars into
> the
> >> opt folder?
> >>
> >> Technically speaking, it seems that `flink-ml-api` should be put into
> the
> >> opt directory because they are actually API instead of libraries, just
> like
> >> CEP and Table.
> >>
> >> `flink-ml-lib` seems to be on the border. On one hand, it is a library.
> On
> >> the other hand, unlike SQL formats and Hadoop whose major code are
> outside
> >> of Flink, the algorithm codes are in Flink. So `flink-ml-lib` is more
> like
> >> those of built-in SQL UDFs. So it seems fine to either put it in the opt
> >> folder or in the downloads page.
> >>
> >> From the user experience perspective, it might be better to have both
> >> `flink-ml-lib` and `flink-ml-api` in opt folder so users needn't go to
> two
> >> places for the required dependencies.
> >>
> >> Thanks,
> >>
> >> Jiangjie (Becket) Qin
> >>
> >> On Tue, Feb 4, 2020 at 2:32 PM Hequn Cheng <[hidden email]> <
> [hidden email]> wrote:
> >>
> >>
> >> Hi Till,
> >>
> >> Thanks a lot for your suggestion. It's a good idea to offer the flink-ml
> >> libraries as optional dependencies on the download page which can make
> >>
> >> the
> >>
> >> dist smaller.
> >>
> >> But I also have some concerns for it, e.g., the download page now only
> >> includes the latest 3 releases. We may need to find ways to support more
> >> versions.
> >> On the other hand, the size of the flink-ml libraries now is very
> >> small(about 246K), so it would not bring much impact on the size of
> dist.
> >>
> >> What do you think?
> >>
> >> Best,
> >> Hequn
> >>
> >> On Mon, Feb 3, 2020 at 6:24 PM Till Rohrmann <[hidden email]> <
> [hidden email]>
> >>
> >> wrote:
> >>
> >> An alternative solution would be to offer the flink-ml libraries as
> >> optional dependencies on the download page. Similar to how we offer the
> >> different SQL formats and Hadoop releases [1].
> >>
> >> [1] https://flink.apache.org/downloads.html
> >>
> >> Cheers,
> >> Till
> >>
> >> On Mon, Feb 3, 2020 at 10:19 AM Hequn Cheng <[hidden email]> <
> [hidden email]> wrote:
> >>
> >>
> >> Thank you all for your feedback and suggestions!
> >>
> >> Best, Hequn
> >>
> >> On Mon, Feb 3, 2020 at 5:07 PM Becket Qin <[hidden email]> <
> [hidden email]>
> >>
> >> wrote:
> >>
> >> Thanks for bringing up the discussion, Hequn.
> >>
> >> +1 on adding `flink-ml-api` and `flink-ml-lib` into opt. This would
> >>
> >> make
> >>
> >> it much easier for the users to try out some simple ml tasks.
> >>
> >> Thanks,
> >>
> >> Jiangjie (Becket) Qin
> >>
> >> On Mon, Feb 3, 2020 at 4:34 PM jincheng sun <
> >>
> >> [hidden email]
> >>
> >> wrote:
> >>
> >>
> >> Thank you for pushing forward @Hequn Cheng <[hidden email]> <
> [hidden email]> !
> >>
> >> Hi @Becket Qin <[hidden email]> <[hidden email]> , Do you
> have any concerns
> >>
> >> on
> >>
> >> this ?
> >>
> >> Best,
> >> Jincheng
> >>
> >> Hequn Cheng <[hidden email]> <[hidden email]> 于2020年2月3日周一
> 下午2:09写道：
> >>
> >>
> >> Hi everyone,
> >>
> >> Thanks for the feedback. As there are no objections, I've opened a
> >>
> >> JIRA
> >>
> >> issue(FLINK-15847[1]) to address this issue.
> >> The implementation details can be discussed in the issue or in the
> >> following PR.
> >>
> >> Best,
> >> Hequn
> >>
> >> [1] https://issues.apache.org/jira/browse/FLINK-15847
> >>
> >> On Wed, Jan 8, 2020 at 9:15 PM Hequn Cheng <[hidden email]> <
> [hidden email]>
> >>
> >> wrote:
> >>
> >> Hi Jincheng,
> >>
> >> Thanks a lot for your feedback!
> >> Yes, I agree with you. There are cases that multi jars need to
> >>
> >> be
> >>
> >> uploaded. I will prepare another discussion later. Maybe with a
> >>
> >> simple
> >>
> >> design doc.
> >>
> >> Best, Hequn
> >>
> >> On Wed, Jan 8, 2020 at 3:06 PM jincheng sun <
> >>
> >> [hidden email]>
> >>
> >> wrote:
> >>
> >>
> >> Thanks for bring up this discussion Hequn!
> >>
> >> +1 for include `flink-ml-api` and `flink-ml-lib` in opt.
> >>
> >> BTW: I think would be great if bring up a discussion for upload
> >>
> >> multiple
> >>
> >> Jars at the same time. as PyFlink JOB also can have the benefit
> >>
> >> if
> >>
> >> we
> >>
> >> do
> >>
> >> that improvement.
> >>
> >> Best,
> >> Jincheng
> >>
> >>
> >> Hequn Cheng <[hidden email]> <[hidden email]> 于2020年1月8日周三
> 上午11:50写道：
> >>
> >>
> >> Hi everyone,
> >>
> >> FLIP-39[1] rebuilds Flink ML pipeline on top of TableAPI
> >>
> >> which
> >>
> >> moves
> >>
> >> Flink
> >>
> >> ML a step further. Base on it, users can develop their ML
> >>
> >> jobs
> >>
> >> and
> >>
> >> more
> >>
> >> and
> >>
> >> more machine learning platforms are providing ML services.
> >>
> >> However, the problem now is the jars of flink-ml-api and
> >>
> >> flink-ml-lib
> >>
> >> are
> >>
> >> only exist on maven repo. Whenever users want to submit ML
> >>
> >> jobs,
> >>
> >> they
> >>
> >> can
> >>
> >> only depend on the ml modules and package a fat jar. This
> >>
> >> would be
> >>
> >> inconvenient especially for the machine learning platforms on
> >>
> >> which
> >>
> >> nearly
> >>
> >> all jobs depend on Flink ML modules and have to package a fat
> >>
> >> jar.
> >>
> >> Given this, it would be better to include jars of
> >>
> >> flink-ml-api
> >>
> >> and
> >>
> >> flink-ml-lib in the `opt` folder, so that users can directly
> >>
> >> use
> >>
> >> the
> >>
> >> jars
> >>
> >> with the binary release. For example, users can move the jars
> >>
> >> into
> >>
> >> the
> >>
> >> `lib` folder or use -j to upload the jars. (Currently, -j
> >>
> >> only
> >>
> >> support
> >>
> >> upload one jar. Supporting multi jars for -j can be discussed
> >>
> >> in
> >>
> >> another
> >>
> >> discussion.)
> >>
> >> Putting the jars in the `opt` folder instead of the `lib`
> >>
> >> folder
> >>
> >> is
> >>
> >> because
> >>
> >> currently, the ml jars are still optional for the Flink
> >>
> >> project by
> >>
> >> default.
> >>
> >> What do you think? Welcome any feedback!
> >>
> >> Best,
> >>
> >> Hequn
> >>
> >> [1]
> >>
> >>
> >>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-39+Flink+ML+pipeline+and+ML+libs
> >>
> >>
> >>
>

--
Best Regards

Jeff Zhang

Hequn Cheng-2

Re: [DISCUSS] Include flink-ml-api and flink-ml-lib in opt

Hi,

@Till Rohrmann <[hidden email]> Thanks for the great inputs. I agree
with you that we should have a long term plan for this. It definitely
deserves another discussion.
@Jeff Zhang <[hidden email]> Thanks for your reports and ideas. It's a
good idea to improve the error messages. Do we have any JIRAs for it or
maybe we can create one for it.

Thank you again for your feedback and suggestions. I will go on with the
PR. Thanks!

Best,
Hequn

On Thu, Feb 6, 2020 at 11:51 PM Jeff Zhang <[hidden email]> wrote:

> I have another concern which may not be closely related to this thread.
> Since flink doesn't include all the necessary jars, I think it is critical
> for flink to display meaningful error message when any class is missing.
> e.g. Here's the error message when I use kafka but miss
> including flink-json. To be honest, the kind of error message is hard to
> understand for new users.
>
>
> Reason: No factory implements
> 'org.apache.flink.table.factories.DeserializationSchemaFactory'. The
> following properties are requested:
> connector.properties.bootstrap.servers=localhost:9092
> connector.properties.group.id=testGroup
> connector.properties.zookeeper.connect=localhost:2181
> connector.startup-mode=earliest-offset connector.topic=generated.events
> connector.type=kafka connector.version=universal format.type=json
> schema.0.data-type=VARCHAR(2147483647) schema.0.name=status
> schema.1.data-type=VARCHAR(2147483647) schema.1.name=direction
> schema.2.data-type=BIGINT schema.2.name=event_ts update-mode=append The
> following factories have been considered:
> org.apache.flink.table.catalog.hive.factories.HiveCatalogFactory
> org.apache.flink.table.module.hive.HiveModuleFactory
> org.apache.flink.table.module.CoreModuleFactory
> org.apache.flink.table.catalog.GenericInMemoryCatalogFactory
> org.apache.flink.table.sources.CsvBatchTableSourceFactory
> org.apache.flink.table.sources.CsvAppendTableSourceFactory
> org.apache.flink.table.sinks.CsvBatchTableSinkFactory
> org.apache.flink.table.sinks.CsvAppendTableSinkFactory
> org.apache.flink.table.planner.delegation.BlinkPlannerFactory
> org.apache.flink.table.planner.delegation.BlinkExecutorFactory
> org.apache.flink.table.planner.StreamPlannerFactory
> org.apache.flink.table.executor.StreamExecutorFactory
> org.apache.flink.streaming.connectors.kafka.KafkaTableSourceSinkFactory at
>
> org.apache.flink.table.factories.TableFactoryService.filterByFactoryClass(TableFactoryService.java:238)
> at
>
> org.apache.flink.table.factories.TableFactoryService.filter(TableFactoryService.java:185)
> at
>
> org.apache.flink.table.factories.TableFactoryService.findSingleInternal(TableFactoryService.java:143)
> at
>
> org.apache.flink.table.factories.TableFactoryService.find(TableFactoryService.java:113)
> at
>
> org.apache.flink.streaming.connectors.kafka.KafkaTableSourceSinkFactoryBase.getDeserializationSchema(KafkaTableSourceSinkFactoryBase.java:277)
> at
>
> org.apache.flink.streaming.connectors.kafka.KafkaTableSourceSinkFactoryBase.createStreamTableSource(KafkaTableSourceSinkFactoryBase.java:161)
> at
>
> org.apache.flink.table.factories.StreamTableSourceFactory.createTableSource(StreamTableSourceFactory.java:49)
> at
>
> org.apache.flink.table.factories.TableFactoryUtil.findAndCreateTableSource(TableFactoryUtil.java:53)
> ... 36 more
>
>
>
> Till Rohrmann <[hidden email]> 于2020年2月6日周四下午11:30写道：
>
> > I would not object given that it is rather small at the moment. However,
> I
> > also think that we should have a plan how to handle the ever growing
> Flink
> > ecosystem and how to make it easily accessible to our users. E.g. one far
> > fetched idea could be something like a configuration script which
> downloads
> > the required components for the user. But this deserves definitely a
> > separate discussion and does not really belong here.
> >
> > Cheers,
> > Till
> >
> > On Thu, Feb 6, 2020 at 3:35 PM Hequn Cheng <[hidden email]> wrote:
> >
> > >
> > > Hi everyone,
> > >
> > > Thank you all for the great inputs!
> > >
> > > I think probably what we all agree on is we should try to make a leaner
> > > flink-dist. However, we may also need to do some compromises
> considering
> > > the user experience that users don't need to download the dependencies
> > from
> > > different places. Otherwise, we can move all the jars in the current
> opt
> > > folder to the download page.
> > >
> > > The missing of clear rules for guiding such compromises makes things
> more
> > > complicated now. I would agree that the decisive factor for what goes
> > into
> > > Flink's binary distribution should be how core it is to Flink.
> Meanwhile,
> > > it's better to treat Flink API as a (core) core to Flink. Not only it
> is
> > a
> > > very clear rule that easy to be followed but also in most cases, API is
> > > very significant and deserved to be included in the dist.
> > >
> > > Given this, it might make sense to put flink-ml-api and flink-ml-lib
> into
> > > the opt.
> > > What do you think?
> > >
> > > Best,
> > > Hequn
> > >
> > > On Wed, Feb 5, 2020 at 12:39 AM Chesnay Schepler <[hidden email]>
> > > wrote:
> > >
> > >> Around a year ago I started a discussion
> > >> <
> >
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/DISCUSS-Towards-a-leaner-flink-dist-tp25615.html
> > >
> > >> on reducing the amount of jars we ship with the distribution.
> > >>
> > >> While there was no definitive conclusion there was a shared sentiment
> > >> that APIs should be shipped with the distribution.
> > >>
> > >> On 04/02/2020 17:25, Till Rohrmann wrote:
> > >>
> > >> I think there is no such rule that APIs go automatically into opt/ and
> > >> "libraries" not. The contents of opt/ have mainly grown over time w/o
> > >> following a strict rule.
> > >>
> > >> I think the decisive factor for what goes into Flink's binary
> > distribution
> > >> should be how core it is to Flink. Of course another important
> > >> consideration is which use cases Flink should promote "out of the box"
> > (not
> > >> sure whether this is actual true for content shipped in opt/ because
> you
> > >> also have to move it to lib).
> > >>
> > >> For example, Gelly would be an example which I would rather see as an
> > >> optional component than shipping it with every Flink binary
> > distribution.
> > >>
> > >> Cheers,
> > >> Till
> > >>
> > >> On Tue, Feb 4, 2020 at 11:24 AM Becket Qin <[hidden email]> <
> > [hidden email]> wrote:
> > >>
> > >>
> > >> Thanks for the suggestion, Till.
> > >>
> > >> I am curious about how do we usually decide when to put the jars into
> > the
> > >> opt folder?
> > >>
> > >> Technically speaking, it seems that `flink-ml-api` should be put into
> > the
> > >> opt directory because they are actually API instead of libraries, just
> > like
> > >> CEP and Table.
> > >>
> > >> `flink-ml-lib` seems to be on the border. On one hand, it is a
> library.
> > On
> > >> the other hand, unlike SQL formats and Hadoop whose major code are
> > outside
> > >> of Flink, the algorithm codes are in Flink. So `flink-ml-lib` is more
> > like
> > >> those of built-in SQL UDFs. So it seems fine to either put it in the
> opt
> > >> folder or in the downloads page.
> > >>
> > >> From the user experience perspective, it might be better to have both
> > >> `flink-ml-lib` and `flink-ml-api` in opt folder so users needn't go to
> > two
> > >> places for the required dependencies.
> > >>
> > >> Thanks,
> > >>
> > >> Jiangjie (Becket) Qin
> > >>
> > >> On Tue, Feb 4, 2020 at 2:32 PM Hequn Cheng <[hidden email]> <
> > [hidden email]> wrote:
> > >>
> > >>
> > >> Hi Till,
> > >>
> > >> Thanks a lot for your suggestion. It's a good idea to offer the
> flink-ml
> > >> libraries as optional dependencies on the download page which can make
> > >>
> > >> the
> > >>
> > >> dist smaller.
> > >>
> > >> But I also have some concerns for it, e.g., the download page now only
> > >> includes the latest 3 releases. We may need to find ways to support
> more
> > >> versions.
> > >> On the other hand, the size of the flink-ml libraries now is very
> > >> small(about 246K), so it would not bring much impact on the size of
> > dist.
> > >>
> > >> What do you think?
> > >>
> > >> Best,
> > >> Hequn
> > >>
> > >> On Mon, Feb 3, 2020 at 6:24 PM Till Rohrmann <[hidden email]> <
> > [hidden email]>
> > >>
> > >> wrote:
> > >>
> > >> An alternative solution would be to offer the flink-ml libraries as
> > >> optional dependencies on the download page. Similar to how we offer
> the
> > >> different SQL formats and Hadoop releases [1].
> > >>
> > >> [1] https://flink.apache.org/downloads.html
> > >>
> > >> Cheers,
> > >> Till
> > >>
> > >> On Mon, Feb 3, 2020 at 10:19 AM Hequn Cheng <[hidden email]> <
> > [hidden email]> wrote:
> > >>
> > >>
> > >> Thank you all for your feedback and suggestions!
> > >>
> > >> Best, Hequn
> > >>
> > >> On Mon, Feb 3, 2020 at 5:07 PM Becket Qin <[hidden email]> <
> > [hidden email]>
> > >>
> > >> wrote:
> > >>
> > >> Thanks for bringing up the discussion, Hequn.
> > >>
> > >> +1 on adding `flink-ml-api` and `flink-ml-lib` into opt. This would
> > >>
> > >> make
> > >>
> > >> it much easier for the users to try out some simple ml tasks.
> > >>
> > >> Thanks,
> > >>
> > >> Jiangjie (Becket) Qin
> > >>
> > >> On Mon, Feb 3, 2020 at 4:34 PM jincheng sun <
> > >>
> > >> [hidden email]
> > >>
> > >> wrote:
> > >>
> > >>
> > >> Thank you for pushing forward @Hequn Cheng <[hidden email]> <
> > [hidden email]> !
> > >>
> > >> Hi @Becket Qin <[hidden email]> <[hidden email]> , Do
> you
> > have any concerns
> > >>
> > >> on
> > >>
> > >> this ?
> > >>
> > >> Best,
> > >> Jincheng
> > >>
> > >> Hequn Cheng <[hidden email]> <[hidden email]> 于2020年2月3日周一
> > 下午2:09写道：
> > >>
> > >>
> > >> Hi everyone,
> > >>
> > >> Thanks for the feedback. As there are no objections, I've opened a
> > >>
> > >> JIRA
> > >>
> > >> issue(FLINK-15847[1]) to address this issue.
> > >> The implementation details can be discussed in the issue or in the
> > >> following PR.
> > >>
> > >> Best,
> > >> Hequn
> > >>
> > >> [1] https://issues.apache.org/jira/browse/FLINK-15847
> > >>
> > >> On Wed, Jan 8, 2020 at 9:15 PM Hequn Cheng <[hidden email]> <
> > [hidden email]>
> > >>
> > >> wrote:
> > >>
> > >> Hi Jincheng,
> > >>
> > >> Thanks a lot for your feedback!
> > >> Yes, I agree with you. There are cases that multi jars need to
> > >>
> > >> be
> > >>
> > >> uploaded. I will prepare another discussion later. Maybe with a
> > >>
> > >> simple
> > >>
> > >> design doc.
> > >>
> > >> Best, Hequn
> > >>
> > >> On Wed, Jan 8, 2020 at 3:06 PM jincheng sun <
> > >>
> > >> [hidden email]>
> > >>
> > >> wrote:
> > >>
> > >>
> > >> Thanks for bring up this discussion Hequn!
> > >>
> > >> +1 for include `flink-ml-api` and `flink-ml-lib` in opt.
> > >>
> > >> BTW: I think would be great if bring up a discussion for upload
> > >>
> > >> multiple
> > >>
> > >> Jars at the same time. as PyFlink JOB also can have the benefit
> > >>
> > >> if
> > >>
> > >> we
> > >>
> > >> do
> > >>
> > >> that improvement.
> > >>
> > >> Best,
> > >> Jincheng
> > >>
> > >>
> > >> Hequn Cheng <[hidden email]> <[hidden email]>
> 于2020年1月8日周三
> > 上午11:50写道：
> > >>
> > >>
> > >> Hi everyone,
> > >>
> > >> FLIP-39[1] rebuilds Flink ML pipeline on top of TableAPI
> > >>
> > >> which
> > >>
> > >> moves
> > >>
> > >> Flink
> > >>
> > >> ML a step further. Base on it, users can develop their ML
> > >>
> > >> jobs
> > >>
> > >> and
> > >>
> > >> more
> > >>
> > >> and
> > >>
> > >> more machine learning platforms are providing ML services.
> > >>
> > >> However, the problem now is the jars of flink-ml-api and
> > >>
> > >> flink-ml-lib
> > >>
> > >> are
> > >>
> > >> only exist on maven repo. Whenever users want to submit ML
> > >>
> > >> jobs,
> > >>
> > >> they
> > >>
> > >> can
> > >>
> > >> only depend on the ml modules and package a fat jar. This
> > >>
> > >> would be
> > >>
> > >> inconvenient especially for the machine learning platforms on
> > >>
> > >> which
> > >>
> > >> nearly
> > >>
> > >> all jobs depend on Flink ML modules and have to package a fat
> > >>
> > >> jar.
> > >>
> > >> Given this, it would be better to include jars of
> > >>
> > >> flink-ml-api
> > >>
> > >> and
> > >>
> > >> flink-ml-lib in the `opt` folder, so that users can directly
> > >>
> > >> use
> > >>
> > >> the
> > >>
> > >> jars
> > >>
> > >> with the binary release. For example, users can move the jars
> > >>
> > >> into
> > >>
> > >> the
> > >>
> > >> `lib` folder or use -j to upload the jars. (Currently, -j
> > >>
> > >> only
> > >>
> > >> support
> > >>
> > >> upload one jar. Supporting multi jars for -j can be discussed
> > >>
> > >> in
> > >>
> > >> another
> > >>
> > >> discussion.)
> > >>
> > >> Putting the jars in the `opt` folder instead of the `lib`
> > >>
> > >> folder
> > >>
> > >> is
> > >>
> > >> because
> > >>
> > >> currently, the ml jars are still optional for the Flink
> > >>
> > >> project by
> > >>
> > >> default.
> > >>
> > >> What do you think? Welcome any feedback!
> > >>
> > >> Best,
> > >>
> > >> Hequn
> > >>
> > >> [1]
> > >>
> > >>
> > >>
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-39+Flink+ML+pipeline+and+ML+libs
> > >>
> > >>
> > >>
> >
>
>
> --
> Best Regards
>
> Jeff Zhang
>

Rong Rong

Re: [DISCUSS] Include flink-ml-api and flink-ml-lib in opt

CC @Xu Yang <[hidden email]>

Thanks for starting the discussion @Hequn Cheng <[hidden email]> and
sorry for joining the discussion late.

I've mainly helped merging the code in flink-ml-api and flink-ml-lib in the
past several months.
IMO the flink-ml-api are an extension on top of the table API and agree
that it should be treated as a part of the "core" core.

However, I think given the fact that there are multiple PRs still under
review [1], is it a better idea to come up with a long term plan first
before make the decision to moving it to /opt now?

--
Rong

[1]
https://github.com/apache/flink/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Aopen+label%3Acomponent%3DLibrary%2FMachineLearning+

On Fri, Feb 7, 2020 at 5:54 AM Hequn Cheng <[hidden email]> wrote:

> Hi,
>
> @Till Rohrmann <[hidden email]> Thanks for the great inputs. I agree
> with you that we should have a long term plan for this. It definitely
> deserves another discussion.
> @Jeff Zhang <[hidden email]> Thanks for your reports and ideas. It's a
> good idea to improve the error messages. Do we have any JIRAs for it or
> maybe we can create one for it.
>
> Thank you again for your feedback and suggestions. I will go on with the
> PR. Thanks!
>
> Best,
> Hequn
>
> On Thu, Feb 6, 2020 at 11:51 PM Jeff Zhang <[hidden email]> wrote:
>
> > I have another concern which may not be closely related to this thread.
> > Since flink doesn't include all the necessary jars, I think it is
> critical
> > for flink to display meaningful error message when any class is missing.
> > e.g. Here's the error message when I use kafka but miss
> > including flink-json. To be honest, the kind of error message is hard to
> > understand for new users.
> >
> >
> > Reason: No factory implements
> > 'org.apache.flink.table.factories.DeserializationSchemaFactory'. The
> > following properties are requested:
> > connector.properties.bootstrap.servers=localhost:9092
> > connector.properties.group.id=testGroup
> > connector.properties.zookeeper.connect=localhost:2181
> > connector.startup-mode=earliest-offset connector.topic=generated.events
> > connector.type=kafka connector.version=universal format.type=json
> > schema.0.data-type=VARCHAR(2147483647) schema.0.name=status
> > schema.1.data-type=VARCHAR(2147483647) schema.1.name=direction
> > schema.2.data-type=BIGINT schema.2.name=event_ts update-mode=append The
> > following factories have been considered:
> > org.apache.flink.table.catalog.hive.factories.HiveCatalogFactory
> > org.apache.flink.table.module.hive.HiveModuleFactory
> > org.apache.flink.table.module.CoreModuleFactory
> > org.apache.flink.table.catalog.GenericInMemoryCatalogFactory
> > org.apache.flink.table.sources.CsvBatchTableSourceFactory
> > org.apache.flink.table.sources.CsvAppendTableSourceFactory
> > org.apache.flink.table.sinks.CsvBatchTableSinkFactory
> > org.apache.flink.table.sinks.CsvAppendTableSinkFactory
> > org.apache.flink.table.planner.delegation.BlinkPlannerFactory
> > org.apache.flink.table.planner.delegation.BlinkExecutorFactory
> > org.apache.flink.table.planner.StreamPlannerFactory
> > org.apache.flink.table.executor.StreamExecutorFactory
> > org.apache.flink.streaming.connectors.kafka.KafkaTableSourceSinkFactory
> at
> >
> >
> org.apache.flink.table.factories.TableFactoryService.filterByFactoryClass(TableFactoryService.java:238)
> > at
> >
> >
> org.apache.flink.table.factories.TableFactoryService.filter(TableFactoryService.java:185)
> > at
> >
> >
> org.apache.flink.table.factories.TableFactoryService.findSingleInternal(TableFactoryService.java:143)
> > at
> >
> >
> org.apache.flink.table.factories.TableFactoryService.find(TableFactoryService.java:113)
> > at
> >
> >
> org.apache.flink.streaming.connectors.kafka.KafkaTableSourceSinkFactoryBase.getDeserializationSchema(KafkaTableSourceSinkFactoryBase.java:277)
> > at
> >
> >
> org.apache.flink.streaming.connectors.kafka.KafkaTableSourceSinkFactoryBase.createStreamTableSource(KafkaTableSourceSinkFactoryBase.java:161)
> > at
> >
> >
> org.apache.flink.table.factories.StreamTableSourceFactory.createTableSource(StreamTableSourceFactory.java:49)
> > at
> >
> >
> org.apache.flink.table.factories.TableFactoryUtil.findAndCreateTableSource(TableFactoryUtil.java:53)
> > ... 36 more
> >
> >
> >
> > Till Rohrmann <[hidden email]> 于2020年2月6日周四下午11:30写道：
> >
> > > I would not object given that it is rather small at the moment.
> However,
> > I
> > > also think that we should have a plan how to handle the ever growing
> > Flink
> > > ecosystem and how to make it easily accessible to our users. E.g. one
> far
> > > fetched idea could be something like a configuration script which
> > downloads
> > > the required components for the user. But this deserves definitely a
> > > separate discussion and does not really belong here.
> > >
> > > Cheers,
> > > Till
> > >
> > > On Thu, Feb 6, 2020 at 3:35 PM Hequn Cheng <[hidden email]> wrote:
> > >
> > > >
> > > > Hi everyone,
> > > >
> > > > Thank you all for the great inputs!
> > > >
> > > > I think probably what we all agree on is we should try to make a
> leaner
> > > > flink-dist. However, we may also need to do some compromises
> > considering
> > > > the user experience that users don't need to download the
> dependencies
> > > from
> > > > different places. Otherwise, we can move all the jars in the current
> > opt
> > > > folder to the download page.
> > > >
> > > > The missing of clear rules for guiding such compromises makes things
> > more
> > > > complicated now. I would agree that the decisive factor for what goes
> > > into
> > > > Flink's binary distribution should be how core it is to Flink.
> > Meanwhile,
> > > > it's better to treat Flink API as a (core) core to Flink. Not only it
> > is
> > > a
> > > > very clear rule that easy to be followed but also in most cases, API
> is
> > > > very significant and deserved to be included in the dist.
> > > >
> > > > Given this, it might make sense to put flink-ml-api and flink-ml-lib
> > into
> > > > the opt.
> > > > What do you think?
> > > >
> > > > Best,
> > > > Hequn
> > > >
> > > > On Wed, Feb 5, 2020 at 12:39 AM Chesnay Schepler <[hidden email]
> >
> > > > wrote:
> > > >
> > > >> Around a year ago I started a discussion
> > > >> <
> > >
> >
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/DISCUSS-Towards-a-leaner-flink-dist-tp25615.html
> > > >
> > > >> on reducing the amount of jars we ship with the distribution.
> > > >>
> > > >> While there was no definitive conclusion there was a shared
> sentiment
> > > >> that APIs should be shipped with the distribution.
> > > >>
> > > >> On 04/02/2020 17:25, Till Rohrmann wrote:
> > > >>
> > > >> I think there is no such rule that APIs go automatically into opt/
> and
> > > >> "libraries" not. The contents of opt/ have mainly grown over time
> w/o
> > > >> following a strict rule.
> > > >>
> > > >> I think the decisive factor for what goes into Flink's binary
> > > distribution
> > > >> should be how core it is to Flink. Of course another important
> > > >> consideration is which use cases Flink should promote "out of the
> box"
> > > (not
> > > >> sure whether this is actual true for content shipped in opt/ because
> > you
> > > >> also have to move it to lib).
> > > >>
> > > >> For example, Gelly would be an example which I would rather see as
> an
> > > >> optional component than shipping it with every Flink binary
> > > distribution.
> > > >>
> > > >> Cheers,
> > > >> Till
> > > >>
> > > >> On Tue, Feb 4, 2020 at 11:24 AM Becket Qin <[hidden email]> <
> > > [hidden email]> wrote:
> > > >>
> > > >>
> > > >> Thanks for the suggestion, Till.
> > > >>
> > > >> I am curious about how do we usually decide when to put the jars
> into
> > > the
> > > >> opt folder?
> > > >>
> > > >> Technically speaking, it seems that `flink-ml-api` should be put
> into
> > > the
> > > >> opt directory because they are actually API instead of libraries,
> just
> > > like
> > > >> CEP and Table.
> > > >>
> > > >> `flink-ml-lib` seems to be on the border. On one hand, it is a
> > library.
> > > On
> > > >> the other hand, unlike SQL formats and Hadoop whose major code are
> > > outside
> > > >> of Flink, the algorithm codes are in Flink. So `flink-ml-lib` is
> more
> > > like
> > > >> those of built-in SQL UDFs. So it seems fine to either put it in the
> > opt
> > > >> folder or in the downloads page.
> > > >>
> > > >> From the user experience perspective, it might be better to have
> both
> > > >> `flink-ml-lib` and `flink-ml-api` in opt folder so users needn't go
> to
> > > two
> > > >> places for the required dependencies.
> > > >>
> > > >> Thanks,
> > > >>
> > > >> Jiangjie (Becket) Qin
> > > >>
> > > >> On Tue, Feb 4, 2020 at 2:32 PM Hequn Cheng <[hidden email]> <
> > > [hidden email]> wrote:
> > > >>
> > > >>
> > > >> Hi Till,
> > > >>
> > > >> Thanks a lot for your suggestion. It's a good idea to offer the
> > flink-ml
> > > >> libraries as optional dependencies on the download page which can
> make
> > > >>
> > > >> the
> > > >>
> > > >> dist smaller.
> > > >>
> > > >> But I also have some concerns for it, e.g., the download page now
> only
> > > >> includes the latest 3 releases. We may need to find ways to support
> > more
> > > >> versions.
> > > >> On the other hand, the size of the flink-ml libraries now is very
> > > >> small(about 246K), so it would not bring much impact on the size of
> > > dist.
> > > >>
> > > >> What do you think?
> > > >>
> > > >> Best,
> > > >> Hequn
> > > >>
> > > >> On Mon, Feb 3, 2020 at 6:24 PM Till Rohrmann <[hidden email]>
> <
> > > [hidden email]>
> > > >>
> > > >> wrote:
> > > >>
> > > >> An alternative solution would be to offer the flink-ml libraries as
> > > >> optional dependencies on the download page. Similar to how we offer
> > the
> > > >> different SQL formats and Hadoop releases [1].
> > > >>
> > > >> [1] https://flink.apache.org/downloads.html
> > > >>
> > > >> Cheers,
> > > >> Till
> > > >>
> > > >> On Mon, Feb 3, 2020 at 10:19 AM Hequn Cheng <[hidden email]> <
> > > [hidden email]> wrote:
> > > >>
> > > >>
> > > >> Thank you all for your feedback and suggestions!
> > > >>
> > > >> Best, Hequn
> > > >>
> > > >> On Mon, Feb 3, 2020 at 5:07 PM Becket Qin <[hidden email]> <
> > > [hidden email]>
> > > >>
> > > >> wrote:
> > > >>
> > > >> Thanks for bringing up the discussion, Hequn.
> > > >>
> > > >> +1 on adding `flink-ml-api` and `flink-ml-lib` into opt. This would
> > > >>
> > > >> make
> > > >>
> > > >> it much easier for the users to try out some simple ml tasks.
> > > >>
> > > >> Thanks,
> > > >>
> > > >> Jiangjie (Becket) Qin
> > > >>
> > > >> On Mon, Feb 3, 2020 at 4:34 PM jincheng sun <
> > > >>
> > > >> [hidden email]
> > > >>
> > > >> wrote:
> > > >>
> > > >>
> > > >> Thank you for pushing forward @Hequn Cheng <[hidden email]> <
> > > [hidden email]> !
> > > >>
> > > >> Hi @Becket Qin <[hidden email]> <[hidden email]> , Do
> > you
> > > have any concerns
> > > >>
> > > >> on
> > > >>
> > > >> this ?
> > > >>
> > > >> Best,
> > > >> Jincheng
> > > >>
> > > >> Hequn Cheng <[hidden email]> <[hidden email]> 于2020年2月3日周一
> > > 下午2:09写道：
> > > >>
> > > >>
> > > >> Hi everyone,
> > > >>
> > > >> Thanks for the feedback. As there are no objections, I've opened a
> > > >>
> > > >> JIRA
> > > >>
> > > >> issue(FLINK-15847[1]) to address this issue.
> > > >> The implementation details can be discussed in the issue or in the
> > > >> following PR.
> > > >>
> > > >> Best,
> > > >> Hequn
> > > >>
> > > >> [1] https://issues.apache.org/jira/browse/FLINK-15847
> > > >>
> > > >> On Wed, Jan 8, 2020 at 9:15 PM Hequn Cheng <[hidden email]> <
> > > [hidden email]>
> > > >>
> > > >> wrote:
> > > >>
> > > >> Hi Jincheng,
> > > >>
> > > >> Thanks a lot for your feedback!
> > > >> Yes, I agree with you. There are cases that multi jars need to
> > > >>
> > > >> be
> > > >>
> > > >> uploaded. I will prepare another discussion later. Maybe with a
> > > >>
> > > >> simple
> > > >>
> > > >> design doc.
> > > >>
> > > >> Best, Hequn
> > > >>
> > > >> On Wed, Jan 8, 2020 at 3:06 PM jincheng sun <
> > > >>
> > > >> [hidden email]>
> > > >>
> > > >> wrote:
> > > >>
> > > >>
> > > >> Thanks for bring up this discussion Hequn!
> > > >>
> > > >> +1 for include `flink-ml-api` and `flink-ml-lib` in opt.
> > > >>
> > > >> BTW: I think would be great if bring up a discussion for upload
> > > >>
> > > >> multiple
> > > >>
> > > >> Jars at the same time. as PyFlink JOB also can have the benefit
> > > >>
> > > >> if
> > > >>
> > > >> we
> > > >>
> > > >> do
> > > >>
> > > >> that improvement.
> > > >>
> > > >> Best,
> > > >> Jincheng
> > > >>
> > > >>
> > > >> Hequn Cheng <[hidden email]> <[hidden email]>
> > 于2020年1月8日周三
> > > 上午11:50写道：
> > > >>
> > > >>
> > > >> Hi everyone,
> > > >>
> > > >> FLIP-39[1] rebuilds Flink ML pipeline on top of TableAPI
> > > >>
> > > >> which
> > > >>
> > > >> moves
> > > >>
> > > >> Flink
> > > >>
> > > >> ML a step further. Base on it, users can develop their ML
> > > >>
> > > >> jobs
> > > >>
> > > >> and
> > > >>
> > > >> more
> > > >>
> > > >> and
> > > >>
> > > >> more machine learning platforms are providing ML services.
> > > >>
> > > >> However, the problem now is the jars of flink-ml-api and
> > > >>
> > > >> flink-ml-lib
> > > >>
> > > >> are
> > > >>
> > > >> only exist on maven repo. Whenever users want to submit ML
> > > >>
> > > >> jobs,
> > > >>
> > > >> they
> > > >>
> > > >> can
> > > >>
> > > >> only depend on the ml modules and package a fat jar. This
> > > >>
> > > >> would be
> > > >>
> > > >> inconvenient especially for the machine learning platforms on
> > > >>
> > > >> which
> > > >>
> > > >> nearly
> > > >>
> > > >> all jobs depend on Flink ML modules and have to package a fat
> > > >>
> > > >> jar.
> > > >>
> > > >> Given this, it would be better to include jars of
> > > >>
> > > >> flink-ml-api
> > > >>
> > > >> and
> > > >>
> > > >> flink-ml-lib in the `opt` folder, so that users can directly
> > > >>
> > > >> use
> > > >>
> > > >> the
> > > >>
> > > >> jars
> > > >>
> > > >> with the binary release. For example, users can move the jars
> > > >>
> > > >> into
> > > >>
> > > >> the
> > > >>
> > > >> `lib` folder or use -j to upload the jars. (Currently, -j
> > > >>
> > > >> only
> > > >>
> > > >> support
> > > >>
> > > >> upload one jar. Supporting multi jars for -j can be discussed
> > > >>
> > > >> in
> > > >>
> > > >> another
> > > >>
> > > >> discussion.)
> > > >>
> > > >> Putting the jars in the `opt` folder instead of the `lib`
> > > >>
> > > >> folder
> > > >>
> > > >> is
> > > >>
> > > >> because
> > > >>
> > > >> currently, the ml jars are still optional for the Flink
> > > >>
> > > >> project by
> > > >>
> > > >> default.
> > > >>
> > > >> What do you think? Welcome any feedback!
> > > >>
> > > >> Best,
> > > >>
> > > >> Hequn
> > > >>
> > > >> [1]
> > > >>
> > > >>
> > > >>
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-39+Flink+ML+pipeline+and+ML+libs
> > > >>
> > > >>
> > > >>
> > >
> >
> >
> > --
> > Best Regards
> >
> > Jeff Zhang
> >
>

Hequn Cheng-2

Re: [DISCUSS] Include flink-ml-api and flink-ml-lib in opt

Hi Rong,

Thanks a lot for joining the discussion!

It would be great if we can have a long term plan. My intention is to
provide a way for users to add dependencies of Flink ML, either through the
opt or download page. This would be more and more critical along with the
improvement of the Flink ML, as you said there are multiple PRs under
review and I'm also going to support Python Pipeline API recently[1].

Meanwhile, it also makes sense to include the API into the opt, so it would
probably not break the long term plan.
However, even find something wrong in the future, we can revisit this
easily instead of blocking the improvement for users. What do you think?

Best,
Hequn

[1]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Support-Python-ML-Pipeline-API-td37291.html

On Sat, Feb 8, 2020 at 1:57 AM Rong Rong <[hidden email]> wrote:

> CC @Xu Yang <[hidden email]>
>
> Thanks for starting the discussion @Hequn Cheng <[hidden email]> and
> sorry for joining the discussion late.
>
> I've mainly helped merging the code in flink-ml-api and flink-ml-lib in
> the past several months.
> IMO the flink-ml-api are an extension on top of the table API and agree
> that it should be treated as a part of the "core" core.
>
> However, I think given the fact that there are multiple PRs still under
> review [1], is it a better idea to come up with a long term plan first
> before make the decision to moving it to /opt now?
>
>
> --
> Rong
>
> [1]
> https://github.com/apache/flink/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Aopen+label%3Acomponent%3DLibrary%2FMachineLearning+
>
> On Fri, Feb 7, 2020 at 5:54 AM Hequn Cheng <[hidden email]> wrote:
>
>> Hi,
>>
>> @Till Rohrmann <[hidden email]> Thanks for the great inputs. I
>> agree
>> with you that we should have a long term plan for this. It definitely
>> deserves another discussion.
>> @Jeff Zhang <[hidden email]> Thanks for your reports and ideas. It's a
>> good idea to improve the error messages. Do we have any JIRAs for it or
>> maybe we can create one for it.
>>
>> Thank you again for your feedback and suggestions. I will go on with the
>> PR. Thanks!
>>
>> Best,
>> Hequn
>>
>> On Thu, Feb 6, 2020 at 11:51 PM Jeff Zhang <[hidden email]> wrote:
>>
>> > I have another concern which may not be closely related to this thread.
>> > Since flink doesn't include all the necessary jars, I think it is
>> critical
>> > for flink to display meaningful error message when any class is missing.
>> > e.g. Here's the error message when I use kafka but miss
>> > including flink-json. To be honest, the kind of error message is hard
>> to
>> > understand for new users.
>> >
>> >
>> > Reason: No factory implements
>> > 'org.apache.flink.table.factories.DeserializationSchemaFactory'. The
>> > following properties are requested:
>> > connector.properties.bootstrap.servers=localhost:9092
>> > connector.properties.group.id=testGroup
>> > connector.properties.zookeeper.connect=localhost:2181
>> > connector.startup-mode=earliest-offset connector.topic=generated.events
>> > connector.type=kafka connector.version=universal format.type=json
>> > schema.0.data-type=VARCHAR(2147483647) schema.0.name=status
>> > schema.1.data-type=VARCHAR(2147483647) schema.1.name=direction
>> > schema.2.data-type=BIGINT schema.2.name=event_ts update-mode=append The
>> > following factories have been considered:
>> > org.apache.flink.table.catalog.hive.factories.HiveCatalogFactory
>> > org.apache.flink.table.module.hive.HiveModuleFactory
>> > org.apache.flink.table.module.CoreModuleFactory
>> > org.apache.flink.table.catalog.GenericInMemoryCatalogFactory
>> > org.apache.flink.table.sources.CsvBatchTableSourceFactory
>> > org.apache.flink.table.sources.CsvAppendTableSourceFactory
>> > org.apache.flink.table.sinks.CsvBatchTableSinkFactory
>> > org.apache.flink.table.sinks.CsvAppendTableSinkFactory
>> > org.apache.flink.table.planner.delegation.BlinkPlannerFactory
>> > org.apache.flink.table.planner.delegation.BlinkExecutorFactory
>> > org.apache.flink.table.planner.StreamPlannerFactory
>> > org.apache.flink.table.executor.StreamExecutorFactory
>> > org.apache.flink.streaming.connectors.kafka.KafkaTableSourceSinkFactory
>> at
>> >
>> >
>> org.apache.flink.table.factories.TableFactoryService.filterByFactoryClass(TableFactoryService.java:238)
>> > at
>> >
>> >
>> org.apache.flink.table.factories.TableFactoryService.filter(TableFactoryService.java:185)
>> > at
>> >
>> >
>> org.apache.flink.table.factories.TableFactoryService.findSingleInternal(TableFactoryService.java:143)
>> > at
>> >
>> >
>> org.apache.flink.table.factories.TableFactoryService.find(TableFactoryService.java:113)
>> > at
>> >
>> >
>> org.apache.flink.streaming.connectors.kafka.KafkaTableSourceSinkFactoryBase.getDeserializationSchema(KafkaTableSourceSinkFactoryBase.java:277)
>> > at
>> >
>> >
>> org.apache.flink.streaming.connectors.kafka.KafkaTableSourceSinkFactoryBase.createStreamTableSource(KafkaTableSourceSinkFactoryBase.java:161)
>> > at
>> >
>> >
>> org.apache.flink.table.factories.StreamTableSourceFactory.createTableSource(StreamTableSourceFactory.java:49)
>> > at
>> >
>> >
>> org.apache.flink.table.factories.TableFactoryUtil.findAndCreateTableSource(TableFactoryUtil.java:53)
>> > ... 36 more
>> >
>> >
>> >
>> > Till Rohrmann <[hidden email]> 于2020年2月6日周四下午11:30写道：
>> >
>> > > I would not object given that it is rather small at the moment.
>> However,
>> > I
>> > > also think that we should have a plan how to handle the ever growing
>> > Flink
>> > > ecosystem and how to make it easily accessible to our users. E.g. one
>> far
>> > > fetched idea could be something like a configuration script which
>> > downloads
>> > > the required components for the user. But this deserves definitely a
>> > > separate discussion and does not really belong here.
>> > >
>> > > Cheers,
>> > > Till
>> > >
>> > > On Thu, Feb 6, 2020 at 3:35 PM Hequn Cheng <[hidden email]> wrote:
>> > >
>> > > >
>> > > > Hi everyone,
>> > > >
>> > > > Thank you all for the great inputs!
>> > > >
>> > > > I think probably what we all agree on is we should try to make a
>> leaner
>> > > > flink-dist. However, we may also need to do some compromises
>> > considering
>> > > > the user experience that users don't need to download the
>> dependencies
>> > > from
>> > > > different places. Otherwise, we can move all the jars in the current
>> > opt
>> > > > folder to the download page.
>> > > >
>> > > > The missing of clear rules for guiding such compromises makes things
>> > more
>> > > > complicated now. I would agree that the decisive factor for what
>> goes
>> > > into
>> > > > Flink's binary distribution should be how core it is to Flink.
>> > Meanwhile,
>> > > > it's better to treat Flink API as a (core) core to Flink. Not only
>> it
>> > is
>> > > a
>> > > > very clear rule that easy to be followed but also in most cases,
>> API is
>> > > > very significant and deserved to be included in the dist.
>> > > >
>> > > > Given this, it might make sense to put flink-ml-api and flink-ml-lib
>> > into
>> > > > the opt.
>> > > > What do you think?
>> > > >
>> > > > Best,
>> > > > Hequn
>> > > >
>> > > > On Wed, Feb 5, 2020 at 12:39 AM Chesnay Schepler <
>> [hidden email]>
>> > > > wrote:
>> > > >
>> > > >> Around a year ago I started a discussion
>> > > >> <
>> > >
>> >
>> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/DISCUSS-Towards-a-leaner-flink-dist-tp25615.html
>> > > >
>> > > >> on reducing the amount of jars we ship with the distribution.
>> > > >>
>> > > >> While there was no definitive conclusion there was a shared
>> sentiment
>> > > >> that APIs should be shipped with the distribution.
>> > > >>
>> > > >> On 04/02/2020 17:25, Till Rohrmann wrote:
>> > > >>
>> > > >> I think there is no such rule that APIs go automatically into opt/
>> and
>> > > >> "libraries" not. The contents of opt/ have mainly grown over time
>> w/o
>> > > >> following a strict rule.
>> > > >>
>> > > >> I think the decisive factor for what goes into Flink's binary
>> > > distribution
>> > > >> should be how core it is to Flink. Of course another important
>> > > >> consideration is which use cases Flink should promote "out of the
>> box"
>> > > (not
>> > > >> sure whether this is actual true for content shipped in opt/
>> because
>> > you
>> > > >> also have to move it to lib).
>> > > >>
>> > > >> For example, Gelly would be an example which I would rather see as
>> an
>> > > >> optional component than shipping it with every Flink binary
>> > > distribution.
>> > > >>
>> > > >> Cheers,
>> > > >> Till
>> > > >>
>> > > >> On Tue, Feb 4, 2020 at 11:24 AM Becket Qin <[hidden email]>
>> <
>> > > [hidden email]> wrote:
>> > > >>
>> > > >>
>> > > >> Thanks for the suggestion, Till.
>> > > >>
>> > > >> I am curious about how do we usually decide when to put the jars
>> into
>> > > the
>> > > >> opt folder?
>> > > >>
>> > > >> Technically speaking, it seems that `flink-ml-api` should be put
>> into
>> > > the
>> > > >> opt directory because they are actually API instead of libraries,
>> just
>> > > like
>> > > >> CEP and Table.
>> > > >>
>> > > >> `flink-ml-lib` seems to be on the border. On one hand, it is a
>> > library.
>> > > On
>> > > >> the other hand, unlike SQL formats and Hadoop whose major code are
>> > > outside
>> > > >> of Flink, the algorithm codes are in Flink. So `flink-ml-lib` is
>> more
>> > > like
>> > > >> those of built-in SQL UDFs. So it seems fine to either put it in
>> the
>> > opt
>> > > >> folder or in the downloads page.
>> > > >>
>> > > >> From the user experience perspective, it might be better to have
>> both
>> > > >> `flink-ml-lib` and `flink-ml-api` in opt folder so users needn't
>> go to
>> > > two
>> > > >> places for the required dependencies.
>> > > >>
>> > > >> Thanks,
>> > > >>
>> > > >> Jiangjie (Becket) Qin
>> > > >>
>> > > >> On Tue, Feb 4, 2020 at 2:32 PM Hequn Cheng <[hidden email]> <
>> > > [hidden email]> wrote:
>> > > >>
>> > > >>
>> > > >> Hi Till,
>> > > >>
>> > > >> Thanks a lot for your suggestion. It's a good idea to offer the
>> > flink-ml
>> > > >> libraries as optional dependencies on the download page which can
>> make
>> > > >>
>> > > >> the
>> > > >>
>> > > >> dist smaller.
>> > > >>
>> > > >> But I also have some concerns for it, e.g., the download page now
>> only
>> > > >> includes the latest 3 releases. We may need to find ways to support
>> > more
>> > > >> versions.
>> > > >> On the other hand, the size of the flink-ml libraries now is very
>> > > >> small(about 246K), so it would not bring much impact on the size of
>> > > dist.
>> > > >>
>> > > >> What do you think?
>> > > >>
>> > > >> Best,
>> > > >> Hequn
>> > > >>
>> > > >> On Mon, Feb 3, 2020 at 6:24 PM Till Rohrmann <[hidden email]>
>> <
>> > > [hidden email]>
>> > > >>
>> > > >> wrote:
>> > > >>
>> > > >> An alternative solution would be to offer the flink-ml libraries as
>> > > >> optional dependencies on the download page. Similar to how we offer
>> > the
>> > > >> different SQL formats and Hadoop releases [1].
>> > > >>
>> > > >> [1] https://flink.apache.org/downloads.html
>> > > >>
>> > > >> Cheers,
>> > > >> Till
>> > > >>
>> > > >> On Mon, Feb 3, 2020 at 10:19 AM Hequn Cheng <[hidden email]> <
>> > > [hidden email]> wrote:
>> > > >>
>> > > >>
>> > > >> Thank you all for your feedback and suggestions!
>> > > >>
>> > > >> Best, Hequn
>> > > >>
>> > > >> On Mon, Feb 3, 2020 at 5:07 PM Becket Qin <[hidden email]> <
>> > > [hidden email]>
>> > > >>
>> > > >> wrote:
>> > > >>
>> > > >> Thanks for bringing up the discussion, Hequn.
>> > > >>
>> > > >> +1 on adding `flink-ml-api` and `flink-ml-lib` into opt. This would
>> > > >>
>> > > >> make
>> > > >>
>> > > >> it much easier for the users to try out some simple ml tasks.
>> > > >>
>> > > >> Thanks,
>> > > >>
>> > > >> Jiangjie (Becket) Qin
>> > > >>
>> > > >> On Mon, Feb 3, 2020 at 4:34 PM jincheng sun <
>> > > >>
>> > > >> [hidden email]
>> > > >>
>> > > >> wrote:
>> > > >>
>> > > >>
>> > > >> Thank you for pushing forward @Hequn Cheng <[hidden email]> <
>> > > [hidden email]> !
>> > > >>
>> > > >> Hi @Becket Qin <[hidden email]> <[hidden email]> , Do
>> > you
>> > > have any concerns
>> > > >>
>> > > >> on
>> > > >>
>> > > >> this ?
>> > > >>
>> > > >> Best,
>> > > >> Jincheng
>> > > >>
>> > > >> Hequn Cheng <[hidden email]> <[hidden email]> 于2020年2月3日周一
>> > > 下午2:09写道：
>> > > >>
>> > > >>
>> > > >> Hi everyone,
>> > > >>
>> > > >> Thanks for the feedback. As there are no objections, I've opened a
>> > > >>
>> > > >> JIRA
>> > > >>
>> > > >> issue(FLINK-15847[1]) to address this issue.
>> > > >> The implementation details can be discussed in the issue or in the
>> > > >> following PR.
>> > > >>
>> > > >> Best,
>> > > >> Hequn
>> > > >>
>> > > >> [1] https://issues.apache.org/jira/browse/FLINK-15847
>> > > >>
>> > > >> On Wed, Jan 8, 2020 at 9:15 PM Hequn Cheng <[hidden email]>
>> <
>> > > [hidden email]>
>> > > >>
>> > > >> wrote:
>> > > >>
>> > > >> Hi Jincheng,
>> > > >>
>> > > >> Thanks a lot for your feedback!
>> > > >> Yes, I agree with you. There are cases that multi jars need to
>> > > >>
>> > > >> be
>> > > >>
>> > > >> uploaded. I will prepare another discussion later. Maybe with a
>> > > >>
>> > > >> simple
>> > > >>
>> > > >> design doc.
>> > > >>
>> > > >> Best, Hequn
>> > > >>
>> > > >> On Wed, Jan 8, 2020 at 3:06 PM jincheng sun <
>> > > >>
>> > > >> [hidden email]>
>> > > >>
>> > > >> wrote:
>> > > >>
>> > > >>
>> > > >> Thanks for bring up this discussion Hequn!
>> > > >>
>> > > >> +1 for include `flink-ml-api` and `flink-ml-lib` in opt.
>> > > >>
>> > > >> BTW: I think would be great if bring up a discussion for upload
>> > > >>
>> > > >> multiple
>> > > >>
>> > > >> Jars at the same time. as PyFlink JOB also can have the benefit
>> > > >>
>> > > >> if
>> > > >>
>> > > >> we
>> > > >>
>> > > >> do
>> > > >>
>> > > >> that improvement.
>> > > >>
>> > > >> Best,
>> > > >> Jincheng
>> > > >>
>> > > >>
>> > > >> Hequn Cheng <[hidden email]> <[hidden email]>
>> > 于2020年1月8日周三
>> > > 上午11:50写道：
>> > > >>
>> > > >>
>> > > >> Hi everyone,
>> > > >>
>> > > >> FLIP-39[1] rebuilds Flink ML pipeline on top of TableAPI
>> > > >>
>> > > >> which
>> > > >>
>> > > >> moves
>> > > >>
>> > > >> Flink
>> > > >>
>> > > >> ML a step further. Base on it, users can develop their ML
>> > > >>
>> > > >> jobs
>> > > >>
>> > > >> and
>> > > >>
>> > > >> more
>> > > >>
>> > > >> and
>> > > >>
>> > > >> more machine learning platforms are providing ML services.
>> > > >>
>> > > >> However, the problem now is the jars of flink-ml-api and
>> > > >>
>> > > >> flink-ml-lib
>> > > >>
>> > > >> are
>> > > >>
>> > > >> only exist on maven repo. Whenever users want to submit ML
>> > > >>
>> > > >> jobs,
>> > > >>
>> > > >> they
>> > > >>
>> > > >> can
>> > > >>
>> > > >> only depend on the ml modules and package a fat jar. This
>> > > >>
>> > > >> would be
>> > > >>
>> > > >> inconvenient especially for the machine learning platforms on
>> > > >>
>> > > >> which
>> > > >>
>> > > >> nearly
>> > > >>
>> > > >> all jobs depend on Flink ML modules and have to package a fat
>> > > >>
>> > > >> jar.
>> > > >>
>> > > >> Given this, it would be better to include jars of
>> > > >>
>> > > >> flink-ml-api
>> > > >>
>> > > >> and
>> > > >>
>> > > >> flink-ml-lib in the `opt` folder, so that users can directly
>> > > >>
>> > > >> use
>> > > >>
>> > > >> the
>> > > >>
>> > > >> jars
>> > > >>
>> > > >> with the binary release. For example, users can move the jars
>> > > >>
>> > > >> into
>> > > >>
>> > > >> the
>> > > >>
>> > > >> `lib` folder or use -j to upload the jars. (Currently, -j
>> > > >>
>> > > >> only
>> > > >>
>> > > >> support
>> > > >>
>> > > >> upload one jar. Supporting multi jars for -j can be discussed
>> > > >>
>> > > >> in
>> > > >>
>> > > >> another
>> > > >>
>> > > >> discussion.)
>> > > >>
>> > > >> Putting the jars in the `opt` folder instead of the `lib`
>> > > >>
>> > > >> folder
>> > > >>
>> > > >> is
>> > > >>
>> > > >> because
>> > > >>
>> > > >> currently, the ml jars are still optional for the Flink
>> > > >>
>> > > >> project by
>> > > >>
>> > > >> default.
>> > > >>
>> > > >> What do you think? Welcome any feedback!
>> > > >>
>> > > >> Best,
>> > > >>
>> > > >> Hequn
>> > > >>
>> > > >> [1]
>> > > >>
>> > > >>
>> > > >>
>> > > >>
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-39+Flink+ML+pipeline+and+ML+libs
>> > > >>
>> > > >>
>> > > >>
>> > >
>> >
>> >
>> > --
>> > Best Regards
>> >
>> > Jeff Zhang
>> >
>>
>

Rong Rong

Re: [DISCUSS] Include flink-ml-api and flink-ml-lib in opt

Yes. I think the argument is fairly valid - we can always adjust the API in
the future, in fact most of the APIs are labeled publicEvolving at this
moment.
I was only trying to provide the info, that the interfaces in flink-ml-api
might change in the near future, for others when voting.

In fact, I am actually always +1 on moving flink-ml-api to /opt :-)
Regarding the Python ML API. sorry for not noticing it earlier as I haven't
given it a deep look yet. will do very soon!

--
Rong

On Sun, Feb 9, 2020 at 7:33 PM Hequn Cheng <[hidden email]> wrote:

> Hi Rong,
>
> Thanks a lot for joining the discussion!
>
> It would be great if we can have a long term plan. My intention is to
> provide a way for users to add dependencies of Flink ML, either through the
> opt or download page. This would be more and more critical along with the
> improvement of the Flink ML, as you said there are multiple PRs under
> review and I'm also going to support Python Pipeline API recently[1].
>
> Meanwhile, it also makes sense to include the API into the opt, so it
> would probably not break the long term plan.
> However, even find something wrong in the future, we can revisit this
> easily instead of blocking the improvement for users. What do you think?
>
> Best,
> Hequn
>
> [1]
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Support-Python-ML-Pipeline-API-td37291.html
>
> On Sat, Feb 8, 2020 at 1:57 AM Rong Rong <[hidden email]> wrote:
>
>> CC @Xu Yang <[hidden email]>
>>
>> Thanks for starting the discussion @Hequn Cheng <[hidden email]> and
>> sorry for joining the discussion late.
>>
>> I've mainly helped merging the code in flink-ml-api and flink-ml-lib in
>> the past several months.
>> IMO the flink-ml-api are an extension on top of the table API and agree
>> that it should be treated as a part of the "core" core.
>>
>> However, I think given the fact that there are multiple PRs still under
>> review [1], is it a better idea to come up with a long term plan first
>> before make the decision to moving it to /opt now?
>>
>>
>> --
>> Rong
>>
>> [1]
>> https://github.com/apache/flink/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Aopen+label%3Acomponent%3DLibrary%2FMachineLearning+
>>
>> On Fri, Feb 7, 2020 at 5:54 AM Hequn Cheng <[hidden email]> wrote:
>>
>>> Hi,
>>>
>>> @Till Rohrmann <[hidden email]> Thanks for the great inputs. I
>>> agree
>>> with you that we should have a long term plan for this. It definitely
>>> deserves another discussion.
>>> @Jeff Zhang <[hidden email]> Thanks for your reports and ideas. It's a
>>> good idea to improve the error messages. Do we have any JIRAs for it or
>>> maybe we can create one for it.
>>>
>>> Thank you again for your feedback and suggestions. I will go on with the
>>> PR. Thanks!
>>>
>>> Best,
>>> Hequn
>>>
>>> On Thu, Feb 6, 2020 at 11:51 PM Jeff Zhang <[hidden email]> wrote:
>>>
>>> > I have another concern which may not be closely related to this thread.
>>> > Since flink doesn't include all the necessary jars, I think it is
>>> critical
>>> > for flink to display meaningful error message when any class is
>>> missing.
>>> > e.g. Here's the error message when I use kafka but miss
>>> > including flink-json. To be honest, the kind of error message is hard
>>> to
>>> > understand for new users.
>>> >
>>> >
>>> > Reason: No factory implements
>>> > 'org.apache.flink.table.factories.DeserializationSchemaFactory'. The
>>> > following properties are requested:
>>> > connector.properties.bootstrap.servers=localhost:9092
>>> > connector.properties.group.id=testGroup
>>> > connector.properties.zookeeper.connect=localhost:2181
>>> > connector.startup-mode=earliest-offset connector.topic=generated.events
>>> > connector.type=kafka connector.version=universal format.type=json
>>> > schema.0.data-type=VARCHAR(2147483647) schema.0.name=status
>>> > schema.1.data-type=VARCHAR(2147483647) schema.1.name=direction
>>> > schema.2.data-type=BIGINT schema.2.name=event_ts update-mode=append
>>> The
>>> > following factories have been considered:
>>> > org.apache.flink.table.catalog.hive.factories.HiveCatalogFactory
>>> > org.apache.flink.table.module.hive.HiveModuleFactory
>>> > org.apache.flink.table.module.CoreModuleFactory
>>> > org.apache.flink.table.catalog.GenericInMemoryCatalogFactory
>>> > org.apache.flink.table.sources.CsvBatchTableSourceFactory
>>> > org.apache.flink.table.sources.CsvAppendTableSourceFactory
>>> > org.apache.flink.table.sinks.CsvBatchTableSinkFactory
>>> > org.apache.flink.table.sinks.CsvAppendTableSinkFactory
>>> > org.apache.flink.table.planner.delegation.BlinkPlannerFactory
>>> > org.apache.flink.table.planner.delegation.BlinkExecutorFactory
>>> > org.apache.flink.table.planner.StreamPlannerFactory
>>> > org.apache.flink.table.executor.StreamExecutorFactory
>>> >
>>> org.apache.flink.streaming.connectors.kafka.KafkaTableSourceSinkFactory at
>>> >
>>> >
>>> org.apache.flink.table.factories.TableFactoryService.filterByFactoryClass(TableFactoryService.java:238)
>>> > at
>>> >
>>> >
>>> org.apache.flink.table.factories.TableFactoryService.filter(TableFactoryService.java:185)
>>> > at
>>> >
>>> >
>>> org.apache.flink.table.factories.TableFactoryService.findSingleInternal(TableFactoryService.java:143)
>>> > at
>>> >
>>> >
>>> org.apache.flink.table.factories.TableFactoryService.find(TableFactoryService.java:113)
>>> > at
>>> >
>>> >
>>> org.apache.flink.streaming.connectors.kafka.KafkaTableSourceSinkFactoryBase.getDeserializationSchema(KafkaTableSourceSinkFactoryBase.java:277)
>>> > at
>>> >
>>> >
>>> org.apache.flink.streaming.connectors.kafka.KafkaTableSourceSinkFactoryBase.createStreamTableSource(KafkaTableSourceSinkFactoryBase.java:161)
>>> > at
>>> >
>>> >
>>> org.apache.flink.table.factories.StreamTableSourceFactory.createTableSource(StreamTableSourceFactory.java:49)
>>> > at
>>> >
>>> >
>>> org.apache.flink.table.factories.TableFactoryUtil.findAndCreateTableSource(TableFactoryUtil.java:53)
>>> > ... 36 more
>>> >
>>> >
>>> >
>>> > Till Rohrmann <[hidden email]> 于2020年2月6日周四下午11:30写道：
>>> >
>>> > > I would not object given that it is rather small at the moment.
>>> However,
>>> > I
>>> > > also think that we should have a plan how to handle the ever growing
>>> > Flink
>>> > > ecosystem and how to make it easily accessible to our users. E.g.
>>> one far
>>> > > fetched idea could be something like a configuration script which
>>> > downloads
>>> > > the required components for the user. But this deserves definitely a
>>> > > separate discussion and does not really belong here.
>>> > >
>>> > > Cheers,
>>> > > Till
>>> > >
>>> > > On Thu, Feb 6, 2020 at 3:35 PM Hequn Cheng <[hidden email]> wrote:
>>> > >
>>> > > >
>>> > > > Hi everyone,
>>> > > >
>>> > > > Thank you all for the great inputs!
>>> > > >
>>> > > > I think probably what we all agree on is we should try to make a
>>> leaner
>>> > > > flink-dist. However, we may also need to do some compromises
>>> > considering
>>> > > > the user experience that users don't need to download the
>>> dependencies
>>> > > from
>>> > > > different places. Otherwise, we can move all the jars in the
>>> current
>>> > opt
>>> > > > folder to the download page.
>>> > > >
>>> > > > The missing of clear rules for guiding such compromises makes
>>> things
>>> > more
>>> > > > complicated now. I would agree that the decisive factor for what
>>> goes
>>> > > into
>>> > > > Flink's binary distribution should be how core it is to Flink.
>>> > Meanwhile,
>>> > > > it's better to treat Flink API as a (core) core to Flink. Not only
>>> it
>>> > is
>>> > > a
>>> > > > very clear rule that easy to be followed but also in most cases,
>>> API is
>>> > > > very significant and deserved to be included in the dist.
>>> > > >
>>> > > > Given this, it might make sense to put flink-ml-api and
>>> flink-ml-lib
>>> > into
>>> > > > the opt.
>>> > > > What do you think?
>>> > > >
>>> > > > Best,
>>> > > > Hequn
>>> > > >
>>> > > > On Wed, Feb 5, 2020 at 12:39 AM Chesnay Schepler <
>>> [hidden email]>
>>> > > > wrote:
>>> > > >
>>> > > >> Around a year ago I started a discussion
>>> > > >> <
>>> > >
>>> >
>>> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/DISCUSS-Towards-a-leaner-flink-dist-tp25615.html
>>> > > >
>>> > > >> on reducing the amount of jars we ship with the distribution.
>>> > > >>
>>> > > >> While there was no definitive conclusion there was a shared
>>> sentiment
>>> > > >> that APIs should be shipped with the distribution.
>>> > > >>
>>> > > >> On 04/02/2020 17:25, Till Rohrmann wrote:
>>> > > >>
>>> > > >> I think there is no such rule that APIs go automatically into
>>> opt/ and
>>> > > >> "libraries" not. The contents of opt/ have mainly grown over time
>>> w/o
>>> > > >> following a strict rule.
>>> > > >>
>>> > > >> I think the decisive factor for what goes into Flink's binary
>>> > > distribution
>>> > > >> should be how core it is to Flink. Of course another important
>>> > > >> consideration is which use cases Flink should promote "out of the
>>> box"
>>> > > (not
>>> > > >> sure whether this is actual true for content shipped in opt/
>>> because
>>> > you
>>> > > >> also have to move it to lib).
>>> > > >>
>>> > > >> For example, Gelly would be an example which I would rather see
>>> as an
>>> > > >> optional component than shipping it with every Flink binary
>>> > > distribution.
>>> > > >>
>>> > > >> Cheers,
>>> > > >> Till
>>> > > >>
>>> > > >> On Tue, Feb 4, 2020 at 11:24 AM Becket Qin <[hidden email]>
>>> <
>>> > > [hidden email]> wrote:
>>> > > >>
>>> > > >>
>>> > > >> Thanks for the suggestion, Till.
>>> > > >>
>>> > > >> I am curious about how do we usually decide when to put the jars
>>> into
>>> > > the
>>> > > >> opt folder?
>>> > > >>
>>> > > >> Technically speaking, it seems that `flink-ml-api` should be put
>>> into
>>> > > the
>>> > > >> opt directory because they are actually API instead of libraries,
>>> just
>>> > > like
>>> > > >> CEP and Table.
>>> > > >>
>>> > > >> `flink-ml-lib` seems to be on the border. On one hand, it is a
>>> > library.
>>> > > On
>>> > > >> the other hand, unlike SQL formats and Hadoop whose major code are
>>> > > outside
>>> > > >> of Flink, the algorithm codes are in Flink. So `flink-ml-lib` is
>>> more
>>> > > like
>>> > > >> those of built-in SQL UDFs. So it seems fine to either put it in
>>> the
>>> > opt
>>> > > >> folder or in the downloads page.
>>> > > >>
>>> > > >> From the user experience perspective, it might be better to have
>>> both
>>> > > >> `flink-ml-lib` and `flink-ml-api` in opt folder so users needn't
>>> go to
>>> > > two
>>> > > >> places for the required dependencies.
>>> > > >>
>>> > > >> Thanks,
>>> > > >>
>>> > > >> Jiangjie (Becket) Qin
>>> > > >>
>>> > > >> On Tue, Feb 4, 2020 at 2:32 PM Hequn Cheng <[hidden email]> <
>>> > > [hidden email]> wrote:
>>> > > >>
>>> > > >>
>>> > > >> Hi Till,
>>> > > >>
>>> > > >> Thanks a lot for your suggestion. It's a good idea to offer the
>>> > flink-ml
>>> > > >> libraries as optional dependencies on the download page which can
>>> make
>>> > > >>
>>> > > >> the
>>> > > >>
>>> > > >> dist smaller.
>>> > > >>
>>> > > >> But I also have some concerns for it, e.g., the download page now
>>> only
>>> > > >> includes the latest 3 releases. We may need to find ways to
>>> support
>>> > more
>>> > > >> versions.
>>> > > >> On the other hand, the size of the flink-ml libraries now is very
>>> > > >> small(about 246K), so it would not bring much impact on the size
>>> of
>>> > > dist.
>>> > > >>
>>> > > >> What do you think?
>>> > > >>
>>> > > >> Best,
>>> > > >> Hequn
>>> > > >>
>>> > > >> On Mon, Feb 3, 2020 at 6:24 PM Till Rohrmann <
>>> [hidden email]> <
>>> > > [hidden email]>
>>> > > >>
>>> > > >> wrote:
>>> > > >>
>>> > > >> An alternative solution would be to offer the flink-ml libraries
>>> as
>>> > > >> optional dependencies on the download page. Similar to how we
>>> offer
>>> > the
>>> > > >> different SQL formats and Hadoop releases [1].
>>> > > >>
>>> > > >> [1] https://flink.apache.org/downloads.html
>>> > > >>
>>> > > >> Cheers,
>>> > > >> Till
>>> > > >>
>>> > > >> On Mon, Feb 3, 2020 at 10:19 AM Hequn Cheng <[hidden email]> <
>>> > > [hidden email]> wrote:
>>> > > >>
>>> > > >>
>>> > > >> Thank you all for your feedback and suggestions!
>>> > > >>
>>> > > >> Best, Hequn
>>> > > >>
>>> > > >> On Mon, Feb 3, 2020 at 5:07 PM Becket Qin <[hidden email]>
>>> <
>>> > > [hidden email]>
>>> > > >>
>>> > > >> wrote:
>>> > > >>
>>> > > >> Thanks for bringing up the discussion, Hequn.
>>> > > >>
>>> > > >> +1 on adding `flink-ml-api` and `flink-ml-lib` into opt. This
>>> would
>>> > > >>
>>> > > >> make
>>> > > >>
>>> > > >> it much easier for the users to try out some simple ml tasks.
>>> > > >>
>>> > > >> Thanks,
>>> > > >>
>>> > > >> Jiangjie (Becket) Qin
>>> > > >>
>>> > > >> On Mon, Feb 3, 2020 at 4:34 PM jincheng sun <
>>> > > >>
>>> > > >> [hidden email]
>>> > > >>
>>> > > >> wrote:
>>> > > >>
>>> > > >>
>>> > > >> Thank you for pushing forward @Hequn Cheng <[hidden email]> <
>>> > > [hidden email]> !
>>> > > >>
>>> > > >> Hi @Becket Qin <[hidden email]> <[hidden email]> ,
>>> Do
>>> > you
>>> > > have any concerns
>>> > > >>
>>> > > >> on
>>> > > >>
>>> > > >> this ?
>>> > > >>
>>> > > >> Best,
>>> > > >> Jincheng
>>> > > >>
>>> > > >> Hequn Cheng <[hidden email]> <[hidden email]> 于2020年2月3日周一
>>> > > 下午2:09写道：
>>> > > >>
>>> > > >>
>>> > > >> Hi everyone,
>>> > > >>
>>> > > >> Thanks for the feedback. As there are no objections, I've opened a
>>> > > >>
>>> > > >> JIRA
>>> > > >>
>>> > > >> issue(FLINK-15847[1]) to address this issue.
>>> > > >> The implementation details can be discussed in the issue or in the
>>> > > >> following PR.
>>> > > >>
>>> > > >> Best,
>>> > > >> Hequn
>>> > > >>
>>> > > >> [1] https://issues.apache.org/jira/browse/FLINK-15847
>>> > > >>
>>> > > >> On Wed, Jan 8, 2020 at 9:15 PM Hequn Cheng <[hidden email]>
>>> <
>>> > > [hidden email]>
>>> > > >>
>>> > > >> wrote:
>>> > > >>
>>> > > >> Hi Jincheng,
>>> > > >>
>>> > > >> Thanks a lot for your feedback!
>>> > > >> Yes, I agree with you. There are cases that multi jars need to
>>> > > >>
>>> > > >> be
>>> > > >>
>>> > > >> uploaded. I will prepare another discussion later. Maybe with a
>>> > > >>
>>> > > >> simple
>>> > > >>
>>> > > >> design doc.
>>> > > >>
>>> > > >> Best, Hequn
>>> > > >>
>>> > > >> On Wed, Jan 8, 2020 at 3:06 PM jincheng sun <
>>> > > >>
>>> > > >> [hidden email]>
>>> > > >>
>>> > > >> wrote:
>>> > > >>
>>> > > >>
>>> > > >> Thanks for bring up this discussion Hequn!
>>> > > >>
>>> > > >> +1 for include `flink-ml-api` and `flink-ml-lib` in opt.
>>> > > >>
>>> > > >> BTW: I think would be great if bring up a discussion for upload
>>> > > >>
>>> > > >> multiple
>>> > > >>
>>> > > >> Jars at the same time. as PyFlink JOB also can have the benefit
>>> > > >>
>>> > > >> if
>>> > > >>
>>> > > >> we
>>> > > >>
>>> > > >> do
>>> > > >>
>>> > > >> that improvement.
>>> > > >>
>>> > > >> Best,
>>> > > >> Jincheng
>>> > > >>
>>> > > >>
>>> > > >> Hequn Cheng <[hidden email]> <[hidden email]>
>>> > 于2020年1月8日周三
>>> > > 上午11:50写道：
>>> > > >>
>>> > > >>
>>> > > >> Hi everyone,
>>> > > >>
>>> > > >> FLIP-39[1] rebuilds Flink ML pipeline on top of TableAPI
>>> > > >>
>>> > > >> which
>>> > > >>
>>> > > >> moves
>>> > > >>
>>> > > >> Flink
>>> > > >>
>>> > > >> ML a step further. Base on it, users can develop their ML
>>> > > >>
>>> > > >> jobs
>>> > > >>
>>> > > >> and
>>> > > >>
>>> > > >> more
>>> > > >>
>>> > > >> and
>>> > > >>
>>> > > >> more machine learning platforms are providing ML services.
>>> > > >>
>>> > > >> However, the problem now is the jars of flink-ml-api and
>>> > > >>
>>> > > >> flink-ml-lib
>>> > > >>
>>> > > >> are
>>> > > >>
>>> > > >> only exist on maven repo. Whenever users want to submit ML
>>> > > >>
>>> > > >> jobs,
>>> > > >>
>>> > > >> they
>>> > > >>
>>> > > >> can
>>> > > >>
>>> > > >> only depend on the ml modules and package a fat jar. This
>>> > > >>
>>> > > >> would be
>>> > > >>
>>> > > >> inconvenient especially for the machine learning platforms on
>>> > > >>
>>> > > >> which
>>> > > >>
>>> > > >> nearly
>>> > > >>
>>> > > >> all jobs depend on Flink ML modules and have to package a fat
>>> > > >>
>>> > > >> jar.
>>> > > >>
>>> > > >> Given this, it would be better to include jars of
>>> > > >>
>>> > > >> flink-ml-api
>>> > > >>
>>> > > >> and
>>> > > >>
>>> > > >> flink-ml-lib in the `opt` folder, so that users can directly
>>> > > >>
>>> > > >> use
>>> > > >>
>>> > > >> the
>>> > > >>
>>> > > >> jars
>>> > > >>
>>> > > >> with the binary release. For example, users can move the jars
>>> > > >>
>>> > > >> into
>>> > > >>
>>> > > >> the
>>> > > >>
>>> > > >> `lib` folder or use -j to upload the jars. (Currently, -j
>>> > > >>
>>> > > >> only
>>> > > >>
>>> > > >> support
>>> > > >>
>>> > > >> upload one jar. Supporting multi jars for -j can be discussed
>>> > > >>
>>> > > >> in
>>> > > >>
>>> > > >> another
>>> > > >>
>>> > > >> discussion.)
>>> > > >>
>>> > > >> Putting the jars in the `opt` folder instead of the `lib`
>>> > > >>
>>> > > >> folder
>>> > > >>
>>> > > >> is
>>> > > >>
>>> > > >> because
>>> > > >>
>>> > > >> currently, the ml jars are still optional for the Flink
>>> > > >>
>>> > > >> project by
>>> > > >>
>>> > > >> default.
>>> > > >>
>>> > > >> What do you think? Welcome any feedback!
>>> > > >>
>>> > > >> Best,
>>> > > >>
>>> > > >> Hequn
>>> > > >>
>>> > > >> [1]
>>> > > >>
>>> > > >>
>>> > > >>
>>> > > >>
>>> > >
>>> >
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-39+Flink+ML+pipeline+and+ML+libs
>>> > > >>
>>> > > >>
>>> > > >>
>>> > >
>>> >
>>> >
>>> > --
>>> > Best Regards
>>> >
>>> > Jeff Zhang
>>> >
>>>
>>

Hequn Cheng-2

Re: [DISCUSS] Include flink-ml-api and flink-ml-lib in opt

Hi Rong,

That's great! Looking forward to your feedback.

Thanks,
Hequn

On Tue, Feb 11, 2020 at 1:06 AM Rong Rong <[hidden email]> wrote:

> Yes. I think the argument is fairly valid - we can always adjust the API
> in the future, in fact most of the APIs are labeled publicEvolving at this
> moment.
> I was only trying to provide the info, that the interfaces in flink-ml-api
> might change in the near future, for others when voting.
>
> In fact, I am actually always +1 on moving flink-ml-api to /opt :-)
> Regarding the Python ML API. sorry for not noticing it earlier as I
> haven't given it a deep look yet. will do very soon!
>
> --
> Rong
>
> On Sun, Feb 9, 2020 at 7:33 PM Hequn Cheng <[hidden email]> wrote:
>
>> Hi Rong,
>>
>> Thanks a lot for joining the discussion!
>>
>> It would be great if we can have a long term plan. My intention is to
>> provide a way for users to add dependencies of Flink ML, either through the
>> opt or download page. This would be more and more critical along with the
>> improvement of the Flink ML, as you said there are multiple PRs under
>> review and I'm also going to support Python Pipeline API recently[1].
>>
>> Meanwhile, it also makes sense to include the API into the opt, so it
>> would probably not break the long term plan.
>> However, even find something wrong in the future, we can revisit this
>> easily instead of blocking the improvement for users. What do you think?
>>
>> Best,
>> Hequn
>>
>> [1]
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Support-Python-ML-Pipeline-API-td37291.html
>>
>> On Sat, Feb 8, 2020 at 1:57 AM Rong Rong <[hidden email]> wrote:
>>
>>> CC @Xu Yang <[hidden email]>
>>>
>>> Thanks for starting the discussion @Hequn Cheng <[hidden email]> and
>>> sorry for joining the discussion late.
>>>
>>> I've mainly helped merging the code in flink-ml-api and flink-ml-lib in
>>> the past several months.
>>> IMO the flink-ml-api are an extension on top of the table API and agree
>>> that it should be treated as a part of the "core" core.
>>>
>>> However, I think given the fact that there are multiple PRs still under
>>> review [1], is it a better idea to come up with a long term plan first
>>> before make the decision to moving it to /opt now?
>>>
>>>
>>> --
>>> Rong
>>>
>>> [1]
>>> https://github.com/apache/flink/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Aopen+label%3Acomponent%3DLibrary%2FMachineLearning+
>>>
>>> On Fri, Feb 7, 2020 at 5:54 AM Hequn Cheng <[hidden email]> wrote:
>>>
>>>> Hi,
>>>>
>>>> @Till Rohrmann <[hidden email]> Thanks for the great inputs. I
>>>> agree
>>>> with you that we should have a long term plan for this. It definitely
>>>> deserves another discussion.
>>>> @Jeff Zhang <[hidden email]> Thanks for your reports and ideas. It's
>>>> a
>>>> good idea to improve the error messages. Do we have any JIRAs for it or
>>>> maybe we can create one for it.
>>>>
>>>> Thank you again for your feedback and suggestions. I will go on with the
>>>> PR. Thanks!
>>>>
>>>> Best,
>>>> Hequn
>>>>
>>>> On Thu, Feb 6, 2020 at 11:51 PM Jeff Zhang <[hidden email]> wrote:
>>>>
>>>> > I have another concern which may not be closely related to this
>>>> thread.
>>>> > Since flink doesn't include all the necessary jars, I think it is
>>>> critical
>>>> > for flink to display meaningful error message when any class is
>>>> missing.
>>>> > e.g. Here's the error message when I use kafka but miss
>>>> > including flink-json. To be honest, the kind of error message is
>>>> hard to
>>>> > understand for new users.
>>>> >
>>>> >
>>>> > Reason: No factory implements
>>>> > 'org.apache.flink.table.factories.DeserializationSchemaFactory'. The
>>>> > following properties are requested:
>>>> > connector.properties.bootstrap.servers=localhost:9092
>>>> > connector.properties.group.id=testGroup
>>>> > connector.properties.zookeeper.connect=localhost:2181
>>>> > connector.startup-mode=earliest-offset
>>>> connector.topic=generated.events
>>>> > connector.type=kafka connector.version=universal format.type=json
>>>> > schema.0.data-type=VARCHAR(2147483647) schema.0.name=status
>>>> > schema.1.data-type=VARCHAR(2147483647) schema.1.name=direction
>>>> > schema.2.data-type=BIGINT schema.2.name=event_ts update-mode=append
>>>> The
>>>> > following factories have been considered:
>>>> > org.apache.flink.table.catalog.hive.factories.HiveCatalogFactory
>>>> > org.apache.flink.table.module.hive.HiveModuleFactory
>>>> > org.apache.flink.table.module.CoreModuleFactory
>>>> > org.apache.flink.table.catalog.GenericInMemoryCatalogFactory
>>>> > org.apache.flink.table.sources.CsvBatchTableSourceFactory
>>>> > org.apache.flink.table.sources.CsvAppendTableSourceFactory
>>>> > org.apache.flink.table.sinks.CsvBatchTableSinkFactory
>>>> > org.apache.flink.table.sinks.CsvAppendTableSinkFactory
>>>> > org.apache.flink.table.planner.delegation.BlinkPlannerFactory
>>>> > org.apache.flink.table.planner.delegation.BlinkExecutorFactory
>>>> > org.apache.flink.table.planner.StreamPlannerFactory
>>>> > org.apache.flink.table.executor.StreamExecutorFactory
>>>> >
>>>> org.apache.flink.streaming.connectors.kafka.KafkaTableSourceSinkFactory at
>>>> >
>>>> >
>>>> org.apache.flink.table.factories.TableFactoryService.filterByFactoryClass(TableFactoryService.java:238)
>>>> > at
>>>> >
>>>> >
>>>> org.apache.flink.table.factories.TableFactoryService.filter(TableFactoryService.java:185)
>>>> > at
>>>> >
>>>> >
>>>> org.apache.flink.table.factories.TableFactoryService.findSingleInternal(TableFactoryService.java:143)
>>>> > at
>>>> >
>>>> >
>>>> org.apache.flink.table.factories.TableFactoryService.find(TableFactoryService.java:113)
>>>> > at
>>>> >
>>>> >
>>>> org.apache.flink.streaming.connectors.kafka.KafkaTableSourceSinkFactoryBase.getDeserializationSchema(KafkaTableSourceSinkFactoryBase.java:277)
>>>> > at
>>>> >
>>>> >
>>>> org.apache.flink.streaming.connectors.kafka.KafkaTableSourceSinkFactoryBase.createStreamTableSource(KafkaTableSourceSinkFactoryBase.java:161)
>>>> > at
>>>> >
>>>> >
>>>> org.apache.flink.table.factories.StreamTableSourceFactory.createTableSource(StreamTableSourceFactory.java:49)
>>>> > at
>>>> >
>>>> >
>>>> org.apache.flink.table.factories.TableFactoryUtil.findAndCreateTableSource(TableFactoryUtil.java:53)
>>>> > ... 36 more
>>>> >
>>>> >
>>>> >
>>>> > Till Rohrmann <[hidden email]> 于2020年2月6日周四下午11:30写道：
>>>> >
>>>> > > I would not object given that it is rather small at the moment.
>>>> However,
>>>> > I
>>>> > > also think that we should have a plan how to handle the ever growing
>>>> > Flink
>>>> > > ecosystem and how to make it easily accessible to our users. E.g.
>>>> one far
>>>> > > fetched idea could be something like a configuration script which
>>>> > downloads
>>>> > > the required components for the user. But this deserves definitely a
>>>> > > separate discussion and does not really belong here.
>>>> > >
>>>> > > Cheers,
>>>> > > Till
>>>> > >
>>>> > > On Thu, Feb 6, 2020 at 3:35 PM Hequn Cheng <[hidden email]>
>>>> wrote:
>>>> > >
>>>> > > >
>>>> > > > Hi everyone,
>>>> > > >
>>>> > > > Thank you all for the great inputs!
>>>> > > >
>>>> > > > I think probably what we all agree on is we should try to make a
>>>> leaner
>>>> > > > flink-dist. However, we may also need to do some compromises
>>>> > considering
>>>> > > > the user experience that users don't need to download the
>>>> dependencies
>>>> > > from
>>>> > > > different places. Otherwise, we can move all the jars in the
>>>> current
>>>> > opt
>>>> > > > folder to the download page.
>>>> > > >
>>>> > > > The missing of clear rules for guiding such compromises makes
>>>> things
>>>> > more
>>>> > > > complicated now. I would agree that the decisive factor for what
>>>> goes
>>>> > > into
>>>> > > > Flink's binary distribution should be how core it is to Flink.
>>>> > Meanwhile,
>>>> > > > it's better to treat Flink API as a (core) core to Flink. Not
>>>> only it
>>>> > is
>>>> > > a
>>>> > > > very clear rule that easy to be followed but also in most cases,
>>>> API is
>>>> > > > very significant and deserved to be included in the dist.
>>>> > > >
>>>> > > > Given this, it might make sense to put flink-ml-api and
>>>> flink-ml-lib
>>>> > into
>>>> > > > the opt.
>>>> > > > What do you think?
>>>> > > >
>>>> > > > Best,
>>>> > > > Hequn
>>>> > > >
>>>> > > > On Wed, Feb 5, 2020 at 12:39 AM Chesnay Schepler <
>>>> [hidden email]>
>>>> > > > wrote:
>>>> > > >
>>>> > > >> Around a year ago I started a discussion
>>>> > > >> <
>>>> > >
>>>> >
>>>> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/DISCUSS-Towards-a-leaner-flink-dist-tp25615.html
>>>> > > >
>>>> > > >> on reducing the amount of jars we ship with the distribution.
>>>> > > >>
>>>> > > >> While there was no definitive conclusion there was a shared
>>>> sentiment
>>>> > > >> that APIs should be shipped with the distribution.
>>>> > > >>
>>>> > > >> On 04/02/2020 17:25, Till Rohrmann wrote:
>>>> > > >>
>>>> > > >> I think there is no such rule that APIs go automatically into
>>>> opt/ and
>>>> > > >> "libraries" not. The contents of opt/ have mainly grown over
>>>> time w/o
>>>> > > >> following a strict rule.
>>>> > > >>
>>>> > > >> I think the decisive factor for what goes into Flink's binary
>>>> > > distribution
>>>> > > >> should be how core it is to Flink. Of course another important
>>>> > > >> consideration is which use cases Flink should promote "out of
>>>> the box"
>>>> > > (not
>>>> > > >> sure whether this is actual true for content shipped in opt/
>>>> because
>>>> > you
>>>> > > >> also have to move it to lib).
>>>> > > >>
>>>> > > >> For example, Gelly would be an example which I would rather see
>>>> as an
>>>> > > >> optional component than shipping it with every Flink binary
>>>> > > distribution.
>>>> > > >>
>>>> > > >> Cheers,
>>>> > > >> Till
>>>> > > >>
>>>> > > >> On Tue, Feb 4, 2020 at 11:24 AM Becket Qin <[hidden email]>
>>>> <
>>>> > > [hidden email]> wrote:
>>>> > > >>
>>>> > > >>
>>>> > > >> Thanks for the suggestion, Till.
>>>> > > >>
>>>> > > >> I am curious about how do we usually decide when to put the jars
>>>> into
>>>> > > the
>>>> > > >> opt folder?
>>>> > > >>
>>>> > > >> Technically speaking, it seems that `flink-ml-api` should be put
>>>> into
>>>> > > the
>>>> > > >> opt directory because they are actually API instead of
>>>> libraries, just
>>>> > > like
>>>> > > >> CEP and Table.
>>>> > > >>
>>>> > > >> `flink-ml-lib` seems to be on the border. On one hand, it is a
>>>> > library.
>>>> > > On
>>>> > > >> the other hand, unlike SQL formats and Hadoop whose major code
>>>> are
>>>> > > outside
>>>> > > >> of Flink, the algorithm codes are in Flink. So `flink-ml-lib` is
>>>> more
>>>> > > like
>>>> > > >> those of built-in SQL UDFs. So it seems fine to either put it in
>>>> the
>>>> > opt
>>>> > > >> folder or in the downloads page.
>>>> > > >>
>>>> > > >> From the user experience perspective, it might be better to have
>>>> both
>>>> > > >> `flink-ml-lib` and `flink-ml-api` in opt folder so users needn't
>>>> go to
>>>> > > two
>>>> > > >> places for the required dependencies.
>>>> > > >>
>>>> > > >> Thanks,
>>>> > > >>
>>>> > > >> Jiangjie (Becket) Qin
>>>> > > >>
>>>> > > >> On Tue, Feb 4, 2020 at 2:32 PM Hequn Cheng <[hidden email]> <
>>>> > > [hidden email]> wrote:
>>>> > > >>
>>>> > > >>
>>>> > > >> Hi Till,
>>>> > > >>
>>>> > > >> Thanks a lot for your suggestion. It's a good idea to offer the
>>>> > flink-ml
>>>> > > >> libraries as optional dependencies on the download page which
>>>> can make
>>>> > > >>
>>>> > > >> the
>>>> > > >>
>>>> > > >> dist smaller.
>>>> > > >>
>>>> > > >> But I also have some concerns for it, e.g., the download page
>>>> now only
>>>> > > >> includes the latest 3 releases. We may need to find ways to
>>>> support
>>>> > more
>>>> > > >> versions.
>>>> > > >> On the other hand, the size of the flink-ml libraries now is very
>>>> > > >> small(about 246K), so it would not bring much impact on the size
>>>> of
>>>> > > dist.
>>>> > > >>
>>>> > > >> What do you think?
>>>> > > >>
>>>> > > >> Best,
>>>> > > >> Hequn
>>>> > > >>
>>>> > > >> On Mon, Feb 3, 2020 at 6:24 PM Till Rohrmann <
>>>> [hidden email]> <
>>>> > > [hidden email]>
>>>> > > >>
>>>> > > >> wrote:
>>>> > > >>
>>>> > > >> An alternative solution would be to offer the flink-ml libraries
>>>> as
>>>> > > >> optional dependencies on the download page. Similar to how we
>>>> offer
>>>> > the
>>>> > > >> different SQL formats and Hadoop releases [1].
>>>> > > >>
>>>> > > >> [1] https://flink.apache.org/downloads.html
>>>> > > >>
>>>> > > >> Cheers,
>>>> > > >> Till
>>>> > > >>
>>>> > > >> On Mon, Feb 3, 2020 at 10:19 AM Hequn Cheng <[hidden email]> <
>>>> > > [hidden email]> wrote:
>>>> > > >>
>>>> > > >>
>>>> > > >> Thank you all for your feedback and suggestions!
>>>> > > >>
>>>> > > >> Best, Hequn
>>>> > > >>
>>>> > > >> On Mon, Feb 3, 2020 at 5:07 PM Becket Qin <[hidden email]>
>>>> <
>>>> > > [hidden email]>
>>>> > > >>
>>>> > > >> wrote:
>>>> > > >>
>>>> > > >> Thanks for bringing up the discussion, Hequn.
>>>> > > >>
>>>> > > >> +1 on adding `flink-ml-api` and `flink-ml-lib` into opt. This
>>>> would
>>>> > > >>
>>>> > > >> make
>>>> > > >>
>>>> > > >> it much easier for the users to try out some simple ml tasks.
>>>> > > >>
>>>> > > >> Thanks,
>>>> > > >>
>>>> > > >> Jiangjie (Becket) Qin
>>>> > > >>
>>>> > > >> On Mon, Feb 3, 2020 at 4:34 PM jincheng sun <
>>>> > > >>
>>>> > > >> [hidden email]
>>>> > > >>
>>>> > > >> wrote:
>>>> > > >>
>>>> > > >>
>>>> > > >> Thank you for pushing forward @Hequn Cheng <[hidden email]> <
>>>> > > [hidden email]> !
>>>> > > >>
>>>> > > >> Hi @Becket Qin <[hidden email]> <[hidden email]> ,
>>>> Do
>>>> > you
>>>> > > have any concerns
>>>> > > >>
>>>> > > >> on
>>>> > > >>
>>>> > > >> this ?
>>>> > > >>
>>>> > > >> Best,
>>>> > > >> Jincheng
>>>> > > >>
>>>> > > >> Hequn Cheng <[hidden email]> <[hidden email]> 于2020年2月3日周一
>>>> > > 下午2:09写道：
>>>> > > >>
>>>> > > >>
>>>> > > >> Hi everyone,
>>>> > > >>
>>>> > > >> Thanks for the feedback. As there are no objections, I've opened
>>>> a
>>>> > > >>
>>>> > > >> JIRA
>>>> > > >>
>>>> > > >> issue(FLINK-15847[1]) to address this issue.
>>>> > > >> The implementation details can be discussed in the issue or in
>>>> the
>>>> > > >> following PR.
>>>> > > >>
>>>> > > >> Best,
>>>> > > >> Hequn
>>>> > > >>
>>>> > > >> [1] https://issues.apache.org/jira/browse/FLINK-15847
>>>> > > >>
>>>> > > >> On Wed, Jan 8, 2020 at 9:15 PM Hequn Cheng <[hidden email]>
>>>> <
>>>> > > [hidden email]>
>>>> > > >>
>>>> > > >> wrote:
>>>> > > >>
>>>> > > >> Hi Jincheng,
>>>> > > >>
>>>> > > >> Thanks a lot for your feedback!
>>>> > > >> Yes, I agree with you. There are cases that multi jars need to
>>>> > > >>
>>>> > > >> be
>>>> > > >>
>>>> > > >> uploaded. I will prepare another discussion later. Maybe with a
>>>> > > >>
>>>> > > >> simple
>>>> > > >>
>>>> > > >> design doc.
>>>> > > >>
>>>> > > >> Best, Hequn
>>>> > > >>
>>>> > > >> On Wed, Jan 8, 2020 at 3:06 PM jincheng sun <
>>>> > > >>
>>>> > > >> [hidden email]>
>>>> > > >>
>>>> > > >> wrote:
>>>> > > >>
>>>> > > >>
>>>> > > >> Thanks for bring up this discussion Hequn!
>>>> > > >>
>>>> > > >> +1 for include `flink-ml-api` and `flink-ml-lib` in opt.
>>>> > > >>
>>>> > > >> BTW: I think would be great if bring up a discussion for upload
>>>> > > >>
>>>> > > >> multiple
>>>> > > >>
>>>> > > >> Jars at the same time. as PyFlink JOB also can have the benefit
>>>> > > >>
>>>> > > >> if
>>>> > > >>
>>>> > > >> we
>>>> > > >>
>>>> > > >> do
>>>> > > >>
>>>> > > >> that improvement.
>>>> > > >>
>>>> > > >> Best,
>>>> > > >> Jincheng
>>>> > > >>
>>>> > > >>
>>>> > > >> Hequn Cheng <[hidden email]> <[hidden email]>
>>>> > 于2020年1月8日周三
>>>> > > 上午11:50写道：
>>>> > > >>
>>>> > > >>
>>>> > > >> Hi everyone,
>>>> > > >>
>>>> > > >> FLIP-39[1] rebuilds Flink ML pipeline on top of TableAPI
>>>> > > >>
>>>> > > >> which
>>>> > > >>
>>>> > > >> moves
>>>> > > >>
>>>> > > >> Flink
>>>> > > >>
>>>> > > >> ML a step further. Base on it, users can develop their ML
>>>> > > >>
>>>> > > >> jobs
>>>> > > >>
>>>> > > >> and
>>>> > > >>
>>>> > > >> more
>>>> > > >>
>>>> > > >> and
>>>> > > >>
>>>> > > >> more machine learning platforms are providing ML services.
>>>> > > >>
>>>> > > >> However, the problem now is the jars of flink-ml-api and
>>>> > > >>
>>>> > > >> flink-ml-lib
>>>> > > >>
>>>> > > >> are
>>>> > > >>
>>>> > > >> only exist on maven repo. Whenever users want to submit ML
>>>> > > >>
>>>> > > >> jobs,
>>>> > > >>
>>>> > > >> they
>>>> > > >>
>>>> > > >> can
>>>> > > >>
>>>> > > >> only depend on the ml modules and package a fat jar. This
>>>> > > >>
>>>> > > >> would be
>>>> > > >>
>>>> > > >> inconvenient especially for the machine learning platforms on
>>>> > > >>
>>>> > > >> which
>>>> > > >>
>>>> > > >> nearly
>>>> > > >>
>>>> > > >> all jobs depend on Flink ML modules and have to package a fat
>>>> > > >>
>>>> > > >> jar.
>>>> > > >>
>>>> > > >> Given this, it would be better to include jars of
>>>> > > >>
>>>> > > >> flink-ml-api
>>>> > > >>
>>>> > > >> and
>>>> > > >>
>>>> > > >> flink-ml-lib in the `opt` folder, so that users can directly
>>>> > > >>
>>>> > > >> use
>>>> > > >>
>>>> > > >> the
>>>> > > >>
>>>> > > >> jars
>>>> > > >>
>>>> > > >> with the binary release. For example, users can move the jars
>>>> > > >>
>>>> > > >> into
>>>> > > >>
>>>> > > >> the
>>>> > > >>
>>>> > > >> `lib` folder or use -j to upload the jars. (Currently, -j
>>>> > > >>
>>>> > > >> only
>>>> > > >>
>>>> > > >> support
>>>> > > >>
>>>> > > >> upload one jar. Supporting multi jars for -j can be discussed
>>>> > > >>
>>>> > > >> in
>>>> > > >>
>>>> > > >> another
>>>> > > >>
>>>> > > >> discussion.)
>>>> > > >>
>>>> > > >> Putting the jars in the `opt` folder instead of the `lib`
>>>> > > >>
>>>> > > >> folder
>>>> > > >>
>>>> > > >> is
>>>> > > >>
>>>> > > >> because
>>>> > > >>
>>>> > > >> currently, the ml jars are still optional for the Flink
>>>> > > >>
>>>> > > >> project by
>>>> > > >>
>>>> > > >> default.
>>>> > > >>
>>>> > > >> What do you think? Welcome any feedback!
>>>> > > >>
>>>> > > >> Best,
>>>> > > >>
>>>> > > >> Hequn
>>>> > > >>
>>>> > > >> [1]
>>>> > > >>
>>>> > > >>
>>>> > > >>
>>>> > > >>
>>>> > >
>>>> >
>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-39+Flink+ML+pipeline+and+ML+libs
>>>> > > >>
>>>> > > >>
>>>> > > >>
>>>> > >
>>>> >
>>>> >
>>>> > --
>>>> > Best Regards
>>>> >
>>>> > Jeff Zhang
>>>> >
>>>>
>>>