[DISCUSS] FLIP-39: Flink ML pipeline and ML libs

classic Classic list List threaded Threaded
21 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-39: Flink ML pipeline and ML libs

Shuiqiang Chen
Hi Robert,

Thank you for your reminding! I have added the wiki page[1] for this FLIP.

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-39+Flink+ML+pipeline+and+ML+libs

Robert Metzger <[hidden email]> 于2019年8月14日周三 下午5:56写道:

> It seems that this FLIP doesn't have a Wiki page yet [1], even though it is
> already partially implemented [2]
> We should try to stick more to the FLIP process to manage the project more
> efficiently.
>
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals
> [2] https://issues.apache.org/jira/browse/FLINK-12470
>
> On Mon, Jun 17, 2019 at 12:27 PM Gen Luo <[hidden email]> wrote:
>
> > Hi all,
> >
> > In the review of PR for FLINK-12473, there were a few comments regarding
> > pipeline exportation. We would like to start a follow up discussions to
> > address some related comments.
> >
> > Currently, FLIP-39 proposal gives a way for users to persist a pipeline
> in
> > JSON format. But it does not specify how users can export a pipeline for
> > serving purpose. We summarized some thoughts on this in the following
> doc.
> >
> >
> >
> https://docs.google.com/document/d/1B84b-1CvOXtwWQ6_tQyiaHwnSeiRqh-V96Or8uHqCp8/edit?usp=sharing
> >
> > After we reach consensus on the pipeline exportation, we will add a
> > corresponding section in FLIP-39.
> >
> >
> > Shaoxuan Wang <[hidden email]> 于2019年6月5日周三 上午8:47写道:
> >
> > > Stavros,
> > > They have the similar logic concept, but the implementation details are
> > > quite different. It is hard to migrate the interface with different
> > > implementations. The built-in algorithms are useful legacy that we will
> > > consider migrate to the new API (but still with different
> > implementations).
> > > BTW, the new API has already been merged via FLINK-12473.
> > >
> > > Thanks,
> > > Shaoxuan
> > >
> > >
> > >
> > > On Mon, Jun 3, 2019 at 6:08 PM Stavros Kontopoulos <
> > > [hidden email]>
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > Some portion of the code could be migrated to the new Table API no?
> > > > I am saying that because the new API design is based on scikit-learn
> > and
> > > > the old one was also inspired by it.
> > > >
> > > > Best,
> > > > Stavros
> > > > On Wed, May 22, 2019 at 1:24 PM Shaoxuan Wang <[hidden email]>
> > > wrote:
> > > >
> > > > > Another consensus (from the offline discussion) is that we will
> > > > > delete/deprecate flink-libraries/flink-ml. I have started a survey
> > and
> > > > > discussion [1] in dev/user-ml to collect the feedback. Depending on
> > the
> > > > > replies, we will decide if we shall delete it in Flink1.9 or
> > > > > deprecate&delete in the next release after 1.9.
> > > > >
> > > > > [1]
> > > > >
> > > > >
> > > >
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/SURVEY-Usage-of-flink-ml-and-DISCUSS-Delete-flink-ml-td29057.html
> > > > >
> > > > > Regards,
> > > > > Shaoxuan
> > > > >
> > > > >
> > > > > On Tue, May 21, 2019 at 9:22 PM Gen Luo <[hidden email]>
> wrote:
> > > > >
> > > > > > Yes, this is our conclusion. I'd like to add only one point that
> > > > > > registering user defined aggregator is also needed which is
> > currently
> > > > > > provided by 'bridge' and finally will be merged into Table API.
> > It's
> > > > same
> > > > > > with collect().
> > > > > >
> > > > > > I will add a TableEnvironment argument in Estimator.fit() and
> > > > > > Transformer.transform() to get rid of the dependency on
> > > > > > flink-table-planner. This will be committed soon.
> > > > > >
> > > > > > Aljoscha Krettek <[hidden email]> 于2019年5月21日周二 下午7:31写道:
> > > > > >
> > > > > > > We discussed this in private and came to the conclusion that we
> > > > should
> > > > > > > (for now) have the dependency on flink-table-api-xxx-bridge
> > because
> > > > we
> > > > > > need
> > > > > > > access to the collect() method, which is not yet available in
> the
> > > > Table
> > > > > > > API. Once that is available the code can be refactored but for
> > now
> > > we
> > > > > > want
> > > > > > > to unblock work on this new module.
> > > > > > >
> > > > > > > We also agreed that we don’t need a direct dependency on
> > > > > > > flink-table-planner.
> > > > > > >
> > > > > > > I hope I summarised our discussion correctly.
> > > > > > >
> > > > > > > > On 17. May 2019, at 12:20, Gen Luo <[hidden email]>
> > wrote:
> > > > > > > >
> > > > > > > > Thanks for your reply.
> > > > > > > >
> > > > > > > > For the first question, it's not strictly necessary. But I
> > perfer
> > > > not
> > > > > > to
> > > > > > > > have a TableEnvironment argument in Estimator.fit() or
> > > > > > > > Transformer.transform(), which is not part of machine
> learning
> > > > > concept,
> > > > > > > and
> > > > > > > > may make our API not as clean and pretty as other systems
> do. I
> > > > would
> > > > > > > like
> > > > > > > > another way other than introducing flink-table-planner to do
> > > this.
> > > > If
> > > > > > > it's
> > > > > > > > impossible or severely opposed, I may make the concession to
> > add
> > > > the
> > > > > > > > argument.
> > > > > > > >
> > > > > > > > Other than that, "flink-table-api-xxx-bridge"s are still
> > needed.
> > > A
> > > > > vary
> > > > > > > > common case is that an algorithm needs to guarantee that it's
> > > > running
> > > > > > > under
> > > > > > > > a BatchTableEnvironment, which makes it possible to collect
> > > result
> > > > > each
> > > > > > > > iteration. A typical algorithm like this is ALS. By flink1.8,
> > > this
> > > > > can
> > > > > > be
> > > > > > > > only achieved by converting Table to DataSet than call
> > > > > > DataSet.collect(),
> > > > > > > > which is available in flink-table-api-xxx-bridge. Besides,
> > > > > registering
> > > > > > > > UDAGG is also depending on it.
> > > > > > > >
> > > > > > > > In conclusion, '"planner" can be removed from dependencies
> but
> > > > > > > introducing
> > > > > > > > "bridge"s are inevitable. Whether and how to acquire
> > > > TableEnvironment
> > > > > > > from
> > > > > > > > a Table can be discussed.
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
12