(DEPRECATED) Apache Flink Mailing List archive.

Machine Learning on Flink - Next steps

Classic

List

Threaded

25 messages Options

Theodore Vasiloudis

Re: Machine Learning on Flink - Next steps

Hello all,

I've updated the original Gdoc [1] to include a table with coordinators
and people interested in contributing to the specific projects. With
this latest additions we have many people willing to contribute to
the online learning library, and 2 people who have shown interested
to at least one of the other projects. Feel free to reassign yourself
if you feel like it, these are all indicative of intention anyway, not
commitments (except for the coordinators).

I don't think I'll have the time to set up the online learning doc this
week,
if anyone would like to jump ahead and do that feel free.
Gabor has already started it for the "fast-batch" project, and Stavros has
started with the model serving project as well :)

@Ventura: I would love to see the design principles and abstractions you
have
created for that project, let us know if there is anything you can share
now.

Regards,
Theodore

[1]
https://docs.google.com/document/d/1afQbvZBTV15qF3vobVWUjxQc49h3Ud06MIRhahtJ6dw/edit?usp=sharing

On Mon, Mar 20, 2017 at 3:56 PM, Gábor Hermann <[hidden email]>
wrote:

> Hi all,
>
> @Theodore:
> +1 for the CTR use-case. Thanks for the suggestion!
>
> @Katherin:
> +1 for reflecting the choices made here and contributor commitment in Gdoc.
>
> @Tao, @Ventura:
> It's great to here you have been working on ML on Flink :)
> I hope we can all aggregate our efforts somehow. It would be best if you
> could contribute some of your work.
>
>
> I've started putting together a Gdoc specifically for *Offline/incremental
> learning on Streaming API*:
> https://docs.google.com/document/d/18BqoFTQ0dPkbyO-PWBMMpW5N
> l0pjobSubnWpW0_r8yA/
> Right now you can comment/give suggestions there. I'd like to start a
> separate mailing list discussion as soon as there are enough contributors
> volunteering for this direction. For now, I'm trying to reflect the
> relevant part of the discussion here and the initial Gdoc [1].
>
>
> [1] https://docs.google.com/document/d/1afQbvZBTV15qF3vobVWUjxQc
> 49h3Ud06MIRhahtJ6dw/
>
> Cheers,
> Gabor
>
>
> On 2017-03-20 14:27, Ventura Del Monte wrote:
>
> Hello everyone,
>>
>> Here at DFKI, we are currently working on project that involves developing
>> open-source Online Machine Learning algorithms on top of Flink.
>> So far, we have simple moments, sampling (e.g.: simple reservoir sampling)
>> and sketches (e.g., Frequent Directions) built on top of scikit-like
>> abstractions and Flink's DataStream/KeyedStream.
>> Moreover, we have few industrial use cases and we are gonna validate our
>> implementation using real industrial data.
>> We plan to implement more advanced algorithms in the future as well as to
>> share our results with you and contribute, in case you are interested.
>>
>> Best,
>> Ventura
>>
>>
>>
>>
>> This message, for the D. Lgs n. 196/2003 (Privacy Code), may contain
>> confidential and/or privileged information. If you are not the addressee
>> or
>> authorized to receive this for the addressee, you must not use, copy,
>> disclose or take any action based on this message or any information
>> herein. If you have received this message in error, please advise the
>> sender immediately by reply e-mail and delete this message. Thank you for
>> your cooperation.
>>
>> On Mon, Mar 20, 2017 at 12:26 PM, Tao Meng <[hidden email]> wrote:
>>
>> Hi All,
>>>
>>> Sorry for joining this discussion late.
>>> My graduation thesis is about online learning system. I would build it on
>>> flink in the next three months.
>>>
>>> I'd like to contribute on:
>>> - Online learning
>>>
>>>
>>>
>>>
>>> On Mon, Mar 20, 2017 at 6:51 PM Katherin Eri <[hidden email]>
>>> wrote:
>>>
>>> Hello, Theodore
>>>
>>> Could you please move vectors of development and their prioritized
>>> positions from *## Executive summary* to the google doc?
>>>
>>>
>>>
>>> Could you please also create some table in google doc, that is
>>> representing
>>> the selected directions and persons, who would like to drive or
>>> participate
>>> in the particular topic, in order to make this process transparent for
>>> community and sum up current state of commitment of contributors?
>>>
>>> There we could simply inscribe us to some topic.
>>>
>>>
>>>
>>> And 1+ for CTR prediction case.
>>>
>>> вс, 19 мар. 2017 г. в 16:49, Theodore Vasiloudis <
>>> [hidden email]>:
>>>
>>> Hello Stavros,
>>>>
>>>> The way I thought we'd do it is that each shepherd would be responsible
>>>>
>>> for
>>>
>>>> organizing the project: that includes setting up a Google doc, sending
>>>> an
>>>> email to the dev list to inform the wider community, and if possible,
>>>> personally contacting the people who expressed interest in the project.
>>>>
>>>> Would you be willing to lead that effort for the model serving project?
>>>>
>>>> Regards,
>>>> Theodore
>>>>
>>>> --
>>>> Sent from a mobile device. May contain autocorrect errors.
>>>>
>>>> On Mar 19, 2017 3:49 AM, "Stavros Kontopoulos" <
>>>> [hidden email]
>>>>
>>>> wrote:
>>>>
>>>> Hi all...
>>>>>
>>>>> I agree about the tensorflow integration it seems to be important from
>>>>>
>>>> what
>>>>
>>>>> I hear.
>>>>> Should we sign up somewhere for the working groups (gdcos)?
>>>>> I would like to start helping with the model serving feature.
>>>>>
>>>>> Best Regards,
>>>>> Stavros
>>>>>
>>>>> On Fri, Mar 17, 2017 at 10:34 PM, Gábor Hermann <[hidden email]
>>>>> wrote:
>>>>>
>>>>> Hi Chen,
>>>>>>
>>>>>> Thanks for the input! :)
>>>>>>
>>>>>> There is already a project [1] for using TensorFlow models in Flink,
>>>>>>
>>>>> and
>>>>
>>>>> Theodore has suggested
>>>>>> to contact the author, Eron Wright for the model serving direction.
>>>>>>
>>>>>>
>>>>>> [1] http://sf.flink-forward.org/kb_sessions/introducing-flink-
>>>>>>
>>>>> tensorflow/
>>>>>
>>>>>> Cheers,
>>>>>> Gabor
>>>>>>
>>>>>>
>>>>>> On 2017-03-17 19:41, Chen Qin wrote:
>>>>>>
>>>>>> [1]http://sf.flink-forward.org/kb_sessions/introducing-flink-te
>>>>>>> nsorflow/
>>>>>>>
>>>>>>>
>>>>>> --
>>>
>>> *Yours faithfully, *
>>>
>>> *Kate Eri.*
>>>
>>>
>

Stavros Kontopoulos

Re: Machine Learning on Flink - Next steps

Hi all,

I added a document
<https://docs.google.com/document/d/1CjWL9aLxPrKytKxUF5c3ohs0ickp0fdEXPsPYPEywsE/edit?usp=sharing>
for the model serving effort. It is in draft mode feel free to drop a
comment.
Btw I have contacted Eron Wright and he is willing to help to that
direction, I will keep all updated.

Best,
Stavros

On Mon, Mar 20, 2017 at 7:20 PM, Theodore Vasiloudis <
[hidden email]> wrote:

> Hello all,
>
> I've updated the original Gdoc [1] to include a table with coordinators
> and people interested in contributing to the specific projects. With
> this latest additions we have many people willing to contribute to
> the online learning library, and 2 people who have shown interested
> to at least one of the other projects. Feel free to reassign yourself
> if you feel like it, these are all indicative of intention anyway, not
> commitments (except for the coordinators).
>
> I don't think I'll have the time to set up the online learning doc this
> week,
> if anyone would like to jump ahead and do that feel free.
> Gabor has already started it for the "fast-batch" project, and Stavros has
> started with the model serving project as well :)
>
> @Ventura: I would love to see the design principles and abstractions you
> have
> created for that project, let us know if there is anything you can share
> now.
>
> Regards,
> Theodore
>
>
> [1]
> https://docs.google.com/document/d/1afQbvZBTV15qF3vobVWUjxQc49h3U
> d06MIRhahtJ6dw/edit?usp=sharing
>
> On Mon, Mar 20, 2017 at 3:56 PM, Gábor Hermann <[hidden email]>
> wrote:
>
> > Hi all,
> >
> > @Theodore:
> > +1 for the CTR use-case. Thanks for the suggestion!
> >
> > @Katherin:
> > +1 for reflecting the choices made here and contributor commitment in
> Gdoc.
> >
> > @Tao, @Ventura:
> > It's great to here you have been working on ML on Flink :)
> > I hope we can all aggregate our efforts somehow. It would be best if you
> > could contribute some of your work.
> >
> >
> > I've started putting together a Gdoc specifically for
> *Offline/incremental
> > learning on Streaming API*:
> > https://docs.google.com/document/d/18BqoFTQ0dPkbyO-PWBMMpW5N
> > l0pjobSubnWpW0_r8yA/
> > Right now you can comment/give suggestions there. I'd like to start a
> > separate mailing list discussion as soon as there are enough contributors
> > volunteering for this direction. For now, I'm trying to reflect the
> > relevant part of the discussion here and the initial Gdoc [1].
> >
> >
> > [1] https://docs.google.com/document/d/1afQbvZBTV15qF3vobVWUjxQc
> > 49h3Ud06MIRhahtJ6dw/
> >
> > Cheers,
> > Gabor
> >
> >
> > On 2017-03-20 14:27, Ventura Del Monte wrote:
> >
> > Hello everyone,
> >>
> >> Here at DFKI, we are currently working on project that involves
> developing
> >> open-source Online Machine Learning algorithms on top of Flink.
> >> So far, we have simple moments, sampling (e.g.: simple reservoir
> sampling)
> >> and sketches (e.g., Frequent Directions) built on top of scikit-like
> >> abstractions and Flink's DataStream/KeyedStream.
> >> Moreover, we have few industrial use cases and we are gonna validate our
> >> implementation using real industrial data.
> >> We plan to implement more advanced algorithms in the future as well as
> to
> >> share our results with you and contribute, in case you are interested.
> >>
> >> Best,
> >> Ventura
> >>
> >>
> >>
> >>
> >> This message, for the D. Lgs n. 196/2003 (Privacy Code), may contain
> >> confidential and/or privileged information. If you are not the addressee
> >> or
> >> authorized to receive this for the addressee, you must not use, copy,
> >> disclose or take any action based on this message or any information
> >> herein. If you have received this message in error, please advise the
> >> sender immediately by reply e-mail and delete this message. Thank you
> for
> >> your cooperation.
> >>
> >> On Mon, Mar 20, 2017 at 12:26 PM, Tao Meng <[hidden email]> wrote:
> >>
> >> Hi All,
> >>>
> >>> Sorry for joining this discussion late.
> >>> My graduation thesis is about online learning system. I would build it
> on
> >>> flink in the next three months.
> >>>
> >>> I'd like to contribute on:
> >>> - Online learning
> >>>
> >>>
> >>>
> >>>
> >>> On Mon, Mar 20, 2017 at 6:51 PM Katherin Eri <[hidden email]>
> >>> wrote:
> >>>
> >>> Hello, Theodore
> >>>
> >>> Could you please move vectors of development and their prioritized
> >>> positions from *## Executive summary* to the google doc?
> >>>
> >>>
> >>>
> >>> Could you please also create some table in google doc, that is
> >>> representing
> >>> the selected directions and persons, who would like to drive or
> >>> participate
> >>> in the particular topic, in order to make this process transparent for
> >>> community and sum up current state of commitment of contributors?
> >>>
> >>> There we could simply inscribe us to some topic.
> >>>
> >>>
> >>>
> >>> And 1+ for CTR prediction case.
> >>>
> >>> вс, 19 мар. 2017 г. в 16:49, Theodore Vasiloudis <
> >>> [hidden email]>:
> >>>
> >>> Hello Stavros,
> >>>>
> >>>> The way I thought we'd do it is that each shepherd would be
> responsible
> >>>>
> >>> for
> >>>
> >>>> organizing the project: that includes setting up a Google doc, sending
> >>>> an
> >>>> email to the dev list to inform the wider community, and if possible,
> >>>> personally contacting the people who expressed interest in the
> project.
> >>>>
> >>>> Would you be willing to lead that effort for the model serving
> project?
> >>>>
> >>>> Regards,
> >>>> Theodore
> >>>>
> >>>> --
> >>>> Sent from a mobile device. May contain autocorrect errors.
> >>>>
> >>>> On Mar 19, 2017 3:49 AM, "Stavros Kontopoulos" <
> >>>> [hidden email]
> >>>>
> >>>> wrote:
> >>>>
> >>>> Hi all...
> >>>>>
> >>>>> I agree about the tensorflow integration it seems to be important
> from
> >>>>>
> >>>> what
> >>>>
> >>>>> I hear.
> >>>>> Should we sign up somewhere for the working groups (gdcos)?
> >>>>> I would like to start helping with the model serving feature.
> >>>>>
> >>>>> Best Regards,
> >>>>> Stavros
> >>>>>
> >>>>> On Fri, Mar 17, 2017 at 10:34 PM, Gábor Hermann <
> [hidden email]
> >>>>> wrote:
> >>>>>
> >>>>> Hi Chen,
> >>>>>>
> >>>>>> Thanks for the input! :)
> >>>>>>
> >>>>>> There is already a project [1] for using TensorFlow models in Flink,
> >>>>>>
> >>>>> and
> >>>>
> >>>>> Theodore has suggested
> >>>>>> to contact the author, Eron Wright for the model serving direction.
> >>>>>>
> >>>>>>
> >>>>>> [1] http://sf.flink-forward.org/kb_sessions/introducing-flink-
> >>>>>>
> >>>>> tensorflow/
> >>>>>
> >>>>>> Cheers,
> >>>>>> Gabor
> >>>>>>
> >>>>>>
> >>>>>> On 2017-03-17 19:41, Chen Qin wrote:
> >>>>>>
> >>>>>> [1]http://sf.flink-forward.org/kb_sessions/introducing-flink-te
> >>>>>>> nsorflow/
> >>>>>>>
> >>>>>>>
> >>>>>> --
> >>>
> >>> *Yours faithfully, *
> >>>
> >>> *Kate Eri.*
> >>>
> >>>
> >
>

Adarsh

Re: Machine Learning on Flink - Next steps

In reply to this post by Theodore Vasiloudis

Thank you,.

I vote for:
1) Offline learning with the batch API
2) Low-latency prediction serving -> Online learning

In details:
1) Without ML Flink can never become the de-facto streaming engine.

2) Flink is a part of production ecosystem, and production systems require ML support.

a. Offline training should be supported, because typically most of ML algorithms
are for batch training.
b. Model lifecycle should be supported:
ETL+transformation+training+scoring+exploitation quality monitoring

I understand that batch world is full of competitors, however training in batch and fast execution online
can be very useful and can give Flink a edge, online learning is also desirable however with a lower priority.

We migrated from Spark to Flink and we love Flink however in absence of good ML suppoer we may have to move back to Spark.

Jinkui Shi

Re: Machine Learning on Flink - Next steps

Hi, all

I prefer online learning than offline training.
Offline learning is the basic ability. I think ML in Flink, the more obvious advantage is streaming process.

One suggestion is that FLinkML can define some API to support tensor-flow, mxnet and other ps framework, like apache beam’s mode.

There will be more and more ml framework. We can’t support all of them, but can define the common API to adapter.

Maybe we can start to discuss how to implement such ml framework and its design doc and API design.

Best regards,
Jinkui Shi

> 在 2017年3月31日，下午7:41，Adarsh <[hidden email]> 写道：
>
> Thank you,.
>
> I vote for:
> 1) Offline learning with the batch API
> 2) Low-latency prediction serving -> Online learning
>
> In details:
> 1) Without ML Flink can never become the de-facto streaming engine.
>
> 2) Flink is a part of production ecosystem, and production systems require
> ML support.
>
> a. Offline training should be supported, because typically most of ML
> algorithms
> are for batch training.
> b. Model lifecycle should be supported:
> ETL+transformation+training+scoring+exploitation quality monitoring
>
> I understand that batch world is full of competitors, however training in
> batch and fast execution online
> can be very useful and can give Flink a edge, online learning is also
> desirable however with a lower priority.
>
> We migrated from Spark to Flink and we love Flink however in absence of good
> ML suppoer we may have to move back to Spark.
>
>
>
> --
> View this message in context: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Machine-Learning-on-Flink-Next-steps-tp16334p16874.html
> Sent from the Apache Flink Mailing List archive. mailing list archive at Nabble.com.

Jinkui Shi

Re: Machine Learning on Flink - Next steps

In reply to this post by Adarsh