Flink ML recommender system API

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Flink ML recommender system API

Gábor Hermann
Hey all,

We've been working on improvements for the recommendation in Flink ML,
and some API design questions have come up. Our plans in short:

- Extend ALS to work on implicit feedback datasets [1]
- DSGD implementation for matrix factorization [2]
- Ranking prediction based on a matrix factorization model [3]
- Evaluations for recommenders (precision, recall, nDCG) [4]


First, we've seen that an evaluation framework has been implemented (in
a not yet merged PR [5]), but evalations of recommenders would not fit
into this framework. This is basically because recommender evaluations,
instead of comparing real numbers or fixed size vectors, compare top
lists of possible different, arbitrary large sizes. The details are
descirbed in FLINK-4713 [4]. I see three possible solutions for this:

- we either rework the evaluation framework proposed in [5] to allow
inputs suitable for recommender evaluations
- or fit the recommender evaluations in the framework in a kind of
unnatural form with possible bad performance implications
- or do not fit recommender evaluations in the framework at all

I would prefer reworking the evaluation framework, but it's up to
discussion. It also depends on whether the PR will be merged soon or
not. Theodore, what are your thoughts on this as the author of the eval
framework?


Second, picking the form of evaluation also affects how we should give
the ranking prediction. We could choose a flat form (i.e.
DataSet[(Int,Int,Int)]) or represent the rankings in an array (i.e.
DataSet[(Int,Array[Int])]). See details in [4]. The flat form would
allow the system to work distributedly, so I'd go with that
representation, but it's also up to discussion.


Last, ALS and DSGD are two different algorithms for training the same
matrix factorization model, but in the current API could not be really
visible to the user. Training an ALS model modifies the ALS object and
puts a matrix factorization model in it. We could do the same with DSGD
and have a common abstraction (say a superclass MatrixFactorization).
However, in my opinion, it might be more straightforward if ALS.fit
would return a different object (say MatrixFactorizationModel akin to
Spark [6]) containing the DataSets representing the factors. By using
this approach, we could avoid checking at runtime whether a model has
been trained or not, and force the user at compile time to only call
predict on models that have already been trained.

Of course, this could also be applied to other models in Flink ML, and
would be an API breaking change. Were there any reason to pick the
current training API design instead of the more "typesafe" one? I am
certain, that we should keep the ML API consistent, so we should either
change the training API of all models, or leave them as they ar.
Although, I don't think it would take much effort to modify the API. We
could also keep and depricate the current fit method to avoid breaking
the API. What do you think about this? If there are no objections, I'm
happy to open a JIRA and start working on it.


[1] https://github.com/apache/flink/pull/2542
[2] http://dx.doi.org/10.1145/2020408.2020426
[3] https://issues.apache.org/jira/browse/FLINK-4712
[4] https://issues.apache.org/jira/browse/FLINK-4713
[5] https://github.com/apache/flink/pull/1849
[6]
https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala#L315

Cheers,
Gabor


Reply | Threaded
Open this post in threaded view
|

Re: Flink ML recommender system API

Till Rohrmann
Hi Gabor,

thanks for getting involved in Flink's ML library. Always good to have
people working on it :-)

Some thoughts concerning the points you've raised inline:

On Tue, Oct 4, 2016 at 12:47 PM, Gábor Hermann <[hidden email]>
wrote:

> Hey all,
>
> We've been working on improvements for the recommendation in Flink ML, and
> some API design questions have come up. Our plans in short:
>
> - Extend ALS to work on implicit feedback datasets [1]
> - DSGD implementation for matrix factorization [2]
>
Have looked at GradientDescent? Maybe this is already doing what you want
to implement or can be adapted to do it.


> - Ranking prediction based on a matrix factorization model [3]

- Evaluations for recommenders (precision, recall, nDCG) [4]

>
>
> First, we've seen that an evaluation framework has been implemented (in a
> not yet merged PR [5]), but evalations of recommenders would not fit into
> this framework. This is basically because recommender evaluations, instead
> of comparing real numbers or fixed size vectors, compare top lists of
> possible different, arbitrary large sizes. The details are descirbed in
> FLINK-4713 [4]. I see three possible solutions for this:
>
> - we either rework the evaluation framework proposed in [5] to allow
> inputs suitable for recommender evaluations
> - or fit the recommender evaluations in the framework in a kind of
> unnatural form with possible bad performance implications
> - or do not fit recommender evaluations in the framework at all
>
> I would prefer reworking the evaluation framework, but it's up to
> discussion. It also depends on whether the PR will be merged soon or not.
> Theodore, what are your thoughts on this as the author of the eval
> framework?
>
It would be great if the evaluation framework in the PR could be adapted to
also support evaluating recommenders, if there is no fundamental reason not
to do it. But my gut feeling is that it should not be impossible.


>
> Second, picking the form of evaluation also affects how we should give the
> ranking prediction. We could choose a flat form (i.e.
> DataSet[(Int,Int,Int)]) or represent the rankings in an array (i.e.
> DataSet[(Int,Array[Int])]). See details in [4]. The flat form would allow
> the system to work distributedly, so I'd go with that representation, but
> it's also up to discussion.
>
It would be great to keep scalability in mind. Thus, I would go with the
more scalable version.


>
>
> Last, ALS and DSGD are two different algorithms for training the same
> matrix factorization model, but in the current API could not be really
> visible to the user. Training an ALS model modifies the ALS object and puts
> a matrix factorization model in it. We could do the same with DSGD and have
> a common abstraction (say a superclass MatrixFactorization). However, in my
> opinion, it might be more straightforward if ALS.fit would return a
> different object (say MatrixFactorizationModel akin to Spark [6])
> containing the DataSets representing the factors. By using this approach,
> we could avoid checking at runtime whether a model has been trained or not,
> and force the user at compile time to only call predict on models that have
> already been trained.
>
I'm not sure whether the latter approach plays well along with the
pipelining. One always has to keep in mind that ones abstraction also
should work in a ML pipeline.

I think it would be good to implement a ScoreMatrixFactorizationRecommender
and a RankingMatrixFactorizationRecommender which both work on a
MatrixFactorizationModel. This model can then either be computed by ALS or
DSGD. This could be controlled by a configuration parameter of the
recommenders.


> Of course, this could also be applied to other models in Flink ML, and
> would be an API breaking change. Were there any reason to pick the current
> training API design instead of the more "typesafe" one? I am certain, that
> we should keep the ML API consistent, so we should either change the
> training API of all models, or leave them as they ar. Although, I don't
> think it would take much effort to modify the API. We could also keep and
> depricate the current fit method to avoid breaking the API. What do you
> think about this? If there are no objections, I'm happy to open a JIRA and
> start working on it.
>
What do you mean with more "typesafe"? I don't see how returning the
trained model from the fit method gives you more type safety.

The reason why fit does not return a model was that not every estimator has
necessarily a model it trains (see PolynomialFeatures extractor).
Furthermore, when creating pipelines you basically also have to create
chained models. This is doable, no question, but you can also retrieve the
models from the modelful estimators as it is currently implemented.
Furthermore, the model itself is rarely useful without the respective
prediction algorithm.

But if you need to change the API, then we can try to figure out how we can
do this without breaking the pipelining mechanism.
Cheers,
Till
Reply | Threaded
Open this post in threaded view
|

Re: Flink ML recommender system API

Theodore Vasiloudis
Hello all,

Thanks for starting this discussion Gabor you bring up a lot of interesting
points.

In terms of the evaluation framework I would also favor reworking it in
order to support recommendation models. We can either we merge the current
PR and use it as a basis, or open a new one.

For the form of the evaluation I'm also in favor of the flat form that
allows us to have a scalable system.

For the development of DSGD I would recommend taking into consideration the
current limitations of SGD in FlinkML in term of sampling within
iterations. Due to the nature of iterations at the time we developed SGD it
was not possible to select samples from the dataset while we iterate over
the weights DS. That means that the SGD implementation in FlinkML is
actually simple GD. I can recommend reviewing the discussion in [2] (issue
is at [3]) and perhaps open a new thread as that is major issue on its own
and a blocker for many ML algorithms which modify more than one DS per
iteration.
Would that be an issue for DSGD as well?

Finally regarding the decision of having a model as a separate object from
the algorithm, just to add another perspective to what Till said, there are
also implications in terms of model export/import (esp. with distributed
models), in which case it's probably made easier by decoupling training
from the object.

This is  essentially a design decision with pros and cons on both sides.
Our decision was mostly influenced from the considerations that Till
already mentioned about pipelines. The same approach was taken in
scikit-learn, and I encourage you to read their thinking in page 5 of [1],
a document we generally kept in mind when developing FlinkML.

That being said I'm open to re-evaluating this decision, the issue at hand
is, as Till mentioned, that the pipelining mechanism would probably have to
be re-worked in order to allow for this change.

[1] "API design for machine learning software: experiences from the
scikit-learn project" https://arxiv.org/abs/1309.0238
[2] https://issues.apache.org/jira/browse/FLINK-1807
[3] https://issues.apache.org/jira/browse/FLINK-2396

On Tue, Oct 4, 2016 at 2:04 PM, Till Rohrmann <[hidden email]> wrote:

> Hi Gabor,
>
> thanks for getting involved in Flink's ML library. Always good to have
> people working on it :-)
>
> Some thoughts concerning the points you've raised inline:
>
> On Tue, Oct 4, 2016 at 12:47 PM, Gábor Hermann <[hidden email]>
> wrote:
>
>> Hey all,
>>
>> We've been working on improvements for the recommendation in Flink ML,
>> and some API design questions have come up. Our plans in short:
>>
>> - Extend ALS to work on implicit feedback datasets [1]
>> - DSGD implementation for matrix factorization [2]
>>
> Have looked at GradientDescent? Maybe this is already doing what you want
> to implement or can be adapted to do it.
>
>
>> - Ranking prediction based on a matrix factorization model [3]
>
> - Evaluations for recommenders (precision, recall, nDCG) [4]
>>
>>
>> First, we've seen that an evaluation framework has been implemented (in a
>> not yet merged PR [5]), but evalations of recommenders would not fit into
>> this framework. This is basically because recommender evaluations, instead
>> of comparing real numbers or fixed size vectors, compare top lists of
>> possible different, arbitrary large sizes. The details are descirbed in
>> FLINK-4713 [4]. I see three possible solutions for this:
>>
>> - we either rework the evaluation framework proposed in [5] to allow
>> inputs suitable for recommender evaluations
>> - or fit the recommender evaluations in the framework in a kind of
>> unnatural form with possible bad performance implications
>> - or do not fit recommender evaluations in the framework at all
>>
>> I would prefer reworking the evaluation framework, but it's up to
>> discussion. It also depends on whether the PR will be merged soon or not.
>> Theodore, what are your thoughts on this as the author of the eval
>> framework?
>>
> It would be great if the evaluation framework in the PR could be adapted
> to also support evaluating recommenders, if there is no fundamental reason
> not to do it. But my gut feeling is that it should not be impossible.
>
>
>>
>> Second, picking the form of evaluation also affects how we should give
>> the ranking prediction. We could choose a flat form (i.e.
>> DataSet[(Int,Int,Int)]) or represent the rankings in an array (i.e.
>> DataSet[(Int,Array[Int])]). See details in [4]. The flat form would allow
>> the system to work distributedly, so I'd go with that representation, but
>> it's also up to discussion.
>>
> It would be great to keep scalability in mind. Thus, I would go with the
> more scalable version.
>
>
>>
>>
>> Last, ALS and DSGD are two different algorithms for training the same
>> matrix factorization model, but in the current API could not be really
>> visible to the user. Training an ALS model modifies the ALS object and puts
>> a matrix factorization model in it. We could do the same with DSGD and have
>> a common abstraction (say a superclass MatrixFactorization). However, in my
>> opinion, it might be more straightforward if ALS.fit would return a
>> different object (say MatrixFactorizationModel akin to Spark [6])
>> containing the DataSets representing the factors. By using this approach,
>> we could avoid checking at runtime whether a model has been trained or not,
>> and force the user at compile time to only call predict on models that have
>> already been trained.
>>
> I'm not sure whether the latter approach plays well along with the
> pipelining. One always has to keep in mind that ones abstraction also
> should work in a ML pipeline.
>
> I think it would be good to implement a ScoreMatrixFactorizationRecommender
> and a RankingMatrixFactorizationRecommender which both work on a
> MatrixFactorizationModel. This model can then either be computed by ALS or
> DSGD. This could be controlled by a configuration parameter of the
> recommenders.
>
>
>> Of course, this could also be applied to other models in Flink ML, and
>> would be an API breaking change. Were there any reason to pick the current
>> training API design instead of the more "typesafe" one? I am certain, that
>> we should keep the ML API consistent, so we should either change the
>> training API of all models, or leave them as they ar. Although, I don't
>> think it would take much effort to modify the API. We could also keep and
>> depricate the current fit method to avoid breaking the API. What do you
>> think about this? If there are no objections, I'm happy to open a JIRA and
>> start working on it.
>>
> What do you mean with more "typesafe"? I don't see how returning the
> trained model from the fit method gives you more type safety.
>
> The reason why fit does not return a model was that not every estimator
> has necessarily a model it trains (see PolynomialFeatures extractor).
> Furthermore, when creating pipelines you basically also have to create
> chained models. This is doable, no question, but you can also retrieve the
> models from the modelful estimators as it is currently implemented.
> Furthermore, the model itself is rarely useful without the respective
> prediction algorithm.
>
> But if you need to change the API, then we can try to figure out how we
> can do this without breaking the pipelining mechanism.
>
>>
>>
>> [1] https://github.com/apache/flink/pull/2542
>> [2] http://dx.doi.org/10.1145/2020408.2020426
>> [3] https://issues.apache.org/jira/browse/FLINK-4712
>> [4] https://issues.apache.org/jira/browse/FLINK-4713
>> [5] https://github.com/apache/flink/pull/1849
>> [6] https://github.com/apache/spark/blob/master/mllib/src/main/s
>> cala/org/apache/spark/mllib/recommendation/ALS.scala#L315
>>
>> Cheers,
>> Gabor
>>
>>
>>
> Cheers,
> Till
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Flink ML recommender system API

Gábor Hermann
Thank you both for your detailed replies.

I think we all agree on extending the evaluation framework to handle
recommendation models, and choosing the scalable form of ranking, so
we'll do it that way. For now we will work upon Theodore's PR.

Thanks for giving me the reasons behind the design decision about not
having a separate object for the trained model. I haven't thought about
the implications of pipelines, so I think we should keep the current
design and align our new algorithms to it. Of course, we can bring up a
discussion later and reconsider this design, but I see that it's a
separate issue.

> I think it would be good to implement a ScoreMatrixFactorizationRecommender
> and a RankingMatrixFactorizationRecommender which both work on a
> MatrixFactorizationModel. This model can then either be computed by ALS or
> DSGD. This could be controlled by a configuration parameter of the
> recommenders.
Do you mean having two different predictors, i.e.
Predictor[ScoreMatrixFactorizationRecommender] and
Predictor[RankingMatrixFactorizationRecommender]?
If I understand right, there should be one common class
MatrixFactorizationModel instead of distinct ALS and DSGD classes, and
it should be a configuration parameter which one to use for training?

I like this idea, as both trainers would require almost the same
configuration. AFAIK there would be an additional 'LearningRate'
parameter for DSGD, but apart from that the two configs are the same.

> What do you mean with more "typesafe"? I don't see how returning the
> trained model from the fit method gives you more type safety.
I probably used the wrong word here. I simply meant that using a
separate type for the trained model, the type ensures that the trained
model cannot be trained again, while an untrained model cannot be used
for prediction.

Regarding the DSGD algorithm, I think it uses another sampling
mechanism, and we cannot reuse the simple SGD solver. However, we will
make sure not to write duplicate code for the same problem. We've also
noticed, independently from DSGD, that the SGD solver is a GD solver in
reality, but I have not found the related issues and discussion, so
pointing me to them was really useful, thanks!

Cheers,
Gabor
Reply | Threaded
Open this post in threaded view
|

Re: Flink ML recommender system API

Gábor Hermann
Hi all,

We have managed to fit the ranking recommendation evaluation into the
evaluation framework proposed by Thedore (FLINK-2157). There's one main
problem, that remains: we have to different predictor traits (Predictor,
RankingPredictor) without a common superclass, and that might be
problematic.

Please see the details at the issue:
https://issues.apache.org/jira/browse/FLINK-4713

Could you give feedback on whether we are moving in the right direction
or not? Thanks!

Cheers,
Gabor

Reply | Threaded
Open this post in threaded view
|

Re: Flink ML recommender system API

Theodore Vasiloudis
Hello Gabor,

for this type of issue (design decisions) what we've done in the past with
FlinkML is to open a PR marked with the WIP tag and take the discussion
there, making it easier
for people to check out the code and get a feel of advantages/disadvantages
of different approaches.

Could you do that for this issue?

Regards,
Theodore

On Thu, Nov 10, 2016 at 12:46 PM, Gábor Hermann <[hidden email]>
wrote:

> Hi all,
>
> We have managed to fit the ranking recommendation evaluation into the
> evaluation framework proposed by Thedore (FLINK-2157). There's one main
> problem, that remains: we have to different predictor traits (Predictor,
> RankingPredictor) without a common superclass, and that might be
> problematic.
>
> Please see the details at the issue:
> https://issues.apache.org/jira/browse/FLINK-4713
>
> Could you give feedback on whether we are moving in the right direction or
> not? Thanks!
>
> Cheers,
> Gabor
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Flink ML recommender system API

Gábor Hermann
Hello Theodore,

Thanks for your reply.

Of course. I would have done that in the first place but I had seen the
contribution guideline advising to avoid WIP PRs:

"No WIP pull requests. We consider pull requests as requests to merge
the referenced code as is into the current stable master branch.
Therefore, a pull request should not be “work in progress”. Open a pull
request if you are confident that it can be merged into the current
master branch without problems. If you rather want comments on your
code, post a link to your working branch."

I don't know the rationale behind this, as a WIP PR seems most of the
time a convenient way to share half-finished code. Maybe I'll open a
discussion about this.

I'll open a WIP PR for this after cleaning our code a bit.

Cheers,
Gabor

On 2016-11-10 16:56, Theodore Vasiloudis wrote:

> Hello Gabor,
>
> for this type of issue (design decisions) what we've done in the past with
> FlinkML is to open a PR marked with the WIP tag and take the discussion
> there, making it easier
> for people to check out the code and get a feel of advantages/disadvantages
> of different approaches.
>
> Could you do that for this issue?
>
> Regards,
> Theodore
>
> On Thu, Nov 10, 2016 at 12:46 PM, Gábor Hermann <[hidden email]>
> wrote:
>
>> Hi all,
>>
>> We have managed to fit the ranking recommendation evaluation into the
>> evaluation framework proposed by Thedore (FLINK-2157). There's one main
>> problem, that remains: we have to different predictor traits (Predictor,
>> RankingPredictor) without a common superclass, and that might be
>> problematic.
>>
>> Please see the details at the issue:
>> https://issues.apache.org/jira/browse/FLINK-4713
>>
>> Could you give feedback on whether we are moving in the right direction or
>> not? Thanks!
>>
>> Cheers,
>> Gabor
>>
>>