|
Hi, I’m implementing k-nearest-neighbors classification based flink-ml structure.
In recent commit (7a7a2940 [1]), the pipeline is restructured by dividing predict operation into case of a single element and case of data set. In case of data set, parameter map is given as a method parameter but in case of a single element there is no method to access parameter map. But in k-nearest-neighbors classification, we need to know k in predict method to select top k values. How can I solve this problem? Regards, Chiwan Park [1] https://github.com/apache/flink/commit/7a7a294033ef99c596e59f670e2e4ae9262f5c5f |
|
Hi Chiwan,
when you use the single element predict operation, you always have to implement the `getModel` method. There you have access to the resulting parameters and even to the instance to which the `PredictOperation` belongs. Within in this `getModel` method you can initialize all the information you need for the `predict` operation. You can take a look at the `StandardScalerTransformOperation` [1] where the mean and the std are set in the `getModel` method. Cheers, Till [1] https://github.com/apache/flink/blob/master/flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/preprocessing/StandardScaler.scala#L197 On Sun, Jun 28, 2015 at 1:49 PM, Chiwan Park <[hidden email]> wrote: > Hi, I’m implementing k-nearest-neighbors classification based flink-ml > structure. > > In recent commit (7a7a2940 [1]), the pipeline is restructured by dividing > predict operation > into case of a single element and case of data set. In case of data set, > parameter map is > given as a method parameter but in case of a single element there is no > method to access > parameter map. > > But in k-nearest-neighbors classification, we need to know k in predict > method to select top > k values. > > How can I solve this problem? > > Regards, > Chiwan Park > > [1] > https://github.com/apache/flink/commit/7a7a294033ef99c596e59f670e2e4ae9262f5c5f > > |
|
Thank you Till.
I have another question. Can I use a DataSet object as Model? In KNN, we need to DataSet given in fit operation. But when I defined Model generic parameter to DataSet in PredictOperation, the getModel method’s return type is DataSet[DataSet]. I’m confused with this situation. If any advice about this to me, I will really appreciate. Regards, Chiwan Park > On Jun 29, 2015, at 4:43 PM, Till Rohrmann <[hidden email]> wrote: > > Hi Chiwan, > > when you use the single element predict operation, you always have to > implement the `getModel` method. There you have access to the resulting > parameters and even to the instance to which the `PredictOperation` > belongs. Within in this `getModel` method you can initialize all the > information you need for the `predict` operation. > > You can take a look at the `StandardScalerTransformOperation` [1] where the > mean and the std are set in the `getModel` method. > > Cheers, > Till > > [1] > https://github.com/apache/flink/blob/master/flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/preprocessing/StandardScaler.scala#L197 > > On Sun, Jun 28, 2015 at 1:49 PM, Chiwan Park <[hidden email]> wrote: > >> Hi, I’m implementing k-nearest-neighbors classification based flink-ml >> structure. >> >> In recent commit (7a7a2940 [1]), the pipeline is restructured by dividing >> predict operation >> into case of a single element and case of data set. In case of data set, >> parameter map is >> given as a method parameter but in case of a single element there is no >> method to access >> parameter map. >> >> But in k-nearest-neighbors classification, we need to know k in predict >> method to select top >> k values. >> >> How can I solve this problem? >> >> Regards, >> Chiwan Park >> >> [1] >> https://github.com/apache/flink/commit/7a7a294033ef99c596e59f670e2e4ae9262f5c5f >> >> |
|
Hi Chiwan,
at the moment the single element PredictOperation only supports non-distributed models. This means that it expects the model to be a single element DataSet which can be broadcasted to the predict mappers. If you need more flexibility, you can either extend the PredictOperation interface or you simply use the PredictDataSetOperation, where you have full control over what data flow you execute. Cheers, Till On Mon, Jun 29, 2015 at 12:16 PM, Chiwan Park <[hidden email]> wrote: > Thank you Till. > > I have another question. Can I use a DataSet object as Model? In KNN, we > need > to DataSet given in fit operation. > > But when I defined Model generic parameter to DataSet in PredictOperation, > the getModel method’s return type is DataSet[DataSet]. I’m confused with > this > situation. > > If any advice about this to me, I will really appreciate. > > > Regards, > Chiwan Park > > > On Jun 29, 2015, at 4:43 PM, Till Rohrmann <[hidden email]> wrote: > > > > Hi Chiwan, > > > > when you use the single element predict operation, you always have to > > implement the `getModel` method. There you have access to the resulting > > parameters and even to the instance to which the `PredictOperation` > > belongs. Within in this `getModel` method you can initialize all the > > information you need for the `predict` operation. > > > > You can take a look at the `StandardScalerTransformOperation` [1] where > the > > mean and the std are set in the `getModel` method. > > > > Cheers, > > Till > > > > [1] > > > https://github.com/apache/flink/blob/master/flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/preprocessing/StandardScaler.scala#L197 > > > > On Sun, Jun 28, 2015 at 1:49 PM, Chiwan Park <[hidden email]> > wrote: > > > >> Hi, I’m implementing k-nearest-neighbors classification based flink-ml > >> structure. > >> > >> In recent commit (7a7a2940 [1]), the pipeline is restructured by > dividing > >> predict operation > >> into case of a single element and case of data set. In case of data set, > >> parameter map is > >> given as a method parameter but in case of a single element there is no > >> method to access > >> parameter map. > >> > >> But in k-nearest-neighbors classification, we need to know k in predict > >> method to select top > >> k values. > >> > >> How can I solve this problem? > >> > >> Regards, > >> Chiwan Park > >> > >> [1] > >> > https://github.com/apache/flink/commit/7a7a294033ef99c596e59f670e2e4ae9262f5c5f > >> > >> > > > > > |
|
Thanks Till :)
I reimplemented my implementation using PredictDataSetOperation. Regards, Chiwan Park > On Jun 29, 2015, at 7:41 PM, Till Rohrmann <[hidden email]> wrote: > > Hi Chiwan, > > at the moment the single element PredictOperation only supports > non-distributed models. This means that it expects the model to be a single > element DataSet which can be broadcasted to the predict mappers. > > If you need more flexibility, you can either extend the PredictOperation > interface or you simply use the PredictDataSetOperation, where you have > full control over what data flow you execute. > > Cheers, > Till > > > On Mon, Jun 29, 2015 at 12:16 PM, Chiwan Park <[hidden email]> wrote: > >> Thank you Till. >> >> I have another question. Can I use a DataSet object as Model? In KNN, we >> need >> to DataSet given in fit operation. >> >> But when I defined Model generic parameter to DataSet in PredictOperation, >> the getModel method’s return type is DataSet[DataSet]. I’m confused with >> this >> situation. >> >> If any advice about this to me, I will really appreciate. >> >> >> Regards, >> Chiwan Park >> >>> On Jun 29, 2015, at 4:43 PM, Till Rohrmann <[hidden email]> wrote: >>> >>> Hi Chiwan, >>> >>> when you use the single element predict operation, you always have to >>> implement the `getModel` method. There you have access to the resulting >>> parameters and even to the instance to which the `PredictOperation` >>> belongs. Within in this `getModel` method you can initialize all the >>> information you need for the `predict` operation. >>> >>> You can take a look at the `StandardScalerTransformOperation` [1] where >> the >>> mean and the std are set in the `getModel` method. >>> >>> Cheers, >>> Till >>> >>> [1] >>> >> https://github.com/apache/flink/blob/master/flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/preprocessing/StandardScaler.scala#L197 >>> >>> On Sun, Jun 28, 2015 at 1:49 PM, Chiwan Park <[hidden email]> >> wrote: >>> >>>> Hi, I’m implementing k-nearest-neighbors classification based flink-ml >>>> structure. >>>> >>>> In recent commit (7a7a2940 [1]), the pipeline is restructured by >> dividing >>>> predict operation >>>> into case of a single element and case of data set. In case of data set, >>>> parameter map is >>>> given as a method parameter but in case of a single element there is no >>>> method to access >>>> parameter map. >>>> >>>> But in k-nearest-neighbors classification, we need to know k in predict >>>> method to select top >>>> k values. >>>> >>>> How can I solve this problem? >>>> >>>> Regards, >>>> Chiwan Park >>>> >>>> [1] >>>> >> https://github.com/apache/flink/commit/7a7a294033ef99c596e59f670e2e4ae9262f5c5f >>>> >>>> >> >> >> >> >> |
| Free forum by Nabble | Edit this page |
