FlinkML - Evaluate function should manage LabeledVector

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

FlinkML - Evaluate function should manage LabeledVector

Thomas FOURNIER
Hi,

I'd like to improve SVM evaluate function so that it can use LabeledVector
(and not only Vector).
Indeed, what is done in test is the following (data is a
DataSet[LabeledVector]):

val test = data.map(l => (l.vector, l.label))
svm.evaluate(test)

We would like to do:
sm.evaluate(data)


Adding this "new" code:

implicit def predictLabeledPoint[T <: LabeledVector] = {
 new PredictOperation  ...
}

gives me a predictOperation that should be used with
defaultEvaluateDataSetOperation
with the correct signature (ie with T <: LabeledVector and not T<: Vector).

Nonetheless, tests are failing:


it should "predict with LabeledDataPoint" in {

  val env = ExecutionEnvironment.getExecutionEnvironment

  val svm = SVM().
    setBlocks(env.getParallelism).
    setIterations(100).
    setLocalIterations(100).
    setRegularization(0.002).
    setStepsize(0.1).
    setSeed(0)

  val trainingDS = env.fromCollection(Classification.trainingData)
  svm.fit(trainingDS)
  val predictionPairs = svm.evaluate(trainingDS)

  ....
}

There is no PredictOperation defined for
org.apache.flink.ml.classification.SVM which takes a
DataSet[org.apache.flink.ml.common.LabeledVector] as input.
java.lang.RuntimeException: There is no PredictOperation defined for
org.apache.flink.ml.classification.SVM which takes a
DataSet[org.apache.flink.ml.common.LabeledVector] as input.



Thanks

Regards
Thomas
Reply | Threaded
Open this post in threaded view
|

Re: FlinkML - Evaluate function should manage LabeledVector

Theodore Vasiloudis
Hello Thomas,

since you are calling evaluate here, you should be creating an
EvaluateDataSet operation that works with LabeledVector, I see you are
creating a new PredictOperation.

On Wed, Oct 19, 2016 at 3:05 PM, Thomas FOURNIER <
[hidden email]> wrote:

> Hi,
>
> I'd like to improve SVM evaluate function so that it can use LabeledVector
> (and not only Vector).
> Indeed, what is done in test is the following (data is a
> DataSet[LabeledVector]):
>
> val test = data.map(l => (l.vector, l.label))
> svm.evaluate(test)
>
> We would like to do:
> sm.evaluate(data)
>
>
> Adding this "new" code:
>
> implicit def predictLabeledPoint[T <: LabeledVector] = {
>  new PredictOperation  ...
> }
>
> gives me a predictOperation that should be used with
> defaultEvaluateDataSetOperation
> with the correct signature (ie with T <: LabeledVector and not T<: Vector).
>
> Nonetheless, tests are failing:
>
>
> it should "predict with LabeledDataPoint" in {
>
>   val env = ExecutionEnvironment.getExecutionEnvironment
>
>   val svm = SVM().
>     setBlocks(env.getParallelism).
>     setIterations(100).
>     setLocalIterations(100).
>     setRegularization(0.002).
>     setStepsize(0.1).
>     setSeed(0)
>
>   val trainingDS = env.fromCollection(Classification.trainingData)
>   svm.fit(trainingDS)
>   val predictionPairs = svm.evaluate(trainingDS)
>
>   ....
> }
>
> There is no PredictOperation defined for
> org.apache.flink.ml.classification.SVM which takes a
> DataSet[org.apache.flink.ml.common.LabeledVector] as input.
> java.lang.RuntimeException: There is no PredictOperation defined for
> org.apache.flink.ml.classification.SVM which takes a
> DataSet[org.apache.flink.ml.common.LabeledVector] as input.
>
>
>
> Thanks
>
> Regards
> Thomas
>
Reply | Threaded
Open this post in threaded view
|

Re: FlinkML - Evaluate function should manage LabeledVector

Thomas FOURNIER
Hi,

Two questions:

1- I was thinking of doing this:

implicit def evaluateLabeledVector[T <: LabeledVector] = {

  new EvaluateDataSetOperation[SVM,T,Double]() {

    override def evaluateDataSet(instance: SVM, evaluateParameters:
ParameterMap, testing: DataSet[T]): DataSet[(Double, Double)] = {
      val predictor = ...
      testing.map(l => (l.label, predictor.predict(l.vector)))

    }
  }
}

How can I access to my predictor object (predictor has type
PredictOperation[SVM, DenseVector, T, Double]) ?

2- My first idea was to develop a predictOperation[T <: LabeledVector]
so that I could use implicit def defaultEvaluateDatasetOperation

to get an EvaluateDataSetOperationObject. Is it also valid or not ?

Thanks
Regards

Thomas







2016-10-19 16:26 GMT+02:00 Theodore Vasiloudis <
[hidden email]>:

> Hello Thomas,
>
> since you are calling evaluate here, you should be creating an
> EvaluateDataSet operation that works with LabeledVector, I see you are
> creating a new PredictOperation.
>
> On Wed, Oct 19, 2016 at 3:05 PM, Thomas FOURNIER <
> [hidden email]> wrote:
>
> > Hi,
> >
> > I'd like to improve SVM evaluate function so that it can use
> LabeledVector
> > (and not only Vector).
> > Indeed, what is done in test is the following (data is a
> > DataSet[LabeledVector]):
> >
> > val test = data.map(l => (l.vector, l.label))
> > svm.evaluate(test)
> >
> > We would like to do:
> > sm.evaluate(data)
> >
> >
> > Adding this "new" code:
> >
> > implicit def predictLabeledPoint[T <: LabeledVector] = {
> >  new PredictOperation  ...
> > }
> >
> > gives me a predictOperation that should be used with
> > defaultEvaluateDataSetOperation
> > with the correct signature (ie with T <: LabeledVector and not T<:
> Vector).
> >
> > Nonetheless, tests are failing:
> >
> >
> > it should "predict with LabeledDataPoint" in {
> >
> >   val env = ExecutionEnvironment.getExecutionEnvironment
> >
> >   val svm = SVM().
> >     setBlocks(env.getParallelism).
> >     setIterations(100).
> >     setLocalIterations(100).
> >     setRegularization(0.002).
> >     setStepsize(0.1).
> >     setSeed(0)
> >
> >   val trainingDS = env.fromCollection(Classification.trainingData)
> >   svm.fit(trainingDS)
> >   val predictionPairs = svm.evaluate(trainingDS)
> >
> >   ....
> > }
> >
> > There is no PredictOperation defined for
> > org.apache.flink.ml.classification.SVM which takes a
> > DataSet[org.apache.flink.ml.common.LabeledVector] as input.
> > java.lang.RuntimeException: There is no PredictOperation defined for
> > org.apache.flink.ml.classification.SVM which takes a
> > DataSet[org.apache.flink.ml.common.LabeledVector] as input.
> >
> >
> >
> > Thanks
> >
> > Regards
> > Thomas
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: FlinkML - Evaluate function should manage LabeledVector

Theodore Vasiloudis
I think this might be problematic with the current way we define the
predict operations because they require that both the Testing and
PredictionValue types are available.

Here's what I had to do to get it to work (in ml/pipeline/Predictor.scala):

import org.apache.flink.ml.math.{Vector => FlinkVector}
implicit def labeledVectorEvaluateDataSetOperation[
Instance <: Estimator[Instance],
Model,
FlinkVector,
Double](
implicit predictOperation: PredictOperation[Instance, Model,
FlinkVector, Double],
  testingTypeInformation: TypeInformation[FlinkVector],
  predictionValueTypeInformation: TypeInformation[Double])
: EvaluateDataSetOperation[Instance, LabeledVector, Double] = {
  new EvaluateDataSetOperation[Instance, LabeledVector, Double] {
    override def evaluateDataSet(
                                  instance: Instance,
                                  evaluateParameters: ParameterMap,
                                  testing: DataSet[LabeledVector])
    : DataSet[(Double,  Double)] = {
      val resultingParameters = instance.parameters ++ evaluateParameters
      val model = predictOperation.getModel(instance, resultingParameters)

      implicit val resultTypeInformation =
createTypeInformation[(FlinkVector, Double)]

      testing.mapWithBcVariable(model){
        (element, model) => {
          (element.label.asInstanceOf[Double],
predictOperation.predict(element.vector.asInstanceOf[FlinkVector],
model))
        }
      }
    }
  }
}

I'm not a fan of casting objects, but the compiler complains here otherwise.

Maybe someone has some input as to why the casting is necessary here, given
that the underlying types are correct? Probably has to do with some type
erasure I'm not seeing here.

--Theo

On Wed, Oct 19, 2016 at 10:30 PM, Thomas FOURNIER <
[hidden email]> wrote:

> Hi,
>
> Two questions:
>
> 1- I was thinking of doing this:
>
> implicit def evaluateLabeledVector[T <: LabeledVector] = {
>
>   new EvaluateDataSetOperation[SVM,T,Double]() {
>
>     override def evaluateDataSet(instance: SVM, evaluateParameters:
> ParameterMap, testing: DataSet[T]): DataSet[(Double, Double)] = {
>       val predictor = ...
>       testing.map(l => (l.label, predictor.predict(l.vector)))
>
>     }
>   }
> }
>
> How can I access to my predictor object (predictor has type
> PredictOperation[SVM, DenseVector, T, Double]) ?
>
> 2- My first idea was to develop a predictOperation[T <: LabeledVector]
> so that I could use implicit def defaultEvaluateDatasetOperation
>
> to get an EvaluateDataSetOperationObject. Is it also valid or not ?
>
> Thanks
> Regards
>
> Thomas
>
>
>
>
>
>
>
> 2016-10-19 16:26 GMT+02:00 Theodore Vasiloudis <
> [hidden email]>:
>
> > Hello Thomas,
> >
> > since you are calling evaluate here, you should be creating an
> > EvaluateDataSet operation that works with LabeledVector, I see you are
> > creating a new PredictOperation.
> >
> > On Wed, Oct 19, 2016 at 3:05 PM, Thomas FOURNIER <
> > [hidden email]> wrote:
> >
> > > Hi,
> > >
> > > I'd like to improve SVM evaluate function so that it can use
> > LabeledVector
> > > (and not only Vector).
> > > Indeed, what is done in test is the following (data is a
> > > DataSet[LabeledVector]):
> > >
> > > val test = data.map(l => (l.vector, l.label))
> > > svm.evaluate(test)
> > >
> > > We would like to do:
> > > sm.evaluate(data)
> > >
> > >
> > > Adding this "new" code:
> > >
> > > implicit def predictLabeledPoint[T <: LabeledVector] = {
> > >  new PredictOperation  ...
> > > }
> > >
> > > gives me a predictOperation that should be used with
> > > defaultEvaluateDataSetOperation
> > > with the correct signature (ie with T <: LabeledVector and not T<:
> > Vector).
> > >
> > > Nonetheless, tests are failing:
> > >
> > >
> > > it should "predict with LabeledDataPoint" in {
> > >
> > >   val env = ExecutionEnvironment.getExecutionEnvironment
> > >
> > >   val svm = SVM().
> > >     setBlocks(env.getParallelism).
> > >     setIterations(100).
> > >     setLocalIterations(100).
> > >     setRegularization(0.002).
> > >     setStepsize(0.1).
> > >     setSeed(0)
> > >
> > >   val trainingDS = env.fromCollection(Classification.trainingData)
> > >   svm.fit(trainingDS)
> > >   val predictionPairs = svm.evaluate(trainingDS)
> > >
> > >   ....
> > > }
> > >
> > > There is no PredictOperation defined for
> > > org.apache.flink.ml.classification.SVM which takes a
> > > DataSet[org.apache.flink.ml.common.LabeledVector] as input.
> > > java.lang.RuntimeException: There is no PredictOperation defined for
> > > org.apache.flink.ml.classification.SVM which takes a
> > > DataSet[org.apache.flink.ml.common.LabeledVector] as input.
> > >
> > >
> > >
> > > Thanks
> > >
> > > Regards
> > > Thomas
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: FlinkML - Evaluate function should manage LabeledVector

Thomas FOURNIER
Ok thanks.

I'm going to create a specific JIRA on this. Ok ?

2016-10-20 12:54 GMT+02:00 Theodore Vasiloudis <
[hidden email]>:

> I think this might be problematic with the current way we define the
> predict operations because they require that both the Testing and
> PredictionValue types are available.
>
> Here's what I had to do to get it to work (in ml/pipeline/Predictor.scala):
>
> import org.apache.flink.ml.math.{Vector => FlinkVector}
> implicit def labeledVectorEvaluateDataSetOperation[
> Instance <: Estimator[Instance],
> Model,
> FlinkVector,
> Double](
> implicit predictOperation: PredictOperation[Instance, Model,
> FlinkVector, Double],
>   testingTypeInformation: TypeInformation[FlinkVector],
>   predictionValueTypeInformation: TypeInformation[Double])
> : EvaluateDataSetOperation[Instance, LabeledVector, Double] = {
>   new EvaluateDataSetOperation[Instance, LabeledVector, Double] {
>     override def evaluateDataSet(
>                                   instance: Instance,
>                                   evaluateParameters: ParameterMap,
>                                   testing: DataSet[LabeledVector])
>     : DataSet[(Double,  Double)] = {
>       val resultingParameters = instance.parameters ++ evaluateParameters
>       val model = predictOperation.getModel(instance, resultingParameters)
>
>       implicit val resultTypeInformation =
> createTypeInformation[(FlinkVector, Double)]
>
>       testing.mapWithBcVariable(model){
>         (element, model) => {
>           (element.label.asInstanceOf[Double],
> predictOperation.predict(element.vector.asInstanceOf[FlinkVector],
> model))
>         }
>       }
>     }
>   }
> }
>
> I'm not a fan of casting objects, but the compiler complains here
> otherwise.
>
> Maybe someone has some input as to why the casting is necessary here, given
> that the underlying types are correct? Probably has to do with some type
> erasure I'm not seeing here.
>
> --Theo
>
> On Wed, Oct 19, 2016 at 10:30 PM, Thomas FOURNIER <
> [hidden email]> wrote:
>
> > Hi,
> >
> > Two questions:
> >
> > 1- I was thinking of doing this:
> >
> > implicit def evaluateLabeledVector[T <: LabeledVector] = {
> >
> >   new EvaluateDataSetOperation[SVM,T,Double]() {
> >
> >     override def evaluateDataSet(instance: SVM, evaluateParameters:
> > ParameterMap, testing: DataSet[T]): DataSet[(Double, Double)] = {
> >       val predictor = ...
> >       testing.map(l => (l.label, predictor.predict(l.vector)))
> >
> >     }
> >   }
> > }
> >
> > How can I access to my predictor object (predictor has type
> > PredictOperation[SVM, DenseVector, T, Double]) ?
> >
> > 2- My first idea was to develop a predictOperation[T <: LabeledVector]
> > so that I could use implicit def defaultEvaluateDatasetOperation
> >
> > to get an EvaluateDataSetOperationObject. Is it also valid or not ?
> >
> > Thanks
> > Regards
> >
> > Thomas
> >
> >
> >
> >
> >
> >
> >
> > 2016-10-19 16:26 GMT+02:00 Theodore Vasiloudis <
> > [hidden email]>:
> >
> > > Hello Thomas,
> > >
> > > since you are calling evaluate here, you should be creating an
> > > EvaluateDataSet operation that works with LabeledVector, I see you are
> > > creating a new PredictOperation.
> > >
> > > On Wed, Oct 19, 2016 at 3:05 PM, Thomas FOURNIER <
> > > [hidden email]> wrote:
> > >
> > > > Hi,
> > > >
> > > > I'd like to improve SVM evaluate function so that it can use
> > > LabeledVector
> > > > (and not only Vector).
> > > > Indeed, what is done in test is the following (data is a
> > > > DataSet[LabeledVector]):
> > > >
> > > > val test = data.map(l => (l.vector, l.label))
> > > > svm.evaluate(test)
> > > >
> > > > We would like to do:
> > > > sm.evaluate(data)
> > > >
> > > >
> > > > Adding this "new" code:
> > > >
> > > > implicit def predictLabeledPoint[T <: LabeledVector] = {
> > > >  new PredictOperation  ...
> > > > }
> > > >
> > > > gives me a predictOperation that should be used with
> > > > defaultEvaluateDataSetOperation
> > > > with the correct signature (ie with T <: LabeledVector and not T<:
> > > Vector).
> > > >
> > > > Nonetheless, tests are failing:
> > > >
> > > >
> > > > it should "predict with LabeledDataPoint" in {
> > > >
> > > >   val env = ExecutionEnvironment.getExecutionEnvironment
> > > >
> > > >   val svm = SVM().
> > > >     setBlocks(env.getParallelism).
> > > >     setIterations(100).
> > > >     setLocalIterations(100).
> > > >     setRegularization(0.002).
> > > >     setStepsize(0.1).
> > > >     setSeed(0)
> > > >
> > > >   val trainingDS = env.fromCollection(Classification.trainingData)
> > > >   svm.fit(trainingDS)
> > > >   val predictionPairs = svm.evaluate(trainingDS)
> > > >
> > > >   ....
> > > > }
> > > >
> > > > There is no PredictOperation defined for
> > > > org.apache.flink.ml.classification.SVM which takes a
> > > > DataSet[org.apache.flink.ml.common.LabeledVector] as input.
> > > > java.lang.RuntimeException: There is no PredictOperation defined for
> > > > org.apache.flink.ml.classification.SVM which takes a
> > > > DataSet[org.apache.flink.ml.common.LabeledVector] as input.
> > > >
> > > >
> > > >
> > > > Thanks
> > > >
> > > > Regards
> > > > Thomas
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: FlinkML - Evaluate function should manage LabeledVector

Thomas FOURNIER
Done here: FLINK-4865 <https://issues.apache.org/jira/browse/FLINK-4865>

2016-10-20 14:07 GMT+02:00 Thomas FOURNIER <[hidden email]>:

> Ok thanks.
>
> I'm going to create a specific JIRA on this. Ok ?
>
> 2016-10-20 12:54 GMT+02:00 Theodore Vasiloudis <
> [hidden email]>:
>
>> I think this might be problematic with the current way we define the
>> predict operations because they require that both the Testing and
>> PredictionValue types are available.
>>
>> Here's what I had to do to get it to work (in
>> ml/pipeline/Predictor.scala):
>>
>> import org.apache.flink.ml.math.{Vector => FlinkVector}
>> implicit def labeledVectorEvaluateDataSetOperation[
>> Instance <: Estimator[Instance],
>> Model,
>> FlinkVector,
>> Double](
>> implicit predictOperation: PredictOperation[Instance, Model,
>> FlinkVector, Double],
>>   testingTypeInformation: TypeInformation[FlinkVector],
>>   predictionValueTypeInformation: TypeInformation[Double])
>> : EvaluateDataSetOperation[Instance, LabeledVector, Double] = {
>>   new EvaluateDataSetOperation[Instance, LabeledVector, Double] {
>>     override def evaluateDataSet(
>>                                   instance: Instance,
>>                                   evaluateParameters: ParameterMap,
>>                                   testing: DataSet[LabeledVector])
>>     : DataSet[(Double,  Double)] = {
>>       val resultingParameters = instance.parameters ++ evaluateParameters
>>       val model = predictOperation.getModel(instance,
>> resultingParameters)
>>
>>       implicit val resultTypeInformation =
>> createTypeInformation[(FlinkVector, Double)]
>>
>>       testing.mapWithBcVariable(model){
>>         (element, model) => {
>>           (element.label.asInstanceOf[Double],
>> predictOperation.predict(element.vector.asInstanceOf[FlinkVector],
>> model))
>>         }
>>       }
>>     }
>>   }
>> }
>>
>> I'm not a fan of casting objects, but the compiler complains here
>> otherwise.
>>
>> Maybe someone has some input as to why the casting is necessary here,
>> given
>> that the underlying types are correct? Probably has to do with some type
>> erasure I'm not seeing here.
>>
>> --Theo
>>
>> On Wed, Oct 19, 2016 at 10:30 PM, Thomas FOURNIER <
>> [hidden email]> wrote:
>>
>> > Hi,
>> >
>> > Two questions:
>> >
>> > 1- I was thinking of doing this:
>> >
>> > implicit def evaluateLabeledVector[T <: LabeledVector] = {
>> >
>> >   new EvaluateDataSetOperation[SVM,T,Double]() {
>> >
>> >     override def evaluateDataSet(instance: SVM, evaluateParameters:
>> > ParameterMap, testing: DataSet[T]): DataSet[(Double, Double)] = {
>> >       val predictor = ...
>> >       testing.map(l => (l.label, predictor.predict(l.vector)))
>> >
>> >     }
>> >   }
>> > }
>> >
>> > How can I access to my predictor object (predictor has type
>> > PredictOperation[SVM, DenseVector, T, Double]) ?
>> >
>> > 2- My first idea was to develop a predictOperation[T <: LabeledVector]
>> > so that I could use implicit def defaultEvaluateDatasetOperation
>> >
>> > to get an EvaluateDataSetOperationObject. Is it also valid or not ?
>> >
>> > Thanks
>> > Regards
>> >
>> > Thomas
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > 2016-10-19 16:26 GMT+02:00 Theodore Vasiloudis <
>> > [hidden email]>:
>> >
>> > > Hello Thomas,
>> > >
>> > > since you are calling evaluate here, you should be creating an
>> > > EvaluateDataSet operation that works with LabeledVector, I see you are
>> > > creating a new PredictOperation.
>> > >
>> > > On Wed, Oct 19, 2016 at 3:05 PM, Thomas FOURNIER <
>> > > [hidden email]> wrote:
>> > >
>> > > > Hi,
>> > > >
>> > > > I'd like to improve SVM evaluate function so that it can use
>> > > LabeledVector
>> > > > (and not only Vector).
>> > > > Indeed, what is done in test is the following (data is a
>> > > > DataSet[LabeledVector]):
>> > > >
>> > > > val test = data.map(l => (l.vector, l.label))
>> > > > svm.evaluate(test)
>> > > >
>> > > > We would like to do:
>> > > > sm.evaluate(data)
>> > > >
>> > > >
>> > > > Adding this "new" code:
>> > > >
>> > > > implicit def predictLabeledPoint[T <: LabeledVector] = {
>> > > >  new PredictOperation  ...
>> > > > }
>> > > >
>> > > > gives me a predictOperation that should be used with
>> > > > defaultEvaluateDataSetOperation
>> > > > with the correct signature (ie with T <: LabeledVector and not T<:
>> > > Vector).
>> > > >
>> > > > Nonetheless, tests are failing:
>> > > >
>> > > >
>> > > > it should "predict with LabeledDataPoint" in {
>> > > >
>> > > >   val env = ExecutionEnvironment.getExecutionEnvironment
>> > > >
>> > > >   val svm = SVM().
>> > > >     setBlocks(env.getParallelism).
>> > > >     setIterations(100).
>> > > >     setLocalIterations(100).
>> > > >     setRegularization(0.002).
>> > > >     setStepsize(0.1).
>> > > >     setSeed(0)
>> > > >
>> > > >   val trainingDS = env.fromCollection(Classification.trainingData)
>> > > >   svm.fit(trainingDS)
>> > > >   val predictionPairs = svm.evaluate(trainingDS)
>> > > >
>> > > >   ....
>> > > > }
>> > > >
>> > > > There is no PredictOperation defined for
>> > > > org.apache.flink.ml.classification.SVM which takes a
>> > > > DataSet[org.apache.flink.ml.common.LabeledVector] as input.
>> > > > java.lang.RuntimeException: There is no PredictOperation defined for
>> > > > org.apache.flink.ml.classification.SVM which takes a
>> > > > DataSet[org.apache.flink.ml.common.LabeledVector] as input.
>> > > >
>> > > >
>> > > >
>> > > > Thanks
>> > > >
>> > > > Regards
>> > > > Thomas
>> > > >
>> > >
>> >
>>
>
>