Flink ML Vector and DenseVector

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Flink ML Vector and DenseVector

Hilmi Yildirim
Hi,
the Vector and DenseVector implementations of Flink ML only allow Double
values. But there are cases where the values are not Doubles, e.g. in
NLP. Does it make sense to make the implementations generic, i.e.
Vector[T] and DenseVector[T]?

Best Regards,
Hilmi

--
==================================================================
Hilmi Yildirim, M.Sc.
Researcher

DFKI GmbH
Intelligente Analytik für Massendaten
DFKI Projektbüro Berlin
Alt-Moabit 91c
D-10559 Berlin
Phone: +49 30 23895 1814

E-Mail: [hidden email]

-------------------------------------------------------------
Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH
Firmensitz: Trippstadter Strasse 122, D-67663 Kaiserslautern

Geschaeftsfuehrung:
Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender)
Dr. Walter Olthoff

Vorsitzender des Aufsichtsrats:
Prof. Dr. h.c. Hans A. Aukes

Amtsgericht Kaiserslautern, HRB 2313
-------------------------------------------------------------

Reply | Threaded
Open this post in threaded view
|

Re: Flink ML Vector and DenseVector

Chiwan Park-2
Hi Hilmi,

In NLP, which types are used for vector values? I think we can cover typical case using double values.

> On Jan 18, 2016, at 9:19 PM, Hilmi Yildirim <[hidden email]> wrote:
>
> Hi,
> the Vector and DenseVector implementations of Flink ML only allow Double values. But there are cases where the values are not Doubles, e.g. in NLP. Does it make sense to make the implementations generic, i.e. Vector[T] and DenseVector[T]?
>
> Best Regards,
> Hilmi
>
> --
> ==================================================================
> Hilmi Yildirim, M.Sc.
> Researcher
>
> DFKI GmbH
> Intelligente Analytik für Massendaten
> DFKI Projektbüro Berlin
> Alt-Moabit 91c
> D-10559 Berlin
> Phone: +49 30 23895 1814
>
> E-Mail: [hidden email]
>
> -------------------------------------------------------------
> Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH
> Firmensitz: Trippstadter Strasse 122, D-67663 Kaiserslautern
>
> Geschaeftsfuehrung:
> Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender)
> Dr. Walter Olthoff
>
> Vorsitzender des Aufsichtsrats:
> Prof. Dr. h.c. Hans A. Aukes
>
> Amtsgericht Kaiserslautern, HRB 2313
> -------------------------------------------------------------
>

Regards,
Chiwan Park

Reply | Threaded
Open this post in threaded view
|

Re: Flink ML Vector and DenseVector

Hilmi Yildirim
Hi,
how I explained it in a previous E-Mail, I need a LabeledVector where
the label is also a vector. After we discussed this issue, I created a
new class named LabeledSequenceVector with the labels as a Vector. In my
use case, I want to train a POS-Tagger system, so the "vector" is a
vector of strings and the "labels" is also a vector of strings. If I use
the Flink Vector/DenseVector implementation then the vector does only
have double values but I need String values.

Best Regards,
Hilmi

Am 18.01.2016 um 13:33 schrieb Chiwan Park:

> Hi Hilmi,
>
> In NLP, which types are used for vector values? I think we can cover typical case using double values.
>
>> On Jan 18, 2016, at 9:19 PM, Hilmi Yildirim <[hidden email]> wrote:
>>
>> Hi,
>> the Vector and DenseVector implementations of Flink ML only allow Double values. But there are cases where the values are not Doubles, e.g. in NLP. Does it make sense to make the implementations generic, i.e. Vector[T] and DenseVector[T]?
>>
>> Best Regards,
>> Hilmi
>>
>> --
>> ==================================================================
>> Hilmi Yildirim, M.Sc.
>> Researcher
>>
>> DFKI GmbH
>> Intelligente Analytik für Massendaten
>> DFKI Projektbüro Berlin
>> Alt-Moabit 91c
>> D-10559 Berlin
>> Phone: +49 30 23895 1814
>>
>> E-Mail: [hidden email]
>>
>> -------------------------------------------------------------
>> Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH
>> Firmensitz: Trippstadter Strasse 122, D-67663 Kaiserslautern
>>
>> Geschaeftsfuehrung:
>> Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender)
>> Dr. Walter Olthoff
>>
>> Vorsitzender des Aufsichtsrats:
>> Prof. Dr. h.c. Hans A. Aukes
>>
>> Amtsgericht Kaiserslautern, HRB 2313
>> -------------------------------------------------------------
>>
> Regards,
> Chiwan Park
>
Reply | Threaded
Open this post in threaded view
|

Re: Flink ML Vector and DenseVector

Till Rohrmann
Hi Hilmi,

I think in your case it makes sense to define a custom vector of strings.
The easiest implementation could be an Array[String] or List[String].

The reason why it does not make so much sense to make Vector and DenseVector
generic is that these types are algebraic data types. How would you define
algebraic operations such as scalar product, outer product, multiplication,
etc. on a vector of strings? Then you would have to provide different
implementations for the different type parameters.

Cheers,
Till


On Mon, Jan 18, 2016 at 1:40 PM, Hilmi Yildirim <[hidden email]>
wrote:

> Hi,
> how I explained it in a previous E-Mail, I need a LabeledVector where the
> label is also a vector. After we discussed this issue, I created a new
> class named LabeledSequenceVector with the labels as a Vector. In my use
> case, I want to train a POS-Tagger system, so the "vector" is a vector of
> strings and the "labels" is also a vector of strings. If I use the Flink
> Vector/DenseVector implementation then the vector does only have double
> values but I need String values.
>
> Best Regards,
> Hilmi
>
>
> Am 18.01.2016 um 13:33 schrieb Chiwan Park:
>
>> Hi Hilmi,
>>
>> In NLP, which types are used for vector values? I think we can cover
>> typical case using double values.
>>
>> On Jan 18, 2016, at 9:19 PM, Hilmi Yildirim <[hidden email]>
>>> wrote:
>>>
>>> Hi,
>>> the Vector and DenseVector implementations of Flink ML only allow Double
>>> values. But there are cases where the values are not Doubles, e.g. in NLP.
>>> Does it make sense to make the implementations generic, i.e. Vector[T] and
>>> DenseVector[T]?
>>>
>>> Best Regards,
>>> Hilmi
>>>
>>> --
>>> ==================================================================
>>> Hilmi Yildirim, M.Sc.
>>> Researcher
>>>
>>> DFKI GmbH
>>> Intelligente Analytik für Massendaten
>>> DFKI Projektbüro Berlin
>>> Alt-Moabit 91c
>>> D-10559 Berlin
>>> Phone: +49 30 23895 1814
>>>
>>> E-Mail: [hidden email]
>>>
>>> -------------------------------------------------------------
>>> Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH
>>> Firmensitz: Trippstadter Strasse 122, D-67663 Kaiserslautern
>>>
>>> Geschaeftsfuehrung:
>>> Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender)
>>> Dr. Walter Olthoff
>>>
>>> Vorsitzender des Aufsichtsrats:
>>> Prof. Dr. h.c. Hans A. Aukes
>>>
>>> Amtsgericht Kaiserslautern, HRB 2313
>>> -------------------------------------------------------------
>>>
>>> Regards,
>> Chiwan Park
>>
>>
Reply | Threaded
Open this post in threaded view
|

Re: Flink ML Vector and DenseVector

Theodore Vasiloudis
I agree with Till, the data types are different here so you need a custom
string vector.

The Vector abstraction in FlinkML is designed with numerical vectors in
mind.

On Mon, Jan 18, 2016 at 2:33 PM, Till Rohrmann <[hidden email]> wrote:

> Hi Hilmi,
>
> I think in your case it makes sense to define a custom vector of strings.
> The easiest implementation could be an Array[String] or List[String].
>
> The reason why it does not make so much sense to make Vector and
> DenseVector
> generic is that these types are algebraic data types. How would you define
> algebraic operations such as scalar product, outer product, multiplication,
> etc. on a vector of strings? Then you would have to provide different
> implementations for the different type parameters.
>
> Cheers,
> Till
> ​
>
> On Mon, Jan 18, 2016 at 1:40 PM, Hilmi Yildirim <[hidden email]>
> wrote:
>
> > Hi,
> > how I explained it in a previous E-Mail, I need a LabeledVector where the
> > label is also a vector. After we discussed this issue, I created a new
> > class named LabeledSequenceVector with the labels as a Vector. In my use
> > case, I want to train a POS-Tagger system, so the "vector" is a vector of
> > strings and the "labels" is also a vector of strings. If I use the Flink
> > Vector/DenseVector implementation then the vector does only have double
> > values but I need String values.
> >
> > Best Regards,
> > Hilmi
> >
> >
> > Am 18.01.2016 um 13:33 schrieb Chiwan Park:
> >
> >> Hi Hilmi,
> >>
> >> In NLP, which types are used for vector values? I think we can cover
> >> typical case using double values.
> >>
> >> On Jan 18, 2016, at 9:19 PM, Hilmi Yildirim <[hidden email]>
> >>> wrote:
> >>>
> >>> Hi,
> >>> the Vector and DenseVector implementations of Flink ML only allow
> Double
> >>> values. But there are cases where the values are not Doubles, e.g. in
> NLP.
> >>> Does it make sense to make the implementations generic, i.e. Vector[T]
> and
> >>> DenseVector[T]?
> >>>
> >>> Best Regards,
> >>> Hilmi
> >>>
> >>> --
> >>> ==================================================================
> >>> Hilmi Yildirim, M.Sc.
> >>> Researcher
> >>>
> >>> DFKI GmbH
> >>> Intelligente Analytik für Massendaten
> >>> DFKI Projektbüro Berlin
> >>> Alt-Moabit 91c
> >>> D-10559 Berlin
> >>> Phone: +49 30 23895 1814
> >>>
> >>> E-Mail: [hidden email]
> >>>
> >>> -------------------------------------------------------------
> >>> Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH
> >>> Firmensitz: Trippstadter Strasse 122, D-67663 Kaiserslautern
> >>>
> >>> Geschaeftsfuehrung:
> >>> Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender)
> >>> Dr. Walter Olthoff
> >>>
> >>> Vorsitzender des Aufsichtsrats:
> >>> Prof. Dr. h.c. Hans A. Aukes
> >>>
> >>> Amtsgericht Kaiserslautern, HRB 2313
> >>> -------------------------------------------------------------
> >>>
> >>> Regards,
> >> Chiwan Park
> >>
> >>
>
Reply | Threaded
Open this post in threaded view
|

Re: Flink ML Vector and DenseVector

Hilmi Yildirim-2
Ok. In this case I will use an Array instead.

Am 18.01.2016 um 14:56 schrieb Theodore Vasiloudis:

> I agree with Till, the data types are different here so you need a custom
> string vector.
>
> The Vector abstraction in FlinkML is designed with numerical vectors in
> mind.
>
> On Mon, Jan 18, 2016 at 2:33 PM, Till Rohrmann <[hidden email]> wrote:
>
>> Hi Hilmi,
>>
>> I think in your case it makes sense to define a custom vector of strings.
>> The easiest implementation could be an Array[String] or List[String].
>>
>> The reason why it does not make so much sense to make Vector and
>> DenseVector
>> generic is that these types are algebraic data types. How would you define
>> algebraic operations such as scalar product, outer product, multiplication,
>> etc. on a vector of strings? Then you would have to provide different
>> implementations for the different type parameters.
>>
>> Cheers,
>> Till
>> ​
>>
>> On Mon, Jan 18, 2016 at 1:40 PM, Hilmi Yildirim <[hidden email]>
>> wrote:
>>
>>> Hi,
>>> how I explained it in a previous E-Mail, I need a LabeledVector where the
>>> label is also a vector. After we discussed this issue, I created a new
>>> class named LabeledSequenceVector with the labels as a Vector. In my use
>>> case, I want to train a POS-Tagger system, so the "vector" is a vector of
>>> strings and the "labels" is also a vector of strings. If I use the Flink
>>> Vector/DenseVector implementation then the vector does only have double
>>> values but I need String values.
>>>
>>> Best Regards,
>>> Hilmi
>>>
>>>
>>> Am 18.01.2016 um 13:33 schrieb Chiwan Park:
>>>
>>>> Hi Hilmi,
>>>>
>>>> In NLP, which types are used for vector values? I think we can cover
>>>> typical case using double values.
>>>>
>>>> On Jan 18, 2016, at 9:19 PM, Hilmi Yildirim <[hidden email]>
>>>>> wrote:
>>>>>
>>>>> Hi,
>>>>> the Vector and DenseVector implementations of Flink ML only allow
>> Double
>>>>> values. But there are cases where the values are not Doubles, e.g. in
>> NLP.
>>>>> Does it make sense to make the implementations generic, i.e. Vector[T]
>> and
>>>>> DenseVector[T]?
>>>>>
>>>>> Best Regards,
>>>>> Hilmi
>>>>>
>>>>> --
>>>>> ==================================================================
>>>>> Hilmi Yildirim, M.Sc.
>>>>> Researcher
>>>>>
>>>>> DFKI GmbH
>>>>> Intelligente Analytik für Massendaten
>>>>> DFKI Projektbüro Berlin
>>>>> Alt-Moabit 91c
>>>>> D-10559 Berlin
>>>>> Phone: +49 30 23895 1814
>>>>>
>>>>> E-Mail: [hidden email]
>>>>>
>>>>> -------------------------------------------------------------
>>>>> Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH
>>>>> Firmensitz: Trippstadter Strasse 122, D-67663 Kaiserslautern
>>>>>
>>>>> Geschaeftsfuehrung:
>>>>> Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender)
>>>>> Dr. Walter Olthoff
>>>>>
>>>>> Vorsitzender des Aufsichtsrats:
>>>>> Prof. Dr. h.c. Hans A. Aukes
>>>>>
>>>>> Amtsgericht Kaiserslautern, HRB 2313
>>>>> -------------------------------------------------------------
>>>>>
>>>>> Regards,
>>>> Chiwan Park
>>>>
>>>>

Reply | Threaded
Open this post in threaded view
|

Re: Flink ML Vector and DenseVector

Hilmi Yildirim
In reply to this post by Theodore Vasiloudis
Ok. In this case I will use an Array instead.

Am 18.01.2016 um 14:56 schrieb Theodore Vasiloudis:

> I agree with Till, the data types are different here so you need a custom
> string vector.
>
> The Vector abstraction in FlinkML is designed with numerical vectors in
> mind.
>
> On Mon, Jan 18, 2016 at 2:33 PM, Till Rohrmann <[hidden email]> wrote:
>
>> Hi Hilmi,
>>
>> I think in your case it makes sense to define a custom vector of strings.
>> The easiest implementation could be an Array[String] or List[String].
>>
>> The reason why it does not make so much sense to make Vector and
>> DenseVector
>> generic is that these types are algebraic data types. How would you define
>> algebraic operations such as scalar product, outer product, multiplication,
>> etc. on a vector of strings? Then you would have to provide different
>> implementations for the different type parameters.
>>
>> Cheers,
>> Till
>> ​
>>
>> On Mon, Jan 18, 2016 at 1:40 PM, Hilmi Yildirim <[hidden email]>
>> wrote:
>>
>>> Hi,
>>> how I explained it in a previous E-Mail, I need a LabeledVector where the
>>> label is also a vector. After we discussed this issue, I created a new
>>> class named LabeledSequenceVector with the labels as a Vector. In my use
>>> case, I want to train a POS-Tagger system, so the "vector" is a vector of
>>> strings and the "labels" is also a vector of strings. If I use the Flink
>>> Vector/DenseVector implementation then the vector does only have double
>>> values but I need String values.
>>>
>>> Best Regards,
>>> Hilmi
>>>
>>>
>>> Am 18.01.2016 um 13:33 schrieb Chiwan Park:
>>>
>>>> Hi Hilmi,
>>>>
>>>> In NLP, which types are used for vector values? I think we can cover
>>>> typical case using double values.
>>>>
>>>> On Jan 18, 2016, at 9:19 PM, Hilmi Yildirim <[hidden email]>
>>>>> wrote:
>>>>>
>>>>> Hi,
>>>>> the Vector and DenseVector implementations of Flink ML only allow
>> Double
>>>>> values. But there are cases where the values are not Doubles, e.g. in
>> NLP.
>>>>> Does it make sense to make the implementations generic, i.e. Vector[T]
>> and
>>>>> DenseVector[T]?
>>>>>
>>>>> Best Regards,
>>>>> Hilmi
>>>>>
>>>>> --
>>>>> ==================================================================
>>>>> Hilmi Yildirim, M.Sc.
>>>>> Researcher
>>>>>
>>>>> DFKI GmbH
>>>>> Intelligente Analytik für Massendaten
>>>>> DFKI Projektbüro Berlin
>>>>> Alt-Moabit 91c
>>>>> D-10559 Berlin
>>>>> Phone: +49 30 23895 1814
>>>>>
>>>>> E-Mail: [hidden email]
>>>>>
>>>>> -------------------------------------------------------------
>>>>> Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH
>>>>> Firmensitz: Trippstadter Strasse 122, D-67663 Kaiserslautern
>>>>>
>>>>> Geschaeftsfuehrung:
>>>>> Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender)
>>>>> Dr. Walter Olthoff
>>>>>
>>>>> Vorsitzender des Aufsichtsrats:
>>>>> Prof. Dr. h.c. Hans A. Aukes
>>>>>
>>>>> Amtsgericht Kaiserslautern, HRB 2313
>>>>> -------------------------------------------------------------
>>>>>
>>>>> Regards,
>>>> Chiwan Park
>>>>
>>>>


--
==================================================================
Hilmi Yildirim, M.Sc.
Researcher

DFKI GmbH
Intelligente Analytik für Massendaten
DFKI Projektbüro Berlin
Alt-Moabit 91c
D-10559 Berlin
Phone: +49 30 23895 1814

E-Mail: [hidden email]

-------------------------------------------------------------
Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH
Firmensitz: Trippstadter Strasse 122, D-67663 Kaiserslautern

Geschaeftsfuehrung:
Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender)
Dr. Walter Olthoff

Vorsitzender des Aufsichtsrats:
Prof. Dr. h.c. Hans A. Aukes

Amtsgericht Kaiserslautern, HRB 2313
-------------------------------------------------------------

Reply | Threaded
Open this post in threaded view
|

Re: Flink ML Vector and DenseVector

Chiwan Park-2
How about mapping a number for each string? Maybe you can do it with custom Transformer.

> On Jan 19, 2016, at 12:02 AM, Hilmi Yildirim <[hidden email]> wrote:
>
> Ok. In this case I will use an Array instead.
>
> Am 18.01.2016 um 14:56 schrieb Theodore Vasiloudis:
>> I agree with Till, the data types are different here so you need a custom
>> string vector.
>>
>> The Vector abstraction in FlinkML is designed with numerical vectors in
>> mind.
>>
>> On Mon, Jan 18, 2016 at 2:33 PM, Till Rohrmann <[hidden email]> wrote:
>>
>>> Hi Hilmi,
>>>
>>> I think in your case it makes sense to define a custom vector of strings.
>>> The easiest implementation could be an Array[String] or List[String].
>>>
>>> The reason why it does not make so much sense to make Vector and
>>> DenseVector
>>> generic is that these types are algebraic data types. How would you define
>>> algebraic operations such as scalar product, outer product, multiplication,
>>> etc. on a vector of strings? Then you would have to provide different
>>> implementations for the different type parameters.
>>>
>>> Cheers,
>>> Till
>>> ​
>>>
>>> On Mon, Jan 18, 2016 at 1:40 PM, Hilmi Yildirim <[hidden email]>
>>> wrote:
>>>
>>>> Hi,
>>>> how I explained it in a previous E-Mail, I need a LabeledVector where the
>>>> label is also a vector. After we discussed this issue, I created a new
>>>> class named LabeledSequenceVector with the labels as a Vector. In my use
>>>> case, I want to train a POS-Tagger system, so the "vector" is a vector of
>>>> strings and the "labels" is also a vector of strings. If I use the Flink
>>>> Vector/DenseVector implementation then the vector does only have double
>>>> values but I need String values.
>>>>
>>>> Best Regards,
>>>> Hilmi
>>>>
>>>>
>>>> Am 18.01.2016 um 13:33 schrieb Chiwan Park:
>>>>
>>>>> Hi Hilmi,
>>>>>
>>>>> In NLP, which types are used for vector values? I think we can cover
>>>>> typical case using double values.
>>>>>
>>>>> On Jan 18, 2016, at 9:19 PM, Hilmi Yildirim <[hidden email]>
>>>>>> wrote:
>>>>>>
>>>>>> Hi,
>>>>>> the Vector and DenseVector implementations of Flink ML only allow
>>> Double
>>>>>> values. But there are cases where the values are not Doubles, e.g. in
>>> NLP.
>>>>>> Does it make sense to make the implementations generic, i.e. Vector[T]
>>> and
>>>>>> DenseVector[T]?
>>>>>>
>>>>>> Best Regards,
>>>>>> Hilmi
>>>>>>
>>>>>> --
>>>>>> ==================================================================
>>>>>> Hilmi Yildirim, M.Sc.
>>>>>> Researcher
>>>>>>
>>>>>> DFKI GmbH
>>>>>> Intelligente Analytik für Massendaten
>>>>>> DFKI Projektbüro Berlin
>>>>>> Alt-Moabit 91c
>>>>>> D-10559 Berlin
>>>>>> Phone: +49 30 23895 1814
>>>>>>
>>>>>> E-Mail: [hidden email]
>>>>>>
>>>>>> -------------------------------------------------------------
>>>>>> Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH
>>>>>> Firmensitz: Trippstadter Strasse 122, D-67663 Kaiserslautern
>>>>>>
>>>>>> Geschaeftsfuehrung:
>>>>>> Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender)
>>>>>> Dr. Walter Olthoff
>>>>>>
>>>>>> Vorsitzender des Aufsichtsrats:
>>>>>> Prof. Dr. h.c. Hans A. Aukes
>>>>>>
>>>>>> Amtsgericht Kaiserslautern, HRB 2313
>>>>>> -------------------------------------------------------------
>>>>>>
>>>>>> Regards,
>>>>> Chiwan Park
>>>>>
>>>>>
>
>
> --
> ==================================================================
> Hilmi Yildirim, M.Sc.
> Researcher
>
> DFKI GmbH
> Intelligente Analytik für Massendaten
> DFKI Projektbüro Berlin
> Alt-Moabit 91c
> D-10559 Berlin
> Phone: +49 30 23895 1814
>
> E-Mail: [hidden email]
>
> -------------------------------------------------------------
> Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH
> Firmensitz: Trippstadter Strasse 122, D-67663 Kaiserslautern
>
> Geschaeftsfuehrung:
> Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender)
> Dr. Walter Olthoff
>
> Vorsitzender des Aufsichtsrats:
> Prof. Dr. h.c. Hans A. Aukes
>
> Amtsgericht Kaiserslautern, HRB 2313
> -------------------------------------------------------------
>

Regards,
Chiwan Park