Hi,
the Vector and DenseVector implementations of Flink ML only allow Double values. But there are cases where the values are not Doubles, e.g. in NLP. Does it make sense to make the implementations generic, i.e. Vector[T] and DenseVector[T]? Best Regards, Hilmi -- ================================================================== Hilmi Yildirim, M.Sc. Researcher DFKI GmbH Intelligente Analytik für Massendaten DFKI Projektbüro Berlin Alt-Moabit 91c D-10559 Berlin Phone: +49 30 23895 1814 E-Mail: [hidden email] ------------------------------------------------------------- Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH Firmensitz: Trippstadter Strasse 122, D-67663 Kaiserslautern Geschaeftsfuehrung: Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender) Dr. Walter Olthoff Vorsitzender des Aufsichtsrats: Prof. Dr. h.c. Hans A. Aukes Amtsgericht Kaiserslautern, HRB 2313 ------------------------------------------------------------- |
Hi Hilmi,
In NLP, which types are used for vector values? I think we can cover typical case using double values. > On Jan 18, 2016, at 9:19 PM, Hilmi Yildirim <[hidden email]> wrote: > > Hi, > the Vector and DenseVector implementations of Flink ML only allow Double values. But there are cases where the values are not Doubles, e.g. in NLP. Does it make sense to make the implementations generic, i.e. Vector[T] and DenseVector[T]? > > Best Regards, > Hilmi > > -- > ================================================================== > Hilmi Yildirim, M.Sc. > Researcher > > DFKI GmbH > Intelligente Analytik für Massendaten > DFKI Projektbüro Berlin > Alt-Moabit 91c > D-10559 Berlin > Phone: +49 30 23895 1814 > > E-Mail: [hidden email] > > ------------------------------------------------------------- > Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH > Firmensitz: Trippstadter Strasse 122, D-67663 Kaiserslautern > > Geschaeftsfuehrung: > Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender) > Dr. Walter Olthoff > > Vorsitzender des Aufsichtsrats: > Prof. Dr. h.c. Hans A. Aukes > > Amtsgericht Kaiserslautern, HRB 2313 > ------------------------------------------------------------- > Regards, Chiwan Park |
Hi,
how I explained it in a previous E-Mail, I need a LabeledVector where the label is also a vector. After we discussed this issue, I created a new class named LabeledSequenceVector with the labels as a Vector. In my use case, I want to train a POS-Tagger system, so the "vector" is a vector of strings and the "labels" is also a vector of strings. If I use the Flink Vector/DenseVector implementation then the vector does only have double values but I need String values. Best Regards, Hilmi Am 18.01.2016 um 13:33 schrieb Chiwan Park: > Hi Hilmi, > > In NLP, which types are used for vector values? I think we can cover typical case using double values. > >> On Jan 18, 2016, at 9:19 PM, Hilmi Yildirim <[hidden email]> wrote: >> >> Hi, >> the Vector and DenseVector implementations of Flink ML only allow Double values. But there are cases where the values are not Doubles, e.g. in NLP. Does it make sense to make the implementations generic, i.e. Vector[T] and DenseVector[T]? >> >> Best Regards, >> Hilmi >> >> -- >> ================================================================== >> Hilmi Yildirim, M.Sc. >> Researcher >> >> DFKI GmbH >> Intelligente Analytik für Massendaten >> DFKI Projektbüro Berlin >> Alt-Moabit 91c >> D-10559 Berlin >> Phone: +49 30 23895 1814 >> >> E-Mail: [hidden email] >> >> ------------------------------------------------------------- >> Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH >> Firmensitz: Trippstadter Strasse 122, D-67663 Kaiserslautern >> >> Geschaeftsfuehrung: >> Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender) >> Dr. Walter Olthoff >> >> Vorsitzender des Aufsichtsrats: >> Prof. Dr. h.c. Hans A. Aukes >> >> Amtsgericht Kaiserslautern, HRB 2313 >> ------------------------------------------------------------- >> > Regards, > Chiwan Park > |
Hi Hilmi,
I think in your case it makes sense to define a custom vector of strings. The easiest implementation could be an Array[String] or List[String]. The reason why it does not make so much sense to make Vector and DenseVector generic is that these types are algebraic data types. How would you define algebraic operations such as scalar product, outer product, multiplication, etc. on a vector of strings? Then you would have to provide different implementations for the different type parameters. Cheers, Till On Mon, Jan 18, 2016 at 1:40 PM, Hilmi Yildirim <[hidden email]> wrote: > Hi, > how I explained it in a previous E-Mail, I need a LabeledVector where the > label is also a vector. After we discussed this issue, I created a new > class named LabeledSequenceVector with the labels as a Vector. In my use > case, I want to train a POS-Tagger system, so the "vector" is a vector of > strings and the "labels" is also a vector of strings. If I use the Flink > Vector/DenseVector implementation then the vector does only have double > values but I need String values. > > Best Regards, > Hilmi > > > Am 18.01.2016 um 13:33 schrieb Chiwan Park: > >> Hi Hilmi, >> >> In NLP, which types are used for vector values? I think we can cover >> typical case using double values. >> >> On Jan 18, 2016, at 9:19 PM, Hilmi Yildirim <[hidden email]> >>> wrote: >>> >>> Hi, >>> the Vector and DenseVector implementations of Flink ML only allow Double >>> values. But there are cases where the values are not Doubles, e.g. in NLP. >>> Does it make sense to make the implementations generic, i.e. Vector[T] and >>> DenseVector[T]? >>> >>> Best Regards, >>> Hilmi >>> >>> -- >>> ================================================================== >>> Hilmi Yildirim, M.Sc. >>> Researcher >>> >>> DFKI GmbH >>> Intelligente Analytik für Massendaten >>> DFKI Projektbüro Berlin >>> Alt-Moabit 91c >>> D-10559 Berlin >>> Phone: +49 30 23895 1814 >>> >>> E-Mail: [hidden email] >>> >>> ------------------------------------------------------------- >>> Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH >>> Firmensitz: Trippstadter Strasse 122, D-67663 Kaiserslautern >>> >>> Geschaeftsfuehrung: >>> Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender) >>> Dr. Walter Olthoff >>> >>> Vorsitzender des Aufsichtsrats: >>> Prof. Dr. h.c. Hans A. Aukes >>> >>> Amtsgericht Kaiserslautern, HRB 2313 >>> ------------------------------------------------------------- >>> >>> Regards, >> Chiwan Park >> >> |
I agree with Till, the data types are different here so you need a custom
string vector. The Vector abstraction in FlinkML is designed with numerical vectors in mind. On Mon, Jan 18, 2016 at 2:33 PM, Till Rohrmann <[hidden email]> wrote: > Hi Hilmi, > > I think in your case it makes sense to define a custom vector of strings. > The easiest implementation could be an Array[String] or List[String]. > > The reason why it does not make so much sense to make Vector and > DenseVector > generic is that these types are algebraic data types. How would you define > algebraic operations such as scalar product, outer product, multiplication, > etc. on a vector of strings? Then you would have to provide different > implementations for the different type parameters. > > Cheers, > Till > > > On Mon, Jan 18, 2016 at 1:40 PM, Hilmi Yildirim <[hidden email]> > wrote: > > > Hi, > > how I explained it in a previous E-Mail, I need a LabeledVector where the > > label is also a vector. After we discussed this issue, I created a new > > class named LabeledSequenceVector with the labels as a Vector. In my use > > case, I want to train a POS-Tagger system, so the "vector" is a vector of > > strings and the "labels" is also a vector of strings. If I use the Flink > > Vector/DenseVector implementation then the vector does only have double > > values but I need String values. > > > > Best Regards, > > Hilmi > > > > > > Am 18.01.2016 um 13:33 schrieb Chiwan Park: > > > >> Hi Hilmi, > >> > >> In NLP, which types are used for vector values? I think we can cover > >> typical case using double values. > >> > >> On Jan 18, 2016, at 9:19 PM, Hilmi Yildirim <[hidden email]> > >>> wrote: > >>> > >>> Hi, > >>> the Vector and DenseVector implementations of Flink ML only allow > Double > >>> values. But there are cases where the values are not Doubles, e.g. in > NLP. > >>> Does it make sense to make the implementations generic, i.e. Vector[T] > and > >>> DenseVector[T]? > >>> > >>> Best Regards, > >>> Hilmi > >>> > >>> -- > >>> ================================================================== > >>> Hilmi Yildirim, M.Sc. > >>> Researcher > >>> > >>> DFKI GmbH > >>> Intelligente Analytik für Massendaten > >>> DFKI Projektbüro Berlin > >>> Alt-Moabit 91c > >>> D-10559 Berlin > >>> Phone: +49 30 23895 1814 > >>> > >>> E-Mail: [hidden email] > >>> > >>> ------------------------------------------------------------- > >>> Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH > >>> Firmensitz: Trippstadter Strasse 122, D-67663 Kaiserslautern > >>> > >>> Geschaeftsfuehrung: > >>> Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender) > >>> Dr. Walter Olthoff > >>> > >>> Vorsitzender des Aufsichtsrats: > >>> Prof. Dr. h.c. Hans A. Aukes > >>> > >>> Amtsgericht Kaiserslautern, HRB 2313 > >>> ------------------------------------------------------------- > >>> > >>> Regards, > >> Chiwan Park > >> > >> > |
Ok. In this case I will use an Array instead.
Am 18.01.2016 um 14:56 schrieb Theodore Vasiloudis: > I agree with Till, the data types are different here so you need a custom > string vector. > > The Vector abstraction in FlinkML is designed with numerical vectors in > mind. > > On Mon, Jan 18, 2016 at 2:33 PM, Till Rohrmann <[hidden email]> wrote: > >> Hi Hilmi, >> >> I think in your case it makes sense to define a custom vector of strings. >> The easiest implementation could be an Array[String] or List[String]. >> >> The reason why it does not make so much sense to make Vector and >> DenseVector >> generic is that these types are algebraic data types. How would you define >> algebraic operations such as scalar product, outer product, multiplication, >> etc. on a vector of strings? Then you would have to provide different >> implementations for the different type parameters. >> >> Cheers, >> Till >> >> >> On Mon, Jan 18, 2016 at 1:40 PM, Hilmi Yildirim <[hidden email]> >> wrote: >> >>> Hi, >>> how I explained it in a previous E-Mail, I need a LabeledVector where the >>> label is also a vector. After we discussed this issue, I created a new >>> class named LabeledSequenceVector with the labels as a Vector. In my use >>> case, I want to train a POS-Tagger system, so the "vector" is a vector of >>> strings and the "labels" is also a vector of strings. If I use the Flink >>> Vector/DenseVector implementation then the vector does only have double >>> values but I need String values. >>> >>> Best Regards, >>> Hilmi >>> >>> >>> Am 18.01.2016 um 13:33 schrieb Chiwan Park: >>> >>>> Hi Hilmi, >>>> >>>> In NLP, which types are used for vector values? I think we can cover >>>> typical case using double values. >>>> >>>> On Jan 18, 2016, at 9:19 PM, Hilmi Yildirim <[hidden email]> >>>>> wrote: >>>>> >>>>> Hi, >>>>> the Vector and DenseVector implementations of Flink ML only allow >> Double >>>>> values. But there are cases where the values are not Doubles, e.g. in >> NLP. >>>>> Does it make sense to make the implementations generic, i.e. Vector[T] >> and >>>>> DenseVector[T]? >>>>> >>>>> Best Regards, >>>>> Hilmi >>>>> >>>>> -- >>>>> ================================================================== >>>>> Hilmi Yildirim, M.Sc. >>>>> Researcher >>>>> >>>>> DFKI GmbH >>>>> Intelligente Analytik für Massendaten >>>>> DFKI Projektbüro Berlin >>>>> Alt-Moabit 91c >>>>> D-10559 Berlin >>>>> Phone: +49 30 23895 1814 >>>>> >>>>> E-Mail: [hidden email] >>>>> >>>>> ------------------------------------------------------------- >>>>> Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH >>>>> Firmensitz: Trippstadter Strasse 122, D-67663 Kaiserslautern >>>>> >>>>> Geschaeftsfuehrung: >>>>> Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender) >>>>> Dr. Walter Olthoff >>>>> >>>>> Vorsitzender des Aufsichtsrats: >>>>> Prof. Dr. h.c. Hans A. Aukes >>>>> >>>>> Amtsgericht Kaiserslautern, HRB 2313 >>>>> ------------------------------------------------------------- >>>>> >>>>> Regards, >>>> Chiwan Park >>>> >>>> |
In reply to this post by Theodore Vasiloudis
Ok. In this case I will use an Array instead.
Am 18.01.2016 um 14:56 schrieb Theodore Vasiloudis: > I agree with Till, the data types are different here so you need a custom > string vector. > > The Vector abstraction in FlinkML is designed with numerical vectors in > mind. > > On Mon, Jan 18, 2016 at 2:33 PM, Till Rohrmann <[hidden email]> wrote: > >> Hi Hilmi, >> >> I think in your case it makes sense to define a custom vector of strings. >> The easiest implementation could be an Array[String] or List[String]. >> >> The reason why it does not make so much sense to make Vector and >> DenseVector >> generic is that these types are algebraic data types. How would you define >> algebraic operations such as scalar product, outer product, multiplication, >> etc. on a vector of strings? Then you would have to provide different >> implementations for the different type parameters. >> >> Cheers, >> Till >> >> >> On Mon, Jan 18, 2016 at 1:40 PM, Hilmi Yildirim <[hidden email]> >> wrote: >> >>> Hi, >>> how I explained it in a previous E-Mail, I need a LabeledVector where the >>> label is also a vector. After we discussed this issue, I created a new >>> class named LabeledSequenceVector with the labels as a Vector. In my use >>> case, I want to train a POS-Tagger system, so the "vector" is a vector of >>> strings and the "labels" is also a vector of strings. If I use the Flink >>> Vector/DenseVector implementation then the vector does only have double >>> values but I need String values. >>> >>> Best Regards, >>> Hilmi >>> >>> >>> Am 18.01.2016 um 13:33 schrieb Chiwan Park: >>> >>>> Hi Hilmi, >>>> >>>> In NLP, which types are used for vector values? I think we can cover >>>> typical case using double values. >>>> >>>> On Jan 18, 2016, at 9:19 PM, Hilmi Yildirim <[hidden email]> >>>>> wrote: >>>>> >>>>> Hi, >>>>> the Vector and DenseVector implementations of Flink ML only allow >> Double >>>>> values. But there are cases where the values are not Doubles, e.g. in >> NLP. >>>>> Does it make sense to make the implementations generic, i.e. Vector[T] >> and >>>>> DenseVector[T]? >>>>> >>>>> Best Regards, >>>>> Hilmi >>>>> >>>>> -- >>>>> ================================================================== >>>>> Hilmi Yildirim, M.Sc. >>>>> Researcher >>>>> >>>>> DFKI GmbH >>>>> Intelligente Analytik für Massendaten >>>>> DFKI Projektbüro Berlin >>>>> Alt-Moabit 91c >>>>> D-10559 Berlin >>>>> Phone: +49 30 23895 1814 >>>>> >>>>> E-Mail: [hidden email] >>>>> >>>>> ------------------------------------------------------------- >>>>> Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH >>>>> Firmensitz: Trippstadter Strasse 122, D-67663 Kaiserslautern >>>>> >>>>> Geschaeftsfuehrung: >>>>> Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender) >>>>> Dr. Walter Olthoff >>>>> >>>>> Vorsitzender des Aufsichtsrats: >>>>> Prof. Dr. h.c. Hans A. Aukes >>>>> >>>>> Amtsgericht Kaiserslautern, HRB 2313 >>>>> ------------------------------------------------------------- >>>>> >>>>> Regards, >>>> Chiwan Park >>>> >>>> -- ================================================================== Hilmi Yildirim, M.Sc. Researcher DFKI GmbH Intelligente Analytik für Massendaten DFKI Projektbüro Berlin Alt-Moabit 91c D-10559 Berlin Phone: +49 30 23895 1814 E-Mail: [hidden email] ------------------------------------------------------------- Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH Firmensitz: Trippstadter Strasse 122, D-67663 Kaiserslautern Geschaeftsfuehrung: Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender) Dr. Walter Olthoff Vorsitzender des Aufsichtsrats: Prof. Dr. h.c. Hans A. Aukes Amtsgericht Kaiserslautern, HRB 2313 ------------------------------------------------------------- |
How about mapping a number for each string? Maybe you can do it with custom Transformer.
> On Jan 19, 2016, at 12:02 AM, Hilmi Yildirim <[hidden email]> wrote: > > Ok. In this case I will use an Array instead. > > Am 18.01.2016 um 14:56 schrieb Theodore Vasiloudis: >> I agree with Till, the data types are different here so you need a custom >> string vector. >> >> The Vector abstraction in FlinkML is designed with numerical vectors in >> mind. >> >> On Mon, Jan 18, 2016 at 2:33 PM, Till Rohrmann <[hidden email]> wrote: >> >>> Hi Hilmi, >>> >>> I think in your case it makes sense to define a custom vector of strings. >>> The easiest implementation could be an Array[String] or List[String]. >>> >>> The reason why it does not make so much sense to make Vector and >>> DenseVector >>> generic is that these types are algebraic data types. How would you define >>> algebraic operations such as scalar product, outer product, multiplication, >>> etc. on a vector of strings? Then you would have to provide different >>> implementations for the different type parameters. >>> >>> Cheers, >>> Till >>> >>> >>> On Mon, Jan 18, 2016 at 1:40 PM, Hilmi Yildirim <[hidden email]> >>> wrote: >>> >>>> Hi, >>>> how I explained it in a previous E-Mail, I need a LabeledVector where the >>>> label is also a vector. After we discussed this issue, I created a new >>>> class named LabeledSequenceVector with the labels as a Vector. In my use >>>> case, I want to train a POS-Tagger system, so the "vector" is a vector of >>>> strings and the "labels" is also a vector of strings. If I use the Flink >>>> Vector/DenseVector implementation then the vector does only have double >>>> values but I need String values. >>>> >>>> Best Regards, >>>> Hilmi >>>> >>>> >>>> Am 18.01.2016 um 13:33 schrieb Chiwan Park: >>>> >>>>> Hi Hilmi, >>>>> >>>>> In NLP, which types are used for vector values? I think we can cover >>>>> typical case using double values. >>>>> >>>>> On Jan 18, 2016, at 9:19 PM, Hilmi Yildirim <[hidden email]> >>>>>> wrote: >>>>>> >>>>>> Hi, >>>>>> the Vector and DenseVector implementations of Flink ML only allow >>> Double >>>>>> values. But there are cases where the values are not Doubles, e.g. in >>> NLP. >>>>>> Does it make sense to make the implementations generic, i.e. Vector[T] >>> and >>>>>> DenseVector[T]? >>>>>> >>>>>> Best Regards, >>>>>> Hilmi >>>>>> >>>>>> -- >>>>>> ================================================================== >>>>>> Hilmi Yildirim, M.Sc. >>>>>> Researcher >>>>>> >>>>>> DFKI GmbH >>>>>> Intelligente Analytik für Massendaten >>>>>> DFKI Projektbüro Berlin >>>>>> Alt-Moabit 91c >>>>>> D-10559 Berlin >>>>>> Phone: +49 30 23895 1814 >>>>>> >>>>>> E-Mail: [hidden email] >>>>>> >>>>>> ------------------------------------------------------------- >>>>>> Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH >>>>>> Firmensitz: Trippstadter Strasse 122, D-67663 Kaiserslautern >>>>>> >>>>>> Geschaeftsfuehrung: >>>>>> Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender) >>>>>> Dr. Walter Olthoff >>>>>> >>>>>> Vorsitzender des Aufsichtsrats: >>>>>> Prof. Dr. h.c. Hans A. Aukes >>>>>> >>>>>> Amtsgericht Kaiserslautern, HRB 2313 >>>>>> ------------------------------------------------------------- >>>>>> >>>>>> Regards, >>>>> Chiwan Park >>>>> >>>>> > > > -- > ================================================================== > Hilmi Yildirim, M.Sc. > Researcher > > DFKI GmbH > Intelligente Analytik für Massendaten > DFKI Projektbüro Berlin > Alt-Moabit 91c > D-10559 Berlin > Phone: +49 30 23895 1814 > > E-Mail: [hidden email] > > ------------------------------------------------------------- > Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH > Firmensitz: Trippstadter Strasse 122, D-67663 Kaiserslautern > > Geschaeftsfuehrung: > Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender) > Dr. Walter Olthoff > > Vorsitzender des Aufsichtsrats: > Prof. Dr. h.c. Hans A. Aukes > > Amtsgericht Kaiserslautern, HRB 2313 > ------------------------------------------------------------- > Regards, Chiwan Park |
Free forum by Nabble | Edit this page |