Move Row, RowInputFormat to core package

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Move Row, RowInputFormat to core package

Anton Solovev
Hello,



In Scala case classes can store huge count of fields, it's really helpful for reading wide csv files, but It uses only in table api.

what about this issue (https://issues.apache.org/jira/browse/FLINK-2186), should we use table api in machine learning library?

To solve the issue #readCsvFile can generate RowInputFormat.

For commodity I added another one constructor in RowTypeInfo (https://github.com/apache/flink/compare/master...tonycox:FLINK-2186-x)

What do you think about add some scala and moving Row to Flink core?
Reply | Threaded
Open this post in threaded view
|

Re: Move Row, RowInputFormat to core package

Timo Walther-2
Hi Anton,

I would also support the idea of moving Row and RowTypeInfo to Flink
core. I think there are many real-world use cases where a
variable-length record that supports null values is required. However, I
think that those classes needs to be reworked before. They should not
depend on Scala-related things.

RowTypeInfo should not inherit from CaseClassTypeInfo, the current
solution with the dummy field names is a hacky solution anyway. Row
should not inherit from Scala classes.

Regards,
Timo

Am 24/11/16 um 16:46 schrieb Anton Solovev:

> Hello,
>
>
>
> In Scala case classes can store huge count of fields, it's really helpful for reading wide csv files, but It uses only in table api.
>
> what about this issue (https://issues.apache.org/jira/browse/FLINK-2186), should we use table api in machine learning library?
>
> To solve the issue #readCsvFile can generate RowInputFormat.
>
> For commodity I added another one constructor in RowTypeInfo (https://github.com/apache/flink/compare/master...tonycox:FLINK-2186-x)
>
> What do you think about add some scala and moving Row to Flink core?
>

Reply | Threaded
Open this post in threaded view
|

Re: Move Row, RowInputFormat to core package

Flavio Pompermaier
Fully agree with Timo :)

On Fri, Nov 25, 2016 at 2:30 PM, Timo Walther <[hidden email]> wrote:

> Hi Anton,
>
> I would also support the idea of moving Row and RowTypeInfo to Flink core.
> I think there are many real-world use cases where a variable-length record
> that supports null values is required. However, I think that those classes
> needs to be reworked before. They should not depend on Scala-related things.
>
> RowTypeInfo should not inherit from CaseClassTypeInfo, the current
> solution with the dummy field names is a hacky solution anyway. Row should
> not inherit from Scala classes.
>
> Regards,
> Timo
>
> Am 24/11/16 um 16:46 schrieb Anton Solovev:
>
>> Hello,
>>
>>
>>
>> In Scala case classes can store huge count of fields, it's really helpful
>> for reading wide csv files, but It uses only in table api.
>>
>> what about this issue (https://issues.apache.org/jira/browse/FLINK-2186),
>> should we use table api in machine learning library?
>>
>> To solve the issue #readCsvFile can generate RowInputFormat.
>>
>> For commodity I added another one constructor in RowTypeInfo (
>> https://github.com/apache/flink/compare/master...tonycox:FLINK-2186-x)
>>
>> What do you think about add some scala and moving Row to Flink core?
>>
>>
Reply | Threaded
Open this post in threaded view
|

RE: Move Row, RowInputFormat to core package

Anton Solovev
I agree that we should improve RowTypeInfo. But why not to keep it in Scala?
In case flink-2186 that the "Row" is a "Product" is a reason of supporting wide columns indeed.
Just for example I tried to move the "Row" to flink-scala module
(https://github.com/apache/flink/compare/master...tonycox:FLINK-2186-x)
(https://travis-ci.org/tonycox/flink/builds/178846355)

-----Original Message-----
From: Flavio Pompermaier [mailto:[hidden email]]
Sent: Friday, November 25, 2016 5:59 PM
To: [hidden email]
Subject: Re: Move Row, RowInputFormat to core package

Fully agree with Timo :)

On Fri, Nov 25, 2016 at 2:30 PM, Timo Walther <[hidden email]> wrote:

> Hi Anton,
>
> I would also support the idea of moving Row and RowTypeInfo to Flink core.
> I think there are many real-world use cases where a variable-length
> record that supports null values is required. However, I think that
> those classes needs to be reworked before. They should not depend on Scala-related things.
>
> RowTypeInfo should not inherit from CaseClassTypeInfo, the current
> solution with the dummy field names is a hacky solution anyway. Row
> should not inherit from Scala classes.
>
> Regards,
> Timo
>
> Am 24/11/16 um 16:46 schrieb Anton Solovev:
>
>> Hello,
>>
>>
>>
>> In Scala case classes can store huge count of fields, it's really
>> helpful for reading wide csv files, but It uses only in table api.
>>
>> what about this issue
>> (https://issues.apache.org/jira/browse/FLINK-2186),
>> should we use table api in machine learning library?
>>
>> To solve the issue #readCsvFile can generate RowInputFormat.
>>
>> For commodity I added another one constructor in RowTypeInfo (
>> https://github.com/apache/flink/compare/master...tonycox:FLINK-2186-x
>> )
>>
>> What do you think about add some scala and moving Row to Flink core?
>>
>>
Reply | Threaded
Open this post in threaded view
|

Re: Move Row, RowInputFormat to core package

Aljoscha Krettek-2
If we move it to core, we have to untangle it from Scala, as Timo said. The
reason is that we would like to remove Scala from any user facing API maven
packages and if we had it in core everyone would have to suffix maven
packages with the Scala version.

On Fri, 25 Nov 2016 at 16:47 Anton Solovev <[hidden email]> wrote:

> I agree that we should improve RowTypeInfo. But why not to keep it in
> Scala?
> In case flink-2186 that the "Row" is a "Product" is a reason of supporting
> wide columns indeed.
> Just for example I tried to move the "Row" to flink-scala module
> (https://github.com/apache/flink/compare/master...tonycox:FLINK-2186-x)
> (https://travis-ci.org/tonycox/flink/builds/178846355)
>
> -----Original Message-----
> From: Flavio Pompermaier [mailto:[hidden email]]
> Sent: Friday, November 25, 2016 5:59 PM
> To: [hidden email]
> Subject: Re: Move Row, RowInputFormat to core package
>
> Fully agree with Timo :)
>
> On Fri, Nov 25, 2016 at 2:30 PM, Timo Walther <[hidden email]> wrote:
>
> > Hi Anton,
> >
> > I would also support the idea of moving Row and RowTypeInfo to Flink
> core.
> > I think there are many real-world use cases where a variable-length
> > record that supports null values is required. However, I think that
> > those classes needs to be reworked before. They should not depend on
> Scala-related things.
> >
> > RowTypeInfo should not inherit from CaseClassTypeInfo, the current
> > solution with the dummy field names is a hacky solution anyway. Row
> > should not inherit from Scala classes.
> >
> > Regards,
> > Timo
> >
> > Am 24/11/16 um 16:46 schrieb Anton Solovev:
> >
> >> Hello,
> >>
> >>
> >>
> >> In Scala case classes can store huge count of fields, it's really
> >> helpful for reading wide csv files, but It uses only in table api.
> >>
> >> what about this issue
> >> (https://issues.apache.org/jira/browse/FLINK-2186),
> >> should we use table api in machine learning library?
> >>
> >> To solve the issue #readCsvFile can generate RowInputFormat.
> >>
> >> For commodity I added another one constructor in RowTypeInfo (
> >> https://github.com/apache/flink/compare/master...tonycox:FLINK-2186-x
> >> )
> >>
> >> What do you think about add some scala and moving Row to Flink core?
> >>
> >>
>
Reply | Threaded
Open this post in threaded view
|

RE: Move Row, RowInputFormat to core package

Anton Solovev
What do you think about moving "Row" not into core module, but into Scala module?

-----Original Message-----
From: Aljoscha Krettek [mailto:[hidden email]]
Sent: Monday, November 28, 2016 3:00 PM
To: [hidden email]
Subject: Re: Move Row, RowInputFormat to core package

If we move it to core, we have to untangle it from Scala, as Timo said. The reason is that we would like to remove Scala from any user facing API maven packages and if we had it in core everyone would have to suffix maven packages with the Scala version.

On Fri, 25 Nov 2016 at 16:47 Anton Solovev <[hidden email]> wrote:

> I agree that we should improve RowTypeInfo. But why not to keep it in
> Scala?
> In case flink-2186 that the "Row" is a "Product" is a reason of
> supporting wide columns indeed.
> Just for example I tried to move the "Row" to flink-scala module
> (https://github.com/apache/flink/compare/master...tonycox:FLINK-2186-x
> )
> (https://travis-ci.org/tonycox/flink/builds/178846355)
>
> -----Original Message-----
> From: Flavio Pompermaier [mailto:[hidden email]]
> Sent: Friday, November 25, 2016 5:59 PM
> To: [hidden email]
> Subject: Re: Move Row, RowInputFormat to core package
>
> Fully agree with Timo :)
>
> On Fri, Nov 25, 2016 at 2:30 PM, Timo Walther <[hidden email]> wrote:
>
> > Hi Anton,
> >
> > I would also support the idea of moving Row and RowTypeInfo to Flink
> core.
> > I think there are many real-world use cases where a variable-length
> > record that supports null values is required. However, I think that
> > those classes needs to be reworked before. They should not depend on
> Scala-related things.
> >
> > RowTypeInfo should not inherit from CaseClassTypeInfo, the current
> > solution with the dummy field names is a hacky solution anyway. Row
> > should not inherit from Scala classes.
> >
> > Regards,
> > Timo
> >
> > Am 24/11/16 um 16:46 schrieb Anton Solovev:
> >
> >> Hello,
> >>
> >>
> >>
> >> In Scala case classes can store huge count of fields, it's really
> >> helpful for reading wide csv files, but It uses only in table api.
> >>
> >> what about this issue
> >> (https://issues.apache.org/jira/browse/FLINK-2186),
> >> should we use table api in machine learning library?
> >>
> >> To solve the issue #readCsvFile can generate RowInputFormat.
> >>
> >> For commodity I added another one constructor in RowTypeInfo (
> >> https://github.com/apache/flink/compare/master...tonycox:FLINK-2186
> >> -x
> >> )
> >>
> >> What do you think about add some scala and moving Row to Flink core?
> >>
> >>
>
Reply | Threaded
Open this post in threaded view
|

Re: Move Row, RowInputFormat to core package

Timo Walther-2
There is no good reason why only Scala developers should have access to
this class. If we also want to read CSV files into Row objects, Row must
be in core.


Am 28/11/16 um 14:37 schrieb Anton Solovev:

> What do you think about moving "Row" not into core module, but into Scala module?
>
> -----Original Message-----
> From: Aljoscha Krettek [mailto:[hidden email]]
> Sent: Monday, November 28, 2016 3:00 PM
> To: [hidden email]
> Subject: Re: Move Row, RowInputFormat to core package
>
> If we move it to core, we have to untangle it from Scala, as Timo said. The reason is that we would like to remove Scala from any user facing API maven packages and if we had it in core everyone would have to suffix maven packages with the Scala version.
>
> On Fri, 25 Nov 2016 at 16:47 Anton Solovev <[hidden email]> wrote:
>
>> I agree that we should improve RowTypeInfo. But why not to keep it in
>> Scala?
>> In case flink-2186 that the "Row" is a "Product" is a reason of
>> supporting wide columns indeed.
>> Just for example I tried to move the "Row" to flink-scala module
>> (https://github.com/apache/flink/compare/master...tonycox:FLINK-2186-x
>> )
>> (https://travis-ci.org/tonycox/flink/builds/178846355)
>>
>> -----Original Message-----
>> From: Flavio Pompermaier [mailto:[hidden email]]
>> Sent: Friday, November 25, 2016 5:59 PM
>> To: [hidden email]
>> Subject: Re: Move Row, RowInputFormat to core package
>>
>> Fully agree with Timo :)
>>
>> On Fri, Nov 25, 2016 at 2:30 PM, Timo Walther <[hidden email]> wrote:
>>
>>> Hi Anton,
>>>
>>> I would also support the idea of moving Row and RowTypeInfo to Flink
>> core.
>>> I think there are many real-world use cases where a variable-length
>>> record that supports null values is required. However, I think that
>>> those classes needs to be reworked before. They should not depend on
>> Scala-related things.
>>> RowTypeInfo should not inherit from CaseClassTypeInfo, the current
>>> solution with the dummy field names is a hacky solution anyway. Row
>>> should not inherit from Scala classes.
>>>
>>> Regards,
>>> Timo
>>>
>>> Am 24/11/16 um 16:46 schrieb Anton Solovev:
>>>
>>>> Hello,
>>>>
>>>>
>>>>
>>>> In Scala case classes can store huge count of fields, it's really
>>>> helpful for reading wide csv files, but It uses only in table api.
>>>>
>>>> what about this issue
>>>> (https://issues.apache.org/jira/browse/FLINK-2186),
>>>> should we use table api in machine learning library?
>>>>
>>>> To solve the issue #readCsvFile can generate RowInputFormat.
>>>>
>>>> For commodity I added another one constructor in RowTypeInfo (
>>>> https://github.com/apache/flink/compare/master...tonycox:FLINK-2186
>>>> -x
>>>> )
>>>>
>>>> What do you think about add some scala and moving Row to Flink core?
>>>>
>>>>


--
Freundliche Grüße / Kind Regards

Timo Walther

Follow me: @twalthr
https://www.linkedin.com/in/twalthr