Adding flink-scala as a dependency to flink-streaming-core

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Adding flink-scala as a dependency to flink-streaming-core

Gábor Gévay
Hello,

I would like to ask if it would be OK if I added flink-scala as a
dependency to flink-streaming-core. An alternative would be to move
the Scala typeutils to flink-core (to where the Java typeutils are).
Why I need this:

While I am implementing the fast median calculation for windows as
part of my Google Summer of Code project, I am refactoring the way
sum, min, max, etc. are accessing the user-specified field
(https://github.com/apache/flink/pull/684). Currently both the logic
of their aggregators are duplicated for the different kinds of types
(tuple, pojo, array, Scala case class, simple), and also the field
access logic is duplicated across the different aggregators. In my
GSoC project I will implement some further methods (avg, variance,
etc.) that take the same kind of parameters as sum, min, etc., so it
will be neccassary to have the field access logic centralized (this is
the FieldAccessor class in the PR). It would be convenient if this
could also handle Scala case classes, for which CaseClassTypeInfo is
needed which is currently in flink-scala.

Best regards,
Gabor
Reply | Threaded
Open this post in threaded view
|

Re: Adding flink-scala as a dependency to flink-streaming-core

Till Rohrmann
I see the reason why you want to add flink-scala as a dependency to
flink-streaming-core. However, it does not feel right to add an API package
to a core package IMHO.

But I noticed that flink-streaming-core also depends on flink-java. Which
seems odd to me as well. I'm not a streaming expert and thus cannot tell
much about the reasons why a core package has a dependency on an API
package but for me this looks more like an indicator for a necessary
restructuring of our packages. Maybe someone working on the streaming parts
can chime in and shed some light on the required dependencies.

Cheers,
Till

On Wed, Jun 10, 2015 at 2:13 PM Gábor Gévay <[hidden email]> wrote:

> Hello,
>
> I would like to ask if it would be OK if I added flink-scala as a
> dependency to flink-streaming-core. An alternative would be to move
> the Scala typeutils to flink-core (to where the Java typeutils are).
> Why I need this:
>
> While I am implementing the fast median calculation for windows as
> part of my Google Summer of Code project, I am refactoring the way
> sum, min, max, etc. are accessing the user-specified field
> (https://github.com/apache/flink/pull/684). Currently both the logic
> of their aggregators are duplicated for the different kinds of types
> (tuple, pojo, array, Scala case class, simple), and also the field
> access logic is duplicated across the different aggregators. In my
> GSoC project I will implement some further methods (avg, variance,
> etc.) that take the same kind of parameters as sum, min, etc., so it
> will be neccassary to have the field access logic centralized (this is
> the FieldAccessor class in the PR). It would be convenient if this
> could also handle Scala case classes, for which CaseClassTypeInfo is
> needed which is currently in flink-scala.
>
> Best regards,
> Gabor
>
Reply | Threaded
Open this post in threaded view
|

Re: Adding flink-scala as a dependency to flink-streaming-core

Till Rohrmann
Btw: I noticed that all streaming modules depend on flink-core,
flink-runtime, flink-clients and flink-java. Is there a particular reason
why the streaming connectors depend on flink-clients and flink-java?

On Wed, Jun 10, 2015 at 3:41 PM Till Rohrmann <[hidden email]> wrote:

> I see the reason why you want to add flink-scala as a dependency to
> flink-streaming-core. However, it does not feel right to add an API package
> to a core package IMHO.
>
> But I noticed that flink-streaming-core also depends on flink-java. Which
> seems odd to me as well. I'm not a streaming expert and thus cannot tell
> much about the reasons why a core package has a dependency on an API
> package but for me this looks more like an indicator for a necessary
> restructuring of our packages. Maybe someone working on the streaming parts
> can chime in and shed some light on the required dependencies.
>
> Cheers,
> Till
>
>
> On Wed, Jun 10, 2015 at 2:13 PM Gábor Gévay <[hidden email]> wrote:
>
>> Hello,
>>
>> I would like to ask if it would be OK if I added flink-scala as a
>> dependency to flink-streaming-core. An alternative would be to move
>> the Scala typeutils to flink-core (to where the Java typeutils are).
>> Why I need this:
>>
>> While I am implementing the fast median calculation for windows as
>> part of my Google Summer of Code project, I am refactoring the way
>> sum, min, max, etc. are accessing the user-specified field
>> (https://github.com/apache/flink/pull/684). Currently both the logic
>> of their aggregators are duplicated for the different kinds of types
>> (tuple, pojo, array, Scala case class, simple), and also the field
>> access logic is duplicated across the different aggregators. In my
>> GSoC project I will implement some further methods (avg, variance,
>> etc.) that take the same kind of parameters as sum, min, etc., so it
>> will be neccassary to have the field access logic centralized (this is
>> the FieldAccessor class in the PR). It would be convenient if this
>> could also handle Scala case classes, for which CaseClassTypeInfo is
>> needed which is currently in flink-scala.
>>
>> Best regards,
>> Gabor
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: Adding flink-scala as a dependency to flink-streaming-core

Márton Balassi
As for the current streaming dependencies: flink-java is not needed at all,
flink-clients is only needed in specific cases. Thanks for spotting it, I
am cleaning it.

As for Gabor's problem: maybe we should move the CaseClassTypeInfo to
flink-core then given Till's comment. Any opinions on that?

On Wed, Jun 10, 2015 at 3:46 PM, Till Rohrmann <[hidden email]> wrote:

> Btw: I noticed that all streaming modules depend on flink-core,
> flink-runtime, flink-clients and flink-java. Is there a particular reason
> why the streaming connectors depend on flink-clients and flink-java?
>
> On Wed, Jun 10, 2015 at 3:41 PM Till Rohrmann <[hidden email]>
> wrote:
>
> > I see the reason why you want to add flink-scala as a dependency to
> > flink-streaming-core. However, it does not feel right to add an API
> package
> > to a core package IMHO.
> >
> > But I noticed that flink-streaming-core also depends on flink-java. Which
> > seems odd to me as well. I'm not a streaming expert and thus cannot tell
> > much about the reasons why a core package has a dependency on an API
> > package but for me this looks more like an indicator for a necessary
> > restructuring of our packages. Maybe someone working on the streaming
> parts
> > can chime in and shed some light on the required dependencies.
> >
> > Cheers,
> > Till
> >
> >
> > On Wed, Jun 10, 2015 at 2:13 PM Gábor Gévay <[hidden email]> wrote:
> >
> >> Hello,
> >>
> >> I would like to ask if it would be OK if I added flink-scala as a
> >> dependency to flink-streaming-core. An alternative would be to move
> >> the Scala typeutils to flink-core (to where the Java typeutils are).
> >> Why I need this:
> >>
> >> While I am implementing the fast median calculation for windows as
> >> part of my Google Summer of Code project, I am refactoring the way
> >> sum, min, max, etc. are accessing the user-specified field
> >> (https://github.com/apache/flink/pull/684). Currently both the logic
> >> of their aggregators are duplicated for the different kinds of types
> >> (tuple, pojo, array, Scala case class, simple), and also the field
> >> access logic is duplicated across the different aggregators. In my
> >> GSoC project I will implement some further methods (avg, variance,
> >> etc.) that take the same kind of parameters as sum, min, etc., so it
> >> will be neccassary to have the field access logic centralized (this is
> >> the FieldAccessor class in the PR). It would be convenient if this
> >> could also handle Scala case classes, for which CaseClassTypeInfo is
> >> needed which is currently in flink-scala.
> >>
> >> Best regards,
> >> Gabor
> >>
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Adding flink-scala as a dependency to flink-streaming-core

Gábor Gévay
In reply to this post by Till Rohrmann
> "it does not feel right to add an API package to a core package

Yes, that makes sense. I just tried removing the flink-java dependency
from flink-streaming to see what needs it, and it builds fine without
it :)

What do you think about the second option? (to move the Scala
typeutils (or just CaseClassTypeInfo and CaseClassComparator) to
flink-core, to where the Java typeutils are) This would be mixing
scala code to an otherwise java module, but I don't know whether that
is a problem.

Cheers,
Gabor

2015-06-10 15:46 GMT+02:00 Till Rohrmann <[hidden email]>:

> Btw: I noticed that all streaming modules depend on flink-core,
> flink-runtime, flink-clients and flink-java. Is there a particular reason
> why the streaming connectors depend on flink-clients and flink-java?
>
> On Wed, Jun 10, 2015 at 3:41 PM Till Rohrmann <[hidden email]> wrote:
>
>> I see the reason why you want to add flink-scala as a dependency to
>> flink-streaming-core. However, it does not feel right to add an API package
>> to a core package IMHO.
>>
>> But I noticed that flink-streaming-core also depends on flink-java. Which
>> seems odd to me as well. I'm not a streaming expert and thus cannot tell
>> much about the reasons why a core package has a dependency on an API
>> package but for me this looks more like an indicator for a necessary
>> restructuring of our packages. Maybe someone working on the streaming parts
>> can chime in and shed some light on the required dependencies.
>>
>> Cheers,
>> Till
>>
>>
>> On Wed, Jun 10, 2015 at 2:13 PM Gábor Gévay <[hidden email]> wrote:
>>
>>> Hello,
>>>
>>> I would like to ask if it would be OK if I added flink-scala as a
>>> dependency to flink-streaming-core. An alternative would be to move
>>> the Scala typeutils to flink-core (to where the Java typeutils are).
>>> Why I need this:
>>>
>>> While I am implementing the fast median calculation for windows as
>>> part of my Google Summer of Code project, I am refactoring the way
>>> sum, min, max, etc. are accessing the user-specified field
>>> (https://github.com/apache/flink/pull/684). Currently both the logic
>>> of their aggregators are duplicated for the different kinds of types
>>> (tuple, pojo, array, Scala case class, simple), and also the field
>>> access logic is duplicated across the different aggregators. In my
>>> GSoC project I will implement some further methods (avg, variance,
>>> etc.) that take the same kind of parameters as sum, min, etc., so it
>>> will be neccassary to have the field access logic centralized (this is
>>> the FieldAccessor class in the PR). It would be convenient if this
>>> could also handle Scala case classes, for which CaseClassTypeInfo is
>>> needed which is currently in flink-scala.
>>>
>>> Best regards,
>>> Gabor
>>>
>>
Reply | Threaded
Open this post in threaded view
|

Re: Adding flink-scala as a dependency to flink-streaming-core

Aljoscha Krettek-2
If they can be easily converted to Java code, that would be a good solution.

On Wed, 10 Jun 2015 at 15:56 Gábor Gévay <[hidden email]> wrote:

> > "it does not feel right to add an API package to a core package
>
> Yes, that makes sense. I just tried removing the flink-java dependency
> from flink-streaming to see what needs it, and it builds fine without
> it :)
>
> What do you think about the second option? (to move the Scala
> typeutils (or just CaseClassTypeInfo and CaseClassComparator) to
> flink-core, to where the Java typeutils are) This would be mixing
> scala code to an otherwise java module, but I don't know whether that
> is a problem.
>
> Cheers,
> Gabor
>
> 2015-06-10 15:46 GMT+02:00 Till Rohrmann <[hidden email]>:
> > Btw: I noticed that all streaming modules depend on flink-core,
> > flink-runtime, flink-clients and flink-java. Is there a particular reason
> > why the streaming connectors depend on flink-clients and flink-java?
> >
> > On Wed, Jun 10, 2015 at 3:41 PM Till Rohrmann <[hidden email]>
> wrote:
> >
> >> I see the reason why you want to add flink-scala as a dependency to
> >> flink-streaming-core. However, it does not feel right to add an API
> package
> >> to a core package IMHO.
> >>
> >> But I noticed that flink-streaming-core also depends on flink-java.
> Which
> >> seems odd to me as well. I'm not a streaming expert and thus cannot tell
> >> much about the reasons why a core package has a dependency on an API
> >> package but for me this looks more like an indicator for a necessary
> >> restructuring of our packages. Maybe someone working on the streaming
> parts
> >> can chime in and shed some light on the required dependencies.
> >>
> >> Cheers,
> >> Till
> >>
> >>
> >> On Wed, Jun 10, 2015 at 2:13 PM Gábor Gévay <[hidden email]> wrote:
> >>
> >>> Hello,
> >>>
> >>> I would like to ask if it would be OK if I added flink-scala as a
> >>> dependency to flink-streaming-core. An alternative would be to move
> >>> the Scala typeutils to flink-core (to where the Java typeutils are).
> >>> Why I need this:
> >>>
> >>> While I am implementing the fast median calculation for windows as
> >>> part of my Google Summer of Code project, I am refactoring the way
> >>> sum, min, max, etc. are accessing the user-specified field
> >>> (https://github.com/apache/flink/pull/684). Currently both the logic
> >>> of their aggregators are duplicated for the different kinds of types
> >>> (tuple, pojo, array, Scala case class, simple), and also the field
> >>> access logic is duplicated across the different aggregators. In my
> >>> GSoC project I will implement some further methods (avg, variance,
> >>> etc.) that take the same kind of parameters as sum, min, etc., so it
> >>> will be neccassary to have the field access logic centralized (this is
> >>> the FieldAccessor class in the PR). It would be convenient if this
> >>> could also handle Scala case classes, for which CaseClassTypeInfo is
> >>> needed which is currently in flink-scala.
> >>>
> >>> Best regards,
> >>> Gabor
> >>>
> >>
>
Reply | Threaded
Open this post in threaded view
|

Re: Adding flink-scala as a dependency to flink-streaming-core

Gábor Gévay
In reply to this post by Gábor Gévay
I see four options to solve this without adding the dependency:
1. Move CaseClassTypeInfo and CaseClassComparator to flink-core. Till
said that we want to avoid mixed Scala and Java modules, which rules
this out.
2. Create a new toplevel maven project scala-core, and move things there.
3. Hacky solution: What I actually need is just to determine if a
TypeInfo instance is a CaseClassTypeInfo. This could be achieved by
adding an isCaseClassTypeInfo() method to TupleTypeInfoBase returning
false, and override it in CaseClassTypeInfo to return true.
4. Reimplement CaseClassTypeInfo and CaseClassComparator in Java. I
hope that this can be avoided.

I think the decision boils down to whether similar problems will
probably arise in the future. If this is a unique issue, then 3. is
fine. Otherwise it won't be possible to avoid doing 2. later anyway.
Maybe someone who knows Flink better than me can estimate how likely
this is?

Best regards,
Gabor


2015-06-10 15:55 GMT+02:00 Gábor Gévay <[hidden email]>:

>> "it does not feel right to add an API package to a core package
>
> Yes, that makes sense. I just tried removing the flink-java dependency
> from flink-streaming to see what needs it, and it builds fine without
> it :)
>
> What do you think about the second option? (to move the Scala
> typeutils (or just CaseClassTypeInfo and CaseClassComparator) to
> flink-core, to where the Java typeutils are) This would be mixing
> scala code to an otherwise java module, but I don't know whether that
> is a problem.
>
> Cheers,
> Gabor
>
> 2015-06-10 15:46 GMT+02:00 Till Rohrmann <[hidden email]>:
>> Btw: I noticed that all streaming modules depend on flink-core,
>> flink-runtime, flink-clients and flink-java. Is there a particular reason
>> why the streaming connectors depend on flink-clients and flink-java?
>>
>> On Wed, Jun 10, 2015 at 3:41 PM Till Rohrmann <[hidden email]> wrote:
>>
>>> I see the reason why you want to add flink-scala as a dependency to
>>> flink-streaming-core. However, it does not feel right to add an API package
>>> to a core package IMHO.
>>>
>>> But I noticed that flink-streaming-core also depends on flink-java. Which
>>> seems odd to me as well. I'm not a streaming expert and thus cannot tell
>>> much about the reasons why a core package has a dependency on an API
>>> package but for me this looks more like an indicator for a necessary
>>> restructuring of our packages. Maybe someone working on the streaming parts
>>> can chime in and shed some light on the required dependencies.
>>>
>>> Cheers,
>>> Till
>>>
>>>
>>> On Wed, Jun 10, 2015 at 2:13 PM Gábor Gévay <[hidden email]> wrote:
>>>
>>>> Hello,
>>>>
>>>> I would like to ask if it would be OK if I added flink-scala as a
>>>> dependency to flink-streaming-core. An alternative would be to move
>>>> the Scala typeutils to flink-core (to where the Java typeutils are).
>>>> Why I need this:
>>>>
>>>> While I am implementing the fast median calculation for windows as
>>>> part of my Google Summer of Code project, I am refactoring the way
>>>> sum, min, max, etc. are accessing the user-specified field
>>>> (https://github.com/apache/flink/pull/684). Currently both the logic
>>>> of their aggregators are duplicated for the different kinds of types
>>>> (tuple, pojo, array, Scala case class, simple), and also the field
>>>> access logic is duplicated across the different aggregators. In my
>>>> GSoC project I will implement some further methods (avg, variance,
>>>> etc.) that take the same kind of parameters as sum, min, etc., so it
>>>> will be neccassary to have the field access logic centralized (this is
>>>> the FieldAccessor class in the PR). It would be convenient if this
>>>> could also handle Scala case classes, for which CaseClassTypeInfo is
>>>> needed which is currently in flink-scala.
>>>>
>>>> Best regards,
>>>> Gabor
>>>>
>>>
Reply | Threaded
Open this post in threaded view
|

Re: Adding flink-scala as a dependency to flink-streaming-core

Robert Metzger
I like option 1 the most ("move to flink-core"), however, it would scatter
the type extractor / type information classes accross multiple projects.
Why are we not moving the entire type extractor system into "flink-core"?

There were some thoughts in the past to restructure the maven module
layout: https://issues.apache.org/jira/browse/FLINK-1712 .. I think the
change would be in line with the move.


On Wed, Jun 10, 2015 at 7:52 AM, Gábor Gévay <[hidden email]> wrote:

> I see four options to solve this without adding the dependency:
> 1. Move CaseClassTypeInfo and CaseClassComparator to flink-core. Till
> said that we want to avoid mixed Scala and Java modules, which rules
> this out.
> 2. Create a new toplevel maven project scala-core, and move things there.
> 3. Hacky solution: What I actually need is just to determine if a
> TypeInfo instance is a CaseClassTypeInfo. This could be achieved by
> adding an isCaseClassTypeInfo() method to TupleTypeInfoBase returning
> false, and override it in CaseClassTypeInfo to return true.
> 4. Reimplement CaseClassTypeInfo and CaseClassComparator in Java. I
> hope that this can be avoided.
>
> I think the decision boils down to whether similar problems will
> probably arise in the future. If this is a unique issue, then 3. is
> fine. Otherwise it won't be possible to avoid doing 2. later anyway.
> Maybe someone who knows Flink better than me can estimate how likely
> this is?
>
> Best regards,
> Gabor
>
>
> 2015-06-10 15:55 GMT+02:00 Gábor Gévay <[hidden email]>:
> >> "it does not feel right to add an API package to a core package
> >
> > Yes, that makes sense. I just tried removing the flink-java dependency
> > from flink-streaming to see what needs it, and it builds fine without
> > it :)
> >
> > What do you think about the second option? (to move the Scala
> > typeutils (or just CaseClassTypeInfo and CaseClassComparator) to
> > flink-core, to where the Java typeutils are) This would be mixing
> > scala code to an otherwise java module, but I don't know whether that
> > is a problem.
> >
> > Cheers,
> > Gabor
> >
> > 2015-06-10 15:46 GMT+02:00 Till Rohrmann <[hidden email]>:
> >> Btw: I noticed that all streaming modules depend on flink-core,
> >> flink-runtime, flink-clients and flink-java. Is there a particular
> reason
> >> why the streaming connectors depend on flink-clients and flink-java?
> >>
> >> On Wed, Jun 10, 2015 at 3:41 PM Till Rohrmann <[hidden email]>
> wrote:
> >>
> >>> I see the reason why you want to add flink-scala as a dependency to
> >>> flink-streaming-core. However, it does not feel right to add an API
> package
> >>> to a core package IMHO.
> >>>
> >>> But I noticed that flink-streaming-core also depends on flink-java.
> Which
> >>> seems odd to me as well. I'm not a streaming expert and thus cannot
> tell
> >>> much about the reasons why a core package has a dependency on an API
> >>> package but for me this looks more like an indicator for a necessary
> >>> restructuring of our packages. Maybe someone working on the streaming
> parts
> >>> can chime in and shed some light on the required dependencies.
> >>>
> >>> Cheers,
> >>> Till
> >>>
> >>>
> >>> On Wed, Jun 10, 2015 at 2:13 PM Gábor Gévay <[hidden email]> wrote:
> >>>
> >>>> Hello,
> >>>>
> >>>> I would like to ask if it would be OK if I added flink-scala as a
> >>>> dependency to flink-streaming-core. An alternative would be to move
> >>>> the Scala typeutils to flink-core (to where the Java typeutils are).
> >>>> Why I need this:
> >>>>
> >>>> While I am implementing the fast median calculation for windows as
> >>>> part of my Google Summer of Code project, I am refactoring the way
> >>>> sum, min, max, etc. are accessing the user-specified field
> >>>> (https://github.com/apache/flink/pull/684). Currently both the logic
> >>>> of their aggregators are duplicated for the different kinds of types
> >>>> (tuple, pojo, array, Scala case class, simple), and also the field
> >>>> access logic is duplicated across the different aggregators. In my
> >>>> GSoC project I will implement some further methods (avg, variance,
> >>>> etc.) that take the same kind of parameters as sum, min, etc., so it
> >>>> will be neccassary to have the field access logic centralized (this is
> >>>> the FieldAccessor class in the PR). It would be convenient if this
> >>>> could also handle Scala case classes, for which CaseClassTypeInfo is
> >>>> needed which is currently in flink-scala.
> >>>>
> >>>> Best regards,
> >>>> Gabor
> >>>>
> >>>
>
Reply | Threaded
Open this post in threaded view
|

Re: Adding flink-scala as a dependency to flink-streaming-core

Chiwan Park
+1 for moving the whole type extractor and type information classes to "flink-core".

The user cannot see "flink-core” because it is a hidden package from user’s perspective.
So the package could be deployed with Scala version variation safely. I think that we can
mix “flink-core” and some Scala codes which are used widely.


Regards,
Chiwan Park

> On Jun 21, 2015, at 8:48 AM, Robert Metzger <[hidden email]> wrote:
>
> I like option 1 the most ("move to flink-core"), however, it would scatter
> the type extractor / type information classes accross multiple projects.
> Why are we not moving the entire type extractor system into "flink-core"?
>
> There were some thoughts in the past to restructure the maven module
> layout: https://issues.apache.org/jira/browse/FLINK-1712 .. I think the
> change would be in line with the move.
>
>
> On Wed, Jun 10, 2015 at 7:52 AM, Gábor Gévay <[hidden email]> wrote:
>
>> I see four options to solve this without adding the dependency:
>> 1. Move CaseClassTypeInfo and CaseClassComparator to flink-core. Till
>> said that we want to avoid mixed Scala and Java modules, which rules
>> this out.
>> 2. Create a new toplevel maven project scala-core, and move things there.
>> 3. Hacky solution: What I actually need is just to determine if a
>> TypeInfo instance is a CaseClassTypeInfo. This could be achieved by
>> adding an isCaseClassTypeInfo() method to TupleTypeInfoBase returning
>> false, and override it in CaseClassTypeInfo to return true.
>> 4. Reimplement CaseClassTypeInfo and CaseClassComparator in Java. I
>> hope that this can be avoided.
>>
>> I think the decision boils down to whether similar problems will
>> probably arise in the future. If this is a unique issue, then 3. is
>> fine. Otherwise it won't be possible to avoid doing 2. later anyway.
>> Maybe someone who knows Flink better than me can estimate how likely
>> this is?
>>
>> Best regards,
>> Gabor
>>
>>
>> 2015-06-10 15:55 GMT+02:00 Gábor Gévay <[hidden email]>:
>>>> "it does not feel right to add an API package to a core package
>>>
>>> Yes, that makes sense. I just tried removing the flink-java dependency
>>> from flink-streaming to see what needs it, and it builds fine without
>>> it :)
>>>
>>> What do you think about the second option? (to move the Scala
>>> typeutils (or just CaseClassTypeInfo and CaseClassComparator) to
>>> flink-core, to where the Java typeutils are) This would be mixing
>>> scala code to an otherwise java module, but I don't know whether that
>>> is a problem.
>>>
>>> Cheers,
>>> Gabor
>>>
>>> 2015-06-10 15:46 GMT+02:00 Till Rohrmann <[hidden email]>:
>>>> Btw: I noticed that all streaming modules depend on flink-core,
>>>> flink-runtime, flink-clients and flink-java. Is there a particular
>> reason
>>>> why the streaming connectors depend on flink-clients and flink-java?
>>>>
>>>> On Wed, Jun 10, 2015 at 3:41 PM Till Rohrmann <[hidden email]>
>> wrote:
>>>>
>>>>> I see the reason why you want to add flink-scala as a dependency to
>>>>> flink-streaming-core. However, it does not feel right to add an API
>> package
>>>>> to a core package IMHO.
>>>>>
>>>>> But I noticed that flink-streaming-core also depends on flink-java.
>> Which
>>>>> seems odd to me as well. I'm not a streaming expert and thus cannot
>> tell
>>>>> much about the reasons why a core package has a dependency on an API
>>>>> package but for me this looks more like an indicator for a necessary
>>>>> restructuring of our packages. Maybe someone working on the streaming
>> parts
>>>>> can chime in and shed some light on the required dependencies.
>>>>>
>>>>> Cheers,
>>>>> Till
>>>>>
>>>>>
>>>>> On Wed, Jun 10, 2015 at 2:13 PM Gábor Gévay <[hidden email]> wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> I would like to ask if it would be OK if I added flink-scala as a
>>>>>> dependency to flink-streaming-core. An alternative would be to move
>>>>>> the Scala typeutils to flink-core (to where the Java typeutils are).
>>>>>> Why I need this:
>>>>>>
>>>>>> While I am implementing the fast median calculation for windows as
>>>>>> part of my Google Summer of Code project, I am refactoring the way
>>>>>> sum, min, max, etc. are accessing the user-specified field
>>>>>> (https://github.com/apache/flink/pull/684). Currently both the logic
>>>>>> of their aggregators are duplicated for the different kinds of types
>>>>>> (tuple, pojo, array, Scala case class, simple), and also the field
>>>>>> access logic is duplicated across the different aggregators. In my
>>>>>> GSoC project I will implement some further methods (avg, variance,
>>>>>> etc.) that take the same kind of parameters as sum, min, etc., so it
>>>>>> will be neccassary to have the field access logic centralized (this is
>>>>>> the FieldAccessor class in the PR). It would be convenient if this
>>>>>> could also handle Scala case classes, for which CaseClassTypeInfo is
>>>>>> needed which is currently in flink-scala.
>>>>>>
>>>>>> Best regards,
>>>>>> Gabor
>>>>>>
>>>>>
>>




Reply | Threaded
Open this post in threaded view
|

Re: Adding flink-scala as a dependency to flink-streaming-core

Stephan Ewen
I like moving the type system classes to core.

But one blocker so far was that the TupleX classes are in flink-java, and
no one moved them.

We have this long discussion pending whether we want to merge flink-core
and flink-java, actually...

On Sun, Jun 21, 2015 at 12:29 AM, Chiwan Park <[hidden email]> wrote:

> +1 for moving the whole type extractor and type information classes to
> "flink-core".
>
> The user cannot see "flink-core” because it is a hidden package from
> user’s perspective.
> So the package could be deployed with Scala version variation safely. I
> think that we can
> mix “flink-core” and some Scala codes which are used widely.
>
>
> Regards,
> Chiwan Park
>
> > On Jun 21, 2015, at 8:48 AM, Robert Metzger <[hidden email]> wrote:
> >
> > I like option 1 the most ("move to flink-core"), however, it would
> scatter
> > the type extractor / type information classes accross multiple projects.
> > Why are we not moving the entire type extractor system into "flink-core"?
> >
> > There were some thoughts in the past to restructure the maven module
> > layout: https://issues.apache.org/jira/browse/FLINK-1712 .. I think the
> > change would be in line with the move.
> >
> >
> > On Wed, Jun 10, 2015 at 7:52 AM, Gábor Gévay <[hidden email]> wrote:
> >
> >> I see four options to solve this without adding the dependency:
> >> 1. Move CaseClassTypeInfo and CaseClassComparator to flink-core. Till
> >> said that we want to avoid mixed Scala and Java modules, which rules
> >> this out.
> >> 2. Create a new toplevel maven project scala-core, and move things
> there.
> >> 3. Hacky solution: What I actually need is just to determine if a
> >> TypeInfo instance is a CaseClassTypeInfo. This could be achieved by
> >> adding an isCaseClassTypeInfo() method to TupleTypeInfoBase returning
> >> false, and override it in CaseClassTypeInfo to return true.
> >> 4. Reimplement CaseClassTypeInfo and CaseClassComparator in Java. I
> >> hope that this can be avoided.
> >>
> >> I think the decision boils down to whether similar problems will
> >> probably arise in the future. If this is a unique issue, then 3. is
> >> fine. Otherwise it won't be possible to avoid doing 2. later anyway.
> >> Maybe someone who knows Flink better than me can estimate how likely
> >> this is?
> >>
> >> Best regards,
> >> Gabor
> >>
> >>
> >> 2015-06-10 15:55 GMT+02:00 Gábor Gévay <[hidden email]>:
> >>>> "it does not feel right to add an API package to a core package
> >>>
> >>> Yes, that makes sense. I just tried removing the flink-java dependency
> >>> from flink-streaming to see what needs it, and it builds fine without
> >>> it :)
> >>>
> >>> What do you think about the second option? (to move the Scala
> >>> typeutils (or just CaseClassTypeInfo and CaseClassComparator) to
> >>> flink-core, to where the Java typeutils are) This would be mixing
> >>> scala code to an otherwise java module, but I don't know whether that
> >>> is a problem.
> >>>
> >>> Cheers,
> >>> Gabor
> >>>
> >>> 2015-06-10 15:46 GMT+02:00 Till Rohrmann <[hidden email]>:
> >>>> Btw: I noticed that all streaming modules depend on flink-core,
> >>>> flink-runtime, flink-clients and flink-java. Is there a particular
> >> reason
> >>>> why the streaming connectors depend on flink-clients and flink-java?
> >>>>
> >>>> On Wed, Jun 10, 2015 at 3:41 PM Till Rohrmann <[hidden email]>
> >> wrote:
> >>>>
> >>>>> I see the reason why you want to add flink-scala as a dependency to
> >>>>> flink-streaming-core. However, it does not feel right to add an API
> >> package
> >>>>> to a core package IMHO.
> >>>>>
> >>>>> But I noticed that flink-streaming-core also depends on flink-java.
> >> Which
> >>>>> seems odd to me as well. I'm not a streaming expert and thus cannot
> >> tell
> >>>>> much about the reasons why a core package has a dependency on an API
> >>>>> package but for me this looks more like an indicator for a necessary
> >>>>> restructuring of our packages. Maybe someone working on the streaming
> >> parts
> >>>>> can chime in and shed some light on the required dependencies.
> >>>>>
> >>>>> Cheers,
> >>>>> Till
> >>>>>
> >>>>>
> >>>>> On Wed, Jun 10, 2015 at 2:13 PM Gábor Gévay <[hidden email]>
> wrote:
> >>>>>
> >>>>>> Hello,
> >>>>>>
> >>>>>> I would like to ask if it would be OK if I added flink-scala as a
> >>>>>> dependency to flink-streaming-core. An alternative would be to move
> >>>>>> the Scala typeutils to flink-core (to where the Java typeutils are).
> >>>>>> Why I need this:
> >>>>>>
> >>>>>> While I am implementing the fast median calculation for windows as
> >>>>>> part of my Google Summer of Code project, I am refactoring the way
> >>>>>> sum, min, max, etc. are accessing the user-specified field
> >>>>>> (https://github.com/apache/flink/pull/684). Currently both the
> logic
> >>>>>> of their aggregators are duplicated for the different kinds of types
> >>>>>> (tuple, pojo, array, Scala case class, simple), and also the field
> >>>>>> access logic is duplicated across the different aggregators. In my
> >>>>>> GSoC project I will implement some further methods (avg, variance,
> >>>>>> etc.) that take the same kind of parameters as sum, min, etc., so it
> >>>>>> will be neccassary to have the field access logic centralized (this
> is
> >>>>>> the FieldAccessor class in the PR). It would be convenient if this
> >>>>>> could also handle Scala case classes, for which CaseClassTypeInfo is
> >>>>>> needed which is currently in flink-scala.
> >>>>>>
> >>>>>> Best regards,
> >>>>>> Gabor
> >>>>>>
> >>>>>
> >>
>
>
>
>
>