Hi!
This is a discussion about altering the way we handle dependencies and shading in Flink. I ran into quite a view problems trying to adjust / fix some shading issues during release validation. The issue is tracked under: https://issues.apache.org/jira/browse/FLINK-6529 Bring this discussion thread up because it is a bigger issue *Problem* Currently, Flink shades dependencies like ASM and Guava into all jars of projects that reference it and relocate the classes. There are some drawbacks to that approach, let's discuss them at the example of ASM: - The ASM classes are for example in flink-core, flink-java, flink-scala, flink-runtime, etc. - Users that reference these dependencies have the classes multiple times in the classpath. That is unclean (works, through, because the classes are identical). The same happens when building the final dist. jar. - Some of these dependencies require to include license files in the shaded jar. It is hard to impossible to build a good automatic solution for that, partly due to Maven's very poor cross-project path support - Most importantly: Scala does not support shading really well. Scala classes have references to classes in more places than just the class names (apparently for Scala reflect support). Referencing a Scala project with shaded ASM still requires to add a reference to unshaded ASM (at least as a compile dependency). *Proposal* I propose that we build and deploy a asm-flink-shaded version of ASM and directly program against the relocated namespaces. Since we never use classes that we relocate in public interfaces, Flink users will never see the relocated class names. Internally, it does not hurt to use them. - Proper maven dependency management, no hidden (shaded) dependencies - One copy of each class for shaded dependencies - Proper Scala interoperability - Natural License management (license is part of deployed asm-flink-shaded jar) Happy to hear thoughts! Stephan |
I like the idea, thank you for bringing it up.
Given that the raised problems aren't really ASM specific would it make sense to create one flink-shaded module that contains all frequently shaded libraries? (or maybe even all shaded dependencies by core modules) The proposal limits the scope of this to ASM and i was wondering why. I also remember that there was a discussion recently about why we shade things at all, and the idea of working against the shaded namespaces was brought up. Back then i was expressing doubts as to whether IDE's would properly support this; what's the state on that? On 10.05.2017 18:18, Stephan Ewen wrote: > Hi! > > This is a discussion about altering the way we handle dependencies and > shading in Flink. > I ran into quite a view problems trying to adjust / fix some shading issues > during release validation. > > The issue is tracked under: https://issues.apache.org/jira/browse/FLINK-6529 > Bring this discussion thread up because it is a bigger issue > > *Problem* > > Currently, Flink shades dependencies like ASM and Guava into all jars of > projects that reference it and relocate the classes. > > There are some drawbacks to that approach, let's discuss them at the > example of ASM: > > - The ASM classes are for example in flink-core, flink-java, flink-scala, > flink-runtime, etc. > > - Users that reference these dependencies have the classes multiple times > in the classpath. That is unclean (works, through, because the classes are > identical). The same happens when building the final dist. jar. > > - Some of these dependencies require to include license files in the > shaded jar. It is hard to impossible to build a good automatic solution for > that, partly due to Maven's very poor cross-project path support > > - Most importantly: Scala does not support shading really well. Scala > classes have references to classes in more places than just the class names > (apparently for Scala reflect support). Referencing a Scala project with > shaded ASM still requires to add a reference to unshaded ASM (at least as a > compile dependency). > > *Proposal* > > I propose that we build and deploy a asm-flink-shaded version of ASM and > directly program against the relocated namespaces. Since we never use > classes that we relocate in public interfaces, Flink users will never see > the relocated class names. Internally, it does not hurt to use them. > > - Proper maven dependency management, no hidden (shaded) dependencies > > - One copy of each class for shaded dependencies > > - Proper Scala interoperability > > - Natural License management (license is part of deployed > asm-flink-shaded jar) > > > Happy to hear thoughts! > > Stephan > |
@chesnay: I used ASM as an example in the proposal. Maybe I did not say
that clearly. If we like that approach, we should deal with the other libraries (at least the frequently used ones) in the same way. I would imagine to have a project layout like that: flink-shaded-deps - flink-shaded-asm - flink-shaded-guava - flink-shaded-curator - flink-shaded-hadoop "flink-shaded-deps" would not be built every time (and not be released every time), but only when needed. On Wed, May 10, 2017 at 7:28 PM, Chesnay Schepler <[hidden email]> wrote: > I like the idea, thank you for bringing it up. > > Given that the raised problems aren't really ASM specific would it make > sense to create one flink-shaded module that contains all frequently shaded > libraries? (or maybe even all shaded dependencies by core modules) The > proposal limits the scope of this to ASM and i was wondering why. > > I also remember that there was a discussion recently about why we shade > things at all, and the idea of working against the shaded namespaces was > brought up. Back then i was expressing doubts as to whether IDE's would > properly support this; what's the state on that? > > On 10.05.2017 18:18, Stephan Ewen wrote: > >> Hi! >> >> This is a discussion about altering the way we handle dependencies and >> shading in Flink. >> I ran into quite a view problems trying to adjust / fix some shading >> issues >> during release validation. >> >> The issue is tracked under: https://issues.apache.org/jira >> /browse/FLINK-6529 >> Bring this discussion thread up because it is a bigger issue >> >> *Problem* >> >> Currently, Flink shades dependencies like ASM and Guava into all jars of >> projects that reference it and relocate the classes. >> >> There are some drawbacks to that approach, let's discuss them at the >> example of ASM: >> >> - The ASM classes are for example in flink-core, flink-java, >> flink-scala, >> flink-runtime, etc. >> >> - Users that reference these dependencies have the classes multiple >> times >> in the classpath. That is unclean (works, through, because the classes are >> identical). The same happens when building the final dist. jar. >> >> - Some of these dependencies require to include license files in the >> shaded jar. It is hard to impossible to build a good automatic solution >> for >> that, partly due to Maven's very poor cross-project path support >> >> - Most importantly: Scala does not support shading really well. Scala >> classes have references to classes in more places than just the class >> names >> (apparently for Scala reflect support). Referencing a Scala project with >> shaded ASM still requires to add a reference to unshaded ASM (at least as >> a >> compile dependency). >> >> *Proposal* >> >> I propose that we build and deploy a asm-flink-shaded version of ASM and >> directly program against the relocated namespaces. Since we never use >> classes that we relocate in public interfaces, Flink users will never see >> the relocated class names. Internally, it does not hurt to use them. >> >> - Proper maven dependency management, no hidden (shaded) dependencies >> >> - One copy of each class for shaded dependencies >> >> - Proper Scala interoperability >> >> - Natural License management (license is part of deployed >> asm-flink-shaded jar) >> >> >> Happy to hear thoughts! >> >> Stephan >> >> > |
The advantages you've listed sound really compelling to me.
- Do you have time to implement these changes or do we need a volunteer? ;) - I assume that republishing the artifacts as you propose doesn't have any new legal implications since we already publish them with our JARs, right? - We might think about adding Netty to the list of shaded artifacts since some dependency conflicts were reported recently. Would have to double check the reported issues before doing that though. ;-) – Ufuk On Wed, May 10, 2017 at 8:45 PM, Stephan Ewen <[hidden email]> wrote: > @chesnay: I used ASM as an example in the proposal. Maybe I did not say > that clearly. > > If we like that approach, we should deal with the other libraries (at least > the frequently used ones) in the same way. > > > I would imagine to have a project layout like that: > > flink-shaded-deps > - flink-shaded-asm > - flink-shaded-guava > - flink-shaded-curator > - flink-shaded-hadoop > > > "flink-shaded-deps" would not be built every time (and not be released > every time), but only when needed. > > > > > > > On Wed, May 10, 2017 at 7:28 PM, Chesnay Schepler <[hidden email]> > wrote: > >> I like the idea, thank you for bringing it up. >> >> Given that the raised problems aren't really ASM specific would it make >> sense to create one flink-shaded module that contains all frequently shaded >> libraries? (or maybe even all shaded dependencies by core modules) The >> proposal limits the scope of this to ASM and i was wondering why. >> >> I also remember that there was a discussion recently about why we shade >> things at all, and the idea of working against the shaded namespaces was >> brought up. Back then i was expressing doubts as to whether IDE's would >> properly support this; what's the state on that? >> >> On 10.05.2017 18:18, Stephan Ewen wrote: >> >>> Hi! >>> >>> This is a discussion about altering the way we handle dependencies and >>> shading in Flink. >>> I ran into quite a view problems trying to adjust / fix some shading >>> issues >>> during release validation. >>> >>> The issue is tracked under: https://issues.apache.org/jira >>> /browse/FLINK-6529 >>> Bring this discussion thread up because it is a bigger issue >>> >>> *Problem* >>> >>> Currently, Flink shades dependencies like ASM and Guava into all jars of >>> projects that reference it and relocate the classes. >>> >>> There are some drawbacks to that approach, let's discuss them at the >>> example of ASM: >>> >>> - The ASM classes are for example in flink-core, flink-java, >>> flink-scala, >>> flink-runtime, etc. >>> >>> - Users that reference these dependencies have the classes multiple >>> times >>> in the classpath. That is unclean (works, through, because the classes are >>> identical). The same happens when building the final dist. jar. >>> >>> - Some of these dependencies require to include license files in the >>> shaded jar. It is hard to impossible to build a good automatic solution >>> for >>> that, partly due to Maven's very poor cross-project path support >>> >>> - Most importantly: Scala does not support shading really well. Scala >>> classes have references to classes in more places than just the class >>> names >>> (apparently for Scala reflect support). Referencing a Scala project with >>> shaded ASM still requires to add a reference to unshaded ASM (at least as >>> a >>> compile dependency). >>> >>> *Proposal* >>> >>> I propose that we build and deploy a asm-flink-shaded version of ASM and >>> directly program against the relocated namespaces. Since we never use >>> classes that we relocate in public interfaces, Flink users will never see >>> the relocated class names. Internally, it does not hurt to use them. >>> >>> - Proper maven dependency management, no hidden (shaded) dependencies >>> >>> - One copy of each class for shaded dependencies >>> >>> - Proper Scala interoperability >>> >>> - Natural License management (license is part of deployed >>> asm-flink-shaded jar) >>> >>> >>> Happy to hear thoughts! >>> >>> Stephan >>> >>> >> |
@Ufuk - I have never set up artifact deployment in Maven, could need some
help there. Regarding shading Netty, I agree, would be good to do that as well... On Thu, May 11, 2017 at 10:52 AM, Ufuk Celebi <[hidden email]> wrote: > The advantages you've listed sound really compelling to me. > > - Do you have time to implement these changes or do we need a volunteer? ;) > > - I assume that republishing the artifacts as you propose doesn't have > any new legal implications since we already publish them with our > JARs, right? > > - We might think about adding Netty to the list of shaded artifacts > since some dependency conflicts were reported recently. Would have to > double check the reported issues before doing that though. ;-) > > – Ufuk > > > On Wed, May 10, 2017 at 8:45 PM, Stephan Ewen <[hidden email]> wrote: > > @chesnay: I used ASM as an example in the proposal. Maybe I did not say > > that clearly. > > > > If we like that approach, we should deal with the other libraries (at > least > > the frequently used ones) in the same way. > > > > > > I would imagine to have a project layout like that: > > > > flink-shaded-deps > > - flink-shaded-asm > > - flink-shaded-guava > > - flink-shaded-curator > > - flink-shaded-hadoop > > > > > > "flink-shaded-deps" would not be built every time (and not be released > > every time), but only when needed. > > > > > > > > > > > > > > On Wed, May 10, 2017 at 7:28 PM, Chesnay Schepler <[hidden email]> > > wrote: > > > >> I like the idea, thank you for bringing it up. > >> > >> Given that the raised problems aren't really ASM specific would it make > >> sense to create one flink-shaded module that contains all frequently > shaded > >> libraries? (or maybe even all shaded dependencies by core modules) The > >> proposal limits the scope of this to ASM and i was wondering why. > >> > >> I also remember that there was a discussion recently about why we shade > >> things at all, and the idea of working against the shaded namespaces was > >> brought up. Back then i was expressing doubts as to whether IDE's would > >> properly support this; what's the state on that? > >> > >> On 10.05.2017 18:18, Stephan Ewen wrote: > >> > >>> Hi! > >>> > >>> This is a discussion about altering the way we handle dependencies and > >>> shading in Flink. > >>> I ran into quite a view problems trying to adjust / fix some shading > >>> issues > >>> during release validation. > >>> > >>> The issue is tracked under: https://issues.apache.org/jira > >>> /browse/FLINK-6529 > >>> Bring this discussion thread up because it is a bigger issue > >>> > >>> *Problem* > >>> > >>> Currently, Flink shades dependencies like ASM and Guava into all jars > of > >>> projects that reference it and relocate the classes. > >>> > >>> There are some drawbacks to that approach, let's discuss them at the > >>> example of ASM: > >>> > >>> - The ASM classes are for example in flink-core, flink-java, > >>> flink-scala, > >>> flink-runtime, etc. > >>> > >>> - Users that reference these dependencies have the classes multiple > >>> times > >>> in the classpath. That is unclean (works, through, because the classes > are > >>> identical). The same happens when building the final dist. jar. > >>> > >>> - Some of these dependencies require to include license files in the > >>> shaded jar. It is hard to impossible to build a good automatic solution > >>> for > >>> that, partly due to Maven's very poor cross-project path support > >>> > >>> - Most importantly: Scala does not support shading really well. > Scala > >>> classes have references to classes in more places than just the class > >>> names > >>> (apparently for Scala reflect support). Referencing a Scala project > with > >>> shaded ASM still requires to add a reference to unshaded ASM (at least > as > >>> a > >>> compile dependency). > >>> > >>> *Proposal* > >>> > >>> I propose that we build and deploy a asm-flink-shaded version of ASM > and > >>> directly program against the relocated namespaces. Since we never use > >>> classes that we relocate in public interfaces, Flink users will never > see > >>> the relocated class names. Internally, it does not hurt to use them. > >>> > >>> - Proper maven dependency management, no hidden (shaded) > dependencies > >>> > >>> - One copy of each class for shaded dependencies > >>> > >>> - Proper Scala interoperability > >>> > >>> - Natural License management (license is part of deployed > >>> asm-flink-shaded jar) > >>> > >>> > >>> Happy to hear thoughts! > >>> > >>> Stephan > >>> > >>> > >> > |
I agree that shading is a nasty business it would make things a bit more
explicit to program directly against the relocated API of the above-mentioned libraries. I assume that we should be able to follow the examples of flakka and frocks. Have we somewhere documented how to publish artifacts on Maven central? On Thu, May 11, 2017 at 10:54 AM, Stephan Ewen <[hidden email]> wrote: > @Ufuk - I have never set up artifact deployment in Maven, could need some > help there. > > Regarding shading Netty, I agree, would be good to do that as well... > > On Thu, May 11, 2017 at 10:52 AM, Ufuk Celebi <[hidden email]> wrote: > > > The advantages you've listed sound really compelling to me. > > > > - Do you have time to implement these changes or do we need a volunteer? > ;) > > > > - I assume that republishing the artifacts as you propose doesn't have > > any new legal implications since we already publish them with our > > JARs, right? > > > > - We might think about adding Netty to the list of shaded artifacts > > since some dependency conflicts were reported recently. Would have to > > double check the reported issues before doing that though. ;-) > > > > – Ufuk > > > > > > On Wed, May 10, 2017 at 8:45 PM, Stephan Ewen <[hidden email]> wrote: > > > @chesnay: I used ASM as an example in the proposal. Maybe I did not say > > > that clearly. > > > > > > If we like that approach, we should deal with the other libraries (at > > least > > > the frequently used ones) in the same way. > > > > > > > > > I would imagine to have a project layout like that: > > > > > > flink-shaded-deps > > > - flink-shaded-asm > > > - flink-shaded-guava > > > - flink-shaded-curator > > > - flink-shaded-hadoop > > > > > > > > > "flink-shaded-deps" would not be built every time (and not be released > > > every time), but only when needed. > > > > > > > > > > > > > > > > > > > > > On Wed, May 10, 2017 at 7:28 PM, Chesnay Schepler <[hidden email]> > > > wrote: > > > > > >> I like the idea, thank you for bringing it up. > > >> > > >> Given that the raised problems aren't really ASM specific would it > make > > >> sense to create one flink-shaded module that contains all frequently > > shaded > > >> libraries? (or maybe even all shaded dependencies by core modules) The > > >> proposal limits the scope of this to ASM and i was wondering why. > > >> > > >> I also remember that there was a discussion recently about why we > shade > > >> things at all, and the idea of working against the shaded namespaces > was > > >> brought up. Back then i was expressing doubts as to whether IDE's > would > > >> properly support this; what's the state on that? > > >> > > >> On 10.05.2017 18:18, Stephan Ewen wrote: > > >> > > >>> Hi! > > >>> > > >>> This is a discussion about altering the way we handle dependencies > and > > >>> shading in Flink. > > >>> I ran into quite a view problems trying to adjust / fix some shading > > >>> issues > > >>> during release validation. > > >>> > > >>> The issue is tracked under: https://issues.apache.org/jira > > >>> /browse/FLINK-6529 > > >>> Bring this discussion thread up because it is a bigger issue > > >>> > > >>> *Problem* > > >>> > > >>> Currently, Flink shades dependencies like ASM and Guava into all jars > > of > > >>> projects that reference it and relocate the classes. > > >>> > > >>> There are some drawbacks to that approach, let's discuss them at the > > >>> example of ASM: > > >>> > > >>> - The ASM classes are for example in flink-core, flink-java, > > >>> flink-scala, > > >>> flink-runtime, etc. > > >>> > > >>> - Users that reference these dependencies have the classes > multiple > > >>> times > > >>> in the classpath. That is unclean (works, through, because the > classes > > are > > >>> identical). The same happens when building the final dist. jar. > > >>> > > >>> - Some of these dependencies require to include license files in > the > > >>> shaded jar. It is hard to impossible to build a good automatic > solution > > >>> for > > >>> that, partly due to Maven's very poor cross-project path support > > >>> > > >>> - Most importantly: Scala does not support shading really well. > > Scala > > >>> classes have references to classes in more places than just the class > > >>> names > > >>> (apparently for Scala reflect support). Referencing a Scala project > > with > > >>> shaded ASM still requires to add a reference to unshaded ASM (at > least > > as > > >>> a > > >>> compile dependency). > > >>> > > >>> *Proposal* > > >>> > > >>> I propose that we build and deploy a asm-flink-shaded version of ASM > > and > > >>> directly program against the relocated namespaces. Since we never use > > >>> classes that we relocate in public interfaces, Flink users will never > > see > > >>> the relocated class names. Internally, it does not hurt to use them. > > >>> > > >>> - Proper maven dependency management, no hidden (shaded) > > dependencies > > >>> > > >>> - One copy of each class for shaded dependencies > > >>> > > >>> - Proper Scala interoperability > > >>> > > >>> - Natural License management (license is part of deployed > > >>> asm-flink-shaded jar) > > >>> > > >>> > > >>> Happy to hear thoughts! > > >>> > > >>> Stephan > > >>> > > >>> > > >> > > > |
On Thu, May 11, 2017 at 10:59 AM, Till Rohrmann <[hidden email]> wrote:
> Have we somewhere documented how to publish > artifacts on Maven central? Pulling in Robert who published frocks. @Robert: Would you like to volunteer for this? Would really help to combine this with some docs about publishing Maven artefacts in the flink-shaded-deps README. :-) In general, I'm curious to hear your opinion on this proposal. – Ufuk |
In my opinion, the ideal approach to mitigating conflicts between
application code and Flink itself is to relocate all of Flink's dependencies. Rationale is to avoid placing the burden of relocation on the app developer, and ultimately to eliminate the need for an app uber-jar. For example, imagine enhancing Flink to directly support Maven repositories, e.g. ``` $ flink run --package org.example:app:1.0 ... Downloading: https://repo1.maven.org/maven2/org/example/app/1.0/app.pom ... ``` From that perspective, FLINK-6529 is another good step in that direction. But it seems like we'd be forking more libraries ("fetty"!). Would we need to alter the source code or rely on the shading plugin? As Chesnay mentioned, what's the impact in the IDE? In the future, could the entire flink-runtime be made an uber-jar, performing the relocation at that stage? On Thu, May 11, 2017 at 3:36 AM, Ufuk Celebi <[hidden email]> wrote: > On Thu, May 11, 2017 at 10:59 AM, Till Rohrmann <[hidden email]> > wrote: > > Have we somewhere documented how to publish > > artifacts on Maven central? > > Pulling in Robert who published frocks. @Robert: Would you like to > volunteer for this? Would really help to combine this with some docs > about publishing Maven artefacts in the flink-shaded-deps README. :-) > In general, I'm curious to hear your opinion on this proposal. > > – Ufuk > |
@eron I would try to rely almost purely on the shade plugin
On Thu, May 11, 2017 at 7:53 PM, Eron Wright <[hidden email]> wrote: > In my opinion, the ideal approach to mitigating conflicts between > application code and Flink itself is to relocate all of Flink's > dependencies. Rationale is to avoid placing the burden of relocation on > the app developer, and ultimately to eliminate the need for an app > uber-jar. > > For example, imagine enhancing Flink to directly support Maven > repositories, e.g. > ``` > $ flink run --package org.example:app:1.0 > ... > Downloading: https://repo1.maven.org/maven2/org/example/app/1.0/app.pom > ... > ``` > > From that perspective, FLINK-6529 is another good step in that direction. > But it seems like we'd be forking more libraries ("fetty"!). Would we > need to alter the source code or rely on the shading plugin? As Chesnay > mentioned, what's the impact in the IDE? > > In the future, could the entire flink-runtime be made an uber-jar, > performing the relocation at that stage? > > > On Thu, May 11, 2017 at 3:36 AM, Ufuk Celebi <[hidden email]> wrote: > > > On Thu, May 11, 2017 at 10:59 AM, Till Rohrmann <[hidden email]> > > wrote: > > > Have we somewhere documented how to publish > > > artifacts on Maven central? > > > > Pulling in Robert who published frocks. @Robert: Would you like to > > volunteer for this? Would really help to combine this with some docs > > about publishing Maven artefacts in the flink-shaded-deps README. :-) > > In general, I'm curious to hear your opinion on this proposal. > > > > – Ufuk > > > |
In reply to this post by Stephan Ewen
I would like to start working on this.
I've looked into adding a flink-shaded-guava module. Working against the shaded namespaces seems to work without problems from the IDE, and we could forbid un-shaded usages with checkstyle. So for the list of dependencies that we want to shade we currently got: * asm * guava * netty * hadoop * curator I've had a chat with Stephan Ewan and he brought up kryo + chill as well. The nice thing is that we can do this incrementally, one dependency at a time. As such i would propose to go through the whole process for guava and see what problems arise. This would include adding a flink-shaded module and a child flink-shaded-guava module to the flink repository that are not part of the build process, replacing all usages of guava in Flink, adding the checkstyle rule (optional) and deploying the artifact to maven central. On 11.05.2017 10:54, Stephan Ewen wrote: > @Ufuk - I have never set up artifact deployment in Maven, could need some > help there. > > Regarding shading Netty, I agree, would be good to do that as well... > > On Thu, May 11, 2017 at 10:52 AM, Ufuk Celebi <[hidden email]> wrote: > >> The advantages you've listed sound really compelling to me. >> >> - Do you have time to implement these changes or do we need a volunteer? ;) >> >> - I assume that republishing the artifacts as you propose doesn't have >> any new legal implications since we already publish them with our >> JARs, right? >> >> - We might think about adding Netty to the list of shaded artifacts >> since some dependency conflicts were reported recently. Would have to >> double check the reported issues before doing that though. ;-) >> >> – Ufuk >> >> >> On Wed, May 10, 2017 at 8:45 PM, Stephan Ewen <[hidden email]> wrote: >>> @chesnay: I used ASM as an example in the proposal. Maybe I did not say >>> that clearly. >>> >>> If we like that approach, we should deal with the other libraries (at >> least >>> the frequently used ones) in the same way. >>> >>> >>> I would imagine to have a project layout like that: >>> >>> flink-shaded-deps >>> - flink-shaded-asm >>> - flink-shaded-guava >>> - flink-shaded-curator >>> - flink-shaded-hadoop >>> >>> >>> "flink-shaded-deps" would not be built every time (and not be released >>> every time), but only when needed. >>> >>> >>> >>> >>> >>> >>> On Wed, May 10, 2017 at 7:28 PM, Chesnay Schepler <[hidden email]> >>> wrote: >>> >>>> I like the idea, thank you for bringing it up. >>>> >>>> Given that the raised problems aren't really ASM specific would it make >>>> sense to create one flink-shaded module that contains all frequently >> shaded >>>> libraries? (or maybe even all shaded dependencies by core modules) The >>>> proposal limits the scope of this to ASM and i was wondering why. >>>> >>>> I also remember that there was a discussion recently about why we shade >>>> things at all, and the idea of working against the shaded namespaces was >>>> brought up. Back then i was expressing doubts as to whether IDE's would >>>> properly support this; what's the state on that? >>>> >>>> On 10.05.2017 18:18, Stephan Ewen wrote: >>>> >>>>> Hi! >>>>> >>>>> This is a discussion about altering the way we handle dependencies and >>>>> shading in Flink. >>>>> I ran into quite a view problems trying to adjust / fix some shading >>>>> issues >>>>> during release validation. >>>>> >>>>> The issue is tracked under: https://issues.apache.org/jira >>>>> /browse/FLINK-6529 >>>>> Bring this discussion thread up because it is a bigger issue >>>>> >>>>> *Problem* >>>>> >>>>> Currently, Flink shades dependencies like ASM and Guava into all jars >> of >>>>> projects that reference it and relocate the classes. >>>>> >>>>> There are some drawbacks to that approach, let's discuss them at the >>>>> example of ASM: >>>>> >>>>> - The ASM classes are for example in flink-core, flink-java, >>>>> flink-scala, >>>>> flink-runtime, etc. >>>>> >>>>> - Users that reference these dependencies have the classes multiple >>>>> times >>>>> in the classpath. That is unclean (works, through, because the classes >> are >>>>> identical). The same happens when building the final dist. jar. >>>>> >>>>> - Some of these dependencies require to include license files in the >>>>> shaded jar. It is hard to impossible to build a good automatic solution >>>>> for >>>>> that, partly due to Maven's very poor cross-project path support >>>>> >>>>> - Most importantly: Scala does not support shading really well. >> Scala >>>>> classes have references to classes in more places than just the class >>>>> names >>>>> (apparently for Scala reflect support). Referencing a Scala project >> with >>>>> shaded ASM still requires to add a reference to unshaded ASM (at least >> as >>>>> a >>>>> compile dependency). >>>>> >>>>> *Proposal* >>>>> >>>>> I propose that we build and deploy a asm-flink-shaded version of ASM >> and >>>>> directly program against the relocated namespaces. Since we never use >>>>> classes that we relocate in public interfaces, Flink users will never >> see >>>>> the relocated class names. Internally, it does not hurt to use them. >>>>> >>>>> - Proper maven dependency management, no hidden (shaded) >> dependencies >>>>> - One copy of each class for shaded dependencies >>>>> >>>>> - Proper Scala interoperability >>>>> >>>>> - Natural License management (license is part of deployed >>>>> asm-flink-shaded jar) >>>>> >>>>> >>>>> Happy to hear thoughts! >>>>> >>>>> Stephan >>>>> >>>>> |
I like this approach.
Two additional things can be mention here: - We need to deploy these artifacts independently and not as part of the build. That is a manual step once per "bump" in the dependency of that library. - We reduce the shading complexity of the original build and should thus also speed up build times :-) Stephan On Tue, Jun 20, 2017 at 1:15 PM, Chesnay Schepler <[hidden email]> wrote: > I would like to start working on this. > > I've looked into adding a flink-shaded-guava module. Working against the > shaded namespaces seems > to work without problems from the IDE, and we could forbid un-shaded > usages with checkstyle. > > So for the list of dependencies that we want to shade we currently got: > > * asm > * guava > * netty > * hadoop > * curator > > I've had a chat with Stephan Ewan and he brought up kryo + chill as well. > > The nice thing is that we can do this incrementally, one dependency at a > time. As such i would propose > to go through the whole process for guava and see what problems arise. > > This would include adding a flink-shaded module and a child > flink-shaded-guava module to the flink repository > that are not part of the build process, replacing all usages of guava in > Flink, adding the > checkstyle rule (optional) and deploying the artifact to maven central. > > > On 11.05.2017 10:54, Stephan Ewen wrote: > >> @Ufuk - I have never set up artifact deployment in Maven, could need some >> help there. >> >> Regarding shading Netty, I agree, would be good to do that as well... >> >> On Thu, May 11, 2017 at 10:52 AM, Ufuk Celebi <[hidden email]> wrote: >> >> The advantages you've listed sound really compelling to me. >>> >>> - Do you have time to implement these changes or do we need a volunteer? >>> ;) >>> >>> - I assume that republishing the artifacts as you propose doesn't have >>> any new legal implications since we already publish them with our >>> JARs, right? >>> >>> - We might think about adding Netty to the list of shaded artifacts >>> since some dependency conflicts were reported recently. Would have to >>> double check the reported issues before doing that though. ;-) >>> >>> – Ufuk >>> >>> >>> On Wed, May 10, 2017 at 8:45 PM, Stephan Ewen <[hidden email]> wrote: >>> >>>> @chesnay: I used ASM as an example in the proposal. Maybe I did not say >>>> that clearly. >>>> >>>> If we like that approach, we should deal with the other libraries (at >>>> >>> least >>> >>>> the frequently used ones) in the same way. >>>> >>>> >>>> I would imagine to have a project layout like that: >>>> >>>> flink-shaded-deps >>>> - flink-shaded-asm >>>> - flink-shaded-guava >>>> - flink-shaded-curator >>>> - flink-shaded-hadoop >>>> >>>> >>>> "flink-shaded-deps" would not be built every time (and not be released >>>> every time), but only when needed. >>>> >>>> >>>> >>>> >>>> >>>> >>>> On Wed, May 10, 2017 at 7:28 PM, Chesnay Schepler <[hidden email]> >>>> wrote: >>>> >>>> I like the idea, thank you for bringing it up. >>>>> >>>>> Given that the raised problems aren't really ASM specific would it make >>>>> sense to create one flink-shaded module that contains all frequently >>>>> >>>> shaded >>> >>>> libraries? (or maybe even all shaded dependencies by core modules) The >>>>> proposal limits the scope of this to ASM and i was wondering why. >>>>> >>>>> I also remember that there was a discussion recently about why we shade >>>>> things at all, and the idea of working against the shaded namespaces >>>>> was >>>>> brought up. Back then i was expressing doubts as to whether IDE's would >>>>> properly support this; what's the state on that? >>>>> >>>>> On 10.05.2017 18:18, Stephan Ewen wrote: >>>>> >>>>> Hi! >>>>>> >>>>>> This is a discussion about altering the way we handle dependencies and >>>>>> shading in Flink. >>>>>> I ran into quite a view problems trying to adjust / fix some shading >>>>>> issues >>>>>> during release validation. >>>>>> >>>>>> The issue is tracked under: https://issues.apache.org/jira >>>>>> /browse/FLINK-6529 >>>>>> Bring this discussion thread up because it is a bigger issue >>>>>> >>>>>> *Problem* >>>>>> >>>>>> Currently, Flink shades dependencies like ASM and Guava into all jars >>>>>> >>>>> of >>> >>>> projects that reference it and relocate the classes. >>>>>> >>>>>> There are some drawbacks to that approach, let's discuss them at the >>>>>> example of ASM: >>>>>> >>>>>> - The ASM classes are for example in flink-core, flink-java, >>>>>> flink-scala, >>>>>> flink-runtime, etc. >>>>>> >>>>>> - Users that reference these dependencies have the classes >>>>>> multiple >>>>>> times >>>>>> in the classpath. That is unclean (works, through, because the classes >>>>>> >>>>> are >>> >>>> identical). The same happens when building the final dist. jar. >>>>>> >>>>>> - Some of these dependencies require to include license files in >>>>>> the >>>>>> shaded jar. It is hard to impossible to build a good automatic >>>>>> solution >>>>>> for >>>>>> that, partly due to Maven's very poor cross-project path support >>>>>> >>>>>> - Most importantly: Scala does not support shading really well. >>>>>> >>>>> Scala >>> >>>> classes have references to classes in more places than just the class >>>>>> names >>>>>> (apparently for Scala reflect support). Referencing a Scala project >>>>>> >>>>> with >>> >>>> shaded ASM still requires to add a reference to unshaded ASM (at least >>>>>> >>>>> as >>> >>>> a >>>>>> compile dependency). >>>>>> >>>>>> *Proposal* >>>>>> >>>>>> I propose that we build and deploy a asm-flink-shaded version of ASM >>>>>> >>>>> and >>> >>>> directly program against the relocated namespaces. Since we never use >>>>>> classes that we relocate in public interfaces, Flink users will never >>>>>> >>>>> see >>> >>>> the relocated class names. Internally, it does not hurt to use them. >>>>>> >>>>>> - Proper maven dependency management, no hidden (shaded) >>>>>> >>>>> dependencies >>> >>>> - One copy of each class for shaded dependencies >>>>>> >>>>>> - Proper Scala interoperability >>>>>> >>>>>> - Natural License management (license is part of deployed >>>>>> asm-flink-shaded jar) >>>>>> >>>>>> >>>>>> Happy to hear thoughts! >>>>>> >>>>>> Stephan >>>>>> >>>>>> >>>>>> > |
Its not completely clear to me how we want to version the shaded
dependencies, and where we are putting them. One concern are the official apache release rules. If we want to release something to maven central, we need to do a proper vote over a source archive. I would propose to create a new repository "flink-shaded.git" that contains the following maven module structure: - flink-shaded: 1 - flink-shaded-asm: 1-5.2 - flink-shaded-guava: 1-18.0 - ... The number indicates the version (for ASM, I've just guessed). The version for the parent "flink-shaded" needs to be updated on each parent pom change (new module added, new / changed plugins, ...) We could create a separate release script in this repository that creates the flink-shaded-src.zip from the code and deploys the artifacts to the maven staging area. The advantage of a separate repo would be that we don't need to maintain separate maven projects in the same git repo. Also, the src archives for the release vote can be created from the repo content (without much filtering). On Tue, Jun 20, 2017 at 9:44 PM, Stephan Ewen <[hidden email]> wrote: > I like this approach. > > Two additional things can be mention here: > > - We need to deploy these artifacts independently and not as part of the > build. That is a manual step once per "bump" in the dependency of that > library. > > - We reduce the shading complexity of the original build and should thus > also speed up build times :-) > > Stephan > > > > > On Tue, Jun 20, 2017 at 1:15 PM, Chesnay Schepler <[hidden email]> > wrote: > > > I would like to start working on this. > > > > I've looked into adding a flink-shaded-guava module. Working against the > > shaded namespaces seems > > to work without problems from the IDE, and we could forbid un-shaded > > usages with checkstyle. > > > > So for the list of dependencies that we want to shade we currently got: > > > > * asm > > * guava > > * netty > > * hadoop > > * curator > > > > I've had a chat with Stephan Ewan and he brought up kryo + chill as well. > > > > The nice thing is that we can do this incrementally, one dependency at a > > time. As such i would propose > > to go through the whole process for guava and see what problems arise. > > > > This would include adding a flink-shaded module and a child > > flink-shaded-guava module to the flink repository > > that are not part of the build process, replacing all usages of guava in > > Flink, adding the > > checkstyle rule (optional) and deploying the artifact to maven central. > > > > > > On 11.05.2017 10:54, Stephan Ewen wrote: > > > >> @Ufuk - I have never set up artifact deployment in Maven, could need > some > >> help there. > >> > >> Regarding shading Netty, I agree, would be good to do that as well... > >> > >> On Thu, May 11, 2017 at 10:52 AM, Ufuk Celebi <[hidden email]> wrote: > >> > >> The advantages you've listed sound really compelling to me. > >>> > >>> - Do you have time to implement these changes or do we need a > volunteer? > >>> ;) > >>> > >>> - I assume that republishing the artifacts as you propose doesn't have > >>> any new legal implications since we already publish them with our > >>> JARs, right? > >>> > >>> - We might think about adding Netty to the list of shaded artifacts > >>> since some dependency conflicts were reported recently. Would have to > >>> double check the reported issues before doing that though. ;-) > >>> > >>> – Ufuk > >>> > >>> > >>> On Wed, May 10, 2017 at 8:45 PM, Stephan Ewen <[hidden email]> > wrote: > >>> > >>>> @chesnay: I used ASM as an example in the proposal. Maybe I did not > say > >>>> that clearly. > >>>> > >>>> If we like that approach, we should deal with the other libraries (at > >>>> > >>> least > >>> > >>>> the frequently used ones) in the same way. > >>>> > >>>> > >>>> I would imagine to have a project layout like that: > >>>> > >>>> flink-shaded-deps > >>>> - flink-shaded-asm > >>>> - flink-shaded-guava > >>>> - flink-shaded-curator > >>>> - flink-shaded-hadoop > >>>> > >>>> > >>>> "flink-shaded-deps" would not be built every time (and not be released > >>>> every time), but only when needed. > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> On Wed, May 10, 2017 at 7:28 PM, Chesnay Schepler <[hidden email] > > > >>>> wrote: > >>>> > >>>> I like the idea, thank you for bringing it up. > >>>>> > >>>>> Given that the raised problems aren't really ASM specific would it > make > >>>>> sense to create one flink-shaded module that contains all frequently > >>>>> > >>>> shaded > >>> > >>>> libraries? (or maybe even all shaded dependencies by core modules) The > >>>>> proposal limits the scope of this to ASM and i was wondering why. > >>>>> > >>>>> I also remember that there was a discussion recently about why we > shade > >>>>> things at all, and the idea of working against the shaded namespaces > >>>>> was > >>>>> brought up. Back then i was expressing doubts as to whether IDE's > would > >>>>> properly support this; what's the state on that? > >>>>> > >>>>> On 10.05.2017 18:18, Stephan Ewen wrote: > >>>>> > >>>>> Hi! > >>>>>> > >>>>>> This is a discussion about altering the way we handle dependencies > and > >>>>>> shading in Flink. > >>>>>> I ran into quite a view problems trying to adjust / fix some shading > >>>>>> issues > >>>>>> during release validation. > >>>>>> > >>>>>> The issue is tracked under: https://issues.apache.org/jira > >>>>>> /browse/FLINK-6529 > >>>>>> Bring this discussion thread up because it is a bigger issue > >>>>>> > >>>>>> *Problem* > >>>>>> > >>>>>> Currently, Flink shades dependencies like ASM and Guava into all > jars > >>>>>> > >>>>> of > >>> > >>>> projects that reference it and relocate the classes. > >>>>>> > >>>>>> There are some drawbacks to that approach, let's discuss them at the > >>>>>> example of ASM: > >>>>>> > >>>>>> - The ASM classes are for example in flink-core, flink-java, > >>>>>> flink-scala, > >>>>>> flink-runtime, etc. > >>>>>> > >>>>>> - Users that reference these dependencies have the classes > >>>>>> multiple > >>>>>> times > >>>>>> in the classpath. That is unclean (works, through, because the > classes > >>>>>> > >>>>> are > >>> > >>>> identical). The same happens when building the final dist. jar. > >>>>>> > >>>>>> - Some of these dependencies require to include license files in > >>>>>> the > >>>>>> shaded jar. It is hard to impossible to build a good automatic > >>>>>> solution > >>>>>> for > >>>>>> that, partly due to Maven's very poor cross-project path support > >>>>>> > >>>>>> - Most importantly: Scala does not support shading really well. > >>>>>> > >>>>> Scala > >>> > >>>> classes have references to classes in more places than just the class > >>>>>> names > >>>>>> (apparently for Scala reflect support). Referencing a Scala project > >>>>>> > >>>>> with > >>> > >>>> shaded ASM still requires to add a reference to unshaded ASM (at least > >>>>>> > >>>>> as > >>> > >>>> a > >>>>>> compile dependency). > >>>>>> > >>>>>> *Proposal* > >>>>>> > >>>>>> I propose that we build and deploy a asm-flink-shaded version of ASM > >>>>>> > >>>>> and > >>> > >>>> directly program against the relocated namespaces. Since we never use > >>>>>> classes that we relocate in public interfaces, Flink users will > never > >>>>>> > >>>>> see > >>> > >>>> the relocated class names. Internally, it does not hurt to use them. > >>>>>> > >>>>>> - Proper maven dependency management, no hidden (shaded) > >>>>>> > >>>>> dependencies > >>> > >>>> - One copy of each class for shaded dependencies > >>>>>> > >>>>>> - Proper Scala interoperability > >>>>>> > >>>>>> - Natural License management (license is part of deployed > >>>>>> asm-flink-shaded jar) > >>>>>> > >>>>>> > >>>>>> Happy to hear thoughts! > >>>>>> > >>>>>> Stephan > >>>>>> > >>>>>> > >>>>>> > > > |
I like your suggestion Robert. A lot actually.
Having the actual dependency version (i.e 18 for guava) in the version should improve clarity a lot. Originally i intended to release 1 artifact per Flink version, with the normal versioning scheme that we use. But given that the shaded dependencies aren't changed often (even rarely might be a stretch), and aren't actually coupled to the Flink release cycle this doesn't make a lot of sense. Having separate repos looks like a reasonable separation of concerns. The release for Flink itself would work just like it does now; we don't have to modify any scripts or do extra steps. Since the build, release and development process are separate (since flink-shaded isn't part of Flink build process, has a separate release process and changes to it will /never /require immediate changes to Flink) it seems like a very good candidate to move it into a separate repo. On 21.06.2017 11:26, Robert Metzger wrote: > Its not completely clear to me how we want to version the shaded > dependencies, and where we are putting them. > > One concern are the official apache release rules. If we want to release > something to maven central, we need to do a proper vote over a source > archive. > I would propose to create a new repository "flink-shaded.git" that contains > the following maven module structure: > - flink-shaded: 1 > - flink-shaded-asm: 1-5.2 > - flink-shaded-guava: 1-18.0 > - ... > > The number indicates the version (for ASM, I've just guessed). > The version for the parent "flink-shaded" needs to be updated on each > parent pom change (new module added, new / changed plugins, ...) > > We could create a separate release script in this repository that creates > the flink-shaded-src.zip from the code and deploys the artifacts to the > maven staging area. > > The advantage of a separate repo would be that we don't need to maintain > separate maven projects in the same git repo. > Also, the src archives for the release vote can be created from the repo > content (without much filtering). > > > On Tue, Jun 20, 2017 at 9:44 PM, Stephan Ewen <[hidden email]> wrote: > >> I like this approach. >> >> Two additional things can be mention here: >> >> - We need to deploy these artifacts independently and not as part of the >> build. That is a manual step once per "bump" in the dependency of that >> library. >> >> - We reduce the shading complexity of the original build and should thus >> also speed up build times :-) >> >> Stephan >> >> >> >> >> On Tue, Jun 20, 2017 at 1:15 PM, Chesnay Schepler <[hidden email]> >> wrote: >> >>> I would like to start working on this. >>> >>> I've looked into adding a flink-shaded-guava module. Working against the >>> shaded namespaces seems >>> to work without problems from the IDE, and we could forbid un-shaded >>> usages with checkstyle. >>> >>> So for the list of dependencies that we want to shade we currently got: >>> >>> * asm >>> * guava >>> * netty >>> * hadoop >>> * curator >>> >>> I've had a chat with Stephan Ewan and he brought up kryo + chill as well. >>> >>> The nice thing is that we can do this incrementally, one dependency at a >>> time. As such i would propose >>> to go through the whole process for guava and see what problems arise. >>> >>> This would include adding a flink-shaded module and a child >>> flink-shaded-guava module to the flink repository >>> that are not part of the build process, replacing all usages of guava in >>> Flink, adding the >>> checkstyle rule (optional) and deploying the artifact to maven central. >>> >>> >>> On 11.05.2017 10:54, Stephan Ewen wrote: >>> >>>> @Ufuk - I have never set up artifact deployment in Maven, could need >> some >>>> help there. >>>> >>>> Regarding shading Netty, I agree, would be good to do that as well... >>>> >>>> On Thu, May 11, 2017 at 10:52 AM, Ufuk Celebi <[hidden email]> wrote: >>>> >>>> The advantages you've listed sound really compelling to me. >>>>> - Do you have time to implement these changes or do we need a >> volunteer? >>>>> ;) >>>>> >>>>> - I assume that republishing the artifacts as you propose doesn't have >>>>> any new legal implications since we already publish them with our >>>>> JARs, right? >>>>> >>>>> - We might think about adding Netty to the list of shaded artifacts >>>>> since some dependency conflicts were reported recently. Would have to >>>>> double check the reported issues before doing that though. ;-) >>>>> >>>>> – Ufuk >>>>> >>>>> >>>>> On Wed, May 10, 2017 at 8:45 PM, Stephan Ewen <[hidden email]> >> wrote: >>>>>> @chesnay: I used ASM as an example in the proposal. Maybe I did not >> say >>>>>> that clearly. >>>>>> >>>>>> If we like that approach, we should deal with the other libraries (at >>>>>> >>>>> least >>>>> >>>>>> the frequently used ones) in the same way. >>>>>> >>>>>> >>>>>> I would imagine to have a project layout like that: >>>>>> >>>>>> flink-shaded-deps >>>>>> - flink-shaded-asm >>>>>> - flink-shaded-guava >>>>>> - flink-shaded-curator >>>>>> - flink-shaded-hadoop >>>>>> >>>>>> >>>>>> "flink-shaded-deps" would not be built every time (and not be released >>>>>> every time), but only when needed. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Wed, May 10, 2017 at 7:28 PM, Chesnay Schepler <[hidden email] >>>>>> wrote: >>>>>> >>>>>> I like the idea, thank you for bringing it up. >>>>>>> Given that the raised problems aren't really ASM specific would it >> make >>>>>>> sense to create one flink-shaded module that contains all frequently >>>>>>> >>>>>> shaded >>>>>> libraries? (or maybe even all shaded dependencies by core modules) The >>>>>>> proposal limits the scope of this to ASM and i was wondering why. >>>>>>> >>>>>>> I also remember that there was a discussion recently about why we >> shade >>>>>>> things at all, and the idea of working against the shaded namespaces >>>>>>> was >>>>>>> brought up. Back then i was expressing doubts as to whether IDE's >> would >>>>>>> properly support this; what's the state on that? >>>>>>> >>>>>>> On 10.05.2017 18:18, Stephan Ewen wrote: >>>>>>> >>>>>>> Hi! >>>>>>>> This is a discussion about altering the way we handle dependencies >> and >>>>>>>> shading in Flink. >>>>>>>> I ran into quite a view problems trying to adjust / fix some shading >>>>>>>> issues >>>>>>>> during release validation. >>>>>>>> >>>>>>>> The issue is tracked under: https://issues.apache.org/jira >>>>>>>> /browse/FLINK-6529 >>>>>>>> Bring this discussion thread up because it is a bigger issue >>>>>>>> >>>>>>>> *Problem* >>>>>>>> >>>>>>>> Currently, Flink shades dependencies like ASM and Guava into all >> jars >>>>>>> of >>>>>> projects that reference it and relocate the classes. >>>>>>>> There are some drawbacks to that approach, let's discuss them at the >>>>>>>> example of ASM: >>>>>>>> >>>>>>>> - The ASM classes are for example in flink-core, flink-java, >>>>>>>> flink-scala, >>>>>>>> flink-runtime, etc. >>>>>>>> >>>>>>>> - Users that reference these dependencies have the classes >>>>>>>> multiple >>>>>>>> times >>>>>>>> in the classpath. That is unclean (works, through, because the >> classes >>>>>>> are >>>>>> identical). The same happens when building the final dist. jar. >>>>>>>> - Some of these dependencies require to include license files in >>>>>>>> the >>>>>>>> shaded jar. It is hard to impossible to build a good automatic >>>>>>>> solution >>>>>>>> for >>>>>>>> that, partly due to Maven's very poor cross-project path support >>>>>>>> >>>>>>>> - Most importantly: Scala does not support shading really well. >>>>>>>> >>>>>>> Scala >>>>>> classes have references to classes in more places than just the class >>>>>>>> names >>>>>>>> (apparently for Scala reflect support). Referencing a Scala project >>>>>>>> >>>>>>> with >>>>>> shaded ASM still requires to add a reference to unshaded ASM (at least >>>>>>> as >>>>>> a >>>>>>>> compile dependency). >>>>>>>> >>>>>>>> *Proposal* >>>>>>>> >>>>>>>> I propose that we build and deploy a asm-flink-shaded version of ASM >>>>>>>> >>>>>>> and >>>>>> directly program against the relocated namespaces. Since we never use >>>>>>>> classes that we relocate in public interfaces, Flink users will >> never >>>>>>> see >>>>>> the relocated class names. Internally, it does not hurt to use them. >>>>>>>> - Proper maven dependency management, no hidden (shaded) >>>>>>>> >>>>>>> dependencies >>>>>> - One copy of each class for shaded dependencies >>>>>>>> - Proper Scala interoperability >>>>>>>> >>>>>>>> - Natural License management (license is part of deployed >>>>>>>> asm-flink-shaded jar) >>>>>>>> >>>>>>>> >>>>>>>> Happy to hear thoughts! >>>>>>>> >>>>>>>> Stephan >>>>>>>> >>>>>>>> >>>>>>>> |
Okay, I'll request a repo for the shading.
On Wed, Jun 21, 2017 at 1:38 PM, Chesnay Schepler <[hidden email]> wrote: > I like your suggestion Robert. A lot actually. > > Having the actual dependency version (i.e 18 for guava) in the version > should improve clarity a lot. > > Originally i intended to release 1 artifact per Flink version, with the > normal versioning scheme > that we use. But given that the shaded dependencies aren't changed often > (even rarely might be a stretch), > and aren't actually coupled to the Flink release cycle this doesn't make a > lot of sense. > > Having separate repos looks like a reasonable separation of concerns. The > release for Flink itself > would work just like it does now; we don't have to modify any scripts or > do extra steps. > > Since the build, release and development process are separate (since > flink-shaded isn't part of Flink build > process, has a separate release process and changes to it will /never > /require immediate changes to Flink) > it seems like a very good candidate to move it into a separate repo. > > > On 21.06.2017 11:26, Robert Metzger wrote: > >> Its not completely clear to me how we want to version the shaded >> dependencies, and where we are putting them. >> >> One concern are the official apache release rules. If we want to release >> something to maven central, we need to do a proper vote over a source >> archive. >> I would propose to create a new repository "flink-shaded.git" that >> contains >> the following maven module structure: >> - flink-shaded: 1 >> - flink-shaded-asm: 1-5.2 >> - flink-shaded-guava: 1-18.0 >> - ... >> >> The number indicates the version (for ASM, I've just guessed). >> The version for the parent "flink-shaded" needs to be updated on each >> parent pom change (new module added, new / changed plugins, ...) >> >> We could create a separate release script in this repository that creates >> the flink-shaded-src.zip from the code and deploys the artifacts to the >> maven staging area. >> >> The advantage of a separate repo would be that we don't need to maintain >> separate maven projects in the same git repo. >> Also, the src archives for the release vote can be created from the repo >> content (without much filtering). >> >> >> On Tue, Jun 20, 2017 at 9:44 PM, Stephan Ewen <[hidden email]> wrote: >> >> I like this approach. >>> >>> Two additional things can be mention here: >>> >>> - We need to deploy these artifacts independently and not as part of >>> the >>> build. That is a manual step once per "bump" in the dependency of that >>> library. >>> >>> - We reduce the shading complexity of the original build and should >>> thus >>> also speed up build times :-) >>> >>> Stephan >>> >>> >>> >>> >>> On Tue, Jun 20, 2017 at 1:15 PM, Chesnay Schepler <[hidden email]> >>> wrote: >>> >>> I would like to start working on this. >>>> >>>> I've looked into adding a flink-shaded-guava module. Working against the >>>> shaded namespaces seems >>>> to work without problems from the IDE, and we could forbid un-shaded >>>> usages with checkstyle. >>>> >>>> So for the list of dependencies that we want to shade we currently got: >>>> >>>> * asm >>>> * guava >>>> * netty >>>> * hadoop >>>> * curator >>>> >>>> I've had a chat with Stephan Ewan and he brought up kryo + chill as >>>> well. >>>> >>>> The nice thing is that we can do this incrementally, one dependency at a >>>> time. As such i would propose >>>> to go through the whole process for guava and see what problems arise. >>>> >>>> This would include adding a flink-shaded module and a child >>>> flink-shaded-guava module to the flink repository >>>> that are not part of the build process, replacing all usages of guava in >>>> Flink, adding the >>>> checkstyle rule (optional) and deploying the artifact to maven central. >>>> >>>> >>>> On 11.05.2017 10:54, Stephan Ewen wrote: >>>> >>>> @Ufuk - I have never set up artifact deployment in Maven, could need >>>>> >>>> some >>> >>>> help there. >>>>> >>>>> Regarding shading Netty, I agree, would be good to do that as well... >>>>> >>>>> On Thu, May 11, 2017 at 10:52 AM, Ufuk Celebi <[hidden email]> wrote: >>>>> >>>>> The advantages you've listed sound really compelling to me. >>>>> >>>>>> - Do you have time to implement these changes or do we need a >>>>>> >>>>> volunteer? >>> >>>> ;) >>>>>> >>>>>> - I assume that republishing the artifacts as you propose doesn't have >>>>>> any new legal implications since we already publish them with our >>>>>> JARs, right? >>>>>> >>>>>> - We might think about adding Netty to the list of shaded artifacts >>>>>> since some dependency conflicts were reported recently. Would have to >>>>>> double check the reported issues before doing that though. ;-) >>>>>> >>>>>> – Ufuk >>>>>> >>>>>> >>>>>> On Wed, May 10, 2017 at 8:45 PM, Stephan Ewen <[hidden email]> >>>>>> >>>>> wrote: >>> >>>> @chesnay: I used ASM as an example in the proposal. Maybe I did not >>>>>>> >>>>>> say >>> >>>> that clearly. >>>>>>> >>>>>>> If we like that approach, we should deal with the other libraries (at >>>>>>> >>>>>>> least >>>>>> >>>>>> the frequently used ones) in the same way. >>>>>>> >>>>>>> >>>>>>> I would imagine to have a project layout like that: >>>>>>> >>>>>>> flink-shaded-deps >>>>>>> - flink-shaded-asm >>>>>>> - flink-shaded-guava >>>>>>> - flink-shaded-curator >>>>>>> - flink-shaded-hadoop >>>>>>> >>>>>>> >>>>>>> "flink-shaded-deps" would not be built every time (and not be >>>>>>> released >>>>>>> every time), but only when needed. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Wed, May 10, 2017 at 7:28 PM, Chesnay Schepler < >>>>>>> [hidden email] >>>>>>> wrote: >>>>>>> >>>>>>> I like the idea, thank you for bringing it up. >>>>>>> >>>>>>>> Given that the raised problems aren't really ASM specific would it >>>>>>>> >>>>>>> make >>> >>>> sense to create one flink-shaded module that contains all frequently >>>>>>>> >>>>>>>> shaded >>>>>>> libraries? (or maybe even all shaded dependencies by core modules) >>>>>>> The >>>>>>> >>>>>>>> proposal limits the scope of this to ASM and i was wondering why. >>>>>>>> >>>>>>>> I also remember that there was a discussion recently about why we >>>>>>>> >>>>>>> shade >>> >>>> things at all, and the idea of working against the shaded namespaces >>>>>>>> was >>>>>>>> brought up. Back then i was expressing doubts as to whether IDE's >>>>>>>> >>>>>>> would >>> >>>> properly support this; what's the state on that? >>>>>>>> >>>>>>>> On 10.05.2017 18:18, Stephan Ewen wrote: >>>>>>>> >>>>>>>> Hi! >>>>>>>> >>>>>>>>> This is a discussion about altering the way we handle dependencies >>>>>>>>> >>>>>>>> and >>> >>>> shading in Flink. >>>>>>>>> I ran into quite a view problems trying to adjust / fix some >>>>>>>>> shading >>>>>>>>> issues >>>>>>>>> during release validation. >>>>>>>>> >>>>>>>>> The issue is tracked under: https://issues.apache.org/jira >>>>>>>>> /browse/FLINK-6529 >>>>>>>>> Bring this discussion thread up because it is a bigger issue >>>>>>>>> >>>>>>>>> *Problem* >>>>>>>>> >>>>>>>>> Currently, Flink shades dependencies like ASM and Guava into all >>>>>>>>> >>>>>>>> jars >>> >>>> of >>>>>>>> >>>>>>> projects that reference it and relocate the classes. >>>>>>> >>>>>>>> There are some drawbacks to that approach, let's discuss them at the >>>>>>>>> example of ASM: >>>>>>>>> >>>>>>>>> - The ASM classes are for example in flink-core, flink-java, >>>>>>>>> flink-scala, >>>>>>>>> flink-runtime, etc. >>>>>>>>> >>>>>>>>> - Users that reference these dependencies have the classes >>>>>>>>> multiple >>>>>>>>> times >>>>>>>>> in the classpath. That is unclean (works, through, because the >>>>>>>>> >>>>>>>> classes >>> >>>> are >>>>>>>> >>>>>>> identical). The same happens when building the final dist. jar. >>>>>>> >>>>>>>> - Some of these dependencies require to include license files >>>>>>>>> in >>>>>>>>> the >>>>>>>>> shaded jar. It is hard to impossible to build a good automatic >>>>>>>>> solution >>>>>>>>> for >>>>>>>>> that, partly due to Maven's very poor cross-project path support >>>>>>>>> >>>>>>>>> - Most importantly: Scala does not support shading really >>>>>>>>> well. >>>>>>>>> >>>>>>>>> Scala >>>>>>>> >>>>>>> classes have references to classes in more places than just the class >>>>>>> >>>>>>>> names >>>>>>>>> (apparently for Scala reflect support). Referencing a Scala project >>>>>>>>> >>>>>>>>> with >>>>>>>> >>>>>>> shaded ASM still requires to add a reference to unshaded ASM (at >>>>>>> least >>>>>>> >>>>>>>> as >>>>>>>> >>>>>>> a >>>>>>> >>>>>>>> compile dependency). >>>>>>>>> >>>>>>>>> *Proposal* >>>>>>>>> >>>>>>>>> I propose that we build and deploy a asm-flink-shaded version of >>>>>>>>> ASM >>>>>>>>> >>>>>>>>> and >>>>>>>> >>>>>>> directly program against the relocated namespaces. Since we never use >>>>>>> >>>>>>>> classes that we relocate in public interfaces, Flink users will >>>>>>>>> >>>>>>>> never >>> >>>> see >>>>>>>> >>>>>>> the relocated class names. Internally, it does not hurt to use them. >>>>>>> >>>>>>>> - Proper maven dependency management, no hidden (shaded) >>>>>>>>> >>>>>>>>> dependencies >>>>>>>> >>>>>>> - One copy of each class for shaded dependencies >>>>>>> >>>>>>>> - Proper Scala interoperability >>>>>>>>> >>>>>>>>> - Natural License management (license is part of deployed >>>>>>>>> asm-flink-shaded jar) >>>>>>>>> >>>>>>>>> >>>>>>>>> Happy to hear thoughts! >>>>>>>>> >>>>>>>>> Stephan >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> > |
The new repo was created and is accessible here:
https://github.com/apache/flink-shaded I've already opened a PR to add a shaded guava module. Once the shaded-guava module is merged I would like to do a first release of flink-shaded, only containing guava. I already have a branch with all the changes necessary to integrate this dependency into Flink. The alternative would be to first create shaded modules for all dependencies and make a single release, but I feel that would delay things quite a bit. On 21.06.2017 17:00, Robert Metzger wrote: > Okay, I'll request a repo for the shading. > > On Wed, Jun 21, 2017 at 1:38 PM, Chesnay Schepler <[hidden email]> > wrote: > >> I like your suggestion Robert. A lot actually. >> >> Having the actual dependency version (i.e 18 for guava) in the version >> should improve clarity a lot. >> >> Originally i intended to release 1 artifact per Flink version, with the >> normal versioning scheme >> that we use. But given that the shaded dependencies aren't changed often >> (even rarely might be a stretch), >> and aren't actually coupled to the Flink release cycle this doesn't make a >> lot of sense. >> >> Having separate repos looks like a reasonable separation of concerns. The >> release for Flink itself >> would work just like it does now; we don't have to modify any scripts or >> do extra steps. >> >> Since the build, release and development process are separate (since >> flink-shaded isn't part of Flink build >> process, has a separate release process and changes to it will /never >> /require immediate changes to Flink) >> it seems like a very good candidate to move it into a separate repo. >> >> >> On 21.06.2017 11:26, Robert Metzger wrote: >> >>> Its not completely clear to me how we want to version the shaded >>> dependencies, and where we are putting them. >>> >>> One concern are the official apache release rules. If we want to release >>> something to maven central, we need to do a proper vote over a source >>> archive. >>> I would propose to create a new repository "flink-shaded.git" that >>> contains >>> the following maven module structure: >>> - flink-shaded: 1 >>> - flink-shaded-asm: 1-5.2 >>> - flink-shaded-guava: 1-18.0 >>> - ... >>> >>> The number indicates the version (for ASM, I've just guessed). >>> The version for the parent "flink-shaded" needs to be updated on each >>> parent pom change (new module added, new / changed plugins, ...) >>> >>> We could create a separate release script in this repository that creates >>> the flink-shaded-src.zip from the code and deploys the artifacts to the >>> maven staging area. >>> >>> The advantage of a separate repo would be that we don't need to maintain >>> separate maven projects in the same git repo. >>> Also, the src archives for the release vote can be created from the repo >>> content (without much filtering). >>> >>> >>> On Tue, Jun 20, 2017 at 9:44 PM, Stephan Ewen <[hidden email]> wrote: >>> >>> I like this approach. >>>> Two additional things can be mention here: >>>> >>>> - We need to deploy these artifacts independently and not as part of >>>> the >>>> build. That is a manual step once per "bump" in the dependency of that >>>> library. >>>> >>>> - We reduce the shading complexity of the original build and should >>>> thus >>>> also speed up build times :-) >>>> >>>> Stephan >>>> >>>> >>>> >>>> >>>> On Tue, Jun 20, 2017 at 1:15 PM, Chesnay Schepler <[hidden email]> >>>> wrote: >>>> >>>> I would like to start working on this. >>>>> I've looked into adding a flink-shaded-guava module. Working against the >>>>> shaded namespaces seems >>>>> to work without problems from the IDE, and we could forbid un-shaded >>>>> usages with checkstyle. >>>>> >>>>> So for the list of dependencies that we want to shade we currently got: >>>>> >>>>> * asm >>>>> * guava >>>>> * netty >>>>> * hadoop >>>>> * curator >>>>> >>>>> I've had a chat with Stephan Ewan and he brought up kryo + chill as >>>>> well. >>>>> >>>>> The nice thing is that we can do this incrementally, one dependency at a >>>>> time. As such i would propose >>>>> to go through the whole process for guava and see what problems arise. >>>>> >>>>> This would include adding a flink-shaded module and a child >>>>> flink-shaded-guava module to the flink repository >>>>> that are not part of the build process, replacing all usages of guava in >>>>> Flink, adding the >>>>> checkstyle rule (optional) and deploying the artifact to maven central. >>>>> >>>>> >>>>> On 11.05.2017 10:54, Stephan Ewen wrote: >>>>> >>>>> @Ufuk - I have never set up artifact deployment in Maven, could need >>>>> some >>>>> help there. >>>>>> Regarding shading Netty, I agree, would be good to do that as well... >>>>>> >>>>>> On Thu, May 11, 2017 at 10:52 AM, Ufuk Celebi <[hidden email]> wrote: >>>>>> >>>>>> The advantages you've listed sound really compelling to me. >>>>>> >>>>>>> - Do you have time to implement these changes or do we need a >>>>>>> >>>>>> volunteer? >>>>> ;) >>>>>>> - I assume that republishing the artifacts as you propose doesn't have >>>>>>> any new legal implications since we already publish them with our >>>>>>> JARs, right? >>>>>>> >>>>>>> - We might think about adding Netty to the list of shaded artifacts >>>>>>> since some dependency conflicts were reported recently. Would have to >>>>>>> double check the reported issues before doing that though. ;-) >>>>>>> >>>>>>> – Ufuk >>>>>>> >>>>>>> >>>>>>> On Wed, May 10, 2017 at 8:45 PM, Stephan Ewen <[hidden email]> >>>>>>> >>>>>> wrote: >>>>> @chesnay: I used ASM as an example in the proposal. Maybe I did not >>>>>>> say >>>>> that clearly. >>>>>>>> If we like that approach, we should deal with the other libraries (at >>>>>>>> >>>>>>>> least >>>>>>> the frequently used ones) in the same way. >>>>>>>> >>>>>>>> I would imagine to have a project layout like that: >>>>>>>> >>>>>>>> flink-shaded-deps >>>>>>>> - flink-shaded-asm >>>>>>>> - flink-shaded-guava >>>>>>>> - flink-shaded-curator >>>>>>>> - flink-shaded-hadoop >>>>>>>> >>>>>>>> >>>>>>>> "flink-shaded-deps" would not be built every time (and not be >>>>>>>> released >>>>>>>> every time), but only when needed. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Wed, May 10, 2017 at 7:28 PM, Chesnay Schepler < >>>>>>>> [hidden email] >>>>>>>> wrote: >>>>>>>> >>>>>>>> I like the idea, thank you for bringing it up. >>>>>>>> >>>>>>>>> Given that the raised problems aren't really ASM specific would it >>>>>>>>> >>>>>>>> make >>>>> sense to create one flink-shaded module that contains all frequently >>>>>>>>> shaded >>>>>>>> libraries? (or maybe even all shaded dependencies by core modules) >>>>>>>> The >>>>>>>> >>>>>>>>> proposal limits the scope of this to ASM and i was wondering why. >>>>>>>>> >>>>>>>>> I also remember that there was a discussion recently about why we >>>>>>>>> >>>>>>>> shade >>>>> things at all, and the idea of working against the shaded namespaces >>>>>>>>> was >>>>>>>>> brought up. Back then i was expressing doubts as to whether IDE's >>>>>>>>> >>>>>>>> would >>>>> properly support this; what's the state on that? >>>>>>>>> On 10.05.2017 18:18, Stephan Ewen wrote: >>>>>>>>> >>>>>>>>> Hi! >>>>>>>>> >>>>>>>>>> This is a discussion about altering the way we handle dependencies >>>>>>>>>> >>>>>>>>> and >>>>> shading in Flink. >>>>>>>>>> I ran into quite a view problems trying to adjust / fix some >>>>>>>>>> shading >>>>>>>>>> issues >>>>>>>>>> during release validation. >>>>>>>>>> >>>>>>>>>> The issue is tracked under: https://issues.apache.org/jira >>>>>>>>>> /browse/FLINK-6529 >>>>>>>>>> Bring this discussion thread up because it is a bigger issue >>>>>>>>>> >>>>>>>>>> *Problem* >>>>>>>>>> >>>>>>>>>> Currently, Flink shades dependencies like ASM and Guava into all >>>>>>>>>> >>>>>>>>> jars >>>>> of >>>>>>>> projects that reference it and relocate the classes. >>>>>>>> >>>>>>>>> There are some drawbacks to that approach, let's discuss them at the >>>>>>>>>> example of ASM: >>>>>>>>>> >>>>>>>>>> - The ASM classes are for example in flink-core, flink-java, >>>>>>>>>> flink-scala, >>>>>>>>>> flink-runtime, etc. >>>>>>>>>> >>>>>>>>>> - Users that reference these dependencies have the classes >>>>>>>>>> multiple >>>>>>>>>> times >>>>>>>>>> in the classpath. That is unclean (works, through, because the >>>>>>>>>> >>>>>>>>> classes >>>>> are >>>>>>>> identical). The same happens when building the final dist. jar. >>>>>>>> >>>>>>>>> - Some of these dependencies require to include license files >>>>>>>>>> in >>>>>>>>>> the >>>>>>>>>> shaded jar. It is hard to impossible to build a good automatic >>>>>>>>>> solution >>>>>>>>>> for >>>>>>>>>> that, partly due to Maven's very poor cross-project path support >>>>>>>>>> >>>>>>>>>> - Most importantly: Scala does not support shading really >>>>>>>>>> well. >>>>>>>>>> >>>>>>>>>> Scala >>>>>>>> classes have references to classes in more places than just the class >>>>>>>> >>>>>>>>> names >>>>>>>>>> (apparently for Scala reflect support). Referencing a Scala project >>>>>>>>>> >>>>>>>>>> with >>>>>>>> shaded ASM still requires to add a reference to unshaded ASM (at >>>>>>>> least >>>>>>>> >>>>>>>>> as >>>>>>>>> >>>>>>>> a >>>>>>>> >>>>>>>>> compile dependency). >>>>>>>>>> *Proposal* >>>>>>>>>> >>>>>>>>>> I propose that we build and deploy a asm-flink-shaded version of >>>>>>>>>> ASM >>>>>>>>>> >>>>>>>>>> and >>>>>>>> directly program against the relocated namespaces. Since we never use >>>>>>>> >>>>>>>>> classes that we relocate in public interfaces, Flink users will >>>>>>>>> never >>>>> see >>>>>>>> the relocated class names. Internally, it does not hurt to use them. >>>>>>>> >>>>>>>>> - Proper maven dependency management, no hidden (shaded) >>>>>>>>>> dependencies >>>>>>>> - One copy of each class for shaded dependencies >>>>>>>> >>>>>>>>> - Proper Scala interoperability >>>>>>>>>> - Natural License management (license is part of deployed >>>>>>>>>> asm-flink-shaded jar) >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Happy to hear thoughts! >>>>>>>>>> >>>>>>>>>> Stephan >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> |
Looks good, thanks Chesnay!.
How about including the dependency version names in the module names, like "flink-shaded-guava-18.0"? On Mon, Jun 26, 2017 at 11:49 AM, Chesnay Schepler <[hidden email]> wrote: > The new repo was created and is accessible here: > https://github.com/apache/flink-shaded > > I've already opened a PR to add a shaded guava module. > > Once the shaded-guava module is merged I would like to do a first release > of flink-shaded, > only containing guava. I already have a branch with all the changes > necessary to > integrate this dependency into Flink. > > The alternative would be to first create shaded modules for all > dependencies and make a > single release, but I feel that would delay things quite a bit. > > > > On 21.06.2017 17:00, Robert Metzger wrote: > >> Okay, I'll request a repo for the shading. >> >> On Wed, Jun 21, 2017 at 1:38 PM, Chesnay Schepler <[hidden email]> >> wrote: >> >> I like your suggestion Robert. A lot actually. >>> >>> Having the actual dependency version (i.e 18 for guava) in the version >>> should improve clarity a lot. >>> >>> Originally i intended to release 1 artifact per Flink version, with the >>> normal versioning scheme >>> that we use. But given that the shaded dependencies aren't changed often >>> (even rarely might be a stretch), >>> and aren't actually coupled to the Flink release cycle this doesn't make >>> a >>> lot of sense. >>> >>> Having separate repos looks like a reasonable separation of concerns. The >>> release for Flink itself >>> would work just like it does now; we don't have to modify any scripts or >>> do extra steps. >>> >>> Since the build, release and development process are separate (since >>> flink-shaded isn't part of Flink build >>> process, has a separate release process and changes to it will /never >>> /require immediate changes to Flink) >>> it seems like a very good candidate to move it into a separate repo. >>> >>> >>> On 21.06.2017 11:26, Robert Metzger wrote: >>> >>> Its not completely clear to me how we want to version the shaded >>>> dependencies, and where we are putting them. >>>> >>>> One concern are the official apache release rules. If we want to release >>>> something to maven central, we need to do a proper vote over a source >>>> archive. >>>> I would propose to create a new repository "flink-shaded.git" that >>>> contains >>>> the following maven module structure: >>>> - flink-shaded: 1 >>>> - flink-shaded-asm: 1-5.2 >>>> - flink-shaded-guava: 1-18.0 >>>> - ... >>>> >>>> The number indicates the version (for ASM, I've just guessed). >>>> The version for the parent "flink-shaded" needs to be updated on each >>>> parent pom change (new module added, new / changed plugins, ...) >>>> >>>> We could create a separate release script in this repository that >>>> creates >>>> the flink-shaded-src.zip from the code and deploys the artifacts to the >>>> maven staging area. >>>> >>>> The advantage of a separate repo would be that we don't need to maintain >>>> separate maven projects in the same git repo. >>>> Also, the src archives for the release vote can be created from the repo >>>> content (without much filtering). >>>> >>>> >>>> On Tue, Jun 20, 2017 at 9:44 PM, Stephan Ewen <[hidden email]> wrote: >>>> >>>> I like this approach. >>>> >>>>> Two additional things can be mention here: >>>>> >>>>> - We need to deploy these artifacts independently and not as part >>>>> of >>>>> the >>>>> build. That is a manual step once per "bump" in the dependency of that >>>>> library. >>>>> >>>>> - We reduce the shading complexity of the original build and should >>>>> thus >>>>> also speed up build times :-) >>>>> >>>>> Stephan >>>>> >>>>> >>>>> >>>>> >>>>> On Tue, Jun 20, 2017 at 1:15 PM, Chesnay Schepler <[hidden email]> >>>>> wrote: >>>>> >>>>> I would like to start working on this. >>>>> >>>>>> I've looked into adding a flink-shaded-guava module. Working against >>>>>> the >>>>>> shaded namespaces seems >>>>>> to work without problems from the IDE, and we could forbid un-shaded >>>>>> usages with checkstyle. >>>>>> >>>>>> So for the list of dependencies that we want to shade we currently >>>>>> got: >>>>>> >>>>>> * asm >>>>>> * guava >>>>>> * netty >>>>>> * hadoop >>>>>> * curator >>>>>> >>>>>> I've had a chat with Stephan Ewan and he brought up kryo + chill as >>>>>> well. >>>>>> >>>>>> The nice thing is that we can do this incrementally, one dependency >>>>>> at a >>>>>> time. As such i would propose >>>>>> to go through the whole process for guava and see what problems arise. >>>>>> >>>>>> This would include adding a flink-shaded module and a child >>>>>> flink-shaded-guava module to the flink repository >>>>>> that are not part of the build process, replacing all usages of guava >>>>>> in >>>>>> Flink, adding the >>>>>> checkstyle rule (optional) and deploying the artifact to maven >>>>>> central. >>>>>> >>>>>> >>>>>> On 11.05.2017 10:54, Stephan Ewen wrote: >>>>>> >>>>>> @Ufuk - I have never set up artifact deployment in Maven, could need >>>>>> some >>>>>> help there. >>>>>> >>>>>>> Regarding shading Netty, I agree, would be good to do that as well... >>>>>>> >>>>>>> On Thu, May 11, 2017 at 10:52 AM, Ufuk Celebi <[hidden email]> >>>>>>> wrote: >>>>>>> >>>>>>> The advantages you've listed sound really compelling to me. >>>>>>> >>>>>>> - Do you have time to implement these changes or do we need a >>>>>>>> >>>>>>>> volunteer? >>>>>>> >>>>>> ;) >>>>>> >>>>>>> - I assume that republishing the artifacts as you propose doesn't >>>>>>>> have >>>>>>>> any new legal implications since we already publish them with our >>>>>>>> JARs, right? >>>>>>>> >>>>>>>> - We might think about adding Netty to the list of shaded artifacts >>>>>>>> since some dependency conflicts were reported recently. Would have >>>>>>>> to >>>>>>>> double check the reported issues before doing that though. ;-) >>>>>>>> >>>>>>>> – Ufuk >>>>>>>> >>>>>>>> >>>>>>>> On Wed, May 10, 2017 at 8:45 PM, Stephan Ewen <[hidden email]> >>>>>>>> >>>>>>>> wrote: >>>>>>> >>>>>> @chesnay: I used ASM as an example in the proposal. Maybe I did not >>>>>> >>>>>>> say >>>>>>>> >>>>>>> that clearly. >>>>>> >>>>>>> If we like that approach, we should deal with the other libraries (at >>>>>>>>> >>>>>>>>> least >>>>>>>>> >>>>>>>> the frequently used ones) in the same way. >>>>>>>> >>>>>>>>> >>>>>>>>> I would imagine to have a project layout like that: >>>>>>>>> >>>>>>>>> flink-shaded-deps >>>>>>>>> - flink-shaded-asm >>>>>>>>> - flink-shaded-guava >>>>>>>>> - flink-shaded-curator >>>>>>>>> - flink-shaded-hadoop >>>>>>>>> >>>>>>>>> >>>>>>>>> "flink-shaded-deps" would not be built every time (and not be >>>>>>>>> released >>>>>>>>> every time), but only when needed. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Wed, May 10, 2017 at 7:28 PM, Chesnay Schepler < >>>>>>>>> [hidden email] >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> I like the idea, thank you for bringing it up. >>>>>>>>> >>>>>>>>> Given that the raised problems aren't really ASM specific would it >>>>>>>>>> >>>>>>>>>> make >>>>>>>>> >>>>>>>> sense to create one flink-shaded module that contains all frequently >>>>>> >>>>>>> shaded >>>>>>>>>> >>>>>>>>> libraries? (or maybe even all shaded dependencies by core modules) >>>>>>>>> The >>>>>>>>> >>>>>>>>> proposal limits the scope of this to ASM and i was wondering why. >>>>>>>>>> >>>>>>>>>> I also remember that there was a discussion recently about why we >>>>>>>>>> >>>>>>>>>> shade >>>>>>>>> >>>>>>>> things at all, and the idea of working against the shaded namespaces >>>>>> >>>>>>> was >>>>>>>>>> brought up. Back then i was expressing doubts as to whether IDE's >>>>>>>>>> >>>>>>>>>> would >>>>>>>>> >>>>>>>> properly support this; what's the state on that? >>>>>> >>>>>>> On 10.05.2017 18:18, Stephan Ewen wrote: >>>>>>>>>> >>>>>>>>>> Hi! >>>>>>>>>> >>>>>>>>>> This is a discussion about altering the way we handle dependencies >>>>>>>>>>> >>>>>>>>>>> and >>>>>>>>>> >>>>>>>>> shading in Flink. >>>>>> >>>>>>> I ran into quite a view problems trying to adjust / fix some >>>>>>>>>>> shading >>>>>>>>>>> issues >>>>>>>>>>> during release validation. >>>>>>>>>>> >>>>>>>>>>> The issue is tracked under: https://issues.apache.org/jira >>>>>>>>>>> /browse/FLINK-6529 >>>>>>>>>>> Bring this discussion thread up because it is a bigger issue >>>>>>>>>>> >>>>>>>>>>> *Problem* >>>>>>>>>>> >>>>>>>>>>> Currently, Flink shades dependencies like ASM and Guava into all >>>>>>>>>>> >>>>>>>>>>> jars >>>>>>>>>> >>>>>>>>> of >>>>>> >>>>>>> projects that reference it and relocate the classes. >>>>>>>>> >>>>>>>>> There are some drawbacks to that approach, let's discuss them at >>>>>>>>>> the >>>>>>>>>> >>>>>>>>>>> example of ASM: >>>>>>>>>>> >>>>>>>>>>> - The ASM classes are for example in flink-core, >>>>>>>>>>> flink-java, >>>>>>>>>>> flink-scala, >>>>>>>>>>> flink-runtime, etc. >>>>>>>>>>> >>>>>>>>>>> - Users that reference these dependencies have the classes >>>>>>>>>>> multiple >>>>>>>>>>> times >>>>>>>>>>> in the classpath. That is unclean (works, through, because the >>>>>>>>>>> >>>>>>>>>>> classes >>>>>>>>>> >>>>>>>>> are >>>>>> >>>>>>> identical). The same happens when building the final dist. jar. >>>>>>>>> >>>>>>>>> - Some of these dependencies require to include license files >>>>>>>>>> >>>>>>>>>>> in >>>>>>>>>>> the >>>>>>>>>>> shaded jar. It is hard to impossible to build a good automatic >>>>>>>>>>> solution >>>>>>>>>>> for >>>>>>>>>>> that, partly due to Maven's very poor cross-project path support >>>>>>>>>>> >>>>>>>>>>> - Most importantly: Scala does not support shading really >>>>>>>>>>> well. >>>>>>>>>>> >>>>>>>>>>> Scala >>>>>>>>>>> >>>>>>>>>> classes have references to classes in more places than just the >>>>>>>>> class >>>>>>>>> >>>>>>>>> names >>>>>>>>>> >>>>>>>>>>> (apparently for Scala reflect support). Referencing a Scala >>>>>>>>>>> project >>>>>>>>>>> >>>>>>>>>>> with >>>>>>>>>>> >>>>>>>>>> shaded ASM still requires to add a reference to unshaded ASM (at >>>>>>>>> least >>>>>>>>> >>>>>>>>> as >>>>>>>>>> >>>>>>>>>> a >>>>>>>>> >>>>>>>>> compile dependency). >>>>>>>>>> >>>>>>>>>>> *Proposal* >>>>>>>>>>> >>>>>>>>>>> I propose that we build and deploy a asm-flink-shaded version of >>>>>>>>>>> ASM >>>>>>>>>>> >>>>>>>>>>> and >>>>>>>>>>> >>>>>>>>>> directly program against the relocated namespaces. Since we never >>>>>>>>> use >>>>>>>>> >>>>>>>>> classes that we relocate in public interfaces, Flink users will >>>>>>>>>> never >>>>>>>>>> >>>>>>>>> see >>>>>> >>>>>>> the relocated class names. Internally, it does not hurt to use them. >>>>>>>>> >>>>>>>>> - Proper maven dependency management, no hidden (shaded) >>>>>>>>>> >>>>>>>>>>> dependencies >>>>>>>>>>> >>>>>>>>>> - One copy of each class for shaded dependencies >>>>>>>>> >>>>>>>>> - Proper Scala interoperability >>>>>>>>>> >>>>>>>>>>> - Natural License management (license is part of deployed >>>>>>>>>>> asm-flink-shaded jar) >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Happy to hear thoughts! >>>>>>>>>>> >>>>>>>>>>> Stephan >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> > |
In reply to this post by Stephan Ewen
+1 on including dependency version.
-------- Original message --------From: Stephan Ewen <[hidden email]> Date: 6/26/17 5:01 AM (GMT-08:00) To: [hidden email] Subject: Re: [DISCUSS] Changing Flink's shading model Looks good, thanks Chesnay!. How about including the dependency version names in the module names, like "flink-shaded-guava-18.0"? On Mon, Jun 26, 2017 at 11:49 AM, Chesnay Schepler <[hidden email]> wrote: > The new repo was created and is accessible here: > https://github.com/apache/flink-shaded > > I've already opened a PR to add a shaded guava module. > > Once the shaded-guava module is merged I would like to do a first release > of flink-shaded, > only containing guava. I already have a branch with all the changes > necessary to > integrate this dependency into Flink. > > The alternative would be to first create shaded modules for all > dependencies and make a > single release, but I feel that would delay things quite a bit. > > > > On 21.06.2017 17:00, Robert Metzger wrote: > >> Okay, I'll request a repo for the shading. >> >> On Wed, Jun 21, 2017 at 1:38 PM, Chesnay Schepler <[hidden email]> >> wrote: >> >> I like your suggestion Robert. A lot actually. >>> >>> Having the actual dependency version (i.e 18 for guava) in the version >>> should improve clarity a lot. >>> >>> Originally i intended to release 1 artifact per Flink version, with the >>> normal versioning scheme >>> that we use. But given that the shaded dependencies aren't changed often >>> (even rarely might be a stretch), >>> and aren't actually coupled to the Flink release cycle this doesn't make >>> a >>> lot of sense. >>> >>> Having separate repos looks like a reasonable separation of concerns. The >>> release for Flink itself >>> would work just like it does now; we don't have to modify any scripts or >>> do extra steps. >>> >>> Since the build, release and development process are separate (since >>> flink-shaded isn't part of Flink build >>> process, has a separate release process and changes to it will /never >>> /require immediate changes to Flink) >>> it seems like a very good candidate to move it into a separate repo. >>> >>> >>> On 21.06.2017 11:26, Robert Metzger wrote: >>> >>> Its not completely clear to me how we want to version the shaded >>>> dependencies, and where we are putting them. >>>> >>>> One concern are the official apache release rules. If we want to release >>>> something to maven central, we need to do a proper vote over a source >>>> archive. >>>> I would propose to create a new repository "flink-shaded.git" that >>>> contains >>>> the following maven module structure: >>>> - flink-shaded: 1 >>>> - flink-shaded-asm: 1-5.2 >>>> - flink-shaded-guava: 1-18.0 >>>> - ... >>>> >>>> The number indicates the version (for ASM, I've just guessed). >>>> The version for the parent "flink-shaded" needs to be updated on each >>>> parent pom change (new module added, new / changed plugins, ...) >>>> >>>> We could create a separate release script in this repository that >>>> creates >>>> the flink-shaded-src.zip from the code and deploys the artifacts to the >>>> maven staging area. >>>> >>>> The advantage of a separate repo would be that we don't need to maintain >>>> separate maven projects in the same git repo. >>>> Also, the src archives for the release vote can be created from the repo >>>> content (without much filtering). >>>> >>>> >>>> On Tue, Jun 20, 2017 at 9:44 PM, Stephan Ewen <[hidden email]> wrote: >>>> >>>> I like this approach. >>>> >>>>> Two additional things can be mention here: >>>>> >>>>> - We need to deploy these artifacts independently and not as part >>>>> of >>>>> the >>>>> build. That is a manual step once per "bump" in the dependency of that >>>>> library. >>>>> >>>>> - We reduce the shading complexity of the original build and should >>>>> thus >>>>> also speed up build times :-) >>>>> >>>>> Stephan >>>>> >>>>> >>>>> >>>>> >>>>> On Tue, Jun 20, 2017 at 1:15 PM, Chesnay Schepler <[hidden email]> >>>>> wrote: >>>>> >>>>> I would like to start working on this. >>>>> >>>>>> I've looked into adding a flink-shaded-guava module. Working against >>>>>> the >>>>>> shaded namespaces seems >>>>>> to work without problems from the IDE, and we could forbid un-shaded >>>>>> usages with checkstyle. >>>>>> >>>>>> So for the list of dependencies that we want to shade we currently >>>>>> got: >>>>>> >>>>>> * asm >>>>>> * guava >>>>>> * netty >>>>>> * hadoop >>>>>> * curator >>>>>> >>>>>> I've had a chat with Stephan Ewan and he brought up kryo + chill as >>>>>> well. >>>>>> >>>>>> The nice thing is that we can do this incrementally, one dependency >>>>>> at a >>>>>> time. As such i would propose >>>>>> to go through the whole process for guava and see what problems arise. >>>>>> >>>>>> This would include adding a flink-shaded module and a child >>>>>> flink-shaded-guava module to the flink repository >>>>>> that are not part of the build process, replacing all usages of guava >>>>>> in >>>>>> Flink, adding the >>>>>> checkstyle rule (optional) and deploying the artifact to maven >>>>>> central. >>>>>> >>>>>> >>>>>> On 11.05.2017 10:54, Stephan Ewen wrote: >>>>>> >>>>>> @Ufuk - I have never set up artifact deployment in Maven, could need >>>>>> some >>>>>> help there. >>>>>> >>>>>>> Regarding shading Netty, I agree, would be good to do that as well... >>>>>>> >>>>>>> On Thu, May 11, 2017 at 10:52 AM, Ufuk Celebi <[hidden email]> >>>>>>> wrote: >>>>>>> >>>>>>> The advantages you've listed sound really compelling to me. >>>>>>> >>>>>>> - Do you have time to implement these changes or do we need a >>>>>>>> >>>>>>>> volunteer? >>>>>>> >>>>>> ;) >>>>>> >>>>>>> - I assume that republishing the artifacts as you propose doesn't >>>>>>>> have >>>>>>>> any new legal implications since we already publish them with our >>>>>>>> JARs, right? >>>>>>>> >>>>>>>> - We might think about adding Netty to the list of shaded artifacts >>>>>>>> since some dependency conflicts were reported recently. Would have >>>>>>>> to >>>>>>>> double check the reported issues before doing that though. ;-) >>>>>>>> >>>>>>>> – Ufuk >>>>>>>> >>>>>>>> >>>>>>>> On Wed, May 10, 2017 at 8:45 PM, Stephan Ewen <[hidden email]> >>>>>>>> >>>>>>>> wrote: >>>>>>> >>>>>> @chesnay: I used ASM as an example in the proposal. Maybe I did not >>>>>> >>>>>>> say >>>>>>>> >>>>>>> that clearly. >>>>>> >>>>>>> If we like that approach, we should deal with the other libraries (at >>>>>>>>> >>>>>>>>> least >>>>>>>>> >>>>>>>> the frequently used ones) in the same way. >>>>>>>> >>>>>>>>> >>>>>>>>> I would imagine to have a project layout like that: >>>>>>>>> >>>>>>>>> flink-shaded-deps >>>>>>>>> - flink-shaded-asm >>>>>>>>> - flink-shaded-guava >>>>>>>>> - flink-shaded-curator >>>>>>>>> - flink-shaded-hadoop >>>>>>>>> >>>>>>>>> >>>>>>>>> "flink-shaded-deps" would not be built every time (and not be >>>>>>>>> released >>>>>>>>> every time), but only when needed. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Wed, May 10, 2017 at 7:28 PM, Chesnay Schepler < >>>>>>>>> [hidden email] >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> I like the idea, thank you for bringing it up. >>>>>>>>> >>>>>>>>> Given that the raised problems aren't really ASM specific would it >>>>>>>>>> >>>>>>>>>> make >>>>>>>>> >>>>>>>> sense to create one flink-shaded module that contains all frequently >>>>>> >>>>>>> shaded >>>>>>>>>> >>>>>>>>> libraries? (or maybe even all shaded dependencies by core modules) >>>>>>>>> The >>>>>>>>> >>>>>>>>> proposal limits the scope of this to ASM and i was wondering why. >>>>>>>>>> >>>>>>>>>> I also remember that there was a discussion recently about why we >>>>>>>>>> >>>>>>>>>> shade >>>>>>>>> >>>>>>>> things at all, and the idea of working against the shaded namespaces >>>>>> >>>>>>> was >>>>>>>>>> brought up. Back then i was expressing doubts as to whether IDE's >>>>>>>>>> >>>>>>>>>> would >>>>>>>>> >>>>>>>> properly support this; what's the state on that? >>>>>> >>>>>>> On 10.05.2017 18:18, Stephan Ewen wrote: >>>>>>>>>> >>>>>>>>>> Hi! >>>>>>>>>> >>>>>>>>>> This is a discussion about altering the way we handle dependencies >>>>>>>>>>> >>>>>>>>>>> and >>>>>>>>>> >>>>>>>>> shading in Flink. >>>>>> >>>>>>> I ran into quite a view problems trying to adjust / fix some >>>>>>>>>>> shading >>>>>>>>>>> issues >>>>>>>>>>> during release validation. >>>>>>>>>>> >>>>>>>>>>> The issue is tracked under: https://issues.apache.org/jira >>>>>>>>>>> /browse/FLINK-6529 >>>>>>>>>>> Bring this discussion thread up because it is a bigger issue >>>>>>>>>>> >>>>>>>>>>> *Problem* >>>>>>>>>>> >>>>>>>>>>> Currently, Flink shades dependencies like ASM and Guava into all >>>>>>>>>>> >>>>>>>>>>> jars >>>>>>>>>> >>>>>>>>> of >>>>>> >>>>>>> projects that reference it and relocate the classes. >>>>>>>>> >>>>>>>>> There are some drawbacks to that approach, let's discuss them at >>>>>>>>>> the >>>>>>>>>> >>>>>>>>>>> example of ASM: >>>>>>>>>>> >>>>>>>>>>> - The ASM classes are for example in flink-core, >>>>>>>>>>> flink-java, >>>>>>>>>>> flink-scala, >>>>>>>>>>> flink-runtime, etc. >>>>>>>>>>> >>>>>>>>>>> - Users that reference these dependencies have the classes >>>>>>>>>>> multiple >>>>>>>>>>> times >>>>>>>>>>> in the classpath. That is unclean (works, through, because the >>>>>>>>>>> >>>>>>>>>>> classes >>>>>>>>>> >>>>>>>>> are >>>>>> >>>>>>> identical). The same happens when building the final dist. jar. >>>>>>>>> >>>>>>>>> - Some of these dependencies require to include license files >>>>>>>>>> >>>>>>>>>>> in >>>>>>>>>>> the >>>>>>>>>>> shaded jar. It is hard to impossible to build a good automatic >>>>>>>>>>> solution >>>>>>>>>>> for >>>>>>>>>>> that, partly due to Maven's very poor cross-project path support >>>>>>>>>>> >>>>>>>>>>> - Most importantly: Scala does not support shading really >>>>>>>>>>> well. >>>>>>>>>>> >>>>>>>>>>> Scala >>>>>>>>>>> >>>>>>>>>> classes have references to classes in more places than just the >>>>>>>>> class >>>>>>>>> >>>>>>>>> names >>>>>>>>>> >>>>>>>>>>> (apparently for Scala reflect support). Referencing a Scala >>>>>>>>>>> project >>>>>>>>>>> >>>>>>>>>>> with >>>>>>>>>>> >>>>>>>>>> shaded ASM still requires to add a reference to unshaded ASM (at >>>>>>>>> least >>>>>>>>> >>>>>>>>> as >>>>>>>>>> >>>>>>>>>> a >>>>>>>>> >>>>>>>>> compile dependency). >>>>>>>>>> >>>>>>>>>>> *Proposal* >>>>>>>>>>> >>>>>>>>>>> I propose that we build and deploy a asm-flink-shaded version of >>>>>>>>>>> ASM >>>>>>>>>>> >>>>>>>>>>> and >>>>>>>>>>> >>>>>>>>>> directly program against the relocated namespaces. Since we never >>>>>>>>> use >>>>>>>>> >>>>>>>>> classes that we relocate in public interfaces, Flink users will >>>>>>>>>> never >>>>>>>>>> >>>>>>>>> see >>>>>> >>>>>>> the relocated class names. Internally, it does not hurt to use them. >>>>>>>>> >>>>>>>>> - Proper maven dependency management, no hidden (shaded) >>>>>>>>>> >>>>>>>>>>> dependencies >>>>>>>>>>> >>>>>>>>>> - One copy of each class for shaded dependencies >>>>>>>>> >>>>>>>>> - Proper Scala interoperability >>>>>>>>>> >>>>>>>>>>> - Natural License management (license is part of deployed >>>>>>>>>>> asm-flink-shaded jar) >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Happy to hear thoughts! >>>>>>>>>>> >>>>>>>>>>> Stephan >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> > |
In reply to this post by Stephan Ewen
the guava version is already included in the version of the flink-shaded
module. For example, for the first release of flink-shaded-guava the version would be 1-18.0. 1 is the version of flink-shaded itself, 18.0 is the guava version. On 26.06.2017 14:01, Stephan Ewen wrote: > Looks good, thanks Chesnay!. > > How about including the dependency version names in the module names, like > "flink-shaded-guava-18.0"? > > On Mon, Jun 26, 2017 at 11:49 AM, Chesnay Schepler <[hidden email]> > wrote: > >> The new repo was created and is accessible here: >> https://github.com/apache/flink-shaded >> >> I've already opened a PR to add a shaded guava module. >> >> Once the shaded-guava module is merged I would like to do a first release >> of flink-shaded, >> only containing guava. I already have a branch with all the changes >> necessary to >> integrate this dependency into Flink. >> >> The alternative would be to first create shaded modules for all >> dependencies and make a >> single release, but I feel that would delay things quite a bit. >> >> >> >> On 21.06.2017 17:00, Robert Metzger wrote: >> >>> Okay, I'll request a repo for the shading. >>> >>> On Wed, Jun 21, 2017 at 1:38 PM, Chesnay Schepler <[hidden email]> >>> wrote: >>> >>> I like your suggestion Robert. A lot actually. >>>> Having the actual dependency version (i.e 18 for guava) in the version >>>> should improve clarity a lot. >>>> >>>> Originally i intended to release 1 artifact per Flink version, with the >>>> normal versioning scheme >>>> that we use. But given that the shaded dependencies aren't changed often >>>> (even rarely might be a stretch), >>>> and aren't actually coupled to the Flink release cycle this doesn't make >>>> a >>>> lot of sense. >>>> >>>> Having separate repos looks like a reasonable separation of concerns. The >>>> release for Flink itself >>>> would work just like it does now; we don't have to modify any scripts or >>>> do extra steps. >>>> >>>> Since the build, release and development process are separate (since >>>> flink-shaded isn't part of Flink build >>>> process, has a separate release process and changes to it will /never >>>> /require immediate changes to Flink) >>>> it seems like a very good candidate to move it into a separate repo. >>>> >>>> >>>> On 21.06.2017 11:26, Robert Metzger wrote: >>>> >>>> Its not completely clear to me how we want to version the shaded >>>>> dependencies, and where we are putting them. >>>>> >>>>> One concern are the official apache release rules. If we want to release >>>>> something to maven central, we need to do a proper vote over a source >>>>> archive. >>>>> I would propose to create a new repository "flink-shaded.git" that >>>>> contains >>>>> the following maven module structure: >>>>> - flink-shaded: 1 >>>>> - flink-shaded-asm: 1-5.2 >>>>> - flink-shaded-guava: 1-18.0 >>>>> - ... >>>>> >>>>> The number indicates the version (for ASM, I've just guessed). >>>>> The version for the parent "flink-shaded" needs to be updated on each >>>>> parent pom change (new module added, new / changed plugins, ...) >>>>> >>>>> We could create a separate release script in this repository that >>>>> creates >>>>> the flink-shaded-src.zip from the code and deploys the artifacts to the >>>>> maven staging area. >>>>> >>>>> The advantage of a separate repo would be that we don't need to maintain >>>>> separate maven projects in the same git repo. >>>>> Also, the src archives for the release vote can be created from the repo >>>>> content (without much filtering). >>>>> >>>>> >>>>> On Tue, Jun 20, 2017 at 9:44 PM, Stephan Ewen <[hidden email]> wrote: >>>>> >>>>> I like this approach. >>>>> >>>>>> Two additional things can be mention here: >>>>>> >>>>>> - We need to deploy these artifacts independently and not as part >>>>>> of >>>>>> the >>>>>> build. That is a manual step once per "bump" in the dependency of that >>>>>> library. >>>>>> >>>>>> - We reduce the shading complexity of the original build and should >>>>>> thus >>>>>> also speed up build times :-) >>>>>> >>>>>> Stephan >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Tue, Jun 20, 2017 at 1:15 PM, Chesnay Schepler <[hidden email]> >>>>>> wrote: >>>>>> >>>>>> I would like to start working on this. >>>>>> >>>>>>> I've looked into adding a flink-shaded-guava module. Working against >>>>>>> the >>>>>>> shaded namespaces seems >>>>>>> to work without problems from the IDE, and we could forbid un-shaded >>>>>>> usages with checkstyle. >>>>>>> >>>>>>> So for the list of dependencies that we want to shade we currently >>>>>>> got: >>>>>>> >>>>>>> * asm >>>>>>> * guava >>>>>>> * netty >>>>>>> * hadoop >>>>>>> * curator >>>>>>> >>>>>>> I've had a chat with Stephan Ewan and he brought up kryo + chill as >>>>>>> well. >>>>>>> >>>>>>> The nice thing is that we can do this incrementally, one dependency >>>>>>> at a >>>>>>> time. As such i would propose >>>>>>> to go through the whole process for guava and see what problems arise. >>>>>>> >>>>>>> This would include adding a flink-shaded module and a child >>>>>>> flink-shaded-guava module to the flink repository >>>>>>> that are not part of the build process, replacing all usages of guava >>>>>>> in >>>>>>> Flink, adding the >>>>>>> checkstyle rule (optional) and deploying the artifact to maven >>>>>>> central. >>>>>>> >>>>>>> >>>>>>> On 11.05.2017 10:54, Stephan Ewen wrote: >>>>>>> >>>>>>> @Ufuk - I have never set up artifact deployment in Maven, could need >>>>>>> some >>>>>>> help there. >>>>>>> >>>>>>>> Regarding shading Netty, I agree, would be good to do that as well... >>>>>>>> >>>>>>>> On Thu, May 11, 2017 at 10:52 AM, Ufuk Celebi <[hidden email]> >>>>>>>> wrote: >>>>>>>> >>>>>>>> The advantages you've listed sound really compelling to me. >>>>>>>> >>>>>>>> - Do you have time to implement these changes or do we need a >>>>>>>>> volunteer? >>>>>>> ;) >>>>>>> >>>>>>>> - I assume that republishing the artifacts as you propose doesn't >>>>>>>>> have >>>>>>>>> any new legal implications since we already publish them with our >>>>>>>>> JARs, right? >>>>>>>>> >>>>>>>>> - We might think about adding Netty to the list of shaded artifacts >>>>>>>>> since some dependency conflicts were reported recently. Would have >>>>>>>>> to >>>>>>>>> double check the reported issues before doing that though. ;-) >>>>>>>>> >>>>>>>>> – Ufuk >>>>>>>>> >>>>>>>>> >>>>>>>>> On Wed, May 10, 2017 at 8:45 PM, Stephan Ewen <[hidden email]> >>>>>>>>> >>>>>>>>> wrote: >>>>>>> @chesnay: I used ASM as an example in the proposal. Maybe I did not >>>>>>> >>>>>>>> say >>>>>>>> that clearly. >>>>>>>> If we like that approach, we should deal with the other libraries (at >>>>>>>>>> least >>>>>>>>>> >>>>>>>>> the frequently used ones) in the same way. >>>>>>>>> >>>>>>>>>> I would imagine to have a project layout like that: >>>>>>>>>> >>>>>>>>>> flink-shaded-deps >>>>>>>>>> - flink-shaded-asm >>>>>>>>>> - flink-shaded-guava >>>>>>>>>> - flink-shaded-curator >>>>>>>>>> - flink-shaded-hadoop >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> "flink-shaded-deps" would not be built every time (and not be >>>>>>>>>> released >>>>>>>>>> every time), but only when needed. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Wed, May 10, 2017 at 7:28 PM, Chesnay Schepler < >>>>>>>>>> [hidden email] >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> I like the idea, thank you for bringing it up. >>>>>>>>>> >>>>>>>>>> Given that the raised problems aren't really ASM specific would it >>>>>>>>>>> make >>>>>>>>> sense to create one flink-shaded module that contains all frequently >>>>>>>> shaded >>>>>>>>>> libraries? (or maybe even all shaded dependencies by core modules) >>>>>>>>>> The >>>>>>>>>> >>>>>>>>>> proposal limits the scope of this to ASM and i was wondering why. >>>>>>>>>>> I also remember that there was a discussion recently about why we >>>>>>>>>>> >>>>>>>>>>> shade >>>>>>>>> things at all, and the idea of working against the shaded namespaces >>>>>>>> was >>>>>>>>>>> brought up. Back then i was expressing doubts as to whether IDE's >>>>>>>>>>> >>>>>>>>>>> would >>>>>>>>> properly support this; what's the state on that? >>>>>>>> On 10.05.2017 18:18, Stephan Ewen wrote: >>>>>>>>>>> Hi! >>>>>>>>>>> >>>>>>>>>>> This is a discussion about altering the way we handle dependencies >>>>>>>>>>>> and >>>>>>>>>> shading in Flink. >>>>>>>> I ran into quite a view problems trying to adjust / fix some >>>>>>>>>>>> shading >>>>>>>>>>>> issues >>>>>>>>>>>> during release validation. >>>>>>>>>>>> >>>>>>>>>>>> The issue is tracked under: https://issues.apache.org/jira >>>>>>>>>>>> /browse/FLINK-6529 >>>>>>>>>>>> Bring this discussion thread up because it is a bigger issue >>>>>>>>>>>> >>>>>>>>>>>> *Problem* >>>>>>>>>>>> >>>>>>>>>>>> Currently, Flink shades dependencies like ASM and Guava into all >>>>>>>>>>>> >>>>>>>>>>>> jars >>>>>>>>>> of >>>>>>>> projects that reference it and relocate the classes. >>>>>>>>>> There are some drawbacks to that approach, let's discuss them at >>>>>>>>>>> the >>>>>>>>>>> >>>>>>>>>>>> example of ASM: >>>>>>>>>>>> >>>>>>>>>>>> - The ASM classes are for example in flink-core, >>>>>>>>>>>> flink-java, >>>>>>>>>>>> flink-scala, >>>>>>>>>>>> flink-runtime, etc. >>>>>>>>>>>> >>>>>>>>>>>> - Users that reference these dependencies have the classes >>>>>>>>>>>> multiple >>>>>>>>>>>> times >>>>>>>>>>>> in the classpath. That is unclean (works, through, because the >>>>>>>>>>>> >>>>>>>>>>>> classes >>>>>>>>>> are >>>>>>>> identical). The same happens when building the final dist. jar. >>>>>>>>>> - Some of these dependencies require to include license files >>>>>>>>>>>> in >>>>>>>>>>>> the >>>>>>>>>>>> shaded jar. It is hard to impossible to build a good automatic >>>>>>>>>>>> solution >>>>>>>>>>>> for >>>>>>>>>>>> that, partly due to Maven's very poor cross-project path support >>>>>>>>>>>> >>>>>>>>>>>> - Most importantly: Scala does not support shading really >>>>>>>>>>>> well. >>>>>>>>>>>> >>>>>>>>>>>> Scala >>>>>>>>>>>> >>>>>>>>>>> classes have references to classes in more places than just the >>>>>>>>>> class >>>>>>>>>> >>>>>>>>>> names >>>>>>>>>>>> (apparently for Scala reflect support). Referencing a Scala >>>>>>>>>>>> project >>>>>>>>>>>> >>>>>>>>>>>> with >>>>>>>>>>>> >>>>>>>>>>> shaded ASM still requires to add a reference to unshaded ASM (at >>>>>>>>>> least >>>>>>>>>> >>>>>>>>>> as >>>>>>>>>>> a >>>>>>>>>> compile dependency). >>>>>>>>>>>> *Proposal* >>>>>>>>>>>> >>>>>>>>>>>> I propose that we build and deploy a asm-flink-shaded version of >>>>>>>>>>>> ASM >>>>>>>>>>>> >>>>>>>>>>>> and >>>>>>>>>>>> >>>>>>>>>>> directly program against the relocated namespaces. Since we never >>>>>>>>>> use >>>>>>>>>> >>>>>>>>>> classes that we relocate in public interfaces, Flink users will >>>>>>>>>>> never >>>>>>>>>>> >>>>>>>>>> see >>>>>>>> the relocated class names. Internally, it does not hurt to use them. >>>>>>>>>> - Proper maven dependency management, no hidden (shaded) >>>>>>>>>>>> dependencies >>>>>>>>>>>> >>>>>>>>>>> - One copy of each class for shaded dependencies >>>>>>>>>> - Proper Scala interoperability >>>>>>>>>>>> - Natural License management (license is part of deployed >>>>>>>>>>>> asm-flink-shaded jar) >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Happy to hear thoughts! >>>>>>>>>>>> >>>>>>>>>>>> Stephan >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> |
How should it work when one of the shaded dependencies is updated? We
probably do not want to release all of them, just because the overall version number is in their version number. How about that: - Under normal circumstances, we only increase the version of the root project, when we add / bump a shaded dependency version - Shaded dependencies include the version in the same. That way we can possibly have two different versions of a dependency, such as "flink-shaded-kryo-2" and "flink-shaded-kryo-3". - Shaded dependencies should have the version in the relocation pattern as well, for the same reason as above (unless the two versions have separate namespaces already). - The released version of the module and artifact is always "1.0" unless we find that we did shading/relocation errors, in which case we need to re-release that artifact (1.1, 1.2, ...) On Mon, Jun 26, 2017 at 2:23 PM, Chesnay Schepler <[hidden email]> wrote: > the guava version is already included in the version of the flink-shaded > module. > > For example, for the first release of flink-shaded-guava the version would > be 1-18.0. > > 1 is the version of flink-shaded itself, 18.0 is the guava version. > > > On 26.06.2017 14:01, Stephan Ewen wrote: > >> Looks good, thanks Chesnay!. >> >> How about including the dependency version names in the module names, like >> "flink-shaded-guava-18.0"? >> >> On Mon, Jun 26, 2017 at 11:49 AM, Chesnay Schepler <[hidden email]> >> wrote: >> >> The new repo was created and is accessible here: >>> https://github.com/apache/flink-shaded >>> >>> I've already opened a PR to add a shaded guava module. >>> >>> Once the shaded-guava module is merged I would like to do a first release >>> of flink-shaded, >>> only containing guava. I already have a branch with all the changes >>> necessary to >>> integrate this dependency into Flink. >>> >>> The alternative would be to first create shaded modules for all >>> dependencies and make a >>> single release, but I feel that would delay things quite a bit. >>> >>> >>> >>> On 21.06.2017 17:00, Robert Metzger wrote: >>> >>> Okay, I'll request a repo for the shading. >>>> >>>> On Wed, Jun 21, 2017 at 1:38 PM, Chesnay Schepler <[hidden email]> >>>> wrote: >>>> >>>> I like your suggestion Robert. A lot actually. >>>> >>>>> Having the actual dependency version (i.e 18 for guava) in the version >>>>> should improve clarity a lot. >>>>> >>>>> Originally i intended to release 1 artifact per Flink version, with the >>>>> normal versioning scheme >>>>> that we use. But given that the shaded dependencies aren't changed >>>>> often >>>>> (even rarely might be a stretch), >>>>> and aren't actually coupled to the Flink release cycle this doesn't >>>>> make >>>>> a >>>>> lot of sense. >>>>> >>>>> Having separate repos looks like a reasonable separation of concerns. >>>>> The >>>>> release for Flink itself >>>>> would work just like it does now; we don't have to modify any scripts >>>>> or >>>>> do extra steps. >>>>> >>>>> Since the build, release and development process are separate (since >>>>> flink-shaded isn't part of Flink build >>>>> process, has a separate release process and changes to it will /never >>>>> /require immediate changes to Flink) >>>>> it seems like a very good candidate to move it into a separate repo. >>>>> >>>>> >>>>> On 21.06.2017 11:26, Robert Metzger wrote: >>>>> >>>>> Its not completely clear to me how we want to version the shaded >>>>> >>>>>> dependencies, and where we are putting them. >>>>>> >>>>>> One concern are the official apache release rules. If we want to >>>>>> release >>>>>> something to maven central, we need to do a proper vote over a source >>>>>> archive. >>>>>> I would propose to create a new repository "flink-shaded.git" that >>>>>> contains >>>>>> the following maven module structure: >>>>>> - flink-shaded: 1 >>>>>> - flink-shaded-asm: 1-5.2 >>>>>> - flink-shaded-guava: 1-18.0 >>>>>> - ... >>>>>> >>>>>> The number indicates the version (for ASM, I've just guessed). >>>>>> The version for the parent "flink-shaded" needs to be updated on each >>>>>> parent pom change (new module added, new / changed plugins, ...) >>>>>> >>>>>> We could create a separate release script in this repository that >>>>>> creates >>>>>> the flink-shaded-src.zip from the code and deploys the artifacts to >>>>>> the >>>>>> maven staging area. >>>>>> >>>>>> The advantage of a separate repo would be that we don't need to >>>>>> maintain >>>>>> separate maven projects in the same git repo. >>>>>> Also, the src archives for the release vote can be created from the >>>>>> repo >>>>>> content (without much filtering). >>>>>> >>>>>> >>>>>> On Tue, Jun 20, 2017 at 9:44 PM, Stephan Ewen <[hidden email]> >>>>>> wrote: >>>>>> >>>>>> I like this approach. >>>>>> >>>>>> Two additional things can be mention here: >>>>>>> >>>>>>> - We need to deploy these artifacts independently and not as >>>>>>> part >>>>>>> of >>>>>>> the >>>>>>> build. That is a manual step once per "bump" in the dependency of >>>>>>> that >>>>>>> library. >>>>>>> >>>>>>> - We reduce the shading complexity of the original build and >>>>>>> should >>>>>>> thus >>>>>>> also speed up build times :-) >>>>>>> >>>>>>> Stephan >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Tue, Jun 20, 2017 at 1:15 PM, Chesnay Schepler < >>>>>>> [hidden email]> >>>>>>> wrote: >>>>>>> >>>>>>> I would like to start working on this. >>>>>>> >>>>>>> I've looked into adding a flink-shaded-guava module. Working against >>>>>>>> the >>>>>>>> shaded namespaces seems >>>>>>>> to work without problems from the IDE, and we could forbid un-shaded >>>>>>>> usages with checkstyle. >>>>>>>> >>>>>>>> So for the list of dependencies that we want to shade we currently >>>>>>>> got: >>>>>>>> >>>>>>>> * asm >>>>>>>> * guava >>>>>>>> * netty >>>>>>>> * hadoop >>>>>>>> * curator >>>>>>>> >>>>>>>> I've had a chat with Stephan Ewan and he brought up kryo + chill as >>>>>>>> well. >>>>>>>> >>>>>>>> The nice thing is that we can do this incrementally, one dependency >>>>>>>> at a >>>>>>>> time. As such i would propose >>>>>>>> to go through the whole process for guava and see what problems >>>>>>>> arise. >>>>>>>> >>>>>>>> This would include adding a flink-shaded module and a child >>>>>>>> flink-shaded-guava module to the flink repository >>>>>>>> that are not part of the build process, replacing all usages of >>>>>>>> guava >>>>>>>> in >>>>>>>> Flink, adding the >>>>>>>> checkstyle rule (optional) and deploying the artifact to maven >>>>>>>> central. >>>>>>>> >>>>>>>> >>>>>>>> On 11.05.2017 10:54, Stephan Ewen wrote: >>>>>>>> >>>>>>>> @Ufuk - I have never set up artifact deployment in Maven, could >>>>>>>> need >>>>>>>> some >>>>>>>> help there. >>>>>>>> >>>>>>>> Regarding shading Netty, I agree, would be good to do that as >>>>>>>>> well... >>>>>>>>> >>>>>>>>> On Thu, May 11, 2017 at 10:52 AM, Ufuk Celebi <[hidden email]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> The advantages you've listed sound really compelling to me. >>>>>>>>> >>>>>>>>> - Do you have time to implement these changes or do we need a >>>>>>>>> >>>>>>>>>> volunteer? >>>>>>>>>> >>>>>>>>> ;) >>>>>>>> >>>>>>>> - I assume that republishing the artifacts as you propose doesn't >>>>>>>>> >>>>>>>>>> have >>>>>>>>>> any new legal implications since we already publish them with our >>>>>>>>>> JARs, right? >>>>>>>>>> >>>>>>>>>> - We might think about adding Netty to the list of shaded >>>>>>>>>> artifacts >>>>>>>>>> since some dependency conflicts were reported recently. Would have >>>>>>>>>> to >>>>>>>>>> double check the reported issues before doing that though. ;-) >>>>>>>>>> >>>>>>>>>> – Ufuk >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Wed, May 10, 2017 at 8:45 PM, Stephan Ewen <[hidden email]> >>>>>>>>>> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>> @chesnay: I used ASM as an example in the proposal. Maybe I did not >>>>>>>> >>>>>>>> say >>>>>>>>> that clearly. >>>>>>>>> If we like that approach, we should deal with the other libraries >>>>>>>>> (at >>>>>>>>> >>>>>>>>>> least >>>>>>>>>>> >>>>>>>>>>> the frequently used ones) in the same way. >>>>>>>>>> >>>>>>>>>> I would imagine to have a project layout like that: >>>>>>>>>>> >>>>>>>>>>> flink-shaded-deps >>>>>>>>>>> - flink-shaded-asm >>>>>>>>>>> - flink-shaded-guava >>>>>>>>>>> - flink-shaded-curator >>>>>>>>>>> - flink-shaded-hadoop >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> "flink-shaded-deps" would not be built every time (and not be >>>>>>>>>>> released >>>>>>>>>>> every time), but only when needed. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Wed, May 10, 2017 at 7:28 PM, Chesnay Schepler < >>>>>>>>>>> [hidden email] >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> I like the idea, thank you for bringing it up. >>>>>>>>>>> >>>>>>>>>>> Given that the raised problems aren't really ASM specific would >>>>>>>>>>> it >>>>>>>>>>> >>>>>>>>>>>> make >>>>>>>>>>>> >>>>>>>>>>> sense to create one flink-shaded module that contains all >>>>>>>>>> frequently >>>>>>>>>> >>>>>>>>> shaded >>>>>>>>> >>>>>>>>>> libraries? (or maybe even all shaded dependencies by core modules) >>>>>>>>>>> The >>>>>>>>>>> >>>>>>>>>>> proposal limits the scope of this to ASM and i was wondering why. >>>>>>>>>>> >>>>>>>>>>>> I also remember that there was a discussion recently about why >>>>>>>>>>>> we >>>>>>>>>>>> >>>>>>>>>>>> shade >>>>>>>>>>>> >>>>>>>>>>> things at all, and the idea of working against the shaded >>>>>>>>>> namespaces >>>>>>>>>> >>>>>>>>> was >>>>>>>>> >>>>>>>>>> brought up. Back then i was expressing doubts as to whether IDE's >>>>>>>>>>>> >>>>>>>>>>>> would >>>>>>>>>>>> >>>>>>>>>>> properly support this; what's the state on that? >>>>>>>>>> >>>>>>>>> On 10.05.2017 18:18, Stephan Ewen wrote: >>>>>>>>> >>>>>>>>>> Hi! >>>>>>>>>>>> >>>>>>>>>>>> This is a discussion about altering the way we handle >>>>>>>>>>>> dependencies >>>>>>>>>>>> >>>>>>>>>>>>> and >>>>>>>>>>>>> >>>>>>>>>>>> shading in Flink. >>>>>>>>>>> >>>>>>>>>> I ran into quite a view problems trying to adjust / fix some >>>>>>>>> >>>>>>>>>> shading >>>>>>>>>>>>> issues >>>>>>>>>>>>> during release validation. >>>>>>>>>>>>> >>>>>>>>>>>>> The issue is tracked under: https://issues.apache.org/jira >>>>>>>>>>>>> /browse/FLINK-6529 >>>>>>>>>>>>> Bring this discussion thread up because it is a bigger issue >>>>>>>>>>>>> >>>>>>>>>>>>> *Problem* >>>>>>>>>>>>> >>>>>>>>>>>>> Currently, Flink shades dependencies like ASM and Guava into >>>>>>>>>>>>> all >>>>>>>>>>>>> >>>>>>>>>>>>> jars >>>>>>>>>>>>> >>>>>>>>>>>> of >>>>>>>>>>> >>>>>>>>>> projects that reference it and relocate the classes. >>>>>>>>> >>>>>>>>>> There are some drawbacks to that approach, let's discuss them at >>>>>>>>>>> >>>>>>>>>>>> the >>>>>>>>>>>> >>>>>>>>>>>> example of ASM: >>>>>>>>>>>>> >>>>>>>>>>>>> - The ASM classes are for example in flink-core, >>>>>>>>>>>>> flink-java, >>>>>>>>>>>>> flink-scala, >>>>>>>>>>>>> flink-runtime, etc. >>>>>>>>>>>>> >>>>>>>>>>>>> - Users that reference these dependencies have the >>>>>>>>>>>>> classes >>>>>>>>>>>>> multiple >>>>>>>>>>>>> times >>>>>>>>>>>>> in the classpath. That is unclean (works, through, because the >>>>>>>>>>>>> >>>>>>>>>>>>> classes >>>>>>>>>>>>> >>>>>>>>>>>> are >>>>>>>>>>> >>>>>>>>>> identical). The same happens when building the final dist. jar. >>>>>>>>> >>>>>>>>>> - Some of these dependencies require to include license >>>>>>>>>>> files >>>>>>>>>>> >>>>>>>>>>>> in >>>>>>>>>>>>> the >>>>>>>>>>>>> shaded jar. It is hard to impossible to build a good automatic >>>>>>>>>>>>> solution >>>>>>>>>>>>> for >>>>>>>>>>>>> that, partly due to Maven's very poor cross-project path >>>>>>>>>>>>> support >>>>>>>>>>>>> >>>>>>>>>>>>> - Most importantly: Scala does not support shading >>>>>>>>>>>>> really >>>>>>>>>>>>> well. >>>>>>>>>>>>> >>>>>>>>>>>>> Scala >>>>>>>>>>>>> >>>>>>>>>>>>> classes have references to classes in more places than just the >>>>>>>>>>>> >>>>>>>>>>> class >>>>>>>>>>> >>>>>>>>>>> names >>>>>>>>>>> >>>>>>>>>>>> (apparently for Scala reflect support). Referencing a Scala >>>>>>>>>>>>> project >>>>>>>>>>>>> >>>>>>>>>>>>> with >>>>>>>>>>>>> >>>>>>>>>>>>> shaded ASM still requires to add a reference to unshaded ASM >>>>>>>>>>>> (at >>>>>>>>>>>> >>>>>>>>>>> least >>>>>>>>>>> >>>>>>>>>>> as >>>>>>>>>>> >>>>>>>>>>>> a >>>>>>>>>>>> >>>>>>>>>>> compile dependency). >>>>>>>>>>> >>>>>>>>>>>> *Proposal* >>>>>>>>>>>>> >>>>>>>>>>>>> I propose that we build and deploy a asm-flink-shaded version >>>>>>>>>>>>> of >>>>>>>>>>>>> ASM >>>>>>>>>>>>> >>>>>>>>>>>>> and >>>>>>>>>>>>> >>>>>>>>>>>>> directly program against the relocated namespaces. Since we >>>>>>>>>>>> never >>>>>>>>>>>> >>>>>>>>>>> use >>>>>>>>>>> >>>>>>>>>>> classes that we relocate in public interfaces, Flink users will >>>>>>>>>>> >>>>>>>>>>>> never >>>>>>>>>>>> >>>>>>>>>>>> see >>>>>>>>>>> >>>>>>>>>> the relocated class names. Internally, it does not hurt to use >>>>>>>>> them. >>>>>>>>> >>>>>>>>>> - Proper maven dependency management, no hidden (shaded) >>>>>>>>>>> >>>>>>>>>>>> dependencies >>>>>>>>>>>>> >>>>>>>>>>>>> - One copy of each class for shaded dependencies >>>>>>>>>>>> >>>>>>>>>>> - Proper Scala interoperability >>>>>>>>>>> >>>>>>>>>>>> - Natural License management (license is part of deployed >>>>>>>>>>>>> asm-flink-shaded jar) >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Happy to hear thoughts! >>>>>>>>>>>>> >>>>>>>>>>>>> Stephan >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> > |
You're raising good points, now i see why having the version in the name
is useful. I'll adjust my PR accordingly. And yes, ideally we only release the modified modules, not everything again. On 26.06.2017 14:29, Stephan Ewen wrote: > How should it work when one of the shaded dependencies is updated? We > probably do not want to release all of them, just because the overall > version number is in their version number. > > How about that: > > - Under normal circumstances, we only increase the version of the root > project, when we add / bump a shaded dependency version > > - Shaded dependencies include the version in the same. That way we can > possibly have two different versions of a dependency, such as > "flink-shaded-kryo-2" and "flink-shaded-kryo-3". > - Shaded dependencies should have the version in the relocation pattern > as well, for the same reason as above (unless the two versions have > separate namespaces already). > > - The released version of the module and artifact is always "1.0" unless > we find that we did shading/relocation errors, in which case we need to > re-release that artifact (1.1, 1.2, ...) > > > > On Mon, Jun 26, 2017 at 2:23 PM, Chesnay Schepler <[hidden email]> > wrote: > >> the guava version is already included in the version of the flink-shaded >> module. >> >> For example, for the first release of flink-shaded-guava the version would >> be 1-18.0. >> >> 1 is the version of flink-shaded itself, 18.0 is the guava version. >> >> >> On 26.06.2017 14:01, Stephan Ewen wrote: >> >>> Looks good, thanks Chesnay!. >>> >>> How about including the dependency version names in the module names, like >>> "flink-shaded-guava-18.0"? >>> >>> On Mon, Jun 26, 2017 at 11:49 AM, Chesnay Schepler <[hidden email]> >>> wrote: >>> >>> The new repo was created and is accessible here: >>>> https://github.com/apache/flink-shaded >>>> >>>> I've already opened a PR to add a shaded guava module. >>>> >>>> Once the shaded-guava module is merged I would like to do a first release >>>> of flink-shaded, >>>> only containing guava. I already have a branch with all the changes >>>> necessary to >>>> integrate this dependency into Flink. >>>> >>>> The alternative would be to first create shaded modules for all >>>> dependencies and make a >>>> single release, but I feel that would delay things quite a bit. >>>> >>>> >>>> >>>> On 21.06.2017 17:00, Robert Metzger wrote: >>>> >>>> Okay, I'll request a repo for the shading. >>>>> On Wed, Jun 21, 2017 at 1:38 PM, Chesnay Schepler <[hidden email]> >>>>> wrote: >>>>> >>>>> I like your suggestion Robert. A lot actually. >>>>> >>>>>> Having the actual dependency version (i.e 18 for guava) in the version >>>>>> should improve clarity a lot. >>>>>> >>>>>> Originally i intended to release 1 artifact per Flink version, with the >>>>>> normal versioning scheme >>>>>> that we use. But given that the shaded dependencies aren't changed >>>>>> often >>>>>> (even rarely might be a stretch), >>>>>> and aren't actually coupled to the Flink release cycle this doesn't >>>>>> make >>>>>> a >>>>>> lot of sense. >>>>>> >>>>>> Having separate repos looks like a reasonable separation of concerns. >>>>>> The >>>>>> release for Flink itself >>>>>> would work just like it does now; we don't have to modify any scripts >>>>>> or >>>>>> do extra steps. >>>>>> >>>>>> Since the build, release and development process are separate (since >>>>>> flink-shaded isn't part of Flink build >>>>>> process, has a separate release process and changes to it will /never >>>>>> /require immediate changes to Flink) >>>>>> it seems like a very good candidate to move it into a separate repo. >>>>>> >>>>>> >>>>>> On 21.06.2017 11:26, Robert Metzger wrote: >>>>>> >>>>>> Its not completely clear to me how we want to version the shaded >>>>>> >>>>>>> dependencies, and where we are putting them. >>>>>>> >>>>>>> One concern are the official apache release rules. If we want to >>>>>>> release >>>>>>> something to maven central, we need to do a proper vote over a source >>>>>>> archive. >>>>>>> I would propose to create a new repository "flink-shaded.git" that >>>>>>> contains >>>>>>> the following maven module structure: >>>>>>> - flink-shaded: 1 >>>>>>> - flink-shaded-asm: 1-5.2 >>>>>>> - flink-shaded-guava: 1-18.0 >>>>>>> - ... >>>>>>> >>>>>>> The number indicates the version (for ASM, I've just guessed). >>>>>>> The version for the parent "flink-shaded" needs to be updated on each >>>>>>> parent pom change (new module added, new / changed plugins, ...) >>>>>>> >>>>>>> We could create a separate release script in this repository that >>>>>>> creates >>>>>>> the flink-shaded-src.zip from the code and deploys the artifacts to >>>>>>> the >>>>>>> maven staging area. >>>>>>> >>>>>>> The advantage of a separate repo would be that we don't need to >>>>>>> maintain >>>>>>> separate maven projects in the same git repo. >>>>>>> Also, the src archives for the release vote can be created from the >>>>>>> repo >>>>>>> content (without much filtering). >>>>>>> >>>>>>> >>>>>>> On Tue, Jun 20, 2017 at 9:44 PM, Stephan Ewen <[hidden email]> >>>>>>> wrote: >>>>>>> >>>>>>> I like this approach. >>>>>>> >>>>>>> Two additional things can be mention here: >>>>>>>> - We need to deploy these artifacts independently and not as >>>>>>>> part >>>>>>>> of >>>>>>>> the >>>>>>>> build. That is a manual step once per "bump" in the dependency of >>>>>>>> that >>>>>>>> library. >>>>>>>> >>>>>>>> - We reduce the shading complexity of the original build and >>>>>>>> should >>>>>>>> thus >>>>>>>> also speed up build times :-) >>>>>>>> >>>>>>>> Stephan >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Tue, Jun 20, 2017 at 1:15 PM, Chesnay Schepler < >>>>>>>> [hidden email]> >>>>>>>> wrote: >>>>>>>> >>>>>>>> I would like to start working on this. >>>>>>>> >>>>>>>> I've looked into adding a flink-shaded-guava module. Working against >>>>>>>>> the >>>>>>>>> shaded namespaces seems >>>>>>>>> to work without problems from the IDE, and we could forbid un-shaded >>>>>>>>> usages with checkstyle. >>>>>>>>> >>>>>>>>> So for the list of dependencies that we want to shade we currently >>>>>>>>> got: >>>>>>>>> >>>>>>>>> * asm >>>>>>>>> * guava >>>>>>>>> * netty >>>>>>>>> * hadoop >>>>>>>>> * curator >>>>>>>>> >>>>>>>>> I've had a chat with Stephan Ewan and he brought up kryo + chill as >>>>>>>>> well. >>>>>>>>> >>>>>>>>> The nice thing is that we can do this incrementally, one dependency >>>>>>>>> at a >>>>>>>>> time. As such i would propose >>>>>>>>> to go through the whole process for guava and see what problems >>>>>>>>> arise. >>>>>>>>> >>>>>>>>> This would include adding a flink-shaded module and a child >>>>>>>>> flink-shaded-guava module to the flink repository >>>>>>>>> that are not part of the build process, replacing all usages of >>>>>>>>> guava >>>>>>>>> in >>>>>>>>> Flink, adding the >>>>>>>>> checkstyle rule (optional) and deploying the artifact to maven >>>>>>>>> central. >>>>>>>>> >>>>>>>>> >>>>>>>>> On 11.05.2017 10:54, Stephan Ewen wrote: >>>>>>>>> >>>>>>>>> @Ufuk - I have never set up artifact deployment in Maven, could >>>>>>>>> need >>>>>>>>> some >>>>>>>>> help there. >>>>>>>>> >>>>>>>>> Regarding shading Netty, I agree, would be good to do that as >>>>>>>>>> well... >>>>>>>>>> >>>>>>>>>> On Thu, May 11, 2017 at 10:52 AM, Ufuk Celebi <[hidden email]> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> The advantages you've listed sound really compelling to me. >>>>>>>>>> >>>>>>>>>> - Do you have time to implement these changes or do we need a >>>>>>>>>> >>>>>>>>>>> volunteer? >>>>>>>>>>> >>>>>>>>>> ;) >>>>>>>>> - I assume that republishing the artifacts as you propose doesn't >>>>>>>>>>> have >>>>>>>>>>> any new legal implications since we already publish them with our >>>>>>>>>>> JARs, right? >>>>>>>>>>> >>>>>>>>>>> - We might think about adding Netty to the list of shaded >>>>>>>>>>> artifacts >>>>>>>>>>> since some dependency conflicts were reported recently. Would have >>>>>>>>>>> to >>>>>>>>>>> double check the reported issues before doing that though. ;-) >>>>>>>>>>> >>>>>>>>>>> – Ufuk >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Wed, May 10, 2017 at 8:45 PM, Stephan Ewen <[hidden email]> >>>>>>>>>>> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>> @chesnay: I used ASM as an example in the proposal. Maybe I did not >>>>>>>>> say >>>>>>>>>> that clearly. >>>>>>>>>> If we like that approach, we should deal with the other libraries >>>>>>>>>> (at >>>>>>>>>> >>>>>>>>>>> least >>>>>>>>>>>> the frequently used ones) in the same way. >>>>>>>>>>> I would imagine to have a project layout like that: >>>>>>>>>>>> flink-shaded-deps >>>>>>>>>>>> - flink-shaded-asm >>>>>>>>>>>> - flink-shaded-guava >>>>>>>>>>>> - flink-shaded-curator >>>>>>>>>>>> - flink-shaded-hadoop >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> "flink-shaded-deps" would not be built every time (and not be >>>>>>>>>>>> released >>>>>>>>>>>> every time), but only when needed. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Wed, May 10, 2017 at 7:28 PM, Chesnay Schepler < >>>>>>>>>>>> [hidden email] >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> I like the idea, thank you for bringing it up. >>>>>>>>>>>> >>>>>>>>>>>> Given that the raised problems aren't really ASM specific would >>>>>>>>>>>> it >>>>>>>>>>>> >>>>>>>>>>>>> make >>>>>>>>>>>>> >>>>>>>>>>>> sense to create one flink-shaded module that contains all >>>>>>>>>>> frequently >>>>>>>>>>> >>>>>>>>>> shaded >>>>>>>>>> >>>>>>>>>>> libraries? (or maybe even all shaded dependencies by core modules) >>>>>>>>>>>> The >>>>>>>>>>>> >>>>>>>>>>>> proposal limits the scope of this to ASM and i was wondering why. >>>>>>>>>>>> >>>>>>>>>>>>> I also remember that there was a discussion recently about why >>>>>>>>>>>>> we >>>>>>>>>>>>> >>>>>>>>>>>>> shade >>>>>>>>>>>>> >>>>>>>>>>>> things at all, and the idea of working against the shaded >>>>>>>>>>> namespaces >>>>>>>>>>> >>>>>>>>>> was >>>>>>>>>> >>>>>>>>>>> brought up. Back then i was expressing doubts as to whether IDE's >>>>>>>>>>>>> would >>>>>>>>>>>>> >>>>>>>>>>>> properly support this; what's the state on that? >>>>>>>>>> On 10.05.2017 18:18, Stephan Ewen wrote: >>>>>>>>>> >>>>>>>>>>> Hi! >>>>>>>>>>>>> This is a discussion about altering the way we handle >>>>>>>>>>>>> dependencies >>>>>>>>>>>>> >>>>>>>>>>>>>> and >>>>>>>>>>>>>> >>>>>>>>>>>>> shading in Flink. >>>>>>>>>>> I ran into quite a view problems trying to adjust / fix some >>>>>>>>>>> shading >>>>>>>>>>>>>> issues >>>>>>>>>>>>>> during release validation. >>>>>>>>>>>>>> >>>>>>>>>>>>>> The issue is tracked under: https://issues.apache.org/jira >>>>>>>>>>>>>> /browse/FLINK-6529 >>>>>>>>>>>>>> Bring this discussion thread up because it is a bigger issue >>>>>>>>>>>>>> >>>>>>>>>>>>>> *Problem* >>>>>>>>>>>>>> >>>>>>>>>>>>>> Currently, Flink shades dependencies like ASM and Guava into >>>>>>>>>>>>>> all >>>>>>>>>>>>>> >>>>>>>>>>>>>> jars >>>>>>>>>>>>>> >>>>>>>>>>>>> of >>>>>>>>>>> projects that reference it and relocate the classes. >>>>>>>>>>> There are some drawbacks to that approach, let's discuss them at >>>>>>>>>>>>> the >>>>>>>>>>>>> >>>>>>>>>>>>> example of ASM: >>>>>>>>>>>>>> - The ASM classes are for example in flink-core, >>>>>>>>>>>>>> flink-java, >>>>>>>>>>>>>> flink-scala, >>>>>>>>>>>>>> flink-runtime, etc. >>>>>>>>>>>>>> >>>>>>>>>>>>>> - Users that reference these dependencies have the >>>>>>>>>>>>>> classes >>>>>>>>>>>>>> multiple >>>>>>>>>>>>>> times >>>>>>>>>>>>>> in the classpath. That is unclean (works, through, because the >>>>>>>>>>>>>> >>>>>>>>>>>>>> classes >>>>>>>>>>>>>> >>>>>>>>>>>>> are >>>>>>>>>>> identical). The same happens when building the final dist. jar. >>>>>>>>>>> - Some of these dependencies require to include license >>>>>>>>>>>> files >>>>>>>>>>>> >>>>>>>>>>>>> in >>>>>>>>>>>>>> the >>>>>>>>>>>>>> shaded jar. It is hard to impossible to build a good automatic >>>>>>>>>>>>>> solution >>>>>>>>>>>>>> for >>>>>>>>>>>>>> that, partly due to Maven's very poor cross-project path >>>>>>>>>>>>>> support >>>>>>>>>>>>>> >>>>>>>>>>>>>> - Most importantly: Scala does not support shading >>>>>>>>>>>>>> really >>>>>>>>>>>>>> well. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Scala >>>>>>>>>>>>>> >>>>>>>>>>>>>> classes have references to classes in more places than just the >>>>>>>>>>>> class >>>>>>>>>>>> >>>>>>>>>>>> names >>>>>>>>>>>> >>>>>>>>>>>>> (apparently for Scala reflect support). Referencing a Scala >>>>>>>>>>>>>> project >>>>>>>>>>>>>> >>>>>>>>>>>>>> with >>>>>>>>>>>>>> >>>>>>>>>>>>>> shaded ASM still requires to add a reference to unshaded ASM >>>>>>>>>>>>> (at >>>>>>>>>>>>> >>>>>>>>>>>> least >>>>>>>>>>>> >>>>>>>>>>>> as >>>>>>>>>>>> >>>>>>>>>>>>> a >>>>>>>>>>>>> >>>>>>>>>>>> compile dependency). >>>>>>>>>>>> >>>>>>>>>>>>> *Proposal* >>>>>>>>>>>>>> I propose that we build and deploy a asm-flink-shaded version >>>>>>>>>>>>>> of >>>>>>>>>>>>>> ASM >>>>>>>>>>>>>> >>>>>>>>>>>>>> and >>>>>>>>>>>>>> >>>>>>>>>>>>>> directly program against the relocated namespaces. Since we >>>>>>>>>>>>> never >>>>>>>>>>>>> >>>>>>>>>>>> use >>>>>>>>>>>> >>>>>>>>>>>> classes that we relocate in public interfaces, Flink users will >>>>>>>>>>>> >>>>>>>>>>>>> never >>>>>>>>>>>>> >>>>>>>>>>>>> see >>>>>>>>>>> the relocated class names. Internally, it does not hurt to use >>>>>>>>>> them. >>>>>>>>>> >>>>>>>>>>> - Proper maven dependency management, no hidden (shaded) >>>>>>>>>>>>> dependencies >>>>>>>>>>>>>> - One copy of each class for shaded dependencies >>>>>>>>>>>> - Proper Scala interoperability >>>>>>>>>>>> >>>>>>>>>>>>> - Natural License management (license is part of deployed >>>>>>>>>>>>>> asm-flink-shaded jar) >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Happy to hear thoughts! >>>>>>>>>>>>>> >>>>>>>>>>>>>> Stephan >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> |
Free forum by Nabble | Edit this page |