Hi,
I have come across an issue with classloading in Flink's MiniCluster. The issue is that when I run local flink job from a thread, that has a non-default context classloader (for whatever reason), this classloader is not taken into account when classloading user defined functions. This is due to [1]. Is this behavior intentional, or can I file a JIRA and use Thread.currentThread.getContextClassLoader() there? I have validated that it fixes issues I'm facing. Jan [1] https://github.com/apache/flink/blob/ce557839d762b5f1ec92aa1885fd3d2ae33d0d0b/flink-runtime/src/main/java/org/apache/flink/runtime/execution/librarycache/BlobLibraryCacheManager.java#L280 |
Hi Jan,
this looks to me like a bug for which you could create a JIRA and PR to fix it. Just to make sure, I've pulled in Aljoscha who is the author of this change to check with him whether we are forgetting something. Cheers, Till On Fri, Aug 30, 2019 at 3:44 PM Jan Lukavský <[hidden email]> wrote: > Hi, > > I have come across an issue with classloading in Flink's MiniCluster. > The issue is that when I run local flink job from a thread, that has a > non-default context classloader (for whatever reason), this classloader > is not taken into account when classloading user defined functions. This > is due to [1]. Is this behavior intentional, or can I file a JIRA and > use Thread.currentThread.getContextClassLoader() there? I have validated > that it fixes issues I'm facing. > > Jan > > [1] > > https://github.com/apache/flink/blob/ce557839d762b5f1ec92aa1885fd3d2ae33d0d0b/flink-runtime/src/main/java/org/apache/flink/runtime/execution/librarycache/BlobLibraryCacheManager.java#L280 > > |
Hi,
I actually don’t know whether that change would be ok. FlinkUserCodeClassLoader has taken FlinkUserCodeClassLoader.class.getClassLoader() as the parent ClassLoader before my change. See: https://github.com/apache/flink/blob/release-1.2/flink-runtime/src/main/java/org/apache/flink/runtime/execution/librarycache/FlinkUserCodeClassLoader.java <https://github.com/apache/flink/blob/release-1.2/flink-runtime/src/main/java/org/apache/flink/runtime/execution/librarycache/FlinkUserCodeClassLoader.java>. I have the feeling that this might be on purpose because we want to have the ClassLoader of the Flink Framework components be the parent ClassLoader, but I could be wrong. Maybe Stephan would be most appropriate for answering this. Best, Aljoscha > On 30. Aug 2019, at 16:28, Till Rohrmann <[hidden email]> wrote: > > Hi Jan, > > this looks to me like a bug for which you could create a JIRA and PR to fix it. Just to make sure, I've pulled in Aljoscha who is the author of this change to check with him whether we are forgetting something. > > Cheers, > Till > > On Fri, Aug 30, 2019 at 3:44 PM Jan Lukavský <[hidden email] <mailto:[hidden email]>> wrote: > Hi, > > I have come across an issue with classloading in Flink's MiniCluster. > The issue is that when I run local flink job from a thread, that has a > non-default context classloader (for whatever reason), this classloader > is not taken into account when classloading user defined functions. This > is due to [1]. Is this behavior intentional, or can I file a JIRA and > use Thread.currentThread.getContextClassLoader() there? I have validated > that it fixes issues I'm facing. > > Jan > > [1] > https://github.com/apache/flink/blob/ce557839d762b5f1ec92aa1885fd3d2ae33d0d0b/flink-runtime/src/main/java/org/apache/flink/runtime/execution/librarycache/BlobLibraryCacheManager.java#L280 <https://github.com/apache/flink/blob/ce557839d762b5f1ec92aa1885fd3d2ae33d0d0b/flink-runtime/src/main/java/org/apache/flink/runtime/execution/librarycache/BlobLibraryCacheManager.java#L280> > |
Essentially, the class loader of Flink should be present in parent
hierarchy of context class loader. If FlinkUserCodeClassLoader doesn't use context class loader, then it is actually impossible to use a hierarchy like this: system class loader -> application class loader -> user-defined class loader (defines some UDFs to be used in flink program) Flink now uses the application class loader, and so the UDFs fail to deserialize on local flink, because the user-defined class loader is bypassed. Moreover, there is no way to add additional classpath elements for LocalEnvironment (as opposed to RemoteEnvironment). I'm able to hack this by calling addURL method on the application class loader (which is terribly hackish), but that works only on JDK <= 8. No sensible workaround is available for JDK >= 9. Alternative solution would be to enable adding jars to class loader when using LocalEnvironment, but that looks a little odd. Jan On 9/2/19 11:02 AM, Aljoscha Krettek wrote: > Hi, > > I actually don’t know whether that change would be ok. FlinkUserCodeClassLoader has taken FlinkUserCodeClassLoader.class.getClassLoader() as the parent ClassLoader before my change. See: https://github.com/apache/flink/blob/release-1.2/flink-runtime/src/main/java/org/apache/flink/runtime/execution/librarycache/FlinkUserCodeClassLoader.java <https://github.com/apache/flink/blob/release-1.2/flink-runtime/src/main/java/org/apache/flink/runtime/execution/librarycache/FlinkUserCodeClassLoader.java>. > > I have the feeling that this might be on purpose because we want to have the ClassLoader of the Flink Framework components be the parent ClassLoader, but I could be wrong. Maybe Stephan would be most appropriate for answering this. > > Best, > Aljoscha > >> On 30. Aug 2019, at 16:28, Till Rohrmann <[hidden email]> wrote: >> >> Hi Jan, >> >> this looks to me like a bug for which you could create a JIRA and PR to fix it. Just to make sure, I've pulled in Aljoscha who is the author of this change to check with him whether we are forgetting something. >> >> Cheers, >> Till >> >> On Fri, Aug 30, 2019 at 3:44 PM Jan Lukavský <[hidden email] <mailto:[hidden email]>> wrote: >> Hi, >> >> I have come across an issue with classloading in Flink's MiniCluster. >> The issue is that when I run local flink job from a thread, that has a >> non-default context classloader (for whatever reason), this classloader >> is not taken into account when classloading user defined functions. This >> is due to [1]. Is this behavior intentional, or can I file a JIRA and >> use Thread.currentThread.getContextClassLoader() there? I have validated >> that it fixes issues I'm facing. >> >> Jan >> >> [1] >> https://github.com/apache/flink/blob/ce557839d762b5f1ec92aa1885fd3d2ae33d0d0b/flink-runtime/src/main/java/org/apache/flink/runtime/execution/librarycache/BlobLibraryCacheManager.java#L280 <https://github.com/apache/flink/blob/ce557839d762b5f1ec92aa1885fd3d2ae33d0d0b/flink-runtime/src/main/java/org/apache/flink/runtime/execution/librarycache/BlobLibraryCacheManager.java#L280> >> > |
I’m not saying we can’t change that code to use the context class loader. I’m just not sure whether this might break other things.
Best, Aljoscha > On 2. Sep 2019, at 11:24, Jan Lukavský <[hidden email]> wrote: > > Essentially, the class loader of Flink should be present in parent hierarchy of context class loader. If FlinkUserCodeClassLoader doesn't use context class loader, then it is actually impossible to use a hierarchy like this: > > system class loader -> application class loader -> user-defined class loader (defines some UDFs to be used in flink program) > > Flink now uses the application class loader, and so the UDFs fail to deserialize on local flink, because the user-defined class loader is bypassed. Moreover, there is no way to add additional classpath elements for LocalEnvironment (as opposed to RemoteEnvironment). I'm able to hack this by calling addURL method on the application class loader (which is terribly hackish), but that works only on JDK <= 8. No sensible workaround is available for JDK >= 9. > > Alternative solution would be to enable adding jars to class loader when using LocalEnvironment, but that looks a little odd. > > Jan > > On 9/2/19 11:02 AM, Aljoscha Krettek wrote: >> Hi, >> >> I actually don’t know whether that change would be ok. FlinkUserCodeClassLoader has taken FlinkUserCodeClassLoader.class.getClassLoader() as the parent ClassLoader before my change. See: https://github.com/apache/flink/blob/release-1.2/flink-runtime/src/main/java/org/apache/flink/runtime/execution/librarycache/FlinkUserCodeClassLoader.java <https://github.com/apache/flink/blob/release-1.2/flink-runtime/src/main/java/org/apache/flink/runtime/execution/librarycache/FlinkUserCodeClassLoader.java>. >> >> I have the feeling that this might be on purpose because we want to have the ClassLoader of the Flink Framework components be the parent ClassLoader, but I could be wrong. Maybe Stephan would be most appropriate for answering this. >> >> Best, >> Aljoscha >> >>> On 30. Aug 2019, at 16:28, Till Rohrmann <[hidden email]> wrote: >>> >>> Hi Jan, >>> >>> this looks to me like a bug for which you could create a JIRA and PR to fix it. Just to make sure, I've pulled in Aljoscha who is the author of this change to check with him whether we are forgetting something. >>> >>> Cheers, >>> Till >>> >>> On Fri, Aug 30, 2019 at 3:44 PM Jan Lukavský <[hidden email] <mailto:[hidden email]>> wrote: >>> Hi, >>> >>> I have come across an issue with classloading in Flink's MiniCluster. >>> The issue is that when I run local flink job from a thread, that has a >>> non-default context classloader (for whatever reason), this classloader >>> is not taken into account when classloading user defined functions. This >>> is due to [1]. Is this behavior intentional, or can I file a JIRA and >>> use Thread.currentThread.getContextClassLoader() there? I have validated >>> that it fixes issues I'm facing. >>> >>> Jan >>> >>> [1] >>> https://github.com/apache/flink/blob/ce557839d762b5f1ec92aa1885fd3d2ae33d0d0b/flink-runtime/src/main/java/org/apache/flink/runtime/execution/librarycache/BlobLibraryCacheManager.java#L280 <https://github.com/apache/flink/blob/ce557839d762b5f1ec92aa1885fd3d2ae33d0d0b/flink-runtime/src/main/java/org/apache/flink/runtime/execution/librarycache/BlobLibraryCacheManager.java#L280> >>> >> |
Hi Jan,
I've talked with Aljoscha and Stephan offline and we concluded that we would like to avoid the usage of context class loaders if possible. The reason for this is that using the context class loader can easily mess up an otherwise clear class loader hierarchy which makes it hard to reason about and to debug class loader issues. Given this, I think it would help to better understand the exact problem you are trying to solve by using the context class loader. Usually the usage of the context class loader points towards an API deficiency which we might be able to address differently. Cheers, Till On Mon, Sep 2, 2019 at 11:32 AM Aljoscha Krettek <[hidden email]> wrote: > I’m not saying we can’t change that code to use the context class loader. > I’m just not sure whether this might break other things. > > Best, > Aljoscha > > > On 2. Sep 2019, at 11:24, Jan Lukavský <[hidden email]> wrote: > > > > Essentially, the class loader of Flink should be present in parent > hierarchy of context class loader. If FlinkUserCodeClassLoader doesn't use > context class loader, then it is actually impossible to use a hierarchy > like this: > > > > system class loader -> application class loader -> user-defined class > loader (defines some UDFs to be used in flink program) > > > > Flink now uses the application class loader, and so the UDFs fail to > deserialize on local flink, because the user-defined class loader is > bypassed. Moreover, there is no way to add additional classpath elements > for LocalEnvironment (as opposed to RemoteEnvironment). I'm able to hack > this by calling addURL method on the application class loader (which is > terribly hackish), but that works only on JDK <= 8. No sensible workaround > is available for JDK >= 9. > > > > Alternative solution would be to enable adding jars to class loader when > using LocalEnvironment, but that looks a little odd. > > > > Jan > > > > On 9/2/19 11:02 AM, Aljoscha Krettek wrote: > >> Hi, > >> > >> I actually don’t know whether that change would be ok. > FlinkUserCodeClassLoader has taken > FlinkUserCodeClassLoader.class.getClassLoader() as the parent ClassLoader > before my change. See: > https://github.com/apache/flink/blob/release-1.2/flink-runtime/src/main/java/org/apache/flink/runtime/execution/librarycache/FlinkUserCodeClassLoader.java > < > https://github.com/apache/flink/blob/release-1.2/flink-runtime/src/main/java/org/apache/flink/runtime/execution/librarycache/FlinkUserCodeClassLoader.java > >. > >> > >> I have the feeling that this might be on purpose because we want to > have the ClassLoader of the Flink Framework components be the parent > ClassLoader, but I could be wrong. Maybe Stephan would be most appropriate > for answering this. > >> > >> Best, > >> Aljoscha > >> > >>> On 30. Aug 2019, at 16:28, Till Rohrmann <[hidden email]> wrote: > >>> > >>> Hi Jan, > >>> > >>> this looks to me like a bug for which you could create a JIRA and PR > to fix it. Just to make sure, I've pulled in Aljoscha who is the author of > this change to check with him whether we are forgetting something. > >>> > >>> Cheers, > >>> Till > >>> > >>> On Fri, Aug 30, 2019 at 3:44 PM Jan Lukavský <[hidden email] <mailto: > [hidden email]>> wrote: > >>> Hi, > >>> > >>> I have come across an issue with classloading in Flink's MiniCluster. > >>> The issue is that when I run local flink job from a thread, that has a > >>> non-default context classloader (for whatever reason), this classloader > >>> is not taken into account when classloading user defined functions. > This > >>> is due to [1]. Is this behavior intentional, or can I file a JIRA and > >>> use Thread.currentThread.getContextClassLoader() there? I have > validated > >>> that it fixes issues I'm facing. > >>> > >>> Jan > >>> > >>> [1] > >>> > https://github.com/apache/flink/blob/ce557839d762b5f1ec92aa1885fd3d2ae33d0d0b/flink-runtime/src/main/java/org/apache/flink/runtime/execution/librarycache/BlobLibraryCacheManager.java#L280 > < > https://github.com/apache/flink/blob/ce557839d762b5f1ec92aa1885fd3d2ae33d0d0b/flink-runtime/src/main/java/org/apache/flink/runtime/execution/librarycache/BlobLibraryCacheManager.java#L280 > > > >>> > >> > > |
Hi Till,
the use-case is pretty much simple - I have a REPL shell in groovy, which generates classes at runtime. The actual hierarchy is therefore system class loader -> application classloader -> repl classloader (GroovyClassLoader actually) now, when a terminal (sink) operation in the shell occurs, I'm able to build a jar, which I can submit to remote cluster (in distributed case). But - during testing - I run the code using local flink. There is no (legal) way of adding this new runtime generated jar to local flink. As I said, I have a hackish solution which works on JDK <= 8, because it uses reflection to call addURL on the application classloader (and thefore "pretends", that the generated jar was there all the time from the JVM startup). This breaks on JDK >= 9. It might be possible to work around this somehow, but I think that the reason why LocalEnvironment is not having a way to add jars (as in case of RemoteEnvironment) is that is assumes, that you actually have all of the on classpath when using local runner. I think that this implies that it either has to use context classloader (to be able to work on top of any classloading user might have), or is wrong and would need be fixed, so that LocalEnvironment would accept files to "stage" - which would mean adding them to a class loader (probably URLClassLoader with the application class loader as parent). Or, would you see any other option? Jan On 9/3/19 2:00 PM, Till Rohrmann wrote: > Hi Jan, > > I've talked with Aljoscha and Stephan offline and we concluded that we > would like to avoid the usage of context class loaders if possible. The > reason for this is that using the context class loader can easily mess up > an otherwise clear class loader hierarchy which makes it hard to reason > about and to debug class loader issues. > > Given this, I think it would help to better understand the exact problem > you are trying to solve by using the context class loader. Usually the > usage of the context class loader points towards an API deficiency which we > might be able to address differently. > > Cheers, > Till > > On Mon, Sep 2, 2019 at 11:32 AM Aljoscha Krettek <[hidden email]> > wrote: > >> I’m not saying we can’t change that code to use the context class loader. >> I’m just not sure whether this might break other things. >> >> Best, >> Aljoscha >> >>> On 2. Sep 2019, at 11:24, Jan Lukavský <[hidden email]> wrote: >>> >>> Essentially, the class loader of Flink should be present in parent >> hierarchy of context class loader. If FlinkUserCodeClassLoader doesn't use >> context class loader, then it is actually impossible to use a hierarchy >> like this: >>> system class loader -> application class loader -> user-defined class >> loader (defines some UDFs to be used in flink program) >>> Flink now uses the application class loader, and so the UDFs fail to >> deserialize on local flink, because the user-defined class loader is >> bypassed. Moreover, there is no way to add additional classpath elements >> for LocalEnvironment (as opposed to RemoteEnvironment). I'm able to hack >> this by calling addURL method on the application class loader (which is >> terribly hackish), but that works only on JDK <= 8. No sensible workaround >> is available for JDK >= 9. >>> Alternative solution would be to enable adding jars to class loader when >> using LocalEnvironment, but that looks a little odd. >>> Jan >>> >>> On 9/2/19 11:02 AM, Aljoscha Krettek wrote: >>>> Hi, >>>> >>>> I actually don’t know whether that change would be ok. >> FlinkUserCodeClassLoader has taken >> FlinkUserCodeClassLoader.class.getClassLoader() as the parent ClassLoader >> before my change. See: >> https://github.com/apache/flink/blob/release-1.2/flink-runtime/src/main/java/org/apache/flink/runtime/execution/librarycache/FlinkUserCodeClassLoader.java >> < >> https://github.com/apache/flink/blob/release-1.2/flink-runtime/src/main/java/org/apache/flink/runtime/execution/librarycache/FlinkUserCodeClassLoader.java >>> . >>>> I have the feeling that this might be on purpose because we want to >> have the ClassLoader of the Flink Framework components be the parent >> ClassLoader, but I could be wrong. Maybe Stephan would be most appropriate >> for answering this. >>>> Best, >>>> Aljoscha >>>> >>>>> On 30. Aug 2019, at 16:28, Till Rohrmann <[hidden email]> wrote: >>>>> >>>>> Hi Jan, >>>>> >>>>> this looks to me like a bug for which you could create a JIRA and PR >> to fix it. Just to make sure, I've pulled in Aljoscha who is the author of >> this change to check with him whether we are forgetting something. >>>>> Cheers, >>>>> Till >>>>> >>>>> On Fri, Aug 30, 2019 at 3:44 PM Jan Lukavský <[hidden email] <mailto: >> [hidden email]>> wrote: >>>>> Hi, >>>>> >>>>> I have come across an issue with classloading in Flink's MiniCluster. >>>>> The issue is that when I run local flink job from a thread, that has a >>>>> non-default context classloader (for whatever reason), this classloader >>>>> is not taken into account when classloading user defined functions. >> This >>>>> is due to [1]. Is this behavior intentional, or can I file a JIRA and >>>>> use Thread.currentThread.getContextClassLoader() there? I have >> validated >>>>> that it fixes issues I'm facing. >>>>> >>>>> Jan >>>>> >>>>> [1] >>>>> >> https://github.com/apache/flink/blob/ce557839d762b5f1ec92aa1885fd3d2ae33d0d0b/flink-runtime/src/main/java/org/apache/flink/runtime/execution/librarycache/BlobLibraryCacheManager.java#L280 >> < >> https://github.com/apache/flink/blob/ce557839d762b5f1ec92aa1885fd3d2ae33d0d0b/flink-runtime/src/main/java/org/apache/flink/runtime/execution/librarycache/BlobLibraryCacheManager.java#L280 >> |
[hidden email] From: Jan Lukavský Date: 2019-09-03 20:17 To: dev Subject: Re: ClassLoader created by BlobLibraryCacheManager is not using context classloader Hi Till, the use-case is pretty much simple - I have a REPL shell in groovy, which generates classes at runtime. The actual hierarchy is therefore system class loader -> application classloader -> repl classloader (GroovyClassLoader actually) now, when a terminal (sink) operation in the shell occurs, I'm able to build a jar, which I can submit to remote cluster (in distributed case). But - during testing - I run the code using local flink. There is no (legal) way of adding this new runtime generated jar to local flink. As I said, I have a hackish solution which works on JDK <= 8, because it uses reflection to call addURL on the application classloader (and thefore "pretends", that the generated jar was there all the time from the JVM startup). This breaks on JDK >= 9. It might be possible to work around this somehow, but I think that the reason why LocalEnvironment is not having a way to add jars (as in case of RemoteEnvironment) is that is assumes, that you actually have all of the on classpath when using local runner. I think that this implies that it either has to use context classloader (to be able to work on top of any classloading user might have), or is wrong and would need be fixed, so that LocalEnvironment would accept files to "stage" - which would mean adding them to a class loader (probably URLClassLoader with the application class loader as parent). Or, would you see any other option? Jan On 9/3/19 2:00 PM, Till Rohrmann wrote: > Hi Jan, > > I've talked with Aljoscha and Stephan offline and we concluded that we > would like to avoid the usage of context class loaders if possible. The > reason for this is that using the context class loader can easily mess up > an otherwise clear class loader hierarchy which makes it hard to reason > about and to debug class loader issues. > > Given this, I think it would help to better understand the exact problem > you are trying to solve by using the context class loader. Usually the > usage of the context class loader points towards an API deficiency which we > might be able to address differently. > > Cheers, > Till > > On Mon, Sep 2, 2019 at 11:32 AM Aljoscha Krettek <[hidden email]> > wrote: > >> I’m not saying we can’t change that code to use the context class loader. >> I’m just not sure whether this might break other things. >> >> Best, >> Aljoscha >> >>> On 2. Sep 2019, at 11:24, Jan Lukavský <[hidden email]> wrote: >>> >>> Essentially, the class loader of Flink should be present in parent >> hierarchy of context class loader. If FlinkUserCodeClassLoader doesn't use >> context class loader, then it is actually impossible to use a hierarchy >> like this: >>> system class loader -> application class loader -> user-defined class >> loader (defines some UDFs to be used in flink program) >>> Flink now uses the application class loader, and so the UDFs fail to >> deserialize on local flink, because the user-defined class loader is >> bypassed. Moreover, there is no way to add additional classpath elements >> for LocalEnvironment (as opposed to RemoteEnvironment). I'm able to hack >> this by calling addURL method on the application class loader (which is >> terribly hackish), but that works only on JDK <= 8. No sensible workaround >> is available for JDK >= 9. >>> Alternative solution would be to enable adding jars to class loader when >> using LocalEnvironment, but that looks a little odd. >>> Jan >>> >>> On 9/2/19 11:02 AM, Aljoscha Krettek wrote: >>>> Hi, >>>> >>>> I actually don’t know whether that change would be ok. >> FlinkUserCodeClassLoader has taken >> FlinkUserCodeClassLoader.class.getClassLoader() as the parent ClassLoader >> before my change. See: >> https://github.com/apache/flink/blob/release-1.2/flink-runtime/src/main/java/org/apache/flink/runtime/execution/librarycache/FlinkUserCodeClassLoader.java >> < >> https://github.com/apache/flink/blob/release-1.2/flink-runtime/src/main/java/org/apache/flink/runtime/execution/librarycache/FlinkUserCodeClassLoader.java >>> . >>>> I have the feeling that this might be on purpose because we want to >> have the ClassLoader of the Flink Framework components be the parent >> ClassLoader, but I could be wrong. Maybe Stephan would be most appropriate >> for answering this. >>>> Best, >>>> Aljoscha >>>> >>>>> On 30. Aug 2019, at 16:28, Till Rohrmann <[hidden email]> wrote: >>>>> >>>>> Hi Jan, >>>>> >>>>> this looks to me like a bug for which you could create a JIRA and PR >> to fix it. Just to make sure, I've pulled in Aljoscha who is the author of >> this change to check with him whether we are forgetting something. >>>>> Cheers, >>>>> Till >>>>> >>>>> On Fri, Aug 30, 2019 at 3:44 PM Jan Lukavský <[hidden email] <mailto: >> [hidden email]>> wrote: >>>>> Hi, >>>>> >>>>> I have come across an issue with classloading in Flink's MiniCluster. >>>>> The issue is that when I run local flink job from a thread, that has a >>>>> non-default context classloader (for whatever reason), this classloader >>>>> is not taken into account when classloading user defined functions. >> This >>>>> is due to [1]. Is this behavior intentional, or can I file a JIRA and >>>>> use Thread.currentThread.getContextClassLoader() there? I have >> validated >>>>> that it fixes issues I'm facing. >>>>> >>>>> Jan >>>>> >>>>> [1] >>>>> >> https://github.com/apache/flink/blob/ce557839d762b5f1ec92aa1885fd3d2ae33d0d0b/flink-runtime/src/main/java/org/apache/flink/runtime/execution/librarycache/BlobLibraryCacheManager.java#L280 >> < >> https://github.com/apache/flink/blob/ce557839d762b5f1ec92aa1885fd3d2ae33d0d0b/flink-runtime/src/main/java/org/apache/flink/runtime/execution/librarycache/BlobLibraryCacheManager.java#L280 >> |
[hidden email] From: [hidden email] Date: 2019-09-03 20:23 To: dev Subject: Re: Re: ClassLoader created by BlobLibraryCacheManager is not using context classloader [hidden email] From: Jan Lukavský Date: 2019-09-03 20:17 To: dev Subject: Re: ClassLoader created by BlobLibraryCacheManager is not using context classloader Hi Till, the use-case is pretty much simple - I have a REPL shell in groovy, which generates classes at runtime. The actual hierarchy is therefore system class loader -> application classloader -> repl classloader (GroovyClassLoader actually) now, when a terminal (sink) operation in the shell occurs, I'm able to build a jar, which I can submit to remote cluster (in distributed case). But - during testing - I run the code using local flink. There is no (legal) way of adding this new runtime generated jar to local flink. As I said, I have a hackish solution which works on JDK <= 8, because it uses reflection to call addURL on the application classloader (and thefore "pretends", that the generated jar was there all the time from the JVM startup). This breaks on JDK >= 9. It might be possible to work around this somehow, but I think that the reason why LocalEnvironment is not having a way to add jars (as in case of RemoteEnvironment) is that is assumes, that you actually have all of the on classpath when using local runner. I think that this implies that it either has to use context classloader (to be able to work on top of any classloading user might have), or is wrong and would need be fixed, so that LocalEnvironment would accept files to "stage" - which would mean adding them to a class loader (probably URLClassLoader with the application class loader as parent). Or, would you see any other option? Jan On 9/3/19 2:00 PM, Till Rohrmann wrote: > Hi Jan, > > I've talked with Aljoscha and Stephan offline and we concluded that we > would like to avoid the usage of context class loaders if possible. The > reason for this is that using the context class loader can easily mess up > an otherwise clear class loader hierarchy which makes it hard to reason > about and to debug class loader issues. > > Given this, I think it would help to better understand the exact problem > you are trying to solve by using the context class loader. Usually the > usage of the context class loader points towards an API deficiency which we > might be able to address differently. > > Cheers, > Till > > On Mon, Sep 2, 2019 at 11:32 AM Aljoscha Krettek <[hidden email]> > wrote: > >> I’m not saying we can’t change that code to use the context class loader. >> I’m just not sure whether this might break other things. >> >> Best, >> Aljoscha >> >>> On 2. Sep 2019, at 11:24, Jan Lukavský <[hidden email]> wrote: >>> >>> Essentially, the class loader of Flink should be present in parent >> hierarchy of context class loader. If FlinkUserCodeClassLoader doesn't use >> context class loader, then it is actually impossible to use a hierarchy >> like this: >>> system class loader -> application class loader -> user-defined class >> loader (defines some UDFs to be used in flink program) >>> Flink now uses the application class loader, and so the UDFs fail to >> deserialize on local flink, because the user-defined class loader is >> bypassed. Moreover, there is no way to add additional classpath elements >> for LocalEnvironment (as opposed to RemoteEnvironment). I'm able to hack >> this by calling addURL method on the application class loader (which is >> terribly hackish), but that works only on JDK <= 8. No sensible workaround >> is available for JDK >= 9. >>> Alternative solution would be to enable adding jars to class loader when >> using LocalEnvironment, but that looks a little odd. >>> Jan >>> >>> On 9/2/19 11:02 AM, Aljoscha Krettek wrote: >>>> Hi, >>>> >>>> I actually don’t know whether that change would be ok. >> FlinkUserCodeClassLoader has taken >> FlinkUserCodeClassLoader.class.getClassLoader() as the parent ClassLoader >> before my change. See: >> https://github.com/apache/flink/blob/release-1.2/flink-runtime/src/main/java/org/apache/flink/runtime/execution/librarycache/FlinkUserCodeClassLoader.java >> < >> https://github.com/apache/flink/blob/release-1.2/flink-runtime/src/main/java/org/apache/flink/runtime/execution/librarycache/FlinkUserCodeClassLoader.java >>> . >>>> I have the feeling that this might be on purpose because we want to >> have the ClassLoader of the Flink Framework components be the parent >> ClassLoader, but I could be wrong. Maybe Stephan would be most appropriate >> for answering this. >>>> Best, >>>> Aljoscha >>>> >>>>> On 30. Aug 2019, at 16:28, Till Rohrmann <[hidden email]> wrote: >>>>> >>>>> Hi Jan, >>>>> >>>>> this looks to me like a bug for which you could create a JIRA and PR >> to fix it. Just to make sure, I've pulled in Aljoscha who is the author of >> this change to check with him whether we are forgetting something. >>>>> Cheers, >>>>> Till >>>>> >>>>> On Fri, Aug 30, 2019 at 3:44 PM Jan Lukavský <[hidden email] <mailto: >> [hidden email]>> wrote: >>>>> Hi, >>>>> >>>>> I have come across an issue with classloading in Flink's MiniCluster. >>>>> The issue is that when I run local flink job from a thread, that has a >>>>> non-default context classloader (for whatever reason), this classloader >>>>> is not taken into account when classloading user defined functions. >> This >>>>> is due to [1]. Is this behavior intentional, or can I file a JIRA and >>>>> use Thread.currentThread.getContextClassLoader() there? I have >> validated >>>>> that it fixes issues I'm facing. >>>>> >>>>> Jan >>>>> >>>>> [1] >>>>> >> https://github.com/apache/flink/blob/ce557839d762b5f1ec92aa1885fd3d2ae33d0d0b/flink-runtime/src/main/java/org/apache/flink/runtime/execution/librarycache/BlobLibraryCacheManager.java#L280 >> < >> https://github.com/apache/flink/blob/ce557839d762b5f1ec92aa1885fd3d2ae33d0d0b/flink-runtime/src/main/java/org/apache/flink/runtime/execution/librarycache/BlobLibraryCacheManager.java#L280 >> |
[hidden email] From: [hidden email] Date: 2019-09-03 20:25 To: dev Subject: Re: Re: ClassLoader created by BlobLibraryCacheManager is not using context classloader [hidden email] From: [hidden email] Date: 2019-09-03 20:23 To: dev Subject: Re: Re: ClassLoader created by BlobLibraryCacheManager is not using context classloader [hidden email] From: Jan Lukavský Date: 2019-09-03 20:17 To: dev Subject: Re: ClassLoader created by BlobLibraryCacheManager is not using context classloader Hi Till, the use-case is pretty much simple - I have a REPL shell in groovy, which generates classes at runtime. The actual hierarchy is therefore system class loader -> application classloader -> repl classloader (GroovyClassLoader actually) now, when a terminal (sink) operation in the shell occurs, I'm able to build a jar, which I can submit to remote cluster (in distributed case). But - during testing - I run the code using local flink. There is no (legal) way of adding this new runtime generated jar to local flink. As I said, I have a hackish solution which works on JDK <= 8, because it uses reflection to call addURL on the application classloader (and thefore "pretends", that the generated jar was there all the time from the JVM startup). This breaks on JDK >= 9. It might be possible to work around this somehow, but I think that the reason why LocalEnvironment is not having a way to add jars (as in case of RemoteEnvironment) is that is assumes, that you actually have all of the on classpath when using local runner. I think that this implies that it either has to use context classloader (to be able to work on top of any classloading user might have), or is wrong and would need be fixed, so that LocalEnvironment would accept files to "stage" - which would mean adding them to a class loader (probably URLClassLoader with the application class loader as parent). Or, would you see any other option? Jan On 9/3/19 2:00 PM, Till Rohrmann wrote: > Hi Jan, > > I've talked with Aljoscha and Stephan offline and we concluded that we > would like to avoid the usage of context class loaders if possible. The > reason for this is that using the context class loader can easily mess up > an otherwise clear class loader hierarchy which makes it hard to reason > about and to debug class loader issues. > > Given this, I think it would help to better understand the exact problem > you are trying to solve by using the context class loader. Usually the > usage of the context class loader points towards an API deficiency which we > might be able to address differently. > > Cheers, > Till > > On Mon, Sep 2, 2019 at 11:32 AM Aljoscha Krettek <[hidden email]> > wrote: > >> I’m not saying we can’t change that code to use the context class loader. >> I’m just not sure whether this might break other things. >> >> Best, >> Aljoscha >> >>> On 2. Sep 2019, at 11:24, Jan Lukavský <[hidden email]> wrote: >>> >>> Essentially, the class loader of Flink should be present in parent >> hierarchy of context class loader. If FlinkUserCodeClassLoader doesn't use >> context class loader, then it is actually impossible to use a hierarchy >> like this: >>> system class loader -> application class loader -> user-defined class >> loader (defines some UDFs to be used in flink program) >>> Flink now uses the application class loader, and so the UDFs fail to >> deserialize on local flink, because the user-defined class loader is >> bypassed. Moreover, there is no way to add additional classpath elements >> for LocalEnvironment (as opposed to RemoteEnvironment). I'm able to hack >> this by calling addURL method on the application class loader (which is >> terribly hackish), but that works only on JDK <= 8. No sensible workaround >> is available for JDK >= 9. >>> Alternative solution would be to enable adding jars to class loader when >> using LocalEnvironment, but that looks a little odd. >>> Jan >>> >>> On 9/2/19 11:02 AM, Aljoscha Krettek wrote: >>>> Hi, >>>> >>>> I actually don’t know whether that change would be ok. >> FlinkUserCodeClassLoader has taken >> FlinkUserCodeClassLoader.class.getClassLoader() as the parent ClassLoader >> before my change. See: >> https://github.com/apache/flink/blob/release-1.2/flink-runtime/src/main/java/org/apache/flink/runtime/execution/librarycache/FlinkUserCodeClassLoader.java >> < >> https://github.com/apache/flink/blob/release-1.2/flink-runtime/src/main/java/org/apache/flink/runtime/execution/librarycache/FlinkUserCodeClassLoader.java >>> . >>>> I have the feeling that this might be on purpose because we want to >> have the ClassLoader of the Flink Framework components be the parent >> ClassLoader, but I could be wrong. Maybe Stephan would be most appropriate >> for answering this. >>>> Best, >>>> Aljoscha >>>> >>>>> On 30. Aug 2019, at 16:28, Till Rohrmann <[hidden email]> wrote: >>>>> >>>>> Hi Jan, >>>>> >>>>> this looks to me like a bug for which you could create a JIRA and PR >> to fix it. Just to make sure, I've pulled in Aljoscha who is the author of >> this change to check with him whether we are forgetting something. >>>>> Cheers, >>>>> Till >>>>> >>>>> On Fri, Aug 30, 2019 at 3:44 PM Jan Lukavský <[hidden email] <mailto: >> [hidden email]>> wrote: >>>>> Hi, >>>>> >>>>> I have come across an issue with classloading in Flink's MiniCluster. >>>>> The issue is that when I run local flink job from a thread, that has a >>>>> non-default context classloader (for whatever reason), this classloader >>>>> is not taken into account when classloading user defined functions. >> This >>>>> is due to [1]. Is this behavior intentional, or can I file a JIRA and >>>>> use Thread.currentThread.getContextClassLoader() there? I have >> validated >>>>> that it fixes issues I'm facing. >>>>> >>>>> Jan >>>>> >>>>> [1] >>>>> >> https://github.com/apache/flink/blob/ce557839d762b5f1ec92aa1885fd3d2ae33d0d0b/flink-runtime/src/main/java/org/apache/flink/runtime/execution/librarycache/BlobLibraryCacheManager.java#L280 >> < >> https://github.com/apache/flink/blob/ce557839d762b5f1ec92aa1885fd3d2ae33d0d0b/flink-runtime/src/main/java/org/apache/flink/runtime/execution/librarycache/BlobLibraryCacheManager.java#L280 >> |
I see the problem Jan. What about the following proposal: Instead of using
the LocalEnvironment for local tests you always use the RemoteEnvironment but when testing it locally you spin up a MiniCluster in the same process and initialize the RemoteEnvironment with `MiniCluster#getRestAddress`. That way you would always submit a jar with the generated classes and, hence, not having to set the context class loader. The contract of the LocalEnvironment is indeed that all classes it is supposed t execute must be present when being started. Cheers, Till On Tue, Sep 3, 2019 at 2:27 PM [hidden email] < [hidden email]> wrote: > > > > > [hidden email] > > From: [hidden email] > Date: 2019-09-03 20:25 > To: dev > Subject: Re: Re: ClassLoader created by BlobLibraryCacheManager is not > using context classloader > > > > > [hidden email] > From: [hidden email] > Date: 2019-09-03 20:23 > To: dev > Subject: Re: Re: ClassLoader created by BlobLibraryCacheManager is not > using context classloader > [hidden email] > From: Jan Lukavský > Date: 2019-09-03 20:17 > To: dev > Subject: Re: ClassLoader created by BlobLibraryCacheManager is not using > context classloader > Hi Till, > the use-case is pretty much simple - I have a REPL shell in groovy, > which generates classes at runtime. The actual hierarchy is therefore > system class loader -> application classloader -> repl classloader > (GroovyClassLoader actually) > now, when a terminal (sink) operation in the shell occurs, I'm able to > build a jar, which I can submit to remote cluster (in distributed case). > But - during testing - I run the code using local flink. There is no > (legal) way of adding this new runtime generated jar to local flink. As > I said, I have a hackish solution which works on JDK <= 8, because it > uses reflection to call addURL on the application classloader (and > thefore "pretends", that the generated jar was there all the time from > the JVM startup). This breaks on JDK >= 9. It might be possible to work > around this somehow, but I think that the reason why LocalEnvironment is > not having a way to add jars (as in case of RemoteEnvironment) is that > is assumes, that you actually have all of the on classpath when using > local runner. I think that this implies that it either has to use > context classloader (to be able to work on top of any classloading user > might have), or is wrong and would need be fixed, so that > LocalEnvironment would accept files to "stage" - which would mean adding > them to a class loader (probably URLClassLoader with the application > class loader as parent). > Or, would you see any other option? > Jan > On 9/3/19 2:00 PM, Till Rohrmann wrote: > > Hi Jan, > > > > I've talked with Aljoscha and Stephan offline and we concluded that we > > would like to avoid the usage of context class loaders if possible. The > > reason for this is that using the context class loader can easily mess up > > an otherwise clear class loader hierarchy which makes it hard to reason > > about and to debug class loader issues. > > > > Given this, I think it would help to better understand the exact problem > > you are trying to solve by using the context class loader. Usually the > > usage of the context class loader points towards an API deficiency which > we > > might be able to address differently. > > > > Cheers, > > Till > > > > On Mon, Sep 2, 2019 at 11:32 AM Aljoscha Krettek <[hidden email]> > > wrote: > > > >> I’m not saying we can’t change that code to use the context class > loader. > >> I’m just not sure whether this might break other things. > >> > >> Best, > >> Aljoscha > >> > >>> On 2. Sep 2019, at 11:24, Jan Lukavský <[hidden email]> wrote: > >>> > >>> Essentially, the class loader of Flink should be present in parent > >> hierarchy of context class loader. If FlinkUserCodeClassLoader doesn't > use > >> context class loader, then it is actually impossible to use a hierarchy > >> like this: > >>> system class loader -> application class loader -> user-defined class > >> loader (defines some UDFs to be used in flink program) > >>> Flink now uses the application class loader, and so the UDFs fail to > >> deserialize on local flink, because the user-defined class loader is > >> bypassed. Moreover, there is no way to add additional classpath elements > >> for LocalEnvironment (as opposed to RemoteEnvironment). I'm able to hack > >> this by calling addURL method on the application class loader (which is > >> terribly hackish), but that works only on JDK <= 8. No sensible > workaround > >> is available for JDK >= 9. > >>> Alternative solution would be to enable adding jars to class loader > when > >> using LocalEnvironment, but that looks a little odd. > >>> Jan > >>> > >>> On 9/2/19 11:02 AM, Aljoscha Krettek wrote: > >>>> Hi, > >>>> > >>>> I actually don’t know whether that change would be ok. > >> FlinkUserCodeClassLoader has taken > >> FlinkUserCodeClassLoader.class.getClassLoader() as the parent > ClassLoader > >> before my change. See: > >> > https://github.com/apache/flink/blob/release-1.2/flink-runtime/src/main/java/org/apache/flink/runtime/execution/librarycache/FlinkUserCodeClassLoader.java > >> < > >> > https://github.com/apache/flink/blob/release-1.2/flink-runtime/src/main/java/org/apache/flink/runtime/execution/librarycache/FlinkUserCodeClassLoader.java > >>> . > >>>> I have the feeling that this might be on purpose because we want to > >> have the ClassLoader of the Flink Framework components be the parent > >> ClassLoader, but I could be wrong. Maybe Stephan would be most > appropriate > >> for answering this. > >>>> Best, > >>>> Aljoscha > >>>> > >>>>> On 30. Aug 2019, at 16:28, Till Rohrmann <[hidden email]> > wrote: > >>>>> > >>>>> Hi Jan, > >>>>> > >>>>> this looks to me like a bug for which you could create a JIRA and PR > >> to fix it. Just to make sure, I've pulled in Aljoscha who is the author > of > >> this change to check with him whether we are forgetting something. > >>>>> Cheers, > >>>>> Till > >>>>> > >>>>> On Fri, Aug 30, 2019 at 3:44 PM Jan Lukavský <[hidden email] > <mailto: > >> [hidden email]>> wrote: > >>>>> Hi, > >>>>> > >>>>> I have come across an issue with classloading in Flink's MiniCluster. > >>>>> The issue is that when I run local flink job from a thread, that has > a > >>>>> non-default context classloader (for whatever reason), this > classloader > >>>>> is not taken into account when classloading user defined functions. > >> This > >>>>> is due to [1]. Is this behavior intentional, or can I file a JIRA and > >>>>> use Thread.currentThread.getContextClassLoader() there? I have > >> validated > >>>>> that it fixes issues I'm facing. > >>>>> > >>>>> Jan > >>>>> > >>>>> [1] > >>>>> > >> > https://github.com/apache/flink/blob/ce557839d762b5f1ec92aa1885fd3d2ae33d0d0b/flink-runtime/src/main/java/org/apache/flink/runtime/execution/librarycache/BlobLibraryCacheManager.java#L280 > >> < > >> > https://github.com/apache/flink/blob/ce557839d762b5f1ec92aa1885fd3d2ae33d0d0b/flink-runtime/src/main/java/org/apache/flink/runtime/execution/librarycache/BlobLibraryCacheManager.java#L280 > >> > |
Hi Till,
hmm, that sounds it might work. I would have to incorporate this (either as default, or on demand) into Apache Beam. Would you see any disadvantages of this approach? Would you suggest to make this default behavior for local beam FlinkRunner? I can introduce a configuration option to turn this behavior on, but that would bring additional maintenance burden, etc., etc. Jan On 9/3/19 3:38 PM, Till Rohrmann wrote: > I see the problem Jan. What about the following proposal: Instead of using > the LocalEnvironment for local tests you always use the RemoteEnvironment > but when testing it locally you spin up a MiniCluster in the same process > and initialize the RemoteEnvironment with `MiniCluster#getRestAddress`. > That way you would always submit a jar with the generated classes and, > hence, not having to set the context class loader. > > The contract of the LocalEnvironment is indeed that all classes it is > supposed t execute must be present when being started. > > Cheers, > Till > > On Tue, Sep 3, 2019 at 2:27 PM [hidden email] < > [hidden email]> wrote: > >> >> >> >> [hidden email] >> >> From: [hidden email] >> Date: 2019-09-03 20:25 >> To: dev >> Subject: Re: Re: ClassLoader created by BlobLibraryCacheManager is not >> using context classloader >> >> >> >> >> [hidden email] >> From: [hidden email] >> Date: 2019-09-03 20:23 >> To: dev >> Subject: Re: Re: ClassLoader created by BlobLibraryCacheManager is not >> using context classloader >> [hidden email] >> From: Jan Lukavský >> Date: 2019-09-03 20:17 >> To: dev >> Subject: Re: ClassLoader created by BlobLibraryCacheManager is not using >> context classloader >> Hi Till, >> the use-case is pretty much simple - I have a REPL shell in groovy, >> which generates classes at runtime. The actual hierarchy is therefore >> system class loader -> application classloader -> repl classloader >> (GroovyClassLoader actually) >> now, when a terminal (sink) operation in the shell occurs, I'm able to >> build a jar, which I can submit to remote cluster (in distributed case). >> But - during testing - I run the code using local flink. There is no >> (legal) way of adding this new runtime generated jar to local flink. As >> I said, I have a hackish solution which works on JDK <= 8, because it >> uses reflection to call addURL on the application classloader (and >> thefore "pretends", that the generated jar was there all the time from >> the JVM startup). This breaks on JDK >= 9. It might be possible to work >> around this somehow, but I think that the reason why LocalEnvironment is >> not having a way to add jars (as in case of RemoteEnvironment) is that >> is assumes, that you actually have all of the on classpath when using >> local runner. I think that this implies that it either has to use >> context classloader (to be able to work on top of any classloading user >> might have), or is wrong and would need be fixed, so that >> LocalEnvironment would accept files to "stage" - which would mean adding >> them to a class loader (probably URLClassLoader with the application >> class loader as parent). >> Or, would you see any other option? >> Jan >> On 9/3/19 2:00 PM, Till Rohrmann wrote: >>> Hi Jan, >>> >>> I've talked with Aljoscha and Stephan offline and we concluded that we >>> would like to avoid the usage of context class loaders if possible. The >>> reason for this is that using the context class loader can easily mess up >>> an otherwise clear class loader hierarchy which makes it hard to reason >>> about and to debug class loader issues. >>> >>> Given this, I think it would help to better understand the exact problem >>> you are trying to solve by using the context class loader. Usually the >>> usage of the context class loader points towards an API deficiency which >> we >>> might be able to address differently. >>> >>> Cheers, >>> Till >>> >>> On Mon, Sep 2, 2019 at 11:32 AM Aljoscha Krettek <[hidden email]> >>> wrote: >>> >>>> I’m not saying we can’t change that code to use the context class >> loader. >>>> I’m just not sure whether this might break other things. >>>> >>>> Best, >>>> Aljoscha >>>> >>>>> On 2. Sep 2019, at 11:24, Jan Lukavský <[hidden email]> wrote: >>>>> >>>>> Essentially, the class loader of Flink should be present in parent >>>> hierarchy of context class loader. If FlinkUserCodeClassLoader doesn't >> use >>>> context class loader, then it is actually impossible to use a hierarchy >>>> like this: >>>>> system class loader -> application class loader -> user-defined class >>>> loader (defines some UDFs to be used in flink program) >>>>> Flink now uses the application class loader, and so the UDFs fail to >>>> deserialize on local flink, because the user-defined class loader is >>>> bypassed. Moreover, there is no way to add additional classpath elements >>>> for LocalEnvironment (as opposed to RemoteEnvironment). I'm able to hack >>>> this by calling addURL method on the application class loader (which is >>>> terribly hackish), but that works only on JDK <= 8. No sensible >> workaround >>>> is available for JDK >= 9. >>>>> Alternative solution would be to enable adding jars to class loader >> when >>>> using LocalEnvironment, but that looks a little odd. >>>>> Jan >>>>> >>>>> On 9/2/19 11:02 AM, Aljoscha Krettek wrote: >>>>>> Hi, >>>>>> >>>>>> I actually don’t know whether that change would be ok. >>>> FlinkUserCodeClassLoader has taken >>>> FlinkUserCodeClassLoader.class.getClassLoader() as the parent >> ClassLoader >>>> before my change. See: >>>> >> https://github.com/apache/flink/blob/release-1.2/flink-runtime/src/main/java/org/apache/flink/runtime/execution/librarycache/FlinkUserCodeClassLoader.java >>>> < >>>> >> https://github.com/apache/flink/blob/release-1.2/flink-runtime/src/main/java/org/apache/flink/runtime/execution/librarycache/FlinkUserCodeClassLoader.java >>>>> . >>>>>> I have the feeling that this might be on purpose because we want to >>>> have the ClassLoader of the Flink Framework components be the parent >>>> ClassLoader, but I could be wrong. Maybe Stephan would be most >> appropriate >>>> for answering this. >>>>>> Best, >>>>>> Aljoscha >>>>>> >>>>>>> On 30. Aug 2019, at 16:28, Till Rohrmann <[hidden email]> >> wrote: >>>>>>> Hi Jan, >>>>>>> >>>>>>> this looks to me like a bug for which you could create a JIRA and PR >>>> to fix it. Just to make sure, I've pulled in Aljoscha who is the author >> of >>>> this change to check with him whether we are forgetting something. >>>>>>> Cheers, >>>>>>> Till >>>>>>> >>>>>>> On Fri, Aug 30, 2019 at 3:44 PM Jan Lukavský <[hidden email] >> <mailto: >>>> [hidden email]>> wrote: >>>>>>> Hi, >>>>>>> >>>>>>> I have come across an issue with classloading in Flink's MiniCluster. >>>>>>> The issue is that when I run local flink job from a thread, that has >> a >>>>>>> non-default context classloader (for whatever reason), this >> classloader >>>>>>> is not taken into account when classloading user defined functions. >>>> This >>>>>>> is due to [1]. Is this behavior intentional, or can I file a JIRA and >>>>>>> use Thread.currentThread.getContextClassLoader() there? I have >>>> validated >>>>>>> that it fixes issues I'm facing. >>>>>>> >>>>>>> Jan >>>>>>> >>>>>>> [1] >>>>>>> >> https://github.com/apache/flink/blob/ce557839d762b5f1ec92aa1885fd3d2ae33d0d0b/flink-runtime/src/main/java/org/apache/flink/runtime/execution/librarycache/BlobLibraryCacheManager.java#L280 >>>> < >>>> >> https://github.com/apache/flink/blob/ce557839d762b5f1ec92aa1885fd3d2ae33d0d0b/flink-runtime/src/main/java/org/apache/flink/runtime/execution/librarycache/BlobLibraryCacheManager.java#L280 |
On the other hand, if you say, that the contract of LocalEnvironment is
to execute as if it had all classes on its class loader, then it currently breaks this contract. :-) Jan On 9/3/19 3:45 PM, Jan Lukavský wrote: > Hi Till, > > hmm, that sounds it might work. I would have to incorporate this > (either as default, or on demand) into Apache Beam. Would you see any > disadvantages of this approach? Would you suggest to make this default > behavior for local beam FlinkRunner? I can introduce a configuration > option to turn this behavior on, but that would bring additional > maintenance burden, etc., etc. > > Jan > > On 9/3/19 3:38 PM, Till Rohrmann wrote: >> I see the problem Jan. What about the following proposal: Instead of >> using >> the LocalEnvironment for local tests you always use the >> RemoteEnvironment >> but when testing it locally you spin up a MiniCluster in the same >> process >> and initialize the RemoteEnvironment with `MiniCluster#getRestAddress`. >> That way you would always submit a jar with the generated classes and, >> hence, not having to set the context class loader. >> >> The contract of the LocalEnvironment is indeed that all classes it is >> supposed t execute must be present when being started. >> >> Cheers, >> Till >> >> On Tue, Sep 3, 2019 at 2:27 PM [hidden email] < >> [hidden email]> wrote: >> >>> >>> >>> >>> [hidden email] >>> >>> From: [hidden email] >>> Date: 2019-09-03 20:25 >>> To: dev >>> Subject: Re: Re: ClassLoader created by BlobLibraryCacheManager is not >>> using context classloader >>> >>> >>> >>> >>> [hidden email] >>> From: [hidden email] >>> Date: 2019-09-03 20:23 >>> To: dev >>> Subject: Re: Re: ClassLoader created by BlobLibraryCacheManager is not >>> using context classloader >>> [hidden email] >>> From: Jan Lukavský >>> Date: 2019-09-03 20:17 >>> To: dev >>> Subject: Re: ClassLoader created by BlobLibraryCacheManager is not >>> using >>> context classloader >>> Hi Till, >>> the use-case is pretty much simple - I have a REPL shell in groovy, >>> which generates classes at runtime. The actual hierarchy is therefore >>> system class loader -> application classloader -> repl classloader >>> (GroovyClassLoader actually) >>> now, when a terminal (sink) operation in the shell occurs, I'm able to >>> build a jar, which I can submit to remote cluster (in distributed >>> case). >>> But - during testing - I run the code using local flink. There is no >>> (legal) way of adding this new runtime generated jar to local flink. As >>> I said, I have a hackish solution which works on JDK <= 8, because it >>> uses reflection to call addURL on the application classloader (and >>> thefore "pretends", that the generated jar was there all the time from >>> the JVM startup). This breaks on JDK >= 9. It might be possible to work >>> around this somehow, but I think that the reason why >>> LocalEnvironment is >>> not having a way to add jars (as in case of RemoteEnvironment) is that >>> is assumes, that you actually have all of the on classpath when using >>> local runner. I think that this implies that it either has to use >>> context classloader (to be able to work on top of any classloading user >>> might have), or is wrong and would need be fixed, so that >>> LocalEnvironment would accept files to "stage" - which would mean >>> adding >>> them to a class loader (probably URLClassLoader with the application >>> class loader as parent). >>> Or, would you see any other option? >>> Jan >>> On 9/3/19 2:00 PM, Till Rohrmann wrote: >>>> Hi Jan, >>>> >>>> I've talked with Aljoscha and Stephan offline and we concluded that we >>>> would like to avoid the usage of context class loaders if possible. >>>> The >>>> reason for this is that using the context class loader can easily >>>> mess up >>>> an otherwise clear class loader hierarchy which makes it hard to >>>> reason >>>> about and to debug class loader issues. >>>> >>>> Given this, I think it would help to better understand the exact >>>> problem >>>> you are trying to solve by using the context class loader. Usually the >>>> usage of the context class loader points towards an API deficiency >>>> which >>> we >>>> might be able to address differently. >>>> >>>> Cheers, >>>> Till >>>> >>>> On Mon, Sep 2, 2019 at 11:32 AM Aljoscha Krettek <[hidden email]> >>>> wrote: >>>> >>>>> I’m not saying we can’t change that code to use the context class >>> loader. >>>>> I’m just not sure whether this might break other things. >>>>> >>>>> Best, >>>>> Aljoscha >>>>> >>>>>> On 2. Sep 2019, at 11:24, Jan Lukavský <[hidden email]> wrote: >>>>>> >>>>>> Essentially, the class loader of Flink should be present in parent >>>>> hierarchy of context class loader. If FlinkUserCodeClassLoader >>>>> doesn't >>> use >>>>> context class loader, then it is actually impossible to use a >>>>> hierarchy >>>>> like this: >>>>>> system class loader -> application class loader -> >>>>>> user-defined class >>>>> loader (defines some UDFs to be used in flink program) >>>>>> Flink now uses the application class loader, and so the UDFs fail to >>>>> deserialize on local flink, because the user-defined class loader is >>>>> bypassed. Moreover, there is no way to add additional classpath >>>>> elements >>>>> for LocalEnvironment (as opposed to RemoteEnvironment). I'm able >>>>> to hack >>>>> this by calling addURL method on the application class loader >>>>> (which is >>>>> terribly hackish), but that works only on JDK <= 8. No sensible >>> workaround >>>>> is available for JDK >= 9. >>>>>> Alternative solution would be to enable adding jars to class loader >>> when >>>>> using LocalEnvironment, but that looks a little odd. >>>>>> Jan >>>>>> >>>>>> On 9/2/19 11:02 AM, Aljoscha Krettek wrote: >>>>>>> Hi, >>>>>>> >>>>>>> I actually don’t know whether that change would be ok. >>>>> FlinkUserCodeClassLoader has taken >>>>> FlinkUserCodeClassLoader.class.getClassLoader() as the parent >>> ClassLoader >>>>> before my change. See: >>>>> >>> https://github.com/apache/flink/blob/release-1.2/flink-runtime/src/main/java/org/apache/flink/runtime/execution/librarycache/FlinkUserCodeClassLoader.java >>> >>>>> < >>>>> >>> https://github.com/apache/flink/blob/release-1.2/flink-runtime/src/main/java/org/apache/flink/runtime/execution/librarycache/FlinkUserCodeClassLoader.java >>> >>>>>> . >>>>>>> I have the feeling that this might be on purpose because we want to >>>>> have the ClassLoader of the Flink Framework components be the parent >>>>> ClassLoader, but I could be wrong. Maybe Stephan would be most >>> appropriate >>>>> for answering this. >>>>>>> Best, >>>>>>> Aljoscha >>>>>>> >>>>>>>> On 30. Aug 2019, at 16:28, Till Rohrmann <[hidden email]> >>> wrote: >>>>>>>> Hi Jan, >>>>>>>> >>>>>>>> this looks to me like a bug for which you could create a JIRA >>>>>>>> and PR >>>>> to fix it. Just to make sure, I've pulled in Aljoscha who is the >>>>> author >>> of >>>>> this change to check with him whether we are forgetting something. >>>>>>>> Cheers, >>>>>>>> Till >>>>>>>> >>>>>>>> On Fri, Aug 30, 2019 at 3:44 PM Jan Lukavský <[hidden email] >>> <mailto: >>>>> [hidden email]>> wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> I have come across an issue with classloading in Flink's >>>>>>>> MiniCluster. >>>>>>>> The issue is that when I run local flink job from a thread, >>>>>>>> that has >>> a >>>>>>>> non-default context classloader (for whatever reason), this >>> classloader >>>>>>>> is not taken into account when classloading user defined >>>>>>>> functions. >>>>> This >>>>>>>> is due to [1]. Is this behavior intentional, or can I file a >>>>>>>> JIRA and >>>>>>>> use Thread.currentThread.getContextClassLoader() there? I have >>>>> validated >>>>>>>> that it fixes issues I'm facing. >>>>>>>> >>>>>>>> Jan >>>>>>>> >>>>>>>> [1] >>>>>>>> >>> https://github.com/apache/flink/blob/ce557839d762b5f1ec92aa1885fd3d2ae33d0d0b/flink-runtime/src/main/java/org/apache/flink/runtime/execution/librarycache/BlobLibraryCacheManager.java#L280 >>> >>>>> < >>>>> >>> https://github.com/apache/flink/blob/ce557839d762b5f1ec92aa1885fd3d2ae33d0d0b/flink-runtime/src/main/java/org/apache/flink/runtime/execution/librarycache/BlobLibraryCacheManager.java#L280 >>> |
How so? Does your REPL add the generated classes to the system class
loader? I assume the system class loader is used to load the Flink classes. Ideally, what you would like to have is the option to provide the parent class loader which is used load user code to the LocalEnvironment. This one could then be forwarded to the TaskExecutor where it is used to generate the user code class loader. But this is a bigger effort. The downside to this approach is that it requires you to create a jar file and to submit it via a REST call. The upside is that it is closer to the production setting. Cheers, Till On Tue, Sep 3, 2019 at 3:47 PM Jan Lukavský <[hidden email]> wrote: > On the other hand, if you say, that the contract of LocalEnvironment is > to execute as if it had all classes on its class loader, then it > currently breaks this contract. :-) > > Jan > > On 9/3/19 3:45 PM, Jan Lukavský wrote: > > Hi Till, > > > > hmm, that sounds it might work. I would have to incorporate this > > (either as default, or on demand) into Apache Beam. Would you see any > > disadvantages of this approach? Would you suggest to make this default > > behavior for local beam FlinkRunner? I can introduce a configuration > > option to turn this behavior on, but that would bring additional > > maintenance burden, etc., etc. > > > > Jan > > > > On 9/3/19 3:38 PM, Till Rohrmann wrote: > >> I see the problem Jan. What about the following proposal: Instead of > >> using > >> the LocalEnvironment for local tests you always use the > >> RemoteEnvironment > >> but when testing it locally you spin up a MiniCluster in the same > >> process > >> and initialize the RemoteEnvironment with `MiniCluster#getRestAddress`. > >> That way you would always submit a jar with the generated classes and, > >> hence, not having to set the context class loader. > >> > >> The contract of the LocalEnvironment is indeed that all classes it is > >> supposed t execute must be present when being started. > >> > >> Cheers, > >> Till > >> > >> On Tue, Sep 3, 2019 at 2:27 PM [hidden email] < > >> [hidden email]> wrote: > >> > >>> > >>> > >>> > >>> [hidden email] > >>> > >>> From: [hidden email] > >>> Date: 2019-09-03 20:25 > >>> To: dev > >>> Subject: Re: Re: ClassLoader created by BlobLibraryCacheManager is not > >>> using context classloader > >>> > >>> > >>> > >>> > >>> [hidden email] > >>> From: [hidden email] > >>> Date: 2019-09-03 20:23 > >>> To: dev > >>> Subject: Re: Re: ClassLoader created by BlobLibraryCacheManager is not > >>> using context classloader > >>> [hidden email] > >>> From: Jan Lukavský > >>> Date: 2019-09-03 20:17 > >>> To: dev > >>> Subject: Re: ClassLoader created by BlobLibraryCacheManager is not > >>> using > >>> context classloader > >>> Hi Till, > >>> the use-case is pretty much simple - I have a REPL shell in groovy, > >>> which generates classes at runtime. The actual hierarchy is therefore > >>> system class loader -> application classloader -> repl classloader > >>> (GroovyClassLoader actually) > >>> now, when a terminal (sink) operation in the shell occurs, I'm able to > >>> build a jar, which I can submit to remote cluster (in distributed > >>> case). > >>> But - during testing - I run the code using local flink. There is no > >>> (legal) way of adding this new runtime generated jar to local flink. As > >>> I said, I have a hackish solution which works on JDK <= 8, because it > >>> uses reflection to call addURL on the application classloader (and > >>> thefore "pretends", that the generated jar was there all the time from > >>> the JVM startup). This breaks on JDK >= 9. It might be possible to work > >>> around this somehow, but I think that the reason why > >>> LocalEnvironment is > >>> not having a way to add jars (as in case of RemoteEnvironment) is that > >>> is assumes, that you actually have all of the on classpath when using > >>> local runner. I think that this implies that it either has to use > >>> context classloader (to be able to work on top of any classloading user > >>> might have), or is wrong and would need be fixed, so that > >>> LocalEnvironment would accept files to "stage" - which would mean > >>> adding > >>> them to a class loader (probably URLClassLoader with the application > >>> class loader as parent). > >>> Or, would you see any other option? > >>> Jan > >>> On 9/3/19 2:00 PM, Till Rohrmann wrote: > >>>> Hi Jan, > >>>> > >>>> I've talked with Aljoscha and Stephan offline and we concluded that we > >>>> would like to avoid the usage of context class loaders if possible. > >>>> The > >>>> reason for this is that using the context class loader can easily > >>>> mess up > >>>> an otherwise clear class loader hierarchy which makes it hard to > >>>> reason > >>>> about and to debug class loader issues. > >>>> > >>>> Given this, I think it would help to better understand the exact > >>>> problem > >>>> you are trying to solve by using the context class loader. Usually the > >>>> usage of the context class loader points towards an API deficiency > >>>> which > >>> we > >>>> might be able to address differently. > >>>> > >>>> Cheers, > >>>> Till > >>>> > >>>> On Mon, Sep 2, 2019 at 11:32 AM Aljoscha Krettek <[hidden email] > > > >>>> wrote: > >>>> > >>>>> I’m not saying we can’t change that code to use the context class > >>> loader. > >>>>> I’m just not sure whether this might break other things. > >>>>> > >>>>> Best, > >>>>> Aljoscha > >>>>> > >>>>>> On 2. Sep 2019, at 11:24, Jan Lukavský <[hidden email]> wrote: > >>>>>> > >>>>>> Essentially, the class loader of Flink should be present in parent > >>>>> hierarchy of context class loader. If FlinkUserCodeClassLoader > >>>>> doesn't > >>> use > >>>>> context class loader, then it is actually impossible to use a > >>>>> hierarchy > >>>>> like this: > >>>>>> system class loader -> application class loader -> > >>>>>> user-defined class > >>>>> loader (defines some UDFs to be used in flink program) > >>>>>> Flink now uses the application class loader, and so the UDFs fail to > >>>>> deserialize on local flink, because the user-defined class loader is > >>>>> bypassed. Moreover, there is no way to add additional classpath > >>>>> elements > >>>>> for LocalEnvironment (as opposed to RemoteEnvironment). I'm able > >>>>> to hack > >>>>> this by calling addURL method on the application class loader > >>>>> (which is > >>>>> terribly hackish), but that works only on JDK <= 8. No sensible > >>> workaround > >>>>> is available for JDK >= 9. > >>>>>> Alternative solution would be to enable adding jars to class loader > >>> when > >>>>> using LocalEnvironment, but that looks a little odd. > >>>>>> Jan > >>>>>> > >>>>>> On 9/2/19 11:02 AM, Aljoscha Krettek wrote: > >>>>>>> Hi, > >>>>>>> > >>>>>>> I actually don’t know whether that change would be ok. > >>>>> FlinkUserCodeClassLoader has taken > >>>>> FlinkUserCodeClassLoader.class.getClassLoader() as the parent > >>> ClassLoader > >>>>> before my change. See: > >>>>> > >>> > https://github.com/apache/flink/blob/release-1.2/flink-runtime/src/main/java/org/apache/flink/runtime/execution/librarycache/FlinkUserCodeClassLoader.java > >>> > >>>>> < > >>>>> > >>> > https://github.com/apache/flink/blob/release-1.2/flink-runtime/src/main/java/org/apache/flink/runtime/execution/librarycache/FlinkUserCodeClassLoader.java > >>> > >>>>>> . > >>>>>>> I have the feeling that this might be on purpose because we want to > >>>>> have the ClassLoader of the Flink Framework components be the parent > >>>>> ClassLoader, but I could be wrong. Maybe Stephan would be most > >>> appropriate > >>>>> for answering this. > >>>>>>> Best, > >>>>>>> Aljoscha > >>>>>>> > >>>>>>>> On 30. Aug 2019, at 16:28, Till Rohrmann <[hidden email]> > >>> wrote: > >>>>>>>> Hi Jan, > >>>>>>>> > >>>>>>>> this looks to me like a bug for which you could create a JIRA > >>>>>>>> and PR > >>>>> to fix it. Just to make sure, I've pulled in Aljoscha who is the > >>>>> author > >>> of > >>>>> this change to check with him whether we are forgetting something. > >>>>>>>> Cheers, > >>>>>>>> Till > >>>>>>>> > >>>>>>>> On Fri, Aug 30, 2019 at 3:44 PM Jan Lukavský <[hidden email] > >>> <mailto: > >>>>> [hidden email]>> wrote: > >>>>>>>> Hi, > >>>>>>>> > >>>>>>>> I have come across an issue with classloading in Flink's > >>>>>>>> MiniCluster. > >>>>>>>> The issue is that when I run local flink job from a thread, > >>>>>>>> that has > >>> a > >>>>>>>> non-default context classloader (for whatever reason), this > >>> classloader > >>>>>>>> is not taken into account when classloading user defined > >>>>>>>> functions. > >>>>> This > >>>>>>>> is due to [1]. Is this behavior intentional, or can I file a > >>>>>>>> JIRA and > >>>>>>>> use Thread.currentThread.getContextClassLoader() there? I have > >>>>> validated > >>>>>>>> that it fixes issues I'm facing. > >>>>>>>> > >>>>>>>> Jan > >>>>>>>> > >>>>>>>> [1] > >>>>>>>> > >>> > https://github.com/apache/flink/blob/ce557839d762b5f1ec92aa1885fd3d2ae33d0d0b/flink-runtime/src/main/java/org/apache/flink/runtime/execution/librarycache/BlobLibraryCacheManager.java#L280 > >>> > >>>>> < > >>>>> > >>> > https://github.com/apache/flink/blob/ce557839d762b5f1ec92aa1885fd3d2ae33d0d0b/flink-runtime/src/main/java/org/apache/flink/runtime/execution/librarycache/BlobLibraryCacheManager.java#L280 > >>> > |
Answers inline.
On 9/3/19 4:01 PM, Till Rohrmann wrote: > How so? Does your REPL add the generated classes to the system class > loader? I assume the system class loader is used to load the Flink classes. No, it does not. It cannot on JDK >= 9 (or would have to hack into jdk.internal.loader.ClassLoaders, which I don't want to :)). It just creates another class loader, and is able to create a jar from generated files. The jar is used for remote execution. > > Ideally, what you would like to have is the option to provide the parent > class loader which is used load user code to the LocalEnvironment. This one > could then be forwarded to the TaskExecutor where it is used to generate > the user code class loader. But this is a bigger effort. I'm not sure how this differs from using context classloader? Maybe there is subtle difference in that this is a little more explicit. On the other hand, users normally do not modify class loaders, so the practical impact is IMHO negligible. But maybe this opens another possibility - we probably could add optional ClassLoader parameter to LocalEnvironment, with default value of FlinkRunner.class.getClassLoader()? That might be a good compromise. > > The downside to this approach is that it requires you to create a jar file > and to submit it via a REST call. The upside is that it is closer to the > production setting. Yes, a REPL has to do that anyway to support distributed computing, so this is not an issue. Jan > > Cheers, > Till > > On Tue, Sep 3, 2019 at 3:47 PM Jan Lukavský <[hidden email]> wrote: > >> On the other hand, if you say, that the contract of LocalEnvironment is >> to execute as if it had all classes on its class loader, then it >> currently breaks this contract. :-) >> >> Jan >> >> On 9/3/19 3:45 PM, Jan Lukavský wrote: >>> Hi Till, >>> >>> hmm, that sounds it might work. I would have to incorporate this >>> (either as default, or on demand) into Apache Beam. Would you see any >>> disadvantages of this approach? Would you suggest to make this default >>> behavior for local beam FlinkRunner? I can introduce a configuration >>> option to turn this behavior on, but that would bring additional >>> maintenance burden, etc., etc. >>> >>> Jan >>> >>> On 9/3/19 3:38 PM, Till Rohrmann wrote: >>>> I see the problem Jan. What about the following proposal: Instead of >>>> using >>>> the LocalEnvironment for local tests you always use the >>>> RemoteEnvironment >>>> but when testing it locally you spin up a MiniCluster in the same >>>> process >>>> and initialize the RemoteEnvironment with `MiniCluster#getRestAddress`. >>>> That way you would always submit a jar with the generated classes and, >>>> hence, not having to set the context class loader. >>>> >>>> The contract of the LocalEnvironment is indeed that all classes it is >>>> supposed t execute must be present when being started. >>>> >>>> Cheers, >>>> Till >>>> >>>> On Tue, Sep 3, 2019 at 2:27 PM [hidden email] < >>>> [hidden email]> wrote: >>>> >>>>> >>>>> >>>>> [hidden email] >>>>> >>>>> From: [hidden email] >>>>> Date: 2019-09-03 20:25 >>>>> To: dev >>>>> Subject: Re: Re: ClassLoader created by BlobLibraryCacheManager is not >>>>> using context classloader >>>>> >>>>> >>>>> >>>>> >>>>> [hidden email] >>>>> From: [hidden email] >>>>> Date: 2019-09-03 20:23 >>>>> To: dev >>>>> Subject: Re: Re: ClassLoader created by BlobLibraryCacheManager is not >>>>> using context classloader >>>>> [hidden email] >>>>> From: Jan Lukavský >>>>> Date: 2019-09-03 20:17 >>>>> To: dev >>>>> Subject: Re: ClassLoader created by BlobLibraryCacheManager is not >>>>> using >>>>> context classloader >>>>> Hi Till, >>>>> the use-case is pretty much simple - I have a REPL shell in groovy, >>>>> which generates classes at runtime. The actual hierarchy is therefore >>>>> system class loader -> application classloader -> repl classloader >>>>> (GroovyClassLoader actually) >>>>> now, when a terminal (sink) operation in the shell occurs, I'm able to >>>>> build a jar, which I can submit to remote cluster (in distributed >>>>> case). >>>>> But - during testing - I run the code using local flink. There is no >>>>> (legal) way of adding this new runtime generated jar to local flink. As >>>>> I said, I have a hackish solution which works on JDK <= 8, because it >>>>> uses reflection to call addURL on the application classloader (and >>>>> thefore "pretends", that the generated jar was there all the time from >>>>> the JVM startup). This breaks on JDK >= 9. It might be possible to work >>>>> around this somehow, but I think that the reason why >>>>> LocalEnvironment is >>>>> not having a way to add jars (as in case of RemoteEnvironment) is that >>>>> is assumes, that you actually have all of the on classpath when using >>>>> local runner. I think that this implies that it either has to use >>>>> context classloader (to be able to work on top of any classloading user >>>>> might have), or is wrong and would need be fixed, so that >>>>> LocalEnvironment would accept files to "stage" - which would mean >>>>> adding >>>>> them to a class loader (probably URLClassLoader with the application >>>>> class loader as parent). >>>>> Or, would you see any other option? >>>>> Jan >>>>> On 9/3/19 2:00 PM, Till Rohrmann wrote: >>>>>> Hi Jan, >>>>>> >>>>>> I've talked with Aljoscha and Stephan offline and we concluded that we >>>>>> would like to avoid the usage of context class loaders if possible. >>>>>> The >>>>>> reason for this is that using the context class loader can easily >>>>>> mess up >>>>>> an otherwise clear class loader hierarchy which makes it hard to >>>>>> reason >>>>>> about and to debug class loader issues. >>>>>> >>>>>> Given this, I think it would help to better understand the exact >>>>>> problem >>>>>> you are trying to solve by using the context class loader. Usually the >>>>>> usage of the context class loader points towards an API deficiency >>>>>> which >>>>> we >>>>>> might be able to address differently. >>>>>> >>>>>> Cheers, >>>>>> Till >>>>>> >>>>>> On Mon, Sep 2, 2019 at 11:32 AM Aljoscha Krettek <[hidden email] >>>>>> wrote: >>>>>> >>>>>>> I’m not saying we can’t change that code to use the context class >>>>> loader. >>>>>>> I’m just not sure whether this might break other things. >>>>>>> >>>>>>> Best, >>>>>>> Aljoscha >>>>>>> >>>>>>>> On 2. Sep 2019, at 11:24, Jan Lukavský <[hidden email]> wrote: >>>>>>>> >>>>>>>> Essentially, the class loader of Flink should be present in parent >>>>>>> hierarchy of context class loader. If FlinkUserCodeClassLoader >>>>>>> doesn't >>>>> use >>>>>>> context class loader, then it is actually impossible to use a >>>>>>> hierarchy >>>>>>> like this: >>>>>>>> system class loader -> application class loader -> >>>>>>>> user-defined class >>>>>>> loader (defines some UDFs to be used in flink program) >>>>>>>> Flink now uses the application class loader, and so the UDFs fail to >>>>>>> deserialize on local flink, because the user-defined class loader is >>>>>>> bypassed. Moreover, there is no way to add additional classpath >>>>>>> elements >>>>>>> for LocalEnvironment (as opposed to RemoteEnvironment). I'm able >>>>>>> to hack >>>>>>> this by calling addURL method on the application class loader >>>>>>> (which is >>>>>>> terribly hackish), but that works only on JDK <= 8. No sensible >>>>> workaround >>>>>>> is available for JDK >= 9. >>>>>>>> Alternative solution would be to enable adding jars to class loader >>>>> when >>>>>>> using LocalEnvironment, but that looks a little odd. >>>>>>>> Jan >>>>>>>> >>>>>>>> On 9/2/19 11:02 AM, Aljoscha Krettek wrote: >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> I actually don’t know whether that change would be ok. >>>>>>> FlinkUserCodeClassLoader has taken >>>>>>> FlinkUserCodeClassLoader.class.getClassLoader() as the parent >>>>> ClassLoader >>>>>>> before my change. See: >>>>>>> >> https://github.com/apache/flink/blob/release-1.2/flink-runtime/src/main/java/org/apache/flink/runtime/execution/librarycache/FlinkUserCodeClassLoader.java >>>>>>> < >>>>>>> >> https://github.com/apache/flink/blob/release-1.2/flink-runtime/src/main/java/org/apache/flink/runtime/execution/librarycache/FlinkUserCodeClassLoader.java >>>>>>>> . >>>>>>>>> I have the feeling that this might be on purpose because we want to >>>>>>> have the ClassLoader of the Flink Framework components be the parent >>>>>>> ClassLoader, but I could be wrong. Maybe Stephan would be most >>>>> appropriate >>>>>>> for answering this. >>>>>>>>> Best, >>>>>>>>> Aljoscha >>>>>>>>> >>>>>>>>>> On 30. Aug 2019, at 16:28, Till Rohrmann <[hidden email]> >>>>> wrote: >>>>>>>>>> Hi Jan, >>>>>>>>>> >>>>>>>>>> this looks to me like a bug for which you could create a JIRA >>>>>>>>>> and PR >>>>>>> to fix it. Just to make sure, I've pulled in Aljoscha who is the >>>>>>> author >>>>> of >>>>>>> this change to check with him whether we are forgetting something. >>>>>>>>>> Cheers, >>>>>>>>>> Till >>>>>>>>>> >>>>>>>>>> On Fri, Aug 30, 2019 at 3:44 PM Jan Lukavský <[hidden email] >>>>> <mailto: >>>>>>> [hidden email]>> wrote: >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> I have come across an issue with classloading in Flink's >>>>>>>>>> MiniCluster. >>>>>>>>>> The issue is that when I run local flink job from a thread, >>>>>>>>>> that has >>>>> a >>>>>>>>>> non-default context classloader (for whatever reason), this >>>>> classloader >>>>>>>>>> is not taken into account when classloading user defined >>>>>>>>>> functions. >>>>>>> This >>>>>>>>>> is due to [1]. Is this behavior intentional, or can I file a >>>>>>>>>> JIRA and >>>>>>>>>> use Thread.currentThread.getContextClassLoader() there? I have >>>>>>> validated >>>>>>>>>> that it fixes issues I'm facing. >>>>>>>>>> >>>>>>>>>> Jan >>>>>>>>>> >>>>>>>>>> [1] >>>>>>>>>> >> https://github.com/apache/flink/blob/ce557839d762b5f1ec92aa1885fd3d2ae33d0d0b/flink-runtime/src/main/java/org/apache/flink/runtime/execution/librarycache/BlobLibraryCacheManager.java#L280 >>>>>>> < >>>>>>> >> https://github.com/apache/flink/blob/ce557839d762b5f1ec92aa1885fd3d2ae33d0d0b/flink-runtime/src/main/java/org/apache/flink/runtime/execution/librarycache/BlobLibraryCacheManager.java#L280 |
Hi Till and Aljoscha,
I was investigating the other options, but unfortunately all of them look a little complicated (although possible, of course). But before going into a more complicated solutions, I'd like to know what issues do you actually see with using the context class loader. I can think of one difficulty - if (for whatever reason), the context class loader doesn't contain (in itself or as a parent) class loader that loaded flink core classes, that would probably cause troubles. So, what about a solution that we take as parent class loader of FlinkUserCodeClassLoaders a class loader that is: a) context class loader of current thread, if it either is actually class loader of flink core classes, or if it contains this class loader in its parent hierarchy, or b) class loader of flink core classes That way, class loader of flink core classes would always be in parent hierarchy of FlinkUserCodeClassLoaders. Would that solve the issues you see? It works for me. Jan On 9/3/19 4:52 PM, Jan Lukavský wrote: > Answers inline. > > On 9/3/19 4:01 PM, Till Rohrmann wrote: >> How so? Does your REPL add the generated classes to the system class >> loader? I assume the system class loader is used to load the Flink >> classes. > No, it does not. It cannot on JDK >= 9 (or would have to hack into > jdk.internal.loader.ClassLoaders, which I don't want to :)). It just > creates another class loader, and is able to create a jar from > generated files. The jar is used for remote execution. >> >> Ideally, what you would like to have is the option to provide the parent >> class loader which is used load user code to the LocalEnvironment. >> This one >> could then be forwarded to the TaskExecutor where it is used to generate >> the user code class loader. But this is a bigger effort. > I'm not sure how this differs from using context classloader? Maybe > there is subtle difference in that this is a little more explicit. On > the other hand, users normally do not modify class loaders, so the > practical impact is IMHO negligible. But maybe this opens another > possibility - we probably could add optional ClassLoader parameter to > LocalEnvironment, with default value of > FlinkRunner.class.getClassLoader()? That might be a good compromise. >> >> The downside to this approach is that it requires you to create a jar >> file >> and to submit it via a REST call. The upside is that it is closer to the >> production setting. > > Yes, a REPL has to do that anyway to support distributed computing, so > this is not an issue. > > Jan > >> >> Cheers, >> Till >> >> On Tue, Sep 3, 2019 at 3:47 PM Jan Lukavský <[hidden email]> wrote: >> >>> On the other hand, if you say, that the contract of LocalEnvironment is >>> to execute as if it had all classes on its class loader, then it >>> currently breaks this contract. :-) >>> >>> Jan >>> >>> On 9/3/19 3:45 PM, Jan Lukavský wrote: >>>> Hi Till, >>>> >>>> hmm, that sounds it might work. I would have to incorporate this >>>> (either as default, or on demand) into Apache Beam. Would you see any >>>> disadvantages of this approach? Would you suggest to make this default >>>> behavior for local beam FlinkRunner? I can introduce a configuration >>>> option to turn this behavior on, but that would bring additional >>>> maintenance burden, etc., etc. >>>> >>>> Jan >>>> >>>> On 9/3/19 3:38 PM, Till Rohrmann wrote: >>>>> I see the problem Jan. What about the following proposal: Instead of >>>>> using >>>>> the LocalEnvironment for local tests you always use the >>>>> RemoteEnvironment >>>>> but when testing it locally you spin up a MiniCluster in the same >>>>> process >>>>> and initialize the RemoteEnvironment with >>>>> `MiniCluster#getRestAddress`. >>>>> That way you would always submit a jar with the generated classes >>>>> and, >>>>> hence, not having to set the context class loader. >>>>> >>>>> The contract of the LocalEnvironment is indeed that all classes it is >>>>> supposed t execute must be present when being started. >>>>> >>>>> Cheers, >>>>> Till >>>>> >>>>> On Tue, Sep 3, 2019 at 2:27 PM [hidden email] < >>>>> [hidden email]> wrote: >>>>> >>>>>> >>>>>> >>>>>> [hidden email] >>>>>> >>>>>> From: [hidden email] >>>>>> Date: 2019-09-03 20:25 >>>>>> To: dev >>>>>> Subject: Re: Re: ClassLoader created by BlobLibraryCacheManager >>>>>> is not >>>>>> using context classloader >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> [hidden email] >>>>>> From: [hidden email] >>>>>> Date: 2019-09-03 20:23 >>>>>> To: dev >>>>>> Subject: Re: Re: ClassLoader created by BlobLibraryCacheManager >>>>>> is not >>>>>> using context classloader >>>>>> [hidden email] >>>>>> From: Jan Lukavský >>>>>> Date: 2019-09-03 20:17 >>>>>> To: dev >>>>>> Subject: Re: ClassLoader created by BlobLibraryCacheManager is not >>>>>> using >>>>>> context classloader >>>>>> Hi Till, >>>>>> the use-case is pretty much simple - I have a REPL shell in groovy, >>>>>> which generates classes at runtime. The actual hierarchy is >>>>>> therefore >>>>>> system class loader -> application classloader -> repl classloader >>>>>> (GroovyClassLoader actually) >>>>>> now, when a terminal (sink) operation in the shell occurs, I'm >>>>>> able to >>>>>> build a jar, which I can submit to remote cluster (in distributed >>>>>> case). >>>>>> But - during testing - I run the code using local flink. There >>>>>> is no >>>>>> (legal) way of adding this new runtime generated jar to local >>>>>> flink. As >>>>>> I said, I have a hackish solution which works on JDK <= 8, >>>>>> because it >>>>>> uses reflection to call addURL on the application classloader (and >>>>>> thefore "pretends", that the generated jar was there all the time >>>>>> from >>>>>> the JVM startup). This breaks on JDK >= 9. It might be possible >>>>>> to work >>>>>> around this somehow, but I think that the reason why >>>>>> LocalEnvironment is >>>>>> not having a way to add jars (as in case of RemoteEnvironment) is >>>>>> that >>>>>> is assumes, that you actually have all of the on classpath when >>>>>> using >>>>>> local runner. I think that this implies that it either has to use >>>>>> context classloader (to be able to work on top of any >>>>>> classloading user >>>>>> might have), or is wrong and would need be fixed, so that >>>>>> LocalEnvironment would accept files to "stage" - which would mean >>>>>> adding >>>>>> them to a class loader (probably URLClassLoader with the application >>>>>> class loader as parent). >>>>>> Or, would you see any other option? >>>>>> Jan >>>>>> On 9/3/19 2:00 PM, Till Rohrmann wrote: >>>>>>> Hi Jan, >>>>>>> >>>>>>> I've talked with Aljoscha and Stephan offline and we concluded >>>>>>> that we >>>>>>> would like to avoid the usage of context class loaders if possible. >>>>>>> The >>>>>>> reason for this is that using the context class loader can easily >>>>>>> mess up >>>>>>> an otherwise clear class loader hierarchy which makes it hard to >>>>>>> reason >>>>>>> about and to debug class loader issues. >>>>>>> >>>>>>> Given this, I think it would help to better understand the exact >>>>>>> problem >>>>>>> you are trying to solve by using the context class loader. >>>>>>> Usually the >>>>>>> usage of the context class loader points towards an API deficiency >>>>>>> which >>>>>> we >>>>>>> might be able to address differently. >>>>>>> >>>>>>> Cheers, >>>>>>> Till >>>>>>> >>>>>>> On Mon, Sep 2, 2019 at 11:32 AM Aljoscha Krettek >>>>>>> <[hidden email] >>>>>>> wrote: >>>>>>> >>>>>>>> I’m not saying we can’t change that code to use the context class >>>>>> loader. >>>>>>>> I’m just not sure whether this might break other things. >>>>>>>> >>>>>>>> Best, >>>>>>>> Aljoscha >>>>>>>> >>>>>>>>> On 2. Sep 2019, at 11:24, Jan Lukavský <[hidden email]> wrote: >>>>>>>>> >>>>>>>>> Essentially, the class loader of Flink should be present in >>>>>>>>> parent >>>>>>>> hierarchy of context class loader. If FlinkUserCodeClassLoader >>>>>>>> doesn't >>>>>> use >>>>>>>> context class loader, then it is actually impossible to use a >>>>>>>> hierarchy >>>>>>>> like this: >>>>>>>>> system class loader -> application class loader -> >>>>>>>>> user-defined class >>>>>>>> loader (defines some UDFs to be used in flink program) >>>>>>>>> Flink now uses the application class loader, and so the UDFs >>>>>>>>> fail to >>>>>>>> deserialize on local flink, because the user-defined class >>>>>>>> loader is >>>>>>>> bypassed. Moreover, there is no way to add additional classpath >>>>>>>> elements >>>>>>>> for LocalEnvironment (as opposed to RemoteEnvironment). I'm able >>>>>>>> to hack >>>>>>>> this by calling addURL method on the application class loader >>>>>>>> (which is >>>>>>>> terribly hackish), but that works only on JDK <= 8. No sensible >>>>>> workaround >>>>>>>> is available for JDK >= 9. >>>>>>>>> Alternative solution would be to enable adding jars to class >>>>>>>>> loader >>>>>> when >>>>>>>> using LocalEnvironment, but that looks a little odd. >>>>>>>>> Jan >>>>>>>>> >>>>>>>>> On 9/2/19 11:02 AM, Aljoscha Krettek wrote: >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> I actually don’t know whether that change would be ok. >>>>>>>> FlinkUserCodeClassLoader has taken >>>>>>>> FlinkUserCodeClassLoader.class.getClassLoader() as the parent >>>>>> ClassLoader >>>>>>>> before my change. See: >>>>>>>> >>> https://github.com/apache/flink/blob/release-1.2/flink-runtime/src/main/java/org/apache/flink/runtime/execution/librarycache/FlinkUserCodeClassLoader.java >>> >>>>>>>> < >>>>>>>> >>> https://github.com/apache/flink/blob/release-1.2/flink-runtime/src/main/java/org/apache/flink/runtime/execution/librarycache/FlinkUserCodeClassLoader.java >>> >>>>>>>>> . >>>>>>>>>> I have the feeling that this might be on purpose because we >>>>>>>>>> want to >>>>>>>> have the ClassLoader of the Flink Framework components be the >>>>>>>> parent >>>>>>>> ClassLoader, but I could be wrong. Maybe Stephan would be most >>>>>> appropriate >>>>>>>> for answering this. >>>>>>>>>> Best, >>>>>>>>>> Aljoscha >>>>>>>>>> >>>>>>>>>>> On 30. Aug 2019, at 16:28, Till Rohrmann <[hidden email]> >>>>>> wrote: >>>>>>>>>>> Hi Jan, >>>>>>>>>>> >>>>>>>>>>> this looks to me like a bug for which you could create a JIRA >>>>>>>>>>> and PR >>>>>>>> to fix it. Just to make sure, I've pulled in Aljoscha who is the >>>>>>>> author >>>>>> of >>>>>>>> this change to check with him whether we are forgetting something. >>>>>>>>>>> Cheers, >>>>>>>>>>> Till >>>>>>>>>>> >>>>>>>>>>> On Fri, Aug 30, 2019 at 3:44 PM Jan Lukavský <[hidden email] >>>>>> <mailto: >>>>>>>> [hidden email]>> wrote: >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> I have come across an issue with classloading in Flink's >>>>>>>>>>> MiniCluster. >>>>>>>>>>> The issue is that when I run local flink job from a thread, >>>>>>>>>>> that has >>>>>> a >>>>>>>>>>> non-default context classloader (for whatever reason), this >>>>>> classloader >>>>>>>>>>> is not taken into account when classloading user defined >>>>>>>>>>> functions. >>>>>>>> This >>>>>>>>>>> is due to [1]. Is this behavior intentional, or can I file a >>>>>>>>>>> JIRA and >>>>>>>>>>> use Thread.currentThread.getContextClassLoader() there? I have >>>>>>>> validated >>>>>>>>>>> that it fixes issues I'm facing. >>>>>>>>>>> >>>>>>>>>>> Jan >>>>>>>>>>> >>>>>>>>>>> [1] >>>>>>>>>>> >>> https://github.com/apache/flink/blob/ce557839d762b5f1ec92aa1885fd3d2ae33d0d0b/flink-runtime/src/main/java/org/apache/flink/runtime/execution/librarycache/BlobLibraryCacheManager.java#L280 >>> >>>>>>>> < >>>>>>>> >>> https://github.com/apache/flink/blob/ce557839d762b5f1ec92aa1885fd3d2ae33d0d0b/flink-runtime/src/main/java/org/apache/flink/runtime/execution/librarycache/BlobLibraryCacheManager.java#L280 >>> |
Hi Jan,
sorry for my late response. I think the main concern about using the context class loader is the unpredictability and magic it adds to a component where you actually don't wanna be surprised. Moreover, if we now add support for the context class loader, then at a later time another component might use this feature as well. This component would then have to make sure that the new class loader it sets has the current context class loader as its parent because otherwise it might break your use case. But on the other hand, I admit that I don't have a good solution to your problem at hand other than using the RemoveExecutionEnvironment. One comment concerning your proposed class loader resolution. I think it adds a bit too much magic which is hard to understand for the user. It would be better if the system would fail with a descriptive error message instead. Cheers, Till On Thu, Sep 5, 2019 at 12:55 PM Jan Lukavský <[hidden email]> wrote: > Hi Till and Aljoscha, > > I was investigating the other options, but unfortunately all of them > look a little complicated (although possible, of course). But before > going into a more complicated solutions, I'd like to know what issues do > you actually see with using the context class loader. I can think of one > difficulty - if (for whatever reason), the context class loader doesn't > contain (in itself or as a parent) class loader that loaded flink core > classes, that would probably cause troubles. So, what about a solution > that we take as parent class loader of FlinkUserCodeClassLoaders a class > loader that is: > > a) context class loader of current thread, if it either is actually > class loader of flink core classes, or if it contains this class loader > in its parent hierarchy, or > > b) class loader of flink core classes > > That way, class loader of flink core classes would always be in parent > hierarchy of FlinkUserCodeClassLoaders. Would that solve the issues you > see? It works for me. > > Jan > > On 9/3/19 4:52 PM, Jan Lukavský wrote: > > Answers inline. > > > > On 9/3/19 4:01 PM, Till Rohrmann wrote: > >> How so? Does your REPL add the generated classes to the system class > >> loader? I assume the system class loader is used to load the Flink > >> classes. > > No, it does not. It cannot on JDK >= 9 (or would have to hack into > > jdk.internal.loader.ClassLoaders, which I don't want to :)). It just > > creates another class loader, and is able to create a jar from > > generated files. The jar is used for remote execution. > >> > >> Ideally, what you would like to have is the option to provide the parent > >> class loader which is used load user code to the LocalEnvironment. > >> This one > >> could then be forwarded to the TaskExecutor where it is used to generate > >> the user code class loader. But this is a bigger effort. > > I'm not sure how this differs from using context classloader? Maybe > > there is subtle difference in that this is a little more explicit. On > > the other hand, users normally do not modify class loaders, so the > > practical impact is IMHO negligible. But maybe this opens another > > possibility - we probably could add optional ClassLoader parameter to > > LocalEnvironment, with default value of > > FlinkRunner.class.getClassLoader()? That might be a good compromise. > >> > >> The downside to this approach is that it requires you to create a jar > >> file > >> and to submit it via a REST call. The upside is that it is closer to the > >> production setting. > > > > Yes, a REPL has to do that anyway to support distributed computing, so > > this is not an issue. > > > > Jan > > > >> > >> Cheers, > >> Till > >> > >> On Tue, Sep 3, 2019 at 3:47 PM Jan Lukavský <[hidden email]> wrote: > >> > >>> On the other hand, if you say, that the contract of LocalEnvironment is > >>> to execute as if it had all classes on its class loader, then it > >>> currently breaks this contract. :-) > >>> > >>> Jan > >>> > >>> On 9/3/19 3:45 PM, Jan Lukavský wrote: > >>>> Hi Till, > >>>> > >>>> hmm, that sounds it might work. I would have to incorporate this > >>>> (either as default, or on demand) into Apache Beam. Would you see any > >>>> disadvantages of this approach? Would you suggest to make this default > >>>> behavior for local beam FlinkRunner? I can introduce a configuration > >>>> option to turn this behavior on, but that would bring additional > >>>> maintenance burden, etc., etc. > >>>> > >>>> Jan > >>>> > >>>> On 9/3/19 3:38 PM, Till Rohrmann wrote: > >>>>> I see the problem Jan. What about the following proposal: Instead of > >>>>> using > >>>>> the LocalEnvironment for local tests you always use the > >>>>> RemoteEnvironment > >>>>> but when testing it locally you spin up a MiniCluster in the same > >>>>> process > >>>>> and initialize the RemoteEnvironment with > >>>>> `MiniCluster#getRestAddress`. > >>>>> That way you would always submit a jar with the generated classes > >>>>> and, > >>>>> hence, not having to set the context class loader. > >>>>> > >>>>> The contract of the LocalEnvironment is indeed that all classes it is > >>>>> supposed t execute must be present when being started. > >>>>> > >>>>> Cheers, > >>>>> Till > >>>>> > >>>>> On Tue, Sep 3, 2019 at 2:27 PM [hidden email] < > >>>>> [hidden email]> wrote: > >>>>> > >>>>>> > >>>>>> > >>>>>> [hidden email] > >>>>>> > >>>>>> From: [hidden email] > >>>>>> Date: 2019-09-03 20:25 > >>>>>> To: dev > >>>>>> Subject: Re: Re: ClassLoader created by BlobLibraryCacheManager > >>>>>> is not > >>>>>> using context classloader > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> [hidden email] > >>>>>> From: [hidden email] > >>>>>> Date: 2019-09-03 20:23 > >>>>>> To: dev > >>>>>> Subject: Re: Re: ClassLoader created by BlobLibraryCacheManager > >>>>>> is not > >>>>>> using context classloader > >>>>>> [hidden email] > >>>>>> From: Jan Lukavský > >>>>>> Date: 2019-09-03 20:17 > >>>>>> To: dev > >>>>>> Subject: Re: ClassLoader created by BlobLibraryCacheManager is not > >>>>>> using > >>>>>> context classloader > >>>>>> Hi Till, > >>>>>> the use-case is pretty much simple - I have a REPL shell in groovy, > >>>>>> which generates classes at runtime. The actual hierarchy is > >>>>>> therefore > >>>>>> system class loader -> application classloader -> repl classloader > >>>>>> (GroovyClassLoader actually) > >>>>>> now, when a terminal (sink) operation in the shell occurs, I'm > >>>>>> able to > >>>>>> build a jar, which I can submit to remote cluster (in distributed > >>>>>> case). > >>>>>> But - during testing - I run the code using local flink. There > >>>>>> is no > >>>>>> (legal) way of adding this new runtime generated jar to local > >>>>>> flink. As > >>>>>> I said, I have a hackish solution which works on JDK <= 8, > >>>>>> because it > >>>>>> uses reflection to call addURL on the application classloader (and > >>>>>> thefore "pretends", that the generated jar was there all the time > >>>>>> from > >>>>>> the JVM startup). This breaks on JDK >= 9. It might be possible > >>>>>> to work > >>>>>> around this somehow, but I think that the reason why > >>>>>> LocalEnvironment is > >>>>>> not having a way to add jars (as in case of RemoteEnvironment) is > >>>>>> that > >>>>>> is assumes, that you actually have all of the on classpath when > >>>>>> using > >>>>>> local runner. I think that this implies that it either has to use > >>>>>> context classloader (to be able to work on top of any > >>>>>> classloading user > >>>>>> might have), or is wrong and would need be fixed, so that > >>>>>> LocalEnvironment would accept files to "stage" - which would mean > >>>>>> adding > >>>>>> them to a class loader (probably URLClassLoader with the application > >>>>>> class loader as parent). > >>>>>> Or, would you see any other option? > >>>>>> Jan > >>>>>> On 9/3/19 2:00 PM, Till Rohrmann wrote: > >>>>>>> Hi Jan, > >>>>>>> > >>>>>>> I've talked with Aljoscha and Stephan offline and we concluded > >>>>>>> that we > >>>>>>> would like to avoid the usage of context class loaders if possible. > >>>>>>> The > >>>>>>> reason for this is that using the context class loader can easily > >>>>>>> mess up > >>>>>>> an otherwise clear class loader hierarchy which makes it hard to > >>>>>>> reason > >>>>>>> about and to debug class loader issues. > >>>>>>> > >>>>>>> Given this, I think it would help to better understand the exact > >>>>>>> problem > >>>>>>> you are trying to solve by using the context class loader. > >>>>>>> Usually the > >>>>>>> usage of the context class loader points towards an API deficiency > >>>>>>> which > >>>>>> we > >>>>>>> might be able to address differently. > >>>>>>> > >>>>>>> Cheers, > >>>>>>> Till > >>>>>>> > >>>>>>> On Mon, Sep 2, 2019 at 11:32 AM Aljoscha Krettek > >>>>>>> <[hidden email] > >>>>>>> wrote: > >>>>>>> > >>>>>>>> I’m not saying we can’t change that code to use the context class > >>>>>> loader. > >>>>>>>> I’m just not sure whether this might break other things. > >>>>>>>> > >>>>>>>> Best, > >>>>>>>> Aljoscha > >>>>>>>> > >>>>>>>>> On 2. Sep 2019, at 11:24, Jan Lukavský <[hidden email]> wrote: > >>>>>>>>> > >>>>>>>>> Essentially, the class loader of Flink should be present in > >>>>>>>>> parent > >>>>>>>> hierarchy of context class loader. If FlinkUserCodeClassLoader > >>>>>>>> doesn't > >>>>>> use > >>>>>>>> context class loader, then it is actually impossible to use a > >>>>>>>> hierarchy > >>>>>>>> like this: > >>>>>>>>> system class loader -> application class loader -> > >>>>>>>>> user-defined class > >>>>>>>> loader (defines some UDFs to be used in flink program) > >>>>>>>>> Flink now uses the application class loader, and so the UDFs > >>>>>>>>> fail to > >>>>>>>> deserialize on local flink, because the user-defined class > >>>>>>>> loader is > >>>>>>>> bypassed. Moreover, there is no way to add additional classpath > >>>>>>>> elements > >>>>>>>> for LocalEnvironment (as opposed to RemoteEnvironment). I'm able > >>>>>>>> to hack > >>>>>>>> this by calling addURL method on the application class loader > >>>>>>>> (which is > >>>>>>>> terribly hackish), but that works only on JDK <= 8. No sensible > >>>>>> workaround > >>>>>>>> is available for JDK >= 9. > >>>>>>>>> Alternative solution would be to enable adding jars to class > >>>>>>>>> loader > >>>>>> when > >>>>>>>> using LocalEnvironment, but that looks a little odd. > >>>>>>>>> Jan > >>>>>>>>> > >>>>>>>>> On 9/2/19 11:02 AM, Aljoscha Krettek wrote: > >>>>>>>>>> Hi, > >>>>>>>>>> > >>>>>>>>>> I actually don’t know whether that change would be ok. > >>>>>>>> FlinkUserCodeClassLoader has taken > >>>>>>>> FlinkUserCodeClassLoader.class.getClassLoader() as the parent > >>>>>> ClassLoader > >>>>>>>> before my change. See: > >>>>>>>> > >>> > https://github.com/apache/flink/blob/release-1.2/flink-runtime/src/main/java/org/apache/flink/runtime/execution/librarycache/FlinkUserCodeClassLoader.java > >>> > >>>>>>>> < > >>>>>>>> > >>> > https://github.com/apache/flink/blob/release-1.2/flink-runtime/src/main/java/org/apache/flink/runtime/execution/librarycache/FlinkUserCodeClassLoader.java > >>> > >>>>>>>>> . > >>>>>>>>>> I have the feeling that this might be on purpose because we > >>>>>>>>>> want to > >>>>>>>> have the ClassLoader of the Flink Framework components be the > >>>>>>>> parent > >>>>>>>> ClassLoader, but I could be wrong. Maybe Stephan would be most > >>>>>> appropriate > >>>>>>>> for answering this. > >>>>>>>>>> Best, > >>>>>>>>>> Aljoscha > >>>>>>>>>> > >>>>>>>>>>> On 30. Aug 2019, at 16:28, Till Rohrmann <[hidden email] > > > >>>>>> wrote: > >>>>>>>>>>> Hi Jan, > >>>>>>>>>>> > >>>>>>>>>>> this looks to me like a bug for which you could create a JIRA > >>>>>>>>>>> and PR > >>>>>>>> to fix it. Just to make sure, I've pulled in Aljoscha who is the > >>>>>>>> author > >>>>>> of > >>>>>>>> this change to check with him whether we are forgetting something. > >>>>>>>>>>> Cheers, > >>>>>>>>>>> Till > >>>>>>>>>>> > >>>>>>>>>>> On Fri, Aug 30, 2019 at 3:44 PM Jan Lukavský <[hidden email] > >>>>>> <mailto: > >>>>>>>> [hidden email]>> wrote: > >>>>>>>>>>> Hi, > >>>>>>>>>>> > >>>>>>>>>>> I have come across an issue with classloading in Flink's > >>>>>>>>>>> MiniCluster. > >>>>>>>>>>> The issue is that when I run local flink job from a thread, > >>>>>>>>>>> that has > >>>>>> a > >>>>>>>>>>> non-default context classloader (for whatever reason), this > >>>>>> classloader > >>>>>>>>>>> is not taken into account when classloading user defined > >>>>>>>>>>> functions. > >>>>>>>> This > >>>>>>>>>>> is due to [1]. Is this behavior intentional, or can I file a > >>>>>>>>>>> JIRA and > >>>>>>>>>>> use Thread.currentThread.getContextClassLoader() there? I have > >>>>>>>> validated > >>>>>>>>>>> that it fixes issues I'm facing. > >>>>>>>>>>> > >>>>>>>>>>> Jan > >>>>>>>>>>> > >>>>>>>>>>> [1] > >>>>>>>>>>> > >>> > https://github.com/apache/flink/blob/ce557839d762b5f1ec92aa1885fd3d2ae33d0d0b/flink-runtime/src/main/java/org/apache/flink/runtime/execution/librarycache/BlobLibraryCacheManager.java#L280 > >>> > >>>>>>>> < > >>>>>>>> > >>> > https://github.com/apache/flink/blob/ce557839d762b5f1ec92aa1885fd3d2ae33d0d0b/flink-runtime/src/main/java/org/apache/flink/runtime/execution/librarycache/BlobLibraryCacheManager.java#L280 > >>> > |
Hi Jan!
You can also add an additional class path URL to take classes from. Does that help? Is there a directly where the generated class files go, so that a local file URL could reference them and add them to the user code class loader? Best, Stephan On Tue, Sep 10, 2019 at 10:28 AM Till Rohrmann <[hidden email]> wrote: > Hi Jan, > > sorry for my late response. I think the main concern about using the > context class loader is the unpredictability and magic it adds to a > component where you actually don't wanna be surprised. Moreover, if we now > add support for the context class loader, then at a later time another > component might use this feature as well. This component would then have to > make sure that the new class loader it sets has the current context class > loader as its parent because otherwise it might break your use case. > > But on the other hand, I admit that I don't have a good solution to your > problem at hand other than using the RemoveExecutionEnvironment. > > One comment concerning your proposed class loader resolution. I think it > adds a bit too much magic which is hard to understand for the user. It > would be better if the system would fail with a descriptive error message > instead. > > Cheers, > Till > > On Thu, Sep 5, 2019 at 12:55 PM Jan Lukavský <[hidden email]> wrote: > > > Hi Till and Aljoscha, > > > > I was investigating the other options, but unfortunately all of them > > look a little complicated (although possible, of course). But before > > going into a more complicated solutions, I'd like to know what issues do > > you actually see with using the context class loader. I can think of one > > difficulty - if (for whatever reason), the context class loader doesn't > > contain (in itself or as a parent) class loader that loaded flink core > > classes, that would probably cause troubles. So, what about a solution > > that we take as parent class loader of FlinkUserCodeClassLoaders a class > > loader that is: > > > > a) context class loader of current thread, if it either is actually > > class loader of flink core classes, or if it contains this class loader > > in its parent hierarchy, or > > > > b) class loader of flink core classes > > > > That way, class loader of flink core classes would always be in parent > > hierarchy of FlinkUserCodeClassLoaders. Would that solve the issues you > > see? It works for me. > > > > Jan > > > > On 9/3/19 4:52 PM, Jan Lukavský wrote: > > > Answers inline. > > > > > > On 9/3/19 4:01 PM, Till Rohrmann wrote: > > >> How so? Does your REPL add the generated classes to the system class > > >> loader? I assume the system class loader is used to load the Flink > > >> classes. > > > No, it does not. It cannot on JDK >= 9 (or would have to hack into > > > jdk.internal.loader.ClassLoaders, which I don't want to :)). It just > > > creates another class loader, and is able to create a jar from > > > generated files. The jar is used for remote execution. > > >> > > >> Ideally, what you would like to have is the option to provide the > parent > > >> class loader which is used load user code to the LocalEnvironment. > > >> This one > > >> could then be forwarded to the TaskExecutor where it is used to > generate > > >> the user code class loader. But this is a bigger effort. > > > I'm not sure how this differs from using context classloader? Maybe > > > there is subtle difference in that this is a little more explicit. On > > > the other hand, users normally do not modify class loaders, so the > > > practical impact is IMHO negligible. But maybe this opens another > > > possibility - we probably could add optional ClassLoader parameter to > > > LocalEnvironment, with default value of > > > FlinkRunner.class.getClassLoader()? That might be a good compromise. > > >> > > >> The downside to this approach is that it requires you to create a jar > > >> file > > >> and to submit it via a REST call. The upside is that it is closer to > the > > >> production setting. > > > > > > Yes, a REPL has to do that anyway to support distributed computing, so > > > this is not an issue. > > > > > > Jan > > > > > >> > > >> Cheers, > > >> Till > > >> > > >> On Tue, Sep 3, 2019 at 3:47 PM Jan Lukavský <[hidden email]> wrote: > > >> > > >>> On the other hand, if you say, that the contract of LocalEnvironment > is > > >>> to execute as if it had all classes on its class loader, then it > > >>> currently breaks this contract. :-) > > >>> > > >>> Jan > > >>> > > >>> On 9/3/19 3:45 PM, Jan Lukavský wrote: > > >>>> Hi Till, > > >>>> > > >>>> hmm, that sounds it might work. I would have to incorporate this > > >>>> (either as default, or on demand) into Apache Beam. Would you see > any > > >>>> disadvantages of this approach? Would you suggest to make this > default > > >>>> behavior for local beam FlinkRunner? I can introduce a configuration > > >>>> option to turn this behavior on, but that would bring additional > > >>>> maintenance burden, etc., etc. > > >>>> > > >>>> Jan > > >>>> > > >>>> On 9/3/19 3:38 PM, Till Rohrmann wrote: > > >>>>> I see the problem Jan. What about the following proposal: Instead > of > > >>>>> using > > >>>>> the LocalEnvironment for local tests you always use the > > >>>>> RemoteEnvironment > > >>>>> but when testing it locally you spin up a MiniCluster in the same > > >>>>> process > > >>>>> and initialize the RemoteEnvironment with > > >>>>> `MiniCluster#getRestAddress`. > > >>>>> That way you would always submit a jar with the generated classes > > >>>>> and, > > >>>>> hence, not having to set the context class loader. > > >>>>> > > >>>>> The contract of the LocalEnvironment is indeed that all classes it > is > > >>>>> supposed t execute must be present when being started. > > >>>>> > > >>>>> Cheers, > > >>>>> Till > > >>>>> > > >>>>> On Tue, Sep 3, 2019 at 2:27 PM [hidden email] < > > >>>>> [hidden email]> wrote: > > >>>>> > > >>>>>> > > >>>>>> > > >>>>>> [hidden email] > > >>>>>> > > >>>>>> From: [hidden email] > > >>>>>> Date: 2019-09-03 20:25 > > >>>>>> To: dev > > >>>>>> Subject: Re: Re: ClassLoader created by BlobLibraryCacheManager > > >>>>>> is not > > >>>>>> using context classloader > > >>>>>> > > >>>>>> > > >>>>>> > > >>>>>> > > >>>>>> [hidden email] > > >>>>>> From: [hidden email] > > >>>>>> Date: 2019-09-03 20:23 > > >>>>>> To: dev > > >>>>>> Subject: Re: Re: ClassLoader created by BlobLibraryCacheManager > > >>>>>> is not > > >>>>>> using context classloader > > >>>>>> [hidden email] > > >>>>>> From: Jan Lukavský > > >>>>>> Date: 2019-09-03 20:17 > > >>>>>> To: dev > > >>>>>> Subject: Re: ClassLoader created by BlobLibraryCacheManager is not > > >>>>>> using > > >>>>>> context classloader > > >>>>>> Hi Till, > > >>>>>> the use-case is pretty much simple - I have a REPL shell in > groovy, > > >>>>>> which generates classes at runtime. The actual hierarchy is > > >>>>>> therefore > > >>>>>> system class loader -> application classloader -> repl classloader > > >>>>>> (GroovyClassLoader actually) > > >>>>>> now, when a terminal (sink) operation in the shell occurs, I'm > > >>>>>> able to > > >>>>>> build a jar, which I can submit to remote cluster (in distributed > > >>>>>> case). > > >>>>>> But - during testing - I run the code using local flink. There > > >>>>>> is no > > >>>>>> (legal) way of adding this new runtime generated jar to local > > >>>>>> flink. As > > >>>>>> I said, I have a hackish solution which works on JDK <= 8, > > >>>>>> because it > > >>>>>> uses reflection to call addURL on the application classloader (and > > >>>>>> thefore "pretends", that the generated jar was there all the time > > >>>>>> from > > >>>>>> the JVM startup). This breaks on JDK >= 9. It might be possible > > >>>>>> to work > > >>>>>> around this somehow, but I think that the reason why > > >>>>>> LocalEnvironment is > > >>>>>> not having a way to add jars (as in case of RemoteEnvironment) is > > >>>>>> that > > >>>>>> is assumes, that you actually have all of the on classpath when > > >>>>>> using > > >>>>>> local runner. I think that this implies that it either has to use > > >>>>>> context classloader (to be able to work on top of any > > >>>>>> classloading user > > >>>>>> might have), or is wrong and would need be fixed, so that > > >>>>>> LocalEnvironment would accept files to "stage" - which would mean > > >>>>>> adding > > >>>>>> them to a class loader (probably URLClassLoader with the > application > > >>>>>> class loader as parent). > > >>>>>> Or, would you see any other option? > > >>>>>> Jan > > >>>>>> On 9/3/19 2:00 PM, Till Rohrmann wrote: > > >>>>>>> Hi Jan, > > >>>>>>> > > >>>>>>> I've talked with Aljoscha and Stephan offline and we concluded > > >>>>>>> that we > > >>>>>>> would like to avoid the usage of context class loaders if > possible. > > >>>>>>> The > > >>>>>>> reason for this is that using the context class loader can easily > > >>>>>>> mess up > > >>>>>>> an otherwise clear class loader hierarchy which makes it hard to > > >>>>>>> reason > > >>>>>>> about and to debug class loader issues. > > >>>>>>> > > >>>>>>> Given this, I think it would help to better understand the exact > > >>>>>>> problem > > >>>>>>> you are trying to solve by using the context class loader. > > >>>>>>> Usually the > > >>>>>>> usage of the context class loader points towards an API > deficiency > > >>>>>>> which > > >>>>>> we > > >>>>>>> might be able to address differently. > > >>>>>>> > > >>>>>>> Cheers, > > >>>>>>> Till > > >>>>>>> > > >>>>>>> On Mon, Sep 2, 2019 at 11:32 AM Aljoscha Krettek > > >>>>>>> <[hidden email] > > >>>>>>> wrote: > > >>>>>>> > > >>>>>>>> I’m not saying we can’t change that code to use the context > class > > >>>>>> loader. > > >>>>>>>> I’m just not sure whether this might break other things. > > >>>>>>>> > > >>>>>>>> Best, > > >>>>>>>> Aljoscha > > >>>>>>>> > > >>>>>>>>> On 2. Sep 2019, at 11:24, Jan Lukavský <[hidden email]> > wrote: > > >>>>>>>>> > > >>>>>>>>> Essentially, the class loader of Flink should be present in > > >>>>>>>>> parent > > >>>>>>>> hierarchy of context class loader. If FlinkUserCodeClassLoader > > >>>>>>>> doesn't > > >>>>>> use > > >>>>>>>> context class loader, then it is actually impossible to use a > > >>>>>>>> hierarchy > > >>>>>>>> like this: > > >>>>>>>>> system class loader -> application class loader -> > > >>>>>>>>> user-defined class > > >>>>>>>> loader (defines some UDFs to be used in flink program) > > >>>>>>>>> Flink now uses the application class loader, and so the UDFs > > >>>>>>>>> fail to > > >>>>>>>> deserialize on local flink, because the user-defined class > > >>>>>>>> loader is > > >>>>>>>> bypassed. Moreover, there is no way to add additional classpath > > >>>>>>>> elements > > >>>>>>>> for LocalEnvironment (as opposed to RemoteEnvironment). I'm able > > >>>>>>>> to hack > > >>>>>>>> this by calling addURL method on the application class loader > > >>>>>>>> (which is > > >>>>>>>> terribly hackish), but that works only on JDK <= 8. No sensible > > >>>>>> workaround > > >>>>>>>> is available for JDK >= 9. > > >>>>>>>>> Alternative solution would be to enable adding jars to class > > >>>>>>>>> loader > > >>>>>> when > > >>>>>>>> using LocalEnvironment, but that looks a little odd. > > >>>>>>>>> Jan > > >>>>>>>>> > > >>>>>>>>> On 9/2/19 11:02 AM, Aljoscha Krettek wrote: > > >>>>>>>>>> Hi, > > >>>>>>>>>> > > >>>>>>>>>> I actually don’t know whether that change would be ok. > > >>>>>>>> FlinkUserCodeClassLoader has taken > > >>>>>>>> FlinkUserCodeClassLoader.class.getClassLoader() as the parent > > >>>>>> ClassLoader > > >>>>>>>> before my change. See: > > >>>>>>>> > > >>> > > > https://github.com/apache/flink/blob/release-1.2/flink-runtime/src/main/java/org/apache/flink/runtime/execution/librarycache/FlinkUserCodeClassLoader.java > > >>> > > >>>>>>>> < > > >>>>>>>> > > >>> > > > https://github.com/apache/flink/blob/release-1.2/flink-runtime/src/main/java/org/apache/flink/runtime/execution/librarycache/FlinkUserCodeClassLoader.java > > >>> > > >>>>>>>>> . > > >>>>>>>>>> I have the feeling that this might be on purpose because we > > >>>>>>>>>> want to > > >>>>>>>> have the ClassLoader of the Flink Framework components be the > > >>>>>>>> parent > > >>>>>>>> ClassLoader, but I could be wrong. Maybe Stephan would be most > > >>>>>> appropriate > > >>>>>>>> for answering this. > > >>>>>>>>>> Best, > > >>>>>>>>>> Aljoscha > > >>>>>>>>>> > > >>>>>>>>>>> On 30. Aug 2019, at 16:28, Till Rohrmann < > [hidden email] > > > > > >>>>>> wrote: > > >>>>>>>>>>> Hi Jan, > > >>>>>>>>>>> > > >>>>>>>>>>> this looks to me like a bug for which you could create a JIRA > > >>>>>>>>>>> and PR > > >>>>>>>> to fix it. Just to make sure, I've pulled in Aljoscha who is the > > >>>>>>>> author > > >>>>>> of > > >>>>>>>> this change to check with him whether we are forgetting > something. > > >>>>>>>>>>> Cheers, > > >>>>>>>>>>> Till > > >>>>>>>>>>> > > >>>>>>>>>>> On Fri, Aug 30, 2019 at 3:44 PM Jan Lukavský < > [hidden email] > > >>>>>> <mailto: > > >>>>>>>> [hidden email]>> wrote: > > >>>>>>>>>>> Hi, > > >>>>>>>>>>> > > >>>>>>>>>>> I have come across an issue with classloading in Flink's > > >>>>>>>>>>> MiniCluster. > > >>>>>>>>>>> The issue is that when I run local flink job from a thread, > > >>>>>>>>>>> that has > > >>>>>> a > > >>>>>>>>>>> non-default context classloader (for whatever reason), this > > >>>>>> classloader > > >>>>>>>>>>> is not taken into account when classloading user defined > > >>>>>>>>>>> functions. > > >>>>>>>> This > > >>>>>>>>>>> is due to [1]. Is this behavior intentional, or can I file a > > >>>>>>>>>>> JIRA and > > >>>>>>>>>>> use Thread.currentThread.getContextClassLoader() there? I > have > > >>>>>>>> validated > > >>>>>>>>>>> that it fixes issues I'm facing. > > >>>>>>>>>>> > > >>>>>>>>>>> Jan > > >>>>>>>>>>> > > >>>>>>>>>>> [1] > > >>>>>>>>>>> > > >>> > > > https://github.com/apache/flink/blob/ce557839d762b5f1ec92aa1885fd3d2ae33d0d0b/flink-runtime/src/main/java/org/apache/flink/runtime/execution/librarycache/BlobLibraryCacheManager.java#L280 > > >>> > > >>>>>>>> < > > >>>>>>>> > > >>> > > > https://github.com/apache/flink/blob/ce557839d762b5f1ec92aa1885fd3d2ae33d0d0b/flink-runtime/src/main/java/org/apache/flink/runtime/execution/librarycache/BlobLibraryCacheManager.java#L280 > > >>> > > > |
Hi Stephen,
that sounds like a kind of option. I'm not sure if this would work on JDK11, though. JDK11 where I found this issue in first place, because on JDK < 9 I can inject the generated file into application class loader (which is URLClassLoader). This class loader changed in JDK9 and this is not (straightforwardly) possible anymore. I think that the way to go would be to use the RemoteEnvironment with manually spawn MiniCluster. But thanks for suggestion! Jan On 9/19/19 12:36 PM, Stephan Ewen wrote: > Hi Jan! > > You can also add an additional class path URL to take classes from. Does > that help? > Is there a directly where the generated class files go, so that a > local file URL could reference them and add them to the user code class > loader? > > Best, > Stephan > > > On Tue, Sep 10, 2019 at 10:28 AM Till Rohrmann <[hidden email]> wrote: > >> Hi Jan, >> >> sorry for my late response. I think the main concern about using the >> context class loader is the unpredictability and magic it adds to a >> component where you actually don't wanna be surprised. Moreover, if we now >> add support for the context class loader, then at a later time another >> component might use this feature as well. This component would then have to >> make sure that the new class loader it sets has the current context class >> loader as its parent because otherwise it might break your use case. >> >> But on the other hand, I admit that I don't have a good solution to your >> problem at hand other than using the RemoveExecutionEnvironment. >> >> One comment concerning your proposed class loader resolution. I think it >> adds a bit too much magic which is hard to understand for the user. It >> would be better if the system would fail with a descriptive error message >> instead. >> >> Cheers, >> Till >> >> On Thu, Sep 5, 2019 at 12:55 PM Jan Lukavský <[hidden email]> wrote: >> >>> Hi Till and Aljoscha, >>> >>> I was investigating the other options, but unfortunately all of them >>> look a little complicated (although possible, of course). But before >>> going into a more complicated solutions, I'd like to know what issues do >>> you actually see with using the context class loader. I can think of one >>> difficulty - if (for whatever reason), the context class loader doesn't >>> contain (in itself or as a parent) class loader that loaded flink core >>> classes, that would probably cause troubles. So, what about a solution >>> that we take as parent class loader of FlinkUserCodeClassLoaders a class >>> loader that is: >>> >>> a) context class loader of current thread, if it either is actually >>> class loader of flink core classes, or if it contains this class loader >>> in its parent hierarchy, or >>> >>> b) class loader of flink core classes >>> >>> That way, class loader of flink core classes would always be in parent >>> hierarchy of FlinkUserCodeClassLoaders. Would that solve the issues you >>> see? It works for me. >>> >>> Jan >>> >>> On 9/3/19 4:52 PM, Jan Lukavský wrote: >>>> Answers inline. >>>> >>>> On 9/3/19 4:01 PM, Till Rohrmann wrote: >>>>> How so? Does your REPL add the generated classes to the system class >>>>> loader? I assume the system class loader is used to load the Flink >>>>> classes. >>>> No, it does not. It cannot on JDK >= 9 (or would have to hack into >>>> jdk.internal.loader.ClassLoaders, which I don't want to :)). It just >>>> creates another class loader, and is able to create a jar from >>>> generated files. The jar is used for remote execution. >>>>> Ideally, what you would like to have is the option to provide the >> parent >>>>> class loader which is used load user code to the LocalEnvironment. >>>>> This one >>>>> could then be forwarded to the TaskExecutor where it is used to >> generate >>>>> the user code class loader. But this is a bigger effort. >>>> I'm not sure how this differs from using context classloader? Maybe >>>> there is subtle difference in that this is a little more explicit. On >>>> the other hand, users normally do not modify class loaders, so the >>>> practical impact is IMHO negligible. But maybe this opens another >>>> possibility - we probably could add optional ClassLoader parameter to >>>> LocalEnvironment, with default value of >>>> FlinkRunner.class.getClassLoader()? That might be a good compromise. >>>>> The downside to this approach is that it requires you to create a jar >>>>> file >>>>> and to submit it via a REST call. The upside is that it is closer to >> the >>>>> production setting. >>>> Yes, a REPL has to do that anyway to support distributed computing, so >>>> this is not an issue. >>>> >>>> Jan >>>> >>>>> Cheers, >>>>> Till >>>>> >>>>> On Tue, Sep 3, 2019 at 3:47 PM Jan Lukavský <[hidden email]> wrote: >>>>> >>>>>> On the other hand, if you say, that the contract of LocalEnvironment >> is >>>>>> to execute as if it had all classes on its class loader, then it >>>>>> currently breaks this contract. :-) >>>>>> >>>>>> Jan >>>>>> >>>>>> On 9/3/19 3:45 PM, Jan Lukavský wrote: >>>>>>> Hi Till, >>>>>>> >>>>>>> hmm, that sounds it might work. I would have to incorporate this >>>>>>> (either as default, or on demand) into Apache Beam. Would you see >> any >>>>>>> disadvantages of this approach? Would you suggest to make this >> default >>>>>>> behavior for local beam FlinkRunner? I can introduce a configuration >>>>>>> option to turn this behavior on, but that would bring additional >>>>>>> maintenance burden, etc., etc. >>>>>>> >>>>>>> Jan >>>>>>> >>>>>>> On 9/3/19 3:38 PM, Till Rohrmann wrote: >>>>>>>> I see the problem Jan. What about the following proposal: Instead >> of >>>>>>>> using >>>>>>>> the LocalEnvironment for local tests you always use the >>>>>>>> RemoteEnvironment >>>>>>>> but when testing it locally you spin up a MiniCluster in the same >>>>>>>> process >>>>>>>> and initialize the RemoteEnvironment with >>>>>>>> `MiniCluster#getRestAddress`. >>>>>>>> That way you would always submit a jar with the generated classes >>>>>>>> and, >>>>>>>> hence, not having to set the context class loader. >>>>>>>> >>>>>>>> The contract of the LocalEnvironment is indeed that all classes it >> is >>>>>>>> supposed t execute must be present when being started. >>>>>>>> >>>>>>>> Cheers, >>>>>>>> Till >>>>>>>> >>>>>>>> On Tue, Sep 3, 2019 at 2:27 PM [hidden email] < >>>>>>>> [hidden email]> wrote: >>>>>>>> >>>>>>>>> >>>>>>>>> [hidden email] >>>>>>>>> >>>>>>>>> From: [hidden email] >>>>>>>>> Date: 2019-09-03 20:25 >>>>>>>>> To: dev >>>>>>>>> Subject: Re: Re: ClassLoader created by BlobLibraryCacheManager >>>>>>>>> is not >>>>>>>>> using context classloader >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> [hidden email] >>>>>>>>> From: [hidden email] >>>>>>>>> Date: 2019-09-03 20:23 >>>>>>>>> To: dev >>>>>>>>> Subject: Re: Re: ClassLoader created by BlobLibraryCacheManager >>>>>>>>> is not >>>>>>>>> using context classloader >>>>>>>>> [hidden email] >>>>>>>>> From: Jan Lukavský >>>>>>>>> Date: 2019-09-03 20:17 >>>>>>>>> To: dev >>>>>>>>> Subject: Re: ClassLoader created by BlobLibraryCacheManager is not >>>>>>>>> using >>>>>>>>> context classloader >>>>>>>>> Hi Till, >>>>>>>>> the use-case is pretty much simple - I have a REPL shell in >> groovy, >>>>>>>>> which generates classes at runtime. The actual hierarchy is >>>>>>>>> therefore >>>>>>>>> system class loader -> application classloader -> repl classloader >>>>>>>>> (GroovyClassLoader actually) >>>>>>>>> now, when a terminal (sink) operation in the shell occurs, I'm >>>>>>>>> able to >>>>>>>>> build a jar, which I can submit to remote cluster (in distributed >>>>>>>>> case). >>>>>>>>> But - during testing - I run the code using local flink. There >>>>>>>>> is no >>>>>>>>> (legal) way of adding this new runtime generated jar to local >>>>>>>>> flink. As >>>>>>>>> I said, I have a hackish solution which works on JDK <= 8, >>>>>>>>> because it >>>>>>>>> uses reflection to call addURL on the application classloader (and >>>>>>>>> thefore "pretends", that the generated jar was there all the time >>>>>>>>> from >>>>>>>>> the JVM startup). This breaks on JDK >= 9. It might be possible >>>>>>>>> to work >>>>>>>>> around this somehow, but I think that the reason why >>>>>>>>> LocalEnvironment is >>>>>>>>> not having a way to add jars (as in case of RemoteEnvironment) is >>>>>>>>> that >>>>>>>>> is assumes, that you actually have all of the on classpath when >>>>>>>>> using >>>>>>>>> local runner. I think that this implies that it either has to use >>>>>>>>> context classloader (to be able to work on top of any >>>>>>>>> classloading user >>>>>>>>> might have), or is wrong and would need be fixed, so that >>>>>>>>> LocalEnvironment would accept files to "stage" - which would mean >>>>>>>>> adding >>>>>>>>> them to a class loader (probably URLClassLoader with the >> application >>>>>>>>> class loader as parent). >>>>>>>>> Or, would you see any other option? >>>>>>>>> Jan >>>>>>>>> On 9/3/19 2:00 PM, Till Rohrmann wrote: >>>>>>>>>> Hi Jan, >>>>>>>>>> >>>>>>>>>> I've talked with Aljoscha and Stephan offline and we concluded >>>>>>>>>> that we >>>>>>>>>> would like to avoid the usage of context class loaders if >> possible. >>>>>>>>>> The >>>>>>>>>> reason for this is that using the context class loader can easily >>>>>>>>>> mess up >>>>>>>>>> an otherwise clear class loader hierarchy which makes it hard to >>>>>>>>>> reason >>>>>>>>>> about and to debug class loader issues. >>>>>>>>>> >>>>>>>>>> Given this, I think it would help to better understand the exact >>>>>>>>>> problem >>>>>>>>>> you are trying to solve by using the context class loader. >>>>>>>>>> Usually the >>>>>>>>>> usage of the context class loader points towards an API >> deficiency >>>>>>>>>> which >>>>>>>>> we >>>>>>>>>> might be able to address differently. >>>>>>>>>> >>>>>>>>>> Cheers, >>>>>>>>>> Till >>>>>>>>>> >>>>>>>>>> On Mon, Sep 2, 2019 at 11:32 AM Aljoscha Krettek >>>>>>>>>> <[hidden email] >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> I’m not saying we can’t change that code to use the context >> class >>>>>>>>> loader. >>>>>>>>>>> I’m just not sure whether this might break other things. >>>>>>>>>>> >>>>>>>>>>> Best, >>>>>>>>>>> Aljoscha >>>>>>>>>>> >>>>>>>>>>>> On 2. Sep 2019, at 11:24, Jan Lukavský <[hidden email]> >> wrote: >>>>>>>>>>>> Essentially, the class loader of Flink should be present in >>>>>>>>>>>> parent >>>>>>>>>>> hierarchy of context class loader. If FlinkUserCodeClassLoader >>>>>>>>>>> doesn't >>>>>>>>> use >>>>>>>>>>> context class loader, then it is actually impossible to use a >>>>>>>>>>> hierarchy >>>>>>>>>>> like this: >>>>>>>>>>>> system class loader -> application class loader -> >>>>>>>>>>>> user-defined class >>>>>>>>>>> loader (defines some UDFs to be used in flink program) >>>>>>>>>>>> Flink now uses the application class loader, and so the UDFs >>>>>>>>>>>> fail to >>>>>>>>>>> deserialize on local flink, because the user-defined class >>>>>>>>>>> loader is >>>>>>>>>>> bypassed. Moreover, there is no way to add additional classpath >>>>>>>>>>> elements >>>>>>>>>>> for LocalEnvironment (as opposed to RemoteEnvironment). I'm able >>>>>>>>>>> to hack >>>>>>>>>>> this by calling addURL method on the application class loader >>>>>>>>>>> (which is >>>>>>>>>>> terribly hackish), but that works only on JDK <= 8. No sensible >>>>>>>>> workaround >>>>>>>>>>> is available for JDK >= 9. >>>>>>>>>>>> Alternative solution would be to enable adding jars to class >>>>>>>>>>>> loader >>>>>>>>> when >>>>>>>>>>> using LocalEnvironment, but that looks a little odd. >>>>>>>>>>>> Jan >>>>>>>>>>>> >>>>>>>>>>>> On 9/2/19 11:02 AM, Aljoscha Krettek wrote: >>>>>>>>>>>>> Hi, >>>>>>>>>>>>> >>>>>>>>>>>>> I actually don’t know whether that change would be ok. >>>>>>>>>>> FlinkUserCodeClassLoader has taken >>>>>>>>>>> FlinkUserCodeClassLoader.class.getClassLoader() as the parent >>>>>>>>> ClassLoader >>>>>>>>>>> before my change. See: >>>>>>>>>>> >> https://github.com/apache/flink/blob/release-1.2/flink-runtime/src/main/java/org/apache/flink/runtime/execution/librarycache/FlinkUserCodeClassLoader.java >>>>>>>>>>> < >>>>>>>>>>> >> https://github.com/apache/flink/blob/release-1.2/flink-runtime/src/main/java/org/apache/flink/runtime/execution/librarycache/FlinkUserCodeClassLoader.java >>>>>>>>>>>> . >>>>>>>>>>>>> I have the feeling that this might be on purpose because we >>>>>>>>>>>>> want to >>>>>>>>>>> have the ClassLoader of the Flink Framework components be the >>>>>>>>>>> parent >>>>>>>>>>> ClassLoader, but I could be wrong. Maybe Stephan would be most >>>>>>>>> appropriate >>>>>>>>>>> for answering this. >>>>>>>>>>>>> Best, >>>>>>>>>>>>> Aljoscha >>>>>>>>>>>>> >>>>>>>>>>>>>> On 30. Aug 2019, at 16:28, Till Rohrmann < >> [hidden email] >>>>>>>>> wrote: >>>>>>>>>>>>>> Hi Jan, >>>>>>>>>>>>>> >>>>>>>>>>>>>> this looks to me like a bug for which you could create a JIRA >>>>>>>>>>>>>> and PR >>>>>>>>>>> to fix it. Just to make sure, I've pulled in Aljoscha who is the >>>>>>>>>>> author >>>>>>>>> of >>>>>>>>>>> this change to check with him whether we are forgetting >> something. >>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>> Till >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, Aug 30, 2019 at 3:44 PM Jan Lukavský < >> [hidden email] >>>>>>>>> <mailto: >>>>>>>>>>> [hidden email]>> wrote: >>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>> >>>>>>>>>>>>>> I have come across an issue with classloading in Flink's >>>>>>>>>>>>>> MiniCluster. >>>>>>>>>>>>>> The issue is that when I run local flink job from a thread, >>>>>>>>>>>>>> that has >>>>>>>>> a >>>>>>>>>>>>>> non-default context classloader (for whatever reason), this >>>>>>>>> classloader >>>>>>>>>>>>>> is not taken into account when classloading user defined >>>>>>>>>>>>>> functions. >>>>>>>>>>> This >>>>>>>>>>>>>> is due to [1]. Is this behavior intentional, or can I file a >>>>>>>>>>>>>> JIRA and >>>>>>>>>>>>>> use Thread.currentThread.getContextClassLoader() there? I >> have >>>>>>>>>>> validated >>>>>>>>>>>>>> that it fixes issues I'm facing. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Jan >>>>>>>>>>>>>> >>>>>>>>>>>>>> [1] >>>>>>>>>>>>>> >> https://github.com/apache/flink/blob/ce557839d762b5f1ec92aa1885fd3d2ae33d0d0b/flink-runtime/src/main/java/org/apache/flink/runtime/execution/librarycache/BlobLibraryCacheManager.java#L280 >>>>>>>>>>> < >>>>>>>>>>> >> https://github.com/apache/flink/blob/ce557839d762b5f1ec92aa1885fd3d2ae33d0d0b/flink-runtime/src/main/java/org/apache/flink/runtime/execution/librarycache/BlobLibraryCacheManager.java#L280 |
Free forum by Nabble | Edit this page |