Fine grained batch recovery vs. native libraries

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Fine grained batch recovery vs. native libraries

David Morávek
Hi,

we're testing the newly released batch recovery and are running into class
loading related issues.

1) We have a per-job flink cluster
2) We use BATCH execution mode + region failover strategy

Point 1) should imply single user code class loader per task manager
(because there is only single pipeline, that reuses class loader cached in
BlobLibraryCacheManager). We need this property, because we have UDFs that
access C libraries using JNI (I think this may be fairly common use-case
when dealing with legacy code). JDK internals
<https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/ClassLoader.java#L2466>
make sure that single library can be only loaded by a single class loader
per JVM.

When region recovery is triggered, vertices that need recover are first
reset back to CREATED stated and then rescheduled. In case all tasks in a
task manager are reset, this results in cached class loader being released
<https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/execution/librarycache/BlobLibraryCacheManager.java#L338>.
This unfortunately causes job failure, because we try to reload a native
library in a newly created class loader.

I know that there is always possibility to distribute native libraries with
flink's libs and load it using system class loader, but this introduces a
build & operations overhead and just make it really unfriendly for cluster
user, so I'd rather not work around the issue this way (per-job cluster
should be more user friendly).

I believe the correct approach would be not to release cached class loader
if the job is recovering, even though there are no tasks currently
registered with TM.

What do you think? Thanks for help.

D.
Reply | Threaded
Open this post in threaded view
|

Re: Fine grained batch recovery vs. native libraries

Chesnay Schepler-3
This sounds like a serious bug, please open a JIRA ticket.

On 04/09/2019 13:41, David Morávek wrote:

> Hi,
>
> we're testing the newly released batch recovery and are running into class
> loading related issues.
>
> 1) We have a per-job flink cluster
> 2) We use BATCH execution mode + region failover strategy
>
> Point 1) should imply single user code class loader per task manager
> (because there is only single pipeline, that reuses class loader cached in
> BlobLibraryCacheManager). We need this property, because we have UDFs that
> access C libraries using JNI (I think this may be fairly common use-case
> when dealing with legacy code). JDK internals
> <https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/ClassLoader.java#L2466>
> make sure that single library can be only loaded by a single class loader
> per JVM.
>
> When region recovery is triggered, vertices that need recover are first
> reset back to CREATED stated and then rescheduled. In case all tasks in a
> task manager are reset, this results in cached class loader being released
> <https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/execution/librarycache/BlobLibraryCacheManager.java#L338>.
> This unfortunately causes job failure, because we try to reload a native
> library in a newly created class loader.
>
> I know that there is always possibility to distribute native libraries with
> flink's libs and load it using system class loader, but this introduces a
> build & operations overhead and just make it really unfriendly for cluster
> user, so I'd rather not work around the issue this way (per-job cluster
> should be more user friendly).
>
> I believe the correct approach would be not to release cached class loader
> if the job is recovering, even though there are no tasks currently
> registered with TM.
>
> What do you think? Thanks for help.
>
> D.
>

Reply | Threaded
Open this post in threaded view
|

Re: Fine grained batch recovery vs. native libraries

David Morávek
Hi Chesnay, I've created FLINK-13958
<https://issues.apache.org/jira/browse/FLINK-13958> to track the issue.

Thanks,
D.

On Wed, Sep 4, 2019 at 1:56 PM Chesnay Schepler <[hidden email]> wrote:

> This sounds like a serious bug, please open a JIRA ticket.
>
> On 04/09/2019 13:41, David Morávek wrote:
> > Hi,
> >
> > we're testing the newly released batch recovery and are running into
> class
> > loading related issues.
> >
> > 1) We have a per-job flink cluster
> > 2) We use BATCH execution mode + region failover strategy
> >
> > Point 1) should imply single user code class loader per task manager
> > (because there is only single pipeline, that reuses class loader cached
> in
> > BlobLibraryCacheManager). We need this property, because we have UDFs
> that
> > access C libraries using JNI (I think this may be fairly common use-case
> > when dealing with legacy code). JDK internals
> > <
> https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/ClassLoader.java#L2466
> >
> > make sure that single library can be only loaded by a single class loader
> > per JVM.
> >
> > When region recovery is triggered, vertices that need recover are first
> > reset back to CREATED stated and then rescheduled. In case all tasks in a
> > task manager are reset, this results in cached class loader being
> released
> > <
> https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/execution/librarycache/BlobLibraryCacheManager.java#L338
> >.
> > This unfortunately causes job failure, because we try to reload a native
> > library in a newly created class loader.
> >
> > I know that there is always possibility to distribute native libraries
> with
> > flink's libs and load it using system class loader, but this introduces a
> > build & operations overhead and just make it really unfriendly for
> cluster
> > user, so I'd rather not work around the issue this way (per-job cluster
> > should be more user friendly).
> >
> > I believe the correct approach would be not to release cached class
> loader
> > if the job is recovering, even though there are no tasks currently
> > registered with TM.
> >
> > What do you think? Thanks for help.
> >
> > D.
> >
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Fine grained batch recovery vs. native libraries

Fabian Hueske-2
Thanks for reporting the problem David!

Cheers,
Fabian

Am Mi., 4. Sept. 2019 um 14:09 Uhr schrieb David Morávek <[hidden email]>:

> Hi Chesnay, I've created FLINK-13958
> <https://issues.apache.org/jira/browse/FLINK-13958> to track the issue.
>
> Thanks,
> D.
>
> On Wed, Sep 4, 2019 at 1:56 PM Chesnay Schepler <[hidden email]>
> wrote:
>
> > This sounds like a serious bug, please open a JIRA ticket.
> >
> > On 04/09/2019 13:41, David Morávek wrote:
> > > Hi,
> > >
> > > we're testing the newly released batch recovery and are running into
> > class
> > > loading related issues.
> > >
> > > 1) We have a per-job flink cluster
> > > 2) We use BATCH execution mode + region failover strategy
> > >
> > > Point 1) should imply single user code class loader per task manager
> > > (because there is only single pipeline, that reuses class loader cached
> > in
> > > BlobLibraryCacheManager). We need this property, because we have UDFs
> > that
> > > access C libraries using JNI (I think this may be fairly common
> use-case
> > > when dealing with legacy code). JDK internals
> > > <
> >
> https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/ClassLoader.java#L2466
> > >
> > > make sure that single library can be only loaded by a single class
> loader
> > > per JVM.
> > >
> > > When region recovery is triggered, vertices that need recover are first
> > > reset back to CREATED stated and then rescheduled. In case all tasks
> in a
> > > task manager are reset, this results in cached class loader being
> > released
> > > <
> >
> https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/execution/librarycache/BlobLibraryCacheManager.java#L338
> > >.
> > > This unfortunately causes job failure, because we try to reload a
> native
> > > library in a newly created class loader.
> > >
> > > I know that there is always possibility to distribute native libraries
> > with
> > > flink's libs and load it using system class loader, but this
> introduces a
> > > build & operations overhead and just make it really unfriendly for
> > cluster
> > > user, so I'd rather not work around the issue this way (per-job cluster
> > > should be more user friendly).
> > >
> > > I believe the correct approach would be not to release cached class
> > loader
> > > if the job is recovering, even though there are no tasks currently
> > > registered with TM.
> > >
> > > What do you think? Thanks for help.
> > >
> > > D.
> > >
> >
> >
>