Behavior of lib directory shipping on YARN

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Behavior of lib directory shipping on YARN

stefanobaghino
Hello everybody,

in the past few days me and my colleagues ran some tests with Flink on YARN
and detected a possible inconsistent behavior in the way the contents of
the flink/lib directory is shipped to the cluster when run on YARN,
depending on the fact that the jobs are deployed individually or onto a
long-running session.

After some discussion on the user mailing list
<http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-and-YARN-ship-folder-td5458.html>
we were under the impression that the contents of that folder are always
supposed to be copied so that all the nodes have access to them.
Furthermore, we've found a comment in the code
<https://github.com/apache/flink/blob/master/flink-yarn/src/main/java/org/apache/flink/yarn/FlinkYarnClientBase.java#L254-L263>
that states:

// remove uberjar from ship list (by default everything in the lib/ folder
is added to
// the list of files to ship, but we handle the uberjar separately.

However, after having a look at some portions of the code, I'm not really
sure if this is actually the case or not. The Flink long-running YARN
session actually ships the contents because it's specified in the
yarn-session.sh script
<https://github.com/apache/flink/blob/master/flink-dist/src/main/flink-bin/yarn-bin/yarn-session.sh#L55>,
however running a single job on YARN does not automatically ship the
contents of the lib folder.

The behavior is not documented an I'd like to write some lines in the docs
to make clear of what is shipped in which case. Also, if there is an
agreement on the behavior that the single jobs on YARN should have, I can
also provide a fix for it. My feeling is that running a job on YARN should
end up in having more or less the same effect, regardless of the way the
job is run.

Let me know what you think, thank you for your attention.

--
BR,
Stefano Baghino

Software Engineer @ Radicalbit
Reply | Threaded
Open this post in threaded view
|

Re: Behavior of lib directory shipping on YARN

Ufuk Celebi-2
On Tue, Mar 22, 2016 at 8:42 PM, Stefano Baghino
<[hidden email]> wrote:
> My feeling is that running a job on YARN should
> end up in having more or less the same effect, regardless of the way the
> job is run.

+1

I think that the current behaviour is buggy. The resource management
is currently undergoing a massive refactoring
(https://github.com/apache/flink/pull/1741). Maybe it's already fixed
there (if the issue is independent of the scripts).

Would be great to have a fix for this. If #1751 does not fix it, feel
free to open an issue and PR. :-)

– Ufuk
Reply | Threaded
Open this post in threaded view
|

Re: Behavior of lib directory shipping on YARN

stefanobaghino
Thanks for pointing out Max's work (awesome PR, btw). It actually seem to
have introduced an environment variable regarding ship directories, it
would be good to have his feedback on this.

On Tue, Mar 22, 2016 at 10:24 PM, Ufuk Celebi <[hidden email]> wrote:

> On Tue, Mar 22, 2016 at 8:42 PM, Stefano Baghino
> <[hidden email]> wrote:
> > My feeling is that running a job on YARN should
> > end up in having more or less the same effect, regardless of the way the
> > job is run.
>
> +1
>
> I think that the current behaviour is buggy. The resource management
> is currently undergoing a massive refactoring
> (https://github.com/apache/flink/pull/1741). Maybe it's already fixed
> there (if the issue is independent of the scripts).
>
> Would be great to have a fix for this. If #1751 does not fix it, feel
> free to open an issue and PR. :-)
>
> – Ufuk
>



--
BR,
Stefano Baghino

Software Engineer @ Radicalbit
mxm
Reply | Threaded
Open this post in threaded view
|

Re: Behavior of lib directory shipping on YARN

mxm
Hi Stefano,

Thanks for pointing out this bug. Your analysis is correct. The per-job
cluster does not ship the /lib directory by default. Would you like to open
an issue/PR? We should let the ship_path default to the /lib directory.

The mechanism with the environment variables is the same. They used to be
defined in a different location (FlinkYarnClient) and have been moved to a
separate class (YarnConfigKeys).

Cheers,
Max



On Wed, Mar 23, 2016 at 10:06 AM, Stefano Baghino <
[hidden email]> wrote:

> Thanks for pointing out Max's work (awesome PR, btw). It actually seem to
> have introduced an environment variable regarding ship directories, it
> would be good to have his feedback on this.
>
> On Tue, Mar 22, 2016 at 10:24 PM, Ufuk Celebi <[hidden email]> wrote:
>
> > On Tue, Mar 22, 2016 at 8:42 PM, Stefano Baghino
> > <[hidden email]> wrote:
> > > My feeling is that running a job on YARN should
> > > end up in having more or less the same effect, regardless of the way
> the
> > > job is run.
> >
> > +1
> >
> > I think that the current behaviour is buggy. The resource management
> > is currently undergoing a massive refactoring
> > (https://github.com/apache/flink/pull/1741). Maybe it's already fixed
> > there (if the issue is independent of the scripts).
> >
> > Would be great to have a fix for this. If #1751 does not fix it, feel
> > free to open an issue and PR. :-)
> >
> > – Ufuk
> >
>
>
>
> --
> BR,
> Stefano Baghino
>
> Software Engineer @ Radicalbit
>
Reply | Threaded
Open this post in threaded view
|

Re: Behavior of lib directory shipping on YARN

stefanobaghino
Yup, I shall open an issue for both this one and my other thread (re:
Kerberos).
Thanks for the pointer on this issue.

On Tue, Mar 29, 2016 at 12:44 PM, Maximilian Michels <[hidden email]> wrote:

> Hi Stefano,
>
> Thanks for pointing out this bug. Your analysis is correct. The per-job
> cluster does not ship the /lib directory by default. Would you like to open
> an issue/PR? We should let the ship_path default to the /lib directory.
>
> The mechanism with the environment variables is the same. They used to be
> defined in a different location (FlinkYarnClient) and have been moved to a
> separate class (YarnConfigKeys).
>
> Cheers,
> Max
>
>
>
> On Wed, Mar 23, 2016 at 10:06 AM, Stefano Baghino <
> [hidden email]> wrote:
>
> > Thanks for pointing out Max's work (awesome PR, btw). It actually seem to
> > have introduced an environment variable regarding ship directories, it
> > would be good to have his feedback on this.
> >
> > On Tue, Mar 22, 2016 at 10:24 PM, Ufuk Celebi <[hidden email]> wrote:
> >
> > > On Tue, Mar 22, 2016 at 8:42 PM, Stefano Baghino
> > > <[hidden email]> wrote:
> > > > My feeling is that running a job on YARN should
> > > > end up in having more or less the same effect, regardless of the way
> > the
> > > > job is run.
> > >
> > > +1
> > >
> > > I think that the current behaviour is buggy. The resource management
> > > is currently undergoing a massive refactoring
> > > (https://github.com/apache/flink/pull/1741). Maybe it's already fixed
> > > there (if the issue is independent of the scripts).
> > >
> > > Would be great to have a fix for this. If #1751 does not fix it, feel
> > > free to open an issue and PR. :-)
> > >
> > > – Ufuk
> > >
> >
> >
> >
> > --
> > BR,
> > Stefano Baghino
> >
> > Software Engineer @ Radicalbit
> >
>



--
BR,
Stefano Baghino

Software Engineer @ Radicalbit