Changed the behavior of "DataSet.print()"

classic Classic list List threaded Threaded
22 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Changed the behavior of "DataSet.print()"

Stephan Ewen
Hi all!

Me merged a patch yesterday that changed the API behavior of the
"DataSet.print()" function.

"print()" now prints to stdout on the client process, rather than the
TaskManager process, as before. This is much nicer for debugging and
exploring data sets.

One implication of this is that print() is now an eager method ( like
collect() or count() ). That means that calling "print()" immediately
triggers the execution, and no "env.execute()" is required any more.

Greetings,
Stephan
Reply | Threaded
Open this post in threaded view
|

Re: Changed the behavior of "DataSet.print()"

Robert Metzger
I've filed a JIRA to update the documentation:
https://issues.apache.org/jira/browse/FLINK-2092

On Fri, May 22, 2015 at 11:08 AM, Stephan Ewen <[hidden email]> wrote:

> Hi all!
>
> Me merged a patch yesterday that changed the API behavior of the
> "DataSet.print()" function.
>
> "print()" now prints to stdout on the client process, rather than the
> TaskManager process, as before. This is much nicer for debugging and
> exploring data sets.
>
> One implication of this is that print() is now an eager method ( like
> collect() or count() ). That means that calling "print()" immediately
> triggers the execution, and no "env.execute()" is required any more.
>
> Greetings,
> Stephan
>
>
Reply | Threaded
Open this post in threaded view
|

RE: Changed the behavior of "DataSet.print()"

Kruse, Sebastian
Hi everyone,

I am a bit worried about that recent change of the print() method. I can understand the rationale that obtaining the stdout from all the taskmanagers is cumbersome (although, for local debugging the old print() was fine).
However, a major problem, I see with the new print(), is, that now you can only have one print() per plan, as the plan is directly executed as soon as print() is invoked. If you regard print() as a debugging means, this is a severe restriction.
I see use cases for both print() implementations, but I would at least provide some kind of backwards compatibility, be at a parameter or a legacyPrint() method or anything else. As I assume print() to be very frequently used, a lot of existing programs would benefit from this and might otherwise not be directly portable to newer Flink versions. What do you think?

Cheers,
Sebastian

-----Original Message-----
From: Robert Metzger [mailto:[hidden email]]
Sent: Dienstag, 26. Mai 2015 11:12
To: [hidden email]
Subject: Re: Changed the behavior of "DataSet.print()"

I've filed a JIRA to update the documentation:
https://issues.apache.org/jira/browse/FLINK-2092

On Fri, May 22, 2015 at 11:08 AM, Stephan Ewen <[hidden email]> wrote:

> Hi all!
>
> Me merged a patch yesterday that changed the API behavior of the
> "DataSet.print()" function.
>
> "print()" now prints to stdout on the client process, rather than the
> TaskManager process, as before. This is much nicer for debugging and
> exploring data sets.
>
> One implication of this is that print() is now an eager method ( like
> collect() or count() ). That means that calling "print()" immediately
> triggers the execution, and no "env.execute()" is required any more.
>
> Greetings,
> Stephan
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Changed the behavior of "DataSet.print()"

Robert Metzger
Hi Sebastian,

thank you for the feedback. I agree that both variants have a right to
exist.

I would vote for adding another method to the DataSet called "printLocal()"
that has the old behavior.

On Thu, May 28, 2015 at 1:01 PM, Kruse, Sebastian <[hidden email]>
wrote:

> Hi everyone,
>
> I am a bit worried about that recent change of the print() method. I can
> understand the rationale that obtaining the stdout from all the
> taskmanagers is cumbersome (although, for local debugging the old print()
> was fine).
> However, a major problem, I see with the new print(), is, that now you can
> only have one print() per plan, as the plan is directly executed as soon as
> print() is invoked. If you regard print() as a debugging means, this is a
> severe restriction.
> I see use cases for both print() implementations, but I would at least
> provide some kind of backwards compatibility, be at a parameter or a
> legacyPrint() method or anything else. As I assume print() to be very
> frequently used, a lot of existing programs would benefit from this and
> might otherwise not be directly portable to newer Flink versions. What do
> you think?
>
> Cheers,
> Sebastian
>
> -----Original Message-----
> From: Robert Metzger [mailto:[hidden email]]
> Sent: Dienstag, 26. Mai 2015 11:12
> To: [hidden email]
> Subject: Re: Changed the behavior of "DataSet.print()"
>
> I've filed a JIRA to update the documentation:
> https://issues.apache.org/jira/browse/FLINK-2092
>
> On Fri, May 22, 2015 at 11:08 AM, Stephan Ewen <[hidden email]> wrote:
>
> > Hi all!
> >
> > Me merged a patch yesterday that changed the API behavior of the
> > "DataSet.print()" function.
> >
> > "print()" now prints to stdout on the client process, rather than the
> > TaskManager process, as before. This is much nicer for debugging and
> > exploring data sets.
> >
> > One implication of this is that print() is now an eager method ( like
> > collect() or count() ). That means that calling "print()" immediately
> > triggers the execution, and no "env.execute()" is required any more.
> >
> > Greetings,
> > Stephan
> >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Changed the behavior of "DataSet.print()"

Fabian Hueske-2
+1 for both.

printLocal() might not be the best name, because "local" is not well
defined and could also be understood as the local machine of the user.
How about naming the method completely different (writeToWorkerStdOut()?)
to make sure users are not confused with eager and lazy execution?


2015-05-28 13:44 GMT+02:00 Robert Metzger <[hidden email]>:

> Hi Sebastian,
>
> thank you for the feedback. I agree that both variants have a right to
> exist.
>
> I would vote for adding another method to the DataSet called "printLocal()"
> that has the old behavior.
>
> On Thu, May 28, 2015 at 1:01 PM, Kruse, Sebastian <[hidden email]>
> wrote:
>
> > Hi everyone,
> >
> > I am a bit worried about that recent change of the print() method. I can
> > understand the rationale that obtaining the stdout from all the
> > taskmanagers is cumbersome (although, for local debugging the old print()
> > was fine).
> > However, a major problem, I see with the new print(), is, that now you
> can
> > only have one print() per plan, as the plan is directly executed as soon
> as
> > print() is invoked. If you regard print() as a debugging means, this is a
> > severe restriction.
> > I see use cases for both print() implementations, but I would at least
> > provide some kind of backwards compatibility, be at a parameter or a
> > legacyPrint() method or anything else. As I assume print() to be very
> > frequently used, a lot of existing programs would benefit from this and
> > might otherwise not be directly portable to newer Flink versions. What do
> > you think?
> >
> > Cheers,
> > Sebastian
> >
> > -----Original Message-----
> > From: Robert Metzger [mailto:[hidden email]]
> > Sent: Dienstag, 26. Mai 2015 11:12
> > To: [hidden email]
> > Subject: Re: Changed the behavior of "DataSet.print()"
> >
> > I've filed a JIRA to update the documentation:
> > https://issues.apache.org/jira/browse/FLINK-2092
> >
> > On Fri, May 22, 2015 at 11:08 AM, Stephan Ewen <[hidden email]> wrote:
> >
> > > Hi all!
> > >
> > > Me merged a patch yesterday that changed the API behavior of the
> > > "DataSet.print()" function.
> > >
> > > "print()" now prints to stdout on the client process, rather than the
> > > TaskManager process, as before. This is much nicer for debugging and
> > > exploring data sets.
> > >
> > > One implication of this is that print() is now an eager method ( like
> > > collect() or count() ). That means that calling "print()" immediately
> > > triggers the execution, and no "env.execute()" is required any more.
> > >
> > > Greetings,
> > > Stephan
> > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Changed the behavior of "DataSet.print()"

Robert Metzger
Okay, you are right, local is actually confusing.
I'm against introducing "worker" as a term in the API. Its still called
"TaskManager". Maybe "printOnTaskManager()" ?

On Thu, May 28, 2015 at 2:06 PM, Fabian Hueske <[hidden email]> wrote:

> +1 for both.
>
> printLocal() might not be the best name, because "local" is not well
> defined and could also be understood as the local machine of the user.
> How about naming the method completely different (writeToWorkerStdOut()?)
> to make sure users are not confused with eager and lazy execution?
>
>
> 2015-05-28 13:44 GMT+02:00 Robert Metzger <[hidden email]>:
>
> > Hi Sebastian,
> >
> > thank you for the feedback. I agree that both variants have a right to
> > exist.
> >
> > I would vote for adding another method to the DataSet called
> "printLocal()"
> > that has the old behavior.
> >
> > On Thu, May 28, 2015 at 1:01 PM, Kruse, Sebastian <
> [hidden email]>
> > wrote:
> >
> > > Hi everyone,
> > >
> > > I am a bit worried about that recent change of the print() method. I
> can
> > > understand the rationale that obtaining the stdout from all the
> > > taskmanagers is cumbersome (although, for local debugging the old
> print()
> > > was fine).
> > > However, a major problem, I see with the new print(), is, that now you
> > can
> > > only have one print() per plan, as the plan is directly executed as
> soon
> > as
> > > print() is invoked. If you regard print() as a debugging means, this
> is a
> > > severe restriction.
> > > I see use cases for both print() implementations, but I would at least
> > > provide some kind of backwards compatibility, be at a parameter or a
> > > legacyPrint() method or anything else. As I assume print() to be very
> > > frequently used, a lot of existing programs would benefit from this and
> > > might otherwise not be directly portable to newer Flink versions. What
> do
> > > you think?
> > >
> > > Cheers,
> > > Sebastian
> > >
> > > -----Original Message-----
> > > From: Robert Metzger [mailto:[hidden email]]
> > > Sent: Dienstag, 26. Mai 2015 11:12
> > > To: [hidden email]
> > > Subject: Re: Changed the behavior of "DataSet.print()"
> > >
> > > I've filed a JIRA to update the documentation:
> > > https://issues.apache.org/jira/browse/FLINK-2092
> > >
> > > On Fri, May 22, 2015 at 11:08 AM, Stephan Ewen <[hidden email]>
> wrote:
> > >
> > > > Hi all!
> > > >
> > > > Me merged a patch yesterday that changed the API behavior of the
> > > > "DataSet.print()" function.
> > > >
> > > > "print()" now prints to stdout on the client process, rather than the
> > > > TaskManager process, as before. This is much nicer for debugging and
> > > > exploring data sets.
> > > >
> > > > One implication of this is that print() is now an eager method ( like
> > > > collect() or count() ). That means that calling "print()" immediately
> > > > triggers the execution, and no "env.execute()" is required any more.
> > > >
> > > > Greetings,
> > > > Stephan
> > > >
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Changed the behavior of "DataSet.print()"

Fabian Hueske-2
I would avoid to call it printXYZ, since print()'s behavior changed to
eager execution.

2015-05-28 14:10 GMT+02:00 Robert Metzger <[hidden email]>:

> Okay, you are right, local is actually confusing.
> I'm against introducing "worker" as a term in the API. Its still called
> "TaskManager". Maybe "printOnTaskManager()" ?
>
> On Thu, May 28, 2015 at 2:06 PM, Fabian Hueske <[hidden email]> wrote:
>
> > +1 for both.
> >
> > printLocal() might not be the best name, because "local" is not well
> > defined and could also be understood as the local machine of the user.
> > How about naming the method completely different (writeToWorkerStdOut()?)
> > to make sure users are not confused with eager and lazy execution?
> >
> >
> > 2015-05-28 13:44 GMT+02:00 Robert Metzger <[hidden email]>:
> >
> > > Hi Sebastian,
> > >
> > > thank you for the feedback. I agree that both variants have a right to
> > > exist.
> > >
> > > I would vote for adding another method to the DataSet called
> > "printLocal()"
> > > that has the old behavior.
> > >
> > > On Thu, May 28, 2015 at 1:01 PM, Kruse, Sebastian <
> > [hidden email]>
> > > wrote:
> > >
> > > > Hi everyone,
> > > >
> > > > I am a bit worried about that recent change of the print() method. I
> > can
> > > > understand the rationale that obtaining the stdout from all the
> > > > taskmanagers is cumbersome (although, for local debugging the old
> > print()
> > > > was fine).
> > > > However, a major problem, I see with the new print(), is, that now
> you
> > > can
> > > > only have one print() per plan, as the plan is directly executed as
> > soon
> > > as
> > > > print() is invoked. If you regard print() as a debugging means, this
> > is a
> > > > severe restriction.
> > > > I see use cases for both print() implementations, but I would at
> least
> > > > provide some kind of backwards compatibility, be at a parameter or a
> > > > legacyPrint() method or anything else. As I assume print() to be very
> > > > frequently used, a lot of existing programs would benefit from this
> and
> > > > might otherwise not be directly portable to newer Flink versions.
> What
> > do
> > > > you think?
> > > >
> > > > Cheers,
> > > > Sebastian
> > > >
> > > > -----Original Message-----
> > > > From: Robert Metzger [mailto:[hidden email]]
> > > > Sent: Dienstag, 26. Mai 2015 11:12
> > > > To: [hidden email]
> > > > Subject: Re: Changed the behavior of "DataSet.print()"
> > > >
> > > > I've filed a JIRA to update the documentation:
> > > > https://issues.apache.org/jira/browse/FLINK-2092
> > > >
> > > > On Fri, May 22, 2015 at 11:08 AM, Stephan Ewen <[hidden email]>
> > wrote:
> > > >
> > > > > Hi all!
> > > > >
> > > > > Me merged a patch yesterday that changed the API behavior of the
> > > > > "DataSet.print()" function.
> > > > >
> > > > > "print()" now prints to stdout on the client process, rather than
> the
> > > > > TaskManager process, as before. This is much nicer for debugging
> and
> > > > > exploring data sets.
> > > > >
> > > > > One implication of this is that print() is now an eager method (
> like
> > > > > collect() or count() ). That means that calling "print()"
> immediately
> > > > > triggers the execution, and no "env.execute()" is required any
> more.
> > > > >
> > > > > Greetings,
> > > > > Stephan
> > > > >
> > > > >
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Changed the behavior of "DataSet.print()"

Stephan Ewen
Actually, there is a method "print(String prefix)" which still goes to the
sysout of where the job is executed.

Let's give that one the name "printOnTaskManager()" and then we should have
it...

On Thu, May 28, 2015 at 2:13 PM, Fabian Hueske <[hidden email]> wrote:

> I would avoid to call it printXYZ, since print()'s behavior changed to
> eager execution.
>
> 2015-05-28 14:10 GMT+02:00 Robert Metzger <[hidden email]>:
>
> > Okay, you are right, local is actually confusing.
> > I'm against introducing "worker" as a term in the API. Its still called
> > "TaskManager". Maybe "printOnTaskManager()" ?
> >
> > On Thu, May 28, 2015 at 2:06 PM, Fabian Hueske <[hidden email]>
> wrote:
> >
> > > +1 for both.
> > >
> > > printLocal() might not be the best name, because "local" is not well
> > > defined and could also be understood as the local machine of the user.
> > > How about naming the method completely different
> (writeToWorkerStdOut()?)
> > > to make sure users are not confused with eager and lazy execution?
> > >
> > >
> > > 2015-05-28 13:44 GMT+02:00 Robert Metzger <[hidden email]>:
> > >
> > > > Hi Sebastian,
> > > >
> > > > thank you for the feedback. I agree that both variants have a right
> to
> > > > exist.
> > > >
> > > > I would vote for adding another method to the DataSet called
> > > "printLocal()"
> > > > that has the old behavior.
> > > >
> > > > On Thu, May 28, 2015 at 1:01 PM, Kruse, Sebastian <
> > > [hidden email]>
> > > > wrote:
> > > >
> > > > > Hi everyone,
> > > > >
> > > > > I am a bit worried about that recent change of the print() method.
> I
> > > can
> > > > > understand the rationale that obtaining the stdout from all the
> > > > > taskmanagers is cumbersome (although, for local debugging the old
> > > print()
> > > > > was fine).
> > > > > However, a major problem, I see with the new print(), is, that now
> > you
> > > > can
> > > > > only have one print() per plan, as the plan is directly executed as
> > > soon
> > > > as
> > > > > print() is invoked. If you regard print() as a debugging means,
> this
> > > is a
> > > > > severe restriction.
> > > > > I see use cases for both print() implementations, but I would at
> > least
> > > > > provide some kind of backwards compatibility, be at a parameter or
> a
> > > > > legacyPrint() method or anything else. As I assume print() to be
> very
> > > > > frequently used, a lot of existing programs would benefit from this
> > and
> > > > > might otherwise not be directly portable to newer Flink versions.
> > What
> > > do
> > > > > you think?
> > > > >
> > > > > Cheers,
> > > > > Sebastian
> > > > >
> > > > > -----Original Message-----
> > > > > From: Robert Metzger [mailto:[hidden email]]
> > > > > Sent: Dienstag, 26. Mai 2015 11:12
> > > > > To: [hidden email]
> > > > > Subject: Re: Changed the behavior of "DataSet.print()"
> > > > >
> > > > > I've filed a JIRA to update the documentation:
> > > > > https://issues.apache.org/jira/browse/FLINK-2092
> > > > >
> > > > > On Fri, May 22, 2015 at 11:08 AM, Stephan Ewen <[hidden email]>
> > > wrote:
> > > > >
> > > > > > Hi all!
> > > > > >
> > > > > > Me merged a patch yesterday that changed the API behavior of the
> > > > > > "DataSet.print()" function.
> > > > > >
> > > > > > "print()" now prints to stdout on the client process, rather than
> > the
> > > > > > TaskManager process, as before. This is much nicer for debugging
> > and
> > > > > > exploring data sets.
> > > > > >
> > > > > > One implication of this is that print() is now an eager method (
> > like
> > > > > > collect() or count() ). That means that calling "print()"
> > immediately
> > > > > > triggers the execution, and no "env.execute()" is required any
> > more.
> > > > > >
> > > > > > Greetings,
> > > > > > Stephan
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Changed the behavior of "DataSet.print()"

Fabian Hueske-2
As I said, the common print prefix might indicate eager execution.

I know that writeToTaskManagerStdOut() is quite bulky, but we should make
the difference in the behavior very clear, IMO.

2015-05-28 14:29 GMT+02:00 Stephan Ewen <[hidden email]>:

> Actually, there is a method "print(String prefix)" which still goes to the
> sysout of where the job is executed.
>
> Let's give that one the name "printOnTaskManager()" and then we should have
> it...
>
> On Thu, May 28, 2015 at 2:13 PM, Fabian Hueske <[hidden email]> wrote:
>
> > I would avoid to call it printXYZ, since print()'s behavior changed to
> > eager execution.
> >
> > 2015-05-28 14:10 GMT+02:00 Robert Metzger <[hidden email]>:
> >
> > > Okay, you are right, local is actually confusing.
> > > I'm against introducing "worker" as a term in the API. Its still called
> > > "TaskManager". Maybe "printOnTaskManager()" ?
> > >
> > > On Thu, May 28, 2015 at 2:06 PM, Fabian Hueske <[hidden email]>
> > wrote:
> > >
> > > > +1 for both.
> > > >
> > > > printLocal() might not be the best name, because "local" is not well
> > > > defined and could also be understood as the local machine of the
> user.
> > > > How about naming the method completely different
> > (writeToWorkerStdOut()?)
> > > > to make sure users are not confused with eager and lazy execution?
> > > >
> > > >
> > > > 2015-05-28 13:44 GMT+02:00 Robert Metzger <[hidden email]>:
> > > >
> > > > > Hi Sebastian,
> > > > >
> > > > > thank you for the feedback. I agree that both variants have a right
> > to
> > > > > exist.
> > > > >
> > > > > I would vote for adding another method to the DataSet called
> > > > "printLocal()"
> > > > > that has the old behavior.
> > > > >
> > > > > On Thu, May 28, 2015 at 1:01 PM, Kruse, Sebastian <
> > > > [hidden email]>
> > > > > wrote:
> > > > >
> > > > > > Hi everyone,
> > > > > >
> > > > > > I am a bit worried about that recent change of the print()
> method.
> > I
> > > > can
> > > > > > understand the rationale that obtaining the stdout from all the
> > > > > > taskmanagers is cumbersome (although, for local debugging the old
> > > > print()
> > > > > > was fine).
> > > > > > However, a major problem, I see with the new print(), is, that
> now
> > > you
> > > > > can
> > > > > > only have one print() per plan, as the plan is directly executed
> as
> > > > soon
> > > > > as
> > > > > > print() is invoked. If you regard print() as a debugging means,
> > this
> > > > is a
> > > > > > severe restriction.
> > > > > > I see use cases for both print() implementations, but I would at
> > > least
> > > > > > provide some kind of backwards compatibility, be at a parameter
> or
> > a
> > > > > > legacyPrint() method or anything else. As I assume print() to be
> > very
> > > > > > frequently used, a lot of existing programs would benefit from
> this
> > > and
> > > > > > might otherwise not be directly portable to newer Flink versions.
> > > What
> > > > do
> > > > > > you think?
> > > > > >
> > > > > > Cheers,
> > > > > > Sebastian
> > > > > >
> > > > > > -----Original Message-----
> > > > > > From: Robert Metzger [mailto:[hidden email]]
> > > > > > Sent: Dienstag, 26. Mai 2015 11:12
> > > > > > To: [hidden email]
> > > > > > Subject: Re: Changed the behavior of "DataSet.print()"
> > > > > >
> > > > > > I've filed a JIRA to update the documentation:
> > > > > > https://issues.apache.org/jira/browse/FLINK-2092
> > > > > >
> > > > > > On Fri, May 22, 2015 at 11:08 AM, Stephan Ewen <[hidden email]
> >
> > > > wrote:
> > > > > >
> > > > > > > Hi all!
> > > > > > >
> > > > > > > Me merged a patch yesterday that changed the API behavior of
> the
> > > > > > > "DataSet.print()" function.
> > > > > > >
> > > > > > > "print()" now prints to stdout on the client process, rather
> than
> > > the
> > > > > > > TaskManager process, as before. This is much nicer for
> debugging
> > > and
> > > > > > > exploring data sets.
> > > > > > >
> > > > > > > One implication of this is that print() is now an eager method
> (
> > > like
> > > > > > > collect() or count() ). That means that calling "print()"
> > > immediately
> > > > > > > triggers the execution, and no "env.execute()" is required any
> > > more.
> > > > > > >
> > > > > > > Greetings,
> > > > > > > Stephan
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

RE: Changed the behavior of "DataSet.print()"

Kruse, Sebastian
Thanks, for your quick responses!

I also think that renaming the old print method should do the trick. As a contribution to your brainstorming for a name, I propose logOnTaskManager() ;)

Cheers,
Sebastian

-----Original Message-----
From: Fabian Hueske [mailto:[hidden email]]
Sent: Donnerstag, 28. Mai 2015 14:34
To: [hidden email]
Subject: Re: Changed the behavior of "DataSet.print()"

As I said, the common print prefix might indicate eager execution.

I know that writeToTaskManagerStdOut() is quite bulky, but we should make the difference in the behavior very clear, IMO.

2015-05-28 14:29 GMT+02:00 Stephan Ewen <[hidden email]>:

> Actually, there is a method "print(String prefix)" which still goes to
> the sysout of where the job is executed.
>
> Let's give that one the name "printOnTaskManager()" and then we should
> have it...
>
> On Thu, May 28, 2015 at 2:13 PM, Fabian Hueske <[hidden email]> wrote:
>
> > I would avoid to call it printXYZ, since print()'s behavior changed
> > to eager execution.
> >
> > 2015-05-28 14:10 GMT+02:00 Robert Metzger <[hidden email]>:
> >
> > > Okay, you are right, local is actually confusing.
> > > I'm against introducing "worker" as a term in the API. Its still
> > > called "TaskManager". Maybe "printOnTaskManager()" ?
> > >
> > > On Thu, May 28, 2015 at 2:06 PM, Fabian Hueske <[hidden email]>
> > wrote:
> > >
> > > > +1 for both.
> > > >
> > > > printLocal() might not be the best name, because "local" is not
> > > > well defined and could also be understood as the local machine
> > > > of the
> user.
> > > > How about naming the method completely different
> > (writeToWorkerStdOut()?)
> > > > to make sure users are not confused with eager and lazy execution?
> > > >
> > > >
> > > > 2015-05-28 13:44 GMT+02:00 Robert Metzger <[hidden email]>:
> > > >
> > > > > Hi Sebastian,
> > > > >
> > > > > thank you for the feedback. I agree that both variants have a
> > > > > right
> > to
> > > > > exist.
> > > > >
> > > > > I would vote for adding another method to the DataSet called
> > > > "printLocal()"
> > > > > that has the old behavior.
> > > > >
> > > > > On Thu, May 28, 2015 at 1:01 PM, Kruse, Sebastian <
> > > > [hidden email]>
> > > > > wrote:
> > > > >
> > > > > > Hi everyone,
> > > > > >
> > > > > > I am a bit worried about that recent change of the print()
> method.
> > I
> > > > can
> > > > > > understand the rationale that obtaining the stdout from all
> > > > > > the taskmanagers is cumbersome (although, for local
> > > > > > debugging the old
> > > > print()
> > > > > > was fine).
> > > > > > However, a major problem, I see with the new print(), is,
> > > > > > that
> now
> > > you
> > > > > can
> > > > > > only have one print() per plan, as the plan is directly
> > > > > > executed
> as
> > > > soon
> > > > > as
> > > > > > print() is invoked. If you regard print() as a debugging
> > > > > > means,
> > this
> > > > is a
> > > > > > severe restriction.
> > > > > > I see use cases for both print() implementations, but I
> > > > > > would at
> > > least
> > > > > > provide some kind of backwards compatibility, be at a
> > > > > > parameter
> or
> > a
> > > > > > legacyPrint() method or anything else. As I assume print()
> > > > > > to be
> > very
> > > > > > frequently used, a lot of existing programs would benefit
> > > > > > from
> this
> > > and
> > > > > > might otherwise not be directly portable to newer Flink versions.
> > > What
> > > > do
> > > > > > you think?
> > > > > >
> > > > > > Cheers,
> > > > > > Sebastian
> > > > > >
> > > > > > -----Original Message-----
> > > > > > From: Robert Metzger [mailto:[hidden email]]
> > > > > > Sent: Dienstag, 26. Mai 2015 11:12
> > > > > > To: [hidden email]
> > > > > > Subject: Re: Changed the behavior of "DataSet.print()"
> > > > > >
> > > > > > I've filed a JIRA to update the documentation:
> > > > > > https://issues.apache.org/jira/browse/FLINK-2092
> > > > > >
> > > > > > On Fri, May 22, 2015 at 11:08 AM, Stephan Ewen
> > > > > > <[hidden email]
> >
> > > > wrote:
> > > > > >
> > > > > > > Hi all!
> > > > > > >
> > > > > > > Me merged a patch yesterday that changed the API behavior
> > > > > > > of
> the
> > > > > > > "DataSet.print()" function.
> > > > > > >
> > > > > > > "print()" now prints to stdout on the client process,
> > > > > > > rather
> than
> > > the
> > > > > > > TaskManager process, as before. This is much nicer for
> debugging
> > > and
> > > > > > > exploring data sets.
> > > > > > >
> > > > > > > One implication of this is that print() is now an eager
> > > > > > > method
> (
> > > like
> > > > > > > collect() or count() ). That means that calling "print()"
> > > immediately
> > > > > > > triggers the execution, and no "env.execute()" is required
> > > > > > > any
> > > more.
> > > > > > >
> > > > > > > Greetings,
> > > > > > > Stephan
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
mxm
Reply | Threaded
Open this post in threaded view
|

Re: Changed the behavior of "DataSet.print()"

mxm
+1 for printOnTaskManager()

On Thu, May 28, 2015 at 2:53 PM, Kruse, Sebastian <[hidden email]>
wrote:

> Thanks, for your quick responses!
>
> I also think that renaming the old print method should do the trick. As a
> contribution to your brainstorming for a name, I propose logOnTaskManager()
> ;)
>
> Cheers,
> Sebastian
>
> -----Original Message-----
> From: Fabian Hueske [mailto:[hidden email]]
> Sent: Donnerstag, 28. Mai 2015 14:34
> To: [hidden email]
> Subject: Re: Changed the behavior of "DataSet.print()"
>
> As I said, the common print prefix might indicate eager execution.
>
> I know that writeToTaskManagerStdOut() is quite bulky, but we should make
> the difference in the behavior very clear, IMO.
>
> 2015-05-28 14:29 GMT+02:00 Stephan Ewen <[hidden email]>:
>
> > Actually, there is a method "print(String prefix)" which still goes to
> > the sysout of where the job is executed.
> >
> > Let's give that one the name "printOnTaskManager()" and then we should
> > have it...
> >
> > On Thu, May 28, 2015 at 2:13 PM, Fabian Hueske <[hidden email]>
> wrote:
> >
> > > I would avoid to call it printXYZ, since print()'s behavior changed
> > > to eager execution.
> > >
> > > 2015-05-28 14:10 GMT+02:00 Robert Metzger <[hidden email]>:
> > >
> > > > Okay, you are right, local is actually confusing.
> > > > I'm against introducing "worker" as a term in the API. Its still
> > > > called "TaskManager". Maybe "printOnTaskManager()" ?
> > > >
> > > > On Thu, May 28, 2015 at 2:06 PM, Fabian Hueske <[hidden email]>
> > > wrote:
> > > >
> > > > > +1 for both.
> > > > >
> > > > > printLocal() might not be the best name, because "local" is not
> > > > > well defined and could also be understood as the local machine
> > > > > of the
> > user.
> > > > > How about naming the method completely different
> > > (writeToWorkerStdOut()?)
> > > > > to make sure users are not confused with eager and lazy execution?
> > > > >
> > > > >
> > > > > 2015-05-28 13:44 GMT+02:00 Robert Metzger <[hidden email]>:
> > > > >
> > > > > > Hi Sebastian,
> > > > > >
> > > > > > thank you for the feedback. I agree that both variants have a
> > > > > > right
> > > to
> > > > > > exist.
> > > > > >
> > > > > > I would vote for adding another method to the DataSet called
> > > > > "printLocal()"
> > > > > > that has the old behavior.
> > > > > >
> > > > > > On Thu, May 28, 2015 at 1:01 PM, Kruse, Sebastian <
> > > > > [hidden email]>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi everyone,
> > > > > > >
> > > > > > > I am a bit worried about that recent change of the print()
> > method.
> > > I
> > > > > can
> > > > > > > understand the rationale that obtaining the stdout from all
> > > > > > > the taskmanagers is cumbersome (although, for local
> > > > > > > debugging the old
> > > > > print()
> > > > > > > was fine).
> > > > > > > However, a major problem, I see with the new print(), is,
> > > > > > > that
> > now
> > > > you
> > > > > > can
> > > > > > > only have one print() per plan, as the plan is directly
> > > > > > > executed
> > as
> > > > > soon
> > > > > > as
> > > > > > > print() is invoked. If you regard print() as a debugging
> > > > > > > means,
> > > this
> > > > > is a
> > > > > > > severe restriction.
> > > > > > > I see use cases for both print() implementations, but I
> > > > > > > would at
> > > > least
> > > > > > > provide some kind of backwards compatibility, be at a
> > > > > > > parameter
> > or
> > > a
> > > > > > > legacyPrint() method or anything else. As I assume print()
> > > > > > > to be
> > > very
> > > > > > > frequently used, a lot of existing programs would benefit
> > > > > > > from
> > this
> > > > and
> > > > > > > might otherwise not be directly portable to newer Flink
> versions.
> > > > What
> > > > > do
> > > > > > > you think?
> > > > > > >
> > > > > > > Cheers,
> > > > > > > Sebastian
> > > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: Robert Metzger [mailto:[hidden email]]
> > > > > > > Sent: Dienstag, 26. Mai 2015 11:12
> > > > > > > To: [hidden email]
> > > > > > > Subject: Re: Changed the behavior of "DataSet.print()"
> > > > > > >
> > > > > > > I've filed a JIRA to update the documentation:
> > > > > > > https://issues.apache.org/jira/browse/FLINK-2092
> > > > > > >
> > > > > > > On Fri, May 22, 2015 at 11:08 AM, Stephan Ewen
> > > > > > > <[hidden email]
> > >
> > > > > wrote:
> > > > > > >
> > > > > > > > Hi all!
> > > > > > > >
> > > > > > > > Me merged a patch yesterday that changed the API behavior
> > > > > > > > of
> > the
> > > > > > > > "DataSet.print()" function.
> > > > > > > >
> > > > > > > > "print()" now prints to stdout on the client process,
> > > > > > > > rather
> > than
> > > > the
> > > > > > > > TaskManager process, as before. This is much nicer for
> > debugging
> > > > and
> > > > > > > > exploring data sets.
> > > > > > > >
> > > > > > > > One implication of this is that print() is now an eager
> > > > > > > > method
> > (
> > > > like
> > > > > > > > collect() or count() ). That means that calling "print()"
> > > > immediately
> > > > > > > > triggers the execution, and no "env.execute()" is required
> > > > > > > > any
> > > > more.
> > > > > > > >
> > > > > > > > Greetings,
> > > > > > > > Stephan
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Changed the behavior of "DataSet.print()"

Chiwan Park
I agree that avoiding name which starts with “print” is better.

Regards,
Chiwan Park

> On May 28, 2015, at 11:35 PM, Maximilian Michels <[hidden email]> wrote:
>
> +1 for printOnTaskManager()
>
> On Thu, May 28, 2015 at 2:53 PM, Kruse, Sebastian <[hidden email]>
> wrote:
>
>> Thanks, for your quick responses!
>>
>> I also think that renaming the old print method should do the trick. As a
>> contribution to your brainstorming for a name, I propose logOnTaskManager()
>> ;)
>>
>> Cheers,
>> Sebastian
>>
>> -----Original Message-----
>> From: Fabian Hueske [mailto:[hidden email]]
>> Sent: Donnerstag, 28. Mai 2015 14:34
>> To: [hidden email]
>> Subject: Re: Changed the behavior of "DataSet.print()"
>>
>> As I said, the common print prefix might indicate eager execution.
>>
>> I know that writeToTaskManagerStdOut() is quite bulky, but we should make
>> the difference in the behavior very clear, IMO.
>>
>> 2015-05-28 14:29 GMT+02:00 Stephan Ewen <[hidden email]>:
>>
>>> Actually, there is a method "print(String prefix)" which still goes to
>>> the sysout of where the job is executed.
>>>
>>> Let's give that one the name "printOnTaskManager()" and then we should
>>> have it...
>>>
>>> On Thu, May 28, 2015 at 2:13 PM, Fabian Hueske <[hidden email]>
>> wrote:
>>>
>>>> I would avoid to call it printXYZ, since print()'s behavior changed
>>>> to eager execution.
>>>>
>>>> 2015-05-28 14:10 GMT+02:00 Robert Metzger <[hidden email]>:
>>>>
>>>>> Okay, you are right, local is actually confusing.
>>>>> I'm against introducing "worker" as a term in the API. Its still
>>>>> called "TaskManager". Maybe "printOnTaskManager()" ?
>>>>>
>>>>> On Thu, May 28, 2015 at 2:06 PM, Fabian Hueske <[hidden email]>
>>>> wrote:
>>>>>
>>>>>> +1 for both.
>>>>>>
>>>>>> printLocal() might not be the best name, because "local" is not
>>>>>> well defined and could also be understood as the local machine
>>>>>> of the
>>> user.
>>>>>> How about naming the method completely different
>>>> (writeToWorkerStdOut()?)
>>>>>> to make sure users are not confused with eager and lazy execution?
>>>>>>
>>>>>>
>>>>>> 2015-05-28 13:44 GMT+02:00 Robert Metzger <[hidden email]>:
>>>>>>
>>>>>>> Hi Sebastian,
>>>>>>>
>>>>>>> thank you for the feedback. I agree that both variants have a
>>>>>>> right
>>>> to
>>>>>>> exist.
>>>>>>>
>>>>>>> I would vote for adding another method to the DataSet called
>>>>>> "printLocal()"
>>>>>>> that has the old behavior.
>>>>>>>
>>>>>>> On Thu, May 28, 2015 at 1:01 PM, Kruse, Sebastian <
>>>>>> [hidden email]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi everyone,
>>>>>>>>
>>>>>>>> I am a bit worried about that recent change of the print()
>>> method.
>>>> I
>>>>>> can
>>>>>>>> understand the rationale that obtaining the stdout from all
>>>>>>>> the taskmanagers is cumbersome (although, for local
>>>>>>>> debugging the old
>>>>>> print()
>>>>>>>> was fine).
>>>>>>>> However, a major problem, I see with the new print(), is,
>>>>>>>> that
>>> now
>>>>> you
>>>>>>> can
>>>>>>>> only have one print() per plan, as the plan is directly
>>>>>>>> executed
>>> as
>>>>>> soon
>>>>>>> as
>>>>>>>> print() is invoked. If you regard print() as a debugging
>>>>>>>> means,
>>>> this
>>>>>> is a
>>>>>>>> severe restriction.
>>>>>>>> I see use cases for both print() implementations, but I
>>>>>>>> would at
>>>>> least
>>>>>>>> provide some kind of backwards compatibility, be at a
>>>>>>>> parameter
>>> or
>>>> a
>>>>>>>> legacyPrint() method or anything else. As I assume print()
>>>>>>>> to be
>>>> very
>>>>>>>> frequently used, a lot of existing programs would benefit
>>>>>>>> from
>>> this
>>>>> and
>>>>>>>> might otherwise not be directly portable to newer Flink
>> versions.
>>>>> What
>>>>>> do
>>>>>>>> you think?
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Sebastian
>>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: Robert Metzger [mailto:[hidden email]]
>>>>>>>> Sent: Dienstag, 26. Mai 2015 11:12
>>>>>>>> To: [hidden email]
>>>>>>>> Subject: Re: Changed the behavior of "DataSet.print()"
>>>>>>>>
>>>>>>>> I've filed a JIRA to update the documentation:
>>>>>>>> https://issues.apache.org/jira/browse/FLINK-2092
>>>>>>>>
>>>>>>>> On Fri, May 22, 2015 at 11:08 AM, Stephan Ewen
>>>>>>>> <[hidden email]
>>>>
>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi all!
>>>>>>>>>
>>>>>>>>> Me merged a patch yesterday that changed the API behavior
>>>>>>>>> of
>>> the
>>>>>>>>> "DataSet.print()" function.
>>>>>>>>>
>>>>>>>>> "print()" now prints to stdout on the client process,
>>>>>>>>> rather
>>> than
>>>>> the
>>>>>>>>> TaskManager process, as before. This is much nicer for
>>> debugging
>>>>> and
>>>>>>>>> exploring data sets.
>>>>>>>>>
>>>>>>>>> One implication of this is that print() is now an eager
>>>>>>>>> method
>>> (
>>>>> like
>>>>>>>>> collect() or count() ). That means that calling "print()"
>>>>> immediately
>>>>>>>>> triggers the execution, and no "env.execute()" is required
>>>>>>>>> any
>>>>> more.
>>>>>>>>>
>>>>>>>>> Greetings,
>>>>>>>>> Stephan
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>




Reply | Threaded
Open this post in threaded view
|

Re: Changed the behavior of "DataSet.print()"

Robert Metzger
I would like to reach consensus on this before the 0.9 release.

So far we have the following ideas:

writeToWorkerStdOut(prefix)
printOnTaskManager(prefix) (+1)
logOnTaskManager(prefix)

I'm against logOnTM because we are not logging the output, we are writing
or printing it.


*I would vote for deprecating "print(prefix)" and adding
"writeToWorkerStdOut(prefix)"*



On Thu, May 28, 2015 at 5:00 PM, Chiwan Park <[hidden email]> wrote:

> I agree that avoiding name which starts with “print” is better.
>
> Regards,
> Chiwan Park
>
> > On May 28, 2015, at 11:35 PM, Maximilian Michels <[hidden email]> wrote:
> >
> > +1 for printOnTaskManager()
> >
> > On Thu, May 28, 2015 at 2:53 PM, Kruse, Sebastian <
> [hidden email]>
> > wrote:
> >
> >> Thanks, for your quick responses!
> >>
> >> I also think that renaming the old print method should do the trick. As
> a
> >> contribution to your brainstorming for a name, I propose
> logOnTaskManager()
> >> ;)
> >>
> >> Cheers,
> >> Sebastian
> >>
> >> -----Original Message-----
> >> From: Fabian Hueske [mailto:[hidden email]]
> >> Sent: Donnerstag, 28. Mai 2015 14:34
> >> To: [hidden email]
> >> Subject: Re: Changed the behavior of "DataSet.print()"
> >>
> >> As I said, the common print prefix might indicate eager execution.
> >>
> >> I know that writeToTaskManagerStdOut() is quite bulky, but we should
> make
> >> the difference in the behavior very clear, IMO.
> >>
> >> 2015-05-28 14:29 GMT+02:00 Stephan Ewen <[hidden email]>:
> >>
> >>> Actually, there is a method "print(String prefix)" which still goes to
> >>> the sysout of where the job is executed.
> >>>
> >>> Let's give that one the name "printOnTaskManager()" and then we should
> >>> have it...
> >>>
> >>> On Thu, May 28, 2015 at 2:13 PM, Fabian Hueske <[hidden email]>
> >> wrote:
> >>>
> >>>> I would avoid to call it printXYZ, since print()'s behavior changed
> >>>> to eager execution.
> >>>>
> >>>> 2015-05-28 14:10 GMT+02:00 Robert Metzger <[hidden email]>:
> >>>>
> >>>>> Okay, you are right, local is actually confusing.
> >>>>> I'm against introducing "worker" as a term in the API. Its still
> >>>>> called "TaskManager". Maybe "printOnTaskManager()" ?
> >>>>>
> >>>>> On Thu, May 28, 2015 at 2:06 PM, Fabian Hueske <[hidden email]>
> >>>> wrote:
> >>>>>
> >>>>>> +1 for both.
> >>>>>>
> >>>>>> printLocal() might not be the best name, because "local" is not
> >>>>>> well defined and could also be understood as the local machine
> >>>>>> of the
> >>> user.
> >>>>>> How about naming the method completely different
> >>>> (writeToWorkerStdOut()?)
> >>>>>> to make sure users are not confused with eager and lazy execution?
> >>>>>>
> >>>>>>
> >>>>>> 2015-05-28 13:44 GMT+02:00 Robert Metzger <[hidden email]>:
> >>>>>>
> >>>>>>> Hi Sebastian,
> >>>>>>>
> >>>>>>> thank you for the feedback. I agree that both variants have a
> >>>>>>> right
> >>>> to
> >>>>>>> exist.
> >>>>>>>
> >>>>>>> I would vote for adding another method to the DataSet called
> >>>>>> "printLocal()"
> >>>>>>> that has the old behavior.
> >>>>>>>
> >>>>>>> On Thu, May 28, 2015 at 1:01 PM, Kruse, Sebastian <
> >>>>>> [hidden email]>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> Hi everyone,
> >>>>>>>>
> >>>>>>>> I am a bit worried about that recent change of the print()
> >>> method.
> >>>> I
> >>>>>> can
> >>>>>>>> understand the rationale that obtaining the stdout from all
> >>>>>>>> the taskmanagers is cumbersome (although, for local
> >>>>>>>> debugging the old
> >>>>>> print()
> >>>>>>>> was fine).
> >>>>>>>> However, a major problem, I see with the new print(), is,
> >>>>>>>> that
> >>> now
> >>>>> you
> >>>>>>> can
> >>>>>>>> only have one print() per plan, as the plan is directly
> >>>>>>>> executed
> >>> as
> >>>>>> soon
> >>>>>>> as
> >>>>>>>> print() is invoked. If you regard print() as a debugging
> >>>>>>>> means,
> >>>> this
> >>>>>> is a
> >>>>>>>> severe restriction.
> >>>>>>>> I see use cases for both print() implementations, but I
> >>>>>>>> would at
> >>>>> least
> >>>>>>>> provide some kind of backwards compatibility, be at a
> >>>>>>>> parameter
> >>> or
> >>>> a
> >>>>>>>> legacyPrint() method or anything else. As I assume print()
> >>>>>>>> to be
> >>>> very
> >>>>>>>> frequently used, a lot of existing programs would benefit
> >>>>>>>> from
> >>> this
> >>>>> and
> >>>>>>>> might otherwise not be directly portable to newer Flink
> >> versions.
> >>>>> What
> >>>>>> do
> >>>>>>>> you think?
> >>>>>>>>
> >>>>>>>> Cheers,
> >>>>>>>> Sebastian
> >>>>>>>>
> >>>>>>>> -----Original Message-----
> >>>>>>>> From: Robert Metzger [mailto:[hidden email]]
> >>>>>>>> Sent: Dienstag, 26. Mai 2015 11:12
> >>>>>>>> To: [hidden email]
> >>>>>>>> Subject: Re: Changed the behavior of "DataSet.print()"
> >>>>>>>>
> >>>>>>>> I've filed a JIRA to update the documentation:
> >>>>>>>> https://issues.apache.org/jira/browse/FLINK-2092
> >>>>>>>>
> >>>>>>>> On Fri, May 22, 2015 at 11:08 AM, Stephan Ewen
> >>>>>>>> <[hidden email]
> >>>>
> >>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> Hi all!
> >>>>>>>>>
> >>>>>>>>> Me merged a patch yesterday that changed the API behavior
> >>>>>>>>> of
> >>> the
> >>>>>>>>> "DataSet.print()" function.
> >>>>>>>>>
> >>>>>>>>> "print()" now prints to stdout on the client process,
> >>>>>>>>> rather
> >>> than
> >>>>> the
> >>>>>>>>> TaskManager process, as before. This is much nicer for
> >>> debugging
> >>>>> and
> >>>>>>>>> exploring data sets.
> >>>>>>>>>
> >>>>>>>>> One implication of this is that print() is now an eager
> >>>>>>>>> method
> >>> (
> >>>>> like
> >>>>>>>>> collect() or count() ). That means that calling "print()"
> >>>>> immediately
> >>>>>>>>> triggers the execution, and no "env.execute()" is required
> >>>>>>>>> any
> >>>>> more.
> >>>>>>>>>
> >>>>>>>>> Greetings,
> >>>>>>>>> Stephan
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
>
>
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Changed the behavior of "DataSet.print()"

Aljoscha Krettek-2
+1 for printOnTaskManager(prefix)

On Tue, Jun 2, 2015 at 11:35 AM, Robert Metzger <[hidden email]> wrote:

> I would like to reach consensus on this before the 0.9 release.
>
> So far we have the following ideas:
>
> writeToWorkerStdOut(prefix)
> printOnTaskManager(prefix) (+1)
> logOnTaskManager(prefix)
>
> I'm against logOnTM because we are not logging the output, we are writing
> or printing it.
>
>
> *I would vote for deprecating "print(prefix)" and adding
> "writeToWorkerStdOut(prefix)"*
>
>
>
> On Thu, May 28, 2015 at 5:00 PM, Chiwan Park <[hidden email]> wrote:
>
>> I agree that avoiding name which starts with “print” is better.
>>
>> Regards,
>> Chiwan Park
>>
>> > On May 28, 2015, at 11:35 PM, Maximilian Michels <[hidden email]> wrote:
>> >
>> > +1 for printOnTaskManager()
>> >
>> > On Thu, May 28, 2015 at 2:53 PM, Kruse, Sebastian <
>> [hidden email]>
>> > wrote:
>> >
>> >> Thanks, for your quick responses!
>> >>
>> >> I also think that renaming the old print method should do the trick. As
>> a
>> >> contribution to your brainstorming for a name, I propose
>> logOnTaskManager()
>> >> ;)
>> >>
>> >> Cheers,
>> >> Sebastian
>> >>
>> >> -----Original Message-----
>> >> From: Fabian Hueske [mailto:[hidden email]]
>> >> Sent: Donnerstag, 28. Mai 2015 14:34
>> >> To: [hidden email]
>> >> Subject: Re: Changed the behavior of "DataSet.print()"
>> >>
>> >> As I said, the common print prefix might indicate eager execution.
>> >>
>> >> I know that writeToTaskManagerStdOut() is quite bulky, but we should
>> make
>> >> the difference in the behavior very clear, IMO.
>> >>
>> >> 2015-05-28 14:29 GMT+02:00 Stephan Ewen <[hidden email]>:
>> >>
>> >>> Actually, there is a method "print(String prefix)" which still goes to
>> >>> the sysout of where the job is executed.
>> >>>
>> >>> Let's give that one the name "printOnTaskManager()" and then we should
>> >>> have it...
>> >>>
>> >>> On Thu, May 28, 2015 at 2:13 PM, Fabian Hueske <[hidden email]>
>> >> wrote:
>> >>>
>> >>>> I would avoid to call it printXYZ, since print()'s behavior changed
>> >>>> to eager execution.
>> >>>>
>> >>>> 2015-05-28 14:10 GMT+02:00 Robert Metzger <[hidden email]>:
>> >>>>
>> >>>>> Okay, you are right, local is actually confusing.
>> >>>>> I'm against introducing "worker" as a term in the API. Its still
>> >>>>> called "TaskManager". Maybe "printOnTaskManager()" ?
>> >>>>>
>> >>>>> On Thu, May 28, 2015 at 2:06 PM, Fabian Hueske <[hidden email]>
>> >>>> wrote:
>> >>>>>
>> >>>>>> +1 for both.
>> >>>>>>
>> >>>>>> printLocal() might not be the best name, because "local" is not
>> >>>>>> well defined and could also be understood as the local machine
>> >>>>>> of the
>> >>> user.
>> >>>>>> How about naming the method completely different
>> >>>> (writeToWorkerStdOut()?)
>> >>>>>> to make sure users are not confused with eager and lazy execution?
>> >>>>>>
>> >>>>>>
>> >>>>>> 2015-05-28 13:44 GMT+02:00 Robert Metzger <[hidden email]>:
>> >>>>>>
>> >>>>>>> Hi Sebastian,
>> >>>>>>>
>> >>>>>>> thank you for the feedback. I agree that both variants have a
>> >>>>>>> right
>> >>>> to
>> >>>>>>> exist.
>> >>>>>>>
>> >>>>>>> I would vote for adding another method to the DataSet called
>> >>>>>> "printLocal()"
>> >>>>>>> that has the old behavior.
>> >>>>>>>
>> >>>>>>> On Thu, May 28, 2015 at 1:01 PM, Kruse, Sebastian <
>> >>>>>> [hidden email]>
>> >>>>>>> wrote:
>> >>>>>>>
>> >>>>>>>> Hi everyone,
>> >>>>>>>>
>> >>>>>>>> I am a bit worried about that recent change of the print()
>> >>> method.
>> >>>> I
>> >>>>>> can
>> >>>>>>>> understand the rationale that obtaining the stdout from all
>> >>>>>>>> the taskmanagers is cumbersome (although, for local
>> >>>>>>>> debugging the old
>> >>>>>> print()
>> >>>>>>>> was fine).
>> >>>>>>>> However, a major problem, I see with the new print(), is,
>> >>>>>>>> that
>> >>> now
>> >>>>> you
>> >>>>>>> can
>> >>>>>>>> only have one print() per plan, as the plan is directly
>> >>>>>>>> executed
>> >>> as
>> >>>>>> soon
>> >>>>>>> as
>> >>>>>>>> print() is invoked. If you regard print() as a debugging
>> >>>>>>>> means,
>> >>>> this
>> >>>>>> is a
>> >>>>>>>> severe restriction.
>> >>>>>>>> I see use cases for both print() implementations, but I
>> >>>>>>>> would at
>> >>>>> least
>> >>>>>>>> provide some kind of backwards compatibility, be at a
>> >>>>>>>> parameter
>> >>> or
>> >>>> a
>> >>>>>>>> legacyPrint() method or anything else. As I assume print()
>> >>>>>>>> to be
>> >>>> very
>> >>>>>>>> frequently used, a lot of existing programs would benefit
>> >>>>>>>> from
>> >>> this
>> >>>>> and
>> >>>>>>>> might otherwise not be directly portable to newer Flink
>> >> versions.
>> >>>>> What
>> >>>>>> do
>> >>>>>>>> you think?
>> >>>>>>>>
>> >>>>>>>> Cheers,
>> >>>>>>>> Sebastian
>> >>>>>>>>
>> >>>>>>>> -----Original Message-----
>> >>>>>>>> From: Robert Metzger [mailto:[hidden email]]
>> >>>>>>>> Sent: Dienstag, 26. Mai 2015 11:12
>> >>>>>>>> To: [hidden email]
>> >>>>>>>> Subject: Re: Changed the behavior of "DataSet.print()"
>> >>>>>>>>
>> >>>>>>>> I've filed a JIRA to update the documentation:
>> >>>>>>>> https://issues.apache.org/jira/browse/FLINK-2092
>> >>>>>>>>
>> >>>>>>>> On Fri, May 22, 2015 at 11:08 AM, Stephan Ewen
>> >>>>>>>> <[hidden email]
>> >>>>
>> >>>>>> wrote:
>> >>>>>>>>
>> >>>>>>>>> Hi all!
>> >>>>>>>>>
>> >>>>>>>>> Me merged a patch yesterday that changed the API behavior
>> >>>>>>>>> of
>> >>> the
>> >>>>>>>>> "DataSet.print()" function.
>> >>>>>>>>>
>> >>>>>>>>> "print()" now prints to stdout on the client process,
>> >>>>>>>>> rather
>> >>> than
>> >>>>> the
>> >>>>>>>>> TaskManager process, as before. This is much nicer for
>> >>> debugging
>> >>>>> and
>> >>>>>>>>> exploring data sets.
>> >>>>>>>>>
>> >>>>>>>>> One implication of this is that print() is now an eager
>> >>>>>>>>> method
>> >>> (
>> >>>>> like
>> >>>>>>>>> collect() or count() ). That means that calling "print()"
>> >>>>> immediately
>> >>>>>>>>> triggers the execution, and no "env.execute()" is required
>> >>>>>>>>> any
>> >>>>> more.
>> >>>>>>>>>
>> >>>>>>>>> Greetings,
>> >>>>>>>>> Stephan
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>>>>
>> >>>>>>
>> >>>>>
>> >>>>
>> >>>
>> >>
>>
>>
>>
>>
>>
Reply | Threaded
Open this post in threaded view
|

Re: Changed the behavior of "DataSet.print()"

Fabian Hueske-2
+1 for writeToWorkerStdOut(prefix)
On Jun 2, 2015 11:42, "Aljoscha Krettek" <[hidden email]> wrote:

> +1 for printOnTaskManager(prefix)
>
> On Tue, Jun 2, 2015 at 11:35 AM, Robert Metzger <[hidden email]>
> wrote:
> > I would like to reach consensus on this before the 0.9 release.
> >
> > So far we have the following ideas:
> >
> > writeToWorkerStdOut(prefix)
> > printOnTaskManager(prefix) (+1)
> > logOnTaskManager(prefix)
> >
> > I'm against logOnTM because we are not logging the output, we are writing
> > or printing it.
> >
> >
> > *I would vote for deprecating "print(prefix)" and adding
> > "writeToWorkerStdOut(prefix)"*
> >
> >
> >
> > On Thu, May 28, 2015 at 5:00 PM, Chiwan Park <[hidden email]>
> wrote:
> >
> >> I agree that avoiding name which starts with “print” is better.
> >>
> >> Regards,
> >> Chiwan Park
> >>
> >> > On May 28, 2015, at 11:35 PM, Maximilian Michels <[hidden email]>
> wrote:
> >> >
> >> > +1 for printOnTaskManager()
> >> >
> >> > On Thu, May 28, 2015 at 2:53 PM, Kruse, Sebastian <
> >> [hidden email]>
> >> > wrote:
> >> >
> >> >> Thanks, for your quick responses!
> >> >>
> >> >> I also think that renaming the old print method should do the trick.
> As
> >> a
> >> >> contribution to your brainstorming for a name, I propose
> >> logOnTaskManager()
> >> >> ;)
> >> >>
> >> >> Cheers,
> >> >> Sebastian
> >> >>
> >> >> -----Original Message-----
> >> >> From: Fabian Hueske [mailto:[hidden email]]
> >> >> Sent: Donnerstag, 28. Mai 2015 14:34
> >> >> To: [hidden email]
> >> >> Subject: Re: Changed the behavior of "DataSet.print()"
> >> >>
> >> >> As I said, the common print prefix might indicate eager execution.
> >> >>
> >> >> I know that writeToTaskManagerStdOut() is quite bulky, but we should
> >> make
> >> >> the difference in the behavior very clear, IMO.
> >> >>
> >> >> 2015-05-28 14:29 GMT+02:00 Stephan Ewen <[hidden email]>:
> >> >>
> >> >>> Actually, there is a method "print(String prefix)" which still goes
> to
> >> >>> the sysout of where the job is executed.
> >> >>>
> >> >>> Let's give that one the name "printOnTaskManager()" and then we
> should
> >> >>> have it...
> >> >>>
> >> >>> On Thu, May 28, 2015 at 2:13 PM, Fabian Hueske <[hidden email]>
> >> >> wrote:
> >> >>>
> >> >>>> I would avoid to call it printXYZ, since print()'s behavior changed
> >> >>>> to eager execution.
> >> >>>>
> >> >>>> 2015-05-28 14:10 GMT+02:00 Robert Metzger <[hidden email]>:
> >> >>>>
> >> >>>>> Okay, you are right, local is actually confusing.
> >> >>>>> I'm against introducing "worker" as a term in the API. Its still
> >> >>>>> called "TaskManager". Maybe "printOnTaskManager()" ?
> >> >>>>>
> >> >>>>> On Thu, May 28, 2015 at 2:06 PM, Fabian Hueske <[hidden email]
> >
> >> >>>> wrote:
> >> >>>>>
> >> >>>>>> +1 for both.
> >> >>>>>>
> >> >>>>>> printLocal() might not be the best name, because "local" is not
> >> >>>>>> well defined and could also be understood as the local machine
> >> >>>>>> of the
> >> >>> user.
> >> >>>>>> How about naming the method completely different
> >> >>>> (writeToWorkerStdOut()?)
> >> >>>>>> to make sure users are not confused with eager and lazy
> execution?
> >> >>>>>>
> >> >>>>>>
> >> >>>>>> 2015-05-28 13:44 GMT+02:00 Robert Metzger <[hidden email]>:
> >> >>>>>>
> >> >>>>>>> Hi Sebastian,
> >> >>>>>>>
> >> >>>>>>> thank you for the feedback. I agree that both variants have a
> >> >>>>>>> right
> >> >>>> to
> >> >>>>>>> exist.
> >> >>>>>>>
> >> >>>>>>> I would vote for adding another method to the DataSet called
> >> >>>>>> "printLocal()"
> >> >>>>>>> that has the old behavior.
> >> >>>>>>>
> >> >>>>>>> On Thu, May 28, 2015 at 1:01 PM, Kruse, Sebastian <
> >> >>>>>> [hidden email]>
> >> >>>>>>> wrote:
> >> >>>>>>>
> >> >>>>>>>> Hi everyone,
> >> >>>>>>>>
> >> >>>>>>>> I am a bit worried about that recent change of the print()
> >> >>> method.
> >> >>>> I
> >> >>>>>> can
> >> >>>>>>>> understand the rationale that obtaining the stdout from all
> >> >>>>>>>> the taskmanagers is cumbersome (although, for local
> >> >>>>>>>> debugging the old
> >> >>>>>> print()
> >> >>>>>>>> was fine).
> >> >>>>>>>> However, a major problem, I see with the new print(), is,
> >> >>>>>>>> that
> >> >>> now
> >> >>>>> you
> >> >>>>>>> can
> >> >>>>>>>> only have one print() per plan, as the plan is directly
> >> >>>>>>>> executed
> >> >>> as
> >> >>>>>> soon
> >> >>>>>>> as
> >> >>>>>>>> print() is invoked. If you regard print() as a debugging
> >> >>>>>>>> means,
> >> >>>> this
> >> >>>>>> is a
> >> >>>>>>>> severe restriction.
> >> >>>>>>>> I see use cases for both print() implementations, but I
> >> >>>>>>>> would at
> >> >>>>> least
> >> >>>>>>>> provide some kind of backwards compatibility, be at a
> >> >>>>>>>> parameter
> >> >>> or
> >> >>>> a
> >> >>>>>>>> legacyPrint() method or anything else. As I assume print()
> >> >>>>>>>> to be
> >> >>>> very
> >> >>>>>>>> frequently used, a lot of existing programs would benefit
> >> >>>>>>>> from
> >> >>> this
> >> >>>>> and
> >> >>>>>>>> might otherwise not be directly portable to newer Flink
> >> >> versions.
> >> >>>>> What
> >> >>>>>> do
> >> >>>>>>>> you think?
> >> >>>>>>>>
> >> >>>>>>>> Cheers,
> >> >>>>>>>> Sebastian
> >> >>>>>>>>
> >> >>>>>>>> -----Original Message-----
> >> >>>>>>>> From: Robert Metzger [mailto:[hidden email]]
> >> >>>>>>>> Sent: Dienstag, 26. Mai 2015 11:12
> >> >>>>>>>> To: [hidden email]
> >> >>>>>>>> Subject: Re: Changed the behavior of "DataSet.print()"
> >> >>>>>>>>
> >> >>>>>>>> I've filed a JIRA to update the documentation:
> >> >>>>>>>> https://issues.apache.org/jira/browse/FLINK-2092
> >> >>>>>>>>
> >> >>>>>>>> On Fri, May 22, 2015 at 11:08 AM, Stephan Ewen
> >> >>>>>>>> <[hidden email]
> >> >>>>
> >> >>>>>> wrote:
> >> >>>>>>>>
> >> >>>>>>>>> Hi all!
> >> >>>>>>>>>
> >> >>>>>>>>> Me merged a patch yesterday that changed the API behavior
> >> >>>>>>>>> of
> >> >>> the
> >> >>>>>>>>> "DataSet.print()" function.
> >> >>>>>>>>>
> >> >>>>>>>>> "print()" now prints to stdout on the client process,
> >> >>>>>>>>> rather
> >> >>> than
> >> >>>>> the
> >> >>>>>>>>> TaskManager process, as before. This is much nicer for
> >> >>> debugging
> >> >>>>> and
> >> >>>>>>>>> exploring data sets.
> >> >>>>>>>>>
> >> >>>>>>>>> One implication of this is that print() is now an eager
> >> >>>>>>>>> method
> >> >>> (
> >> >>>>> like
> >> >>>>>>>>> collect() or count() ). That means that calling "print()"
> >> >>>>> immediately
> >> >>>>>>>>> triggers the execution, and no "env.execute()" is required
> >> >>>>>>>>> any
> >> >>>>> more.
> >> >>>>>>>>>
> >> >>>>>>>>> Greetings,
> >> >>>>>>>>> Stephan
> >> >>>>>>>>>
> >> >>>>>>>>>
> >> >>>>>>>>
> >> >>>>>>>
> >> >>>>>>
> >> >>>>>
> >> >>>>
> >> >>>
> >> >>
> >>
> >>
> >>
> >>
> >>
>
Reply | Threaded
Open this post in threaded view
|

Re: Changed the behavior of "DataSet.print()"

Till Rohrmann
+1 for printOnTaskManager(prefix)

On Tue, Jun 2, 2015 at 12:08 PM, Fabian Hueske <[hidden email]> wrote:

> +1 for writeToWorkerStdOut(prefix)
> On Jun 2, 2015 11:42, "Aljoscha Krettek" <[hidden email]> wrote:
>
> > +1 for printOnTaskManager(prefix)
> >
> > On Tue, Jun 2, 2015 at 11:35 AM, Robert Metzger <[hidden email]>
> > wrote:
> > > I would like to reach consensus on this before the 0.9 release.
> > >
> > > So far we have the following ideas:
> > >
> > > writeToWorkerStdOut(prefix)
> > > printOnTaskManager(prefix) (+1)
> > > logOnTaskManager(prefix)
> > >
> > > I'm against logOnTM because we are not logging the output, we are
> writing
> > > or printing it.
> > >
> > >
> > > *I would vote for deprecating "print(prefix)" and adding
> > > "writeToWorkerStdOut(prefix)"*
> > >
> > >
> > >
> > > On Thu, May 28, 2015 at 5:00 PM, Chiwan Park <[hidden email]>
> > wrote:
> > >
> > >> I agree that avoiding name which starts with “print” is better.
> > >>
> > >> Regards,
> > >> Chiwan Park
> > >>
> > >> > On May 28, 2015, at 11:35 PM, Maximilian Michels <[hidden email]>
> > wrote:
> > >> >
> > >> > +1 for printOnTaskManager()
> > >> >
> > >> > On Thu, May 28, 2015 at 2:53 PM, Kruse, Sebastian <
> > >> [hidden email]>
> > >> > wrote:
> > >> >
> > >> >> Thanks, for your quick responses!
> > >> >>
> > >> >> I also think that renaming the old print method should do the
> trick.
> > As
> > >> a
> > >> >> contribution to your brainstorming for a name, I propose
> > >> logOnTaskManager()
> > >> >> ;)
> > >> >>
> > >> >> Cheers,
> > >> >> Sebastian
> > >> >>
> > >> >> -----Original Message-----
> > >> >> From: Fabian Hueske [mailto:[hidden email]]
> > >> >> Sent: Donnerstag, 28. Mai 2015 14:34
> > >> >> To: [hidden email]
> > >> >> Subject: Re: Changed the behavior of "DataSet.print()"
> > >> >>
> > >> >> As I said, the common print prefix might indicate eager execution.
> > >> >>
> > >> >> I know that writeToTaskManagerStdOut() is quite bulky, but we
> should
> > >> make
> > >> >> the difference in the behavior very clear, IMO.
> > >> >>
> > >> >> 2015-05-28 14:29 GMT+02:00 Stephan Ewen <[hidden email]>:
> > >> >>
> > >> >>> Actually, there is a method "print(String prefix)" which still
> goes
> > to
> > >> >>> the sysout of where the job is executed.
> > >> >>>
> > >> >>> Let's give that one the name "printOnTaskManager()" and then we
> > should
> > >> >>> have it...
> > >> >>>
> > >> >>> On Thu, May 28, 2015 at 2:13 PM, Fabian Hueske <[hidden email]
> >
> > >> >> wrote:
> > >> >>>
> > >> >>>> I would avoid to call it printXYZ, since print()'s behavior
> changed
> > >> >>>> to eager execution.
> > >> >>>>
> > >> >>>> 2015-05-28 14:10 GMT+02:00 Robert Metzger <[hidden email]>:
> > >> >>>>
> > >> >>>>> Okay, you are right, local is actually confusing.
> > >> >>>>> I'm against introducing "worker" as a term in the API. Its still
> > >> >>>>> called "TaskManager". Maybe "printOnTaskManager()" ?
> > >> >>>>>
> > >> >>>>> On Thu, May 28, 2015 at 2:06 PM, Fabian Hueske <
> [hidden email]
> > >
> > >> >>>> wrote:
> > >> >>>>>
> > >> >>>>>> +1 for both.
> > >> >>>>>>
> > >> >>>>>> printLocal() might not be the best name, because "local" is not
> > >> >>>>>> well defined and could also be understood as the local machine
> > >> >>>>>> of the
> > >> >>> user.
> > >> >>>>>> How about naming the method completely different
> > >> >>>> (writeToWorkerStdOut()?)
> > >> >>>>>> to make sure users are not confused with eager and lazy
> > execution?
> > >> >>>>>>
> > >> >>>>>>
> > >> >>>>>> 2015-05-28 13:44 GMT+02:00 Robert Metzger <[hidden email]
> >:
> > >> >>>>>>
> > >> >>>>>>> Hi Sebastian,
> > >> >>>>>>>
> > >> >>>>>>> thank you for the feedback. I agree that both variants have a
> > >> >>>>>>> right
> > >> >>>> to
> > >> >>>>>>> exist.
> > >> >>>>>>>
> > >> >>>>>>> I would vote for adding another method to the DataSet called
> > >> >>>>>> "printLocal()"
> > >> >>>>>>> that has the old behavior.
> > >> >>>>>>>
> > >> >>>>>>> On Thu, May 28, 2015 at 1:01 PM, Kruse, Sebastian <
> > >> >>>>>> [hidden email]>
> > >> >>>>>>> wrote:
> > >> >>>>>>>
> > >> >>>>>>>> Hi everyone,
> > >> >>>>>>>>
> > >> >>>>>>>> I am a bit worried about that recent change of the print()
> > >> >>> method.
> > >> >>>> I
> > >> >>>>>> can
> > >> >>>>>>>> understand the rationale that obtaining the stdout from all
> > >> >>>>>>>> the taskmanagers is cumbersome (although, for local
> > >> >>>>>>>> debugging the old
> > >> >>>>>> print()
> > >> >>>>>>>> was fine).
> > >> >>>>>>>> However, a major problem, I see with the new print(), is,
> > >> >>>>>>>> that
> > >> >>> now
> > >> >>>>> you
> > >> >>>>>>> can
> > >> >>>>>>>> only have one print() per plan, as the plan is directly
> > >> >>>>>>>> executed
> > >> >>> as
> > >> >>>>>> soon
> > >> >>>>>>> as
> > >> >>>>>>>> print() is invoked. If you regard print() as a debugging
> > >> >>>>>>>> means,
> > >> >>>> this
> > >> >>>>>> is a
> > >> >>>>>>>> severe restriction.
> > >> >>>>>>>> I see use cases for both print() implementations, but I
> > >> >>>>>>>> would at
> > >> >>>>> least
> > >> >>>>>>>> provide some kind of backwards compatibility, be at a
> > >> >>>>>>>> parameter
> > >> >>> or
> > >> >>>> a
> > >> >>>>>>>> legacyPrint() method or anything else. As I assume print()
> > >> >>>>>>>> to be
> > >> >>>> very
> > >> >>>>>>>> frequently used, a lot of existing programs would benefit
> > >> >>>>>>>> from
> > >> >>> this
> > >> >>>>> and
> > >> >>>>>>>> might otherwise not be directly portable to newer Flink
> > >> >> versions.
> > >> >>>>> What
> > >> >>>>>> do
> > >> >>>>>>>> you think?
> > >> >>>>>>>>
> > >> >>>>>>>> Cheers,
> > >> >>>>>>>> Sebastian
> > >> >>>>>>>>
> > >> >>>>>>>> -----Original Message-----
> > >> >>>>>>>> From: Robert Metzger [mailto:[hidden email]]
> > >> >>>>>>>> Sent: Dienstag, 26. Mai 2015 11:12
> > >> >>>>>>>> To: [hidden email]
> > >> >>>>>>>> Subject: Re: Changed the behavior of "DataSet.print()"
> > >> >>>>>>>>
> > >> >>>>>>>> I've filed a JIRA to update the documentation:
> > >> >>>>>>>> https://issues.apache.org/jira/browse/FLINK-2092
> > >> >>>>>>>>
> > >> >>>>>>>> On Fri, May 22, 2015 at 11:08 AM, Stephan Ewen
> > >> >>>>>>>> <[hidden email]
> > >> >>>>
> > >> >>>>>> wrote:
> > >> >>>>>>>>
> > >> >>>>>>>>> Hi all!
> > >> >>>>>>>>>
> > >> >>>>>>>>> Me merged a patch yesterday that changed the API behavior
> > >> >>>>>>>>> of
> > >> >>> the
> > >> >>>>>>>>> "DataSet.print()" function.
> > >> >>>>>>>>>
> > >> >>>>>>>>> "print()" now prints to stdout on the client process,
> > >> >>>>>>>>> rather
> > >> >>> than
> > >> >>>>> the
> > >> >>>>>>>>> TaskManager process, as before. This is much nicer for
> > >> >>> debugging
> > >> >>>>> and
> > >> >>>>>>>>> exploring data sets.
> > >> >>>>>>>>>
> > >> >>>>>>>>> One implication of this is that print() is now an eager
> > >> >>>>>>>>> method
> > >> >>> (
> > >> >>>>> like
> > >> >>>>>>>>> collect() or count() ). That means that calling "print()"
> > >> >>>>> immediately
> > >> >>>>>>>>> triggers the execution, and no "env.execute()" is required
> > >> >>>>>>>>> any
> > >> >>>>> more.
> > >> >>>>>>>>>
> > >> >>>>>>>>> Greetings,
> > >> >>>>>>>>> Stephan
> > >> >>>>>>>>>
> > >> >>>>>>>>>
> > >> >>>>>>>>
> > >> >>>>>>>
> > >> >>>>>>
> > >> >>>>>
> > >> >>>>
> > >> >>>
> > >> >>
> > >>
> > >>
> > >>
> > >>
> > >>
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Changed the behavior of "DataSet.print()"

Kostas Tzoumas-2
+1 for printOnTaskManager(prefix)

On Tue, Jun 2, 2015 at 1:35 PM, Till Rohrmann <[hidden email]> wrote:

> +1 for printOnTaskManager(prefix)
>
> On Tue, Jun 2, 2015 at 12:08 PM, Fabian Hueske <[hidden email]> wrote:
>
> > +1 for writeToWorkerStdOut(prefix)
> > On Jun 2, 2015 11:42, "Aljoscha Krettek" <[hidden email]> wrote:
> >
> > > +1 for printOnTaskManager(prefix)
> > >
> > > On Tue, Jun 2, 2015 at 11:35 AM, Robert Metzger <[hidden email]>
> > > wrote:
> > > > I would like to reach consensus on this before the 0.9 release.
> > > >
> > > > So far we have the following ideas:
> > > >
> > > > writeToWorkerStdOut(prefix)
> > > > printOnTaskManager(prefix) (+1)
> > > > logOnTaskManager(prefix)
> > > >
> > > > I'm against logOnTM because we are not logging the output, we are
> > writing
> > > > or printing it.
> > > >
> > > >
> > > > *I would vote for deprecating "print(prefix)" and adding
> > > > "writeToWorkerStdOut(prefix)"*
> > > >
> > > >
> > > >
> > > > On Thu, May 28, 2015 at 5:00 PM, Chiwan Park <[hidden email]>
> > > wrote:
> > > >
> > > >> I agree that avoiding name which starts with “print” is better.
> > > >>
> > > >> Regards,
> > > >> Chiwan Park
> > > >>
> > > >> > On May 28, 2015, at 11:35 PM, Maximilian Michels <[hidden email]>
> > > wrote:
> > > >> >
> > > >> > +1 for printOnTaskManager()
> > > >> >
> > > >> > On Thu, May 28, 2015 at 2:53 PM, Kruse, Sebastian <
> > > >> [hidden email]>
> > > >> > wrote:
> > > >> >
> > > >> >> Thanks, for your quick responses!
> > > >> >>
> > > >> >> I also think that renaming the old print method should do the
> > trick.
> > > As
> > > >> a
> > > >> >> contribution to your brainstorming for a name, I propose
> > > >> logOnTaskManager()
> > > >> >> ;)
> > > >> >>
> > > >> >> Cheers,
> > > >> >> Sebastian
> > > >> >>
> > > >> >> -----Original Message-----
> > > >> >> From: Fabian Hueske [mailto:[hidden email]]
> > > >> >> Sent: Donnerstag, 28. Mai 2015 14:34
> > > >> >> To: [hidden email]
> > > >> >> Subject: Re: Changed the behavior of "DataSet.print()"
> > > >> >>
> > > >> >> As I said, the common print prefix might indicate eager
> execution.
> > > >> >>
> > > >> >> I know that writeToTaskManagerStdOut() is quite bulky, but we
> > should
> > > >> make
> > > >> >> the difference in the behavior very clear, IMO.
> > > >> >>
> > > >> >> 2015-05-28 14:29 GMT+02:00 Stephan Ewen <[hidden email]>:
> > > >> >>
> > > >> >>> Actually, there is a method "print(String prefix)" which still
> > goes
> > > to
> > > >> >>> the sysout of where the job is executed.
> > > >> >>>
> > > >> >>> Let's give that one the name "printOnTaskManager()" and then we
> > > should
> > > >> >>> have it...
> > > >> >>>
> > > >> >>> On Thu, May 28, 2015 at 2:13 PM, Fabian Hueske <
> [hidden email]
> > >
> > > >> >> wrote:
> > > >> >>>
> > > >> >>>> I would avoid to call it printXYZ, since print()'s behavior
> > changed
> > > >> >>>> to eager execution.
> > > >> >>>>
> > > >> >>>> 2015-05-28 14:10 GMT+02:00 Robert Metzger <[hidden email]
> >:
> > > >> >>>>
> > > >> >>>>> Okay, you are right, local is actually confusing.
> > > >> >>>>> I'm against introducing "worker" as a term in the API. Its
> still
> > > >> >>>>> called "TaskManager". Maybe "printOnTaskManager()" ?
> > > >> >>>>>
> > > >> >>>>> On Thu, May 28, 2015 at 2:06 PM, Fabian Hueske <
> > [hidden email]
> > > >
> > > >> >>>> wrote:
> > > >> >>>>>
> > > >> >>>>>> +1 for both.
> > > >> >>>>>>
> > > >> >>>>>> printLocal() might not be the best name, because "local" is
> not
> > > >> >>>>>> well defined and could also be understood as the local
> machine
> > > >> >>>>>> of the
> > > >> >>> user.
> > > >> >>>>>> How about naming the method completely different
> > > >> >>>> (writeToWorkerStdOut()?)
> > > >> >>>>>> to make sure users are not confused with eager and lazy
> > > execution?
> > > >> >>>>>>
> > > >> >>>>>>
> > > >> >>>>>> 2015-05-28 13:44 GMT+02:00 Robert Metzger <
> [hidden email]
> > >:
> > > >> >>>>>>
> > > >> >>>>>>> Hi Sebastian,
> > > >> >>>>>>>
> > > >> >>>>>>> thank you for the feedback. I agree that both variants have
> a
> > > >> >>>>>>> right
> > > >> >>>> to
> > > >> >>>>>>> exist.
> > > >> >>>>>>>
> > > >> >>>>>>> I would vote for adding another method to the DataSet called
> > > >> >>>>>> "printLocal()"
> > > >> >>>>>>> that has the old behavior.
> > > >> >>>>>>>
> > > >> >>>>>>> On Thu, May 28, 2015 at 1:01 PM, Kruse, Sebastian <
> > > >> >>>>>> [hidden email]>
> > > >> >>>>>>> wrote:
> > > >> >>>>>>>
> > > >> >>>>>>>> Hi everyone,
> > > >> >>>>>>>>
> > > >> >>>>>>>> I am a bit worried about that recent change of the print()
> > > >> >>> method.
> > > >> >>>> I
> > > >> >>>>>> can
> > > >> >>>>>>>> understand the rationale that obtaining the stdout from all
> > > >> >>>>>>>> the taskmanagers is cumbersome (although, for local
> > > >> >>>>>>>> debugging the old
> > > >> >>>>>> print()
> > > >> >>>>>>>> was fine).
> > > >> >>>>>>>> However, a major problem, I see with the new print(), is,
> > > >> >>>>>>>> that
> > > >> >>> now
> > > >> >>>>> you
> > > >> >>>>>>> can
> > > >> >>>>>>>> only have one print() per plan, as the plan is directly
> > > >> >>>>>>>> executed
> > > >> >>> as
> > > >> >>>>>> soon
> > > >> >>>>>>> as
> > > >> >>>>>>>> print() is invoked. If you regard print() as a debugging
> > > >> >>>>>>>> means,
> > > >> >>>> this
> > > >> >>>>>> is a
> > > >> >>>>>>>> severe restriction.
> > > >> >>>>>>>> I see use cases for both print() implementations, but I
> > > >> >>>>>>>> would at
> > > >> >>>>> least
> > > >> >>>>>>>> provide some kind of backwards compatibility, be at a
> > > >> >>>>>>>> parameter
> > > >> >>> or
> > > >> >>>> a
> > > >> >>>>>>>> legacyPrint() method or anything else. As I assume print()
> > > >> >>>>>>>> to be
> > > >> >>>> very
> > > >> >>>>>>>> frequently used, a lot of existing programs would benefit
> > > >> >>>>>>>> from
> > > >> >>> this
> > > >> >>>>> and
> > > >> >>>>>>>> might otherwise not be directly portable to newer Flink
> > > >> >> versions.
> > > >> >>>>> What
> > > >> >>>>>> do
> > > >> >>>>>>>> you think?
> > > >> >>>>>>>>
> > > >> >>>>>>>> Cheers,
> > > >> >>>>>>>> Sebastian
> > > >> >>>>>>>>
> > > >> >>>>>>>> -----Original Message-----
> > > >> >>>>>>>> From: Robert Metzger [mailto:[hidden email]]
> > > >> >>>>>>>> Sent: Dienstag, 26. Mai 2015 11:12
> > > >> >>>>>>>> To: [hidden email]
> > > >> >>>>>>>> Subject: Re: Changed the behavior of "DataSet.print()"
> > > >> >>>>>>>>
> > > >> >>>>>>>> I've filed a JIRA to update the documentation:
> > > >> >>>>>>>> https://issues.apache.org/jira/browse/FLINK-2092
> > > >> >>>>>>>>
> > > >> >>>>>>>> On Fri, May 22, 2015 at 11:08 AM, Stephan Ewen
> > > >> >>>>>>>> <[hidden email]
> > > >> >>>>
> > > >> >>>>>> wrote:
> > > >> >>>>>>>>
> > > >> >>>>>>>>> Hi all!
> > > >> >>>>>>>>>
> > > >> >>>>>>>>> Me merged a patch yesterday that changed the API behavior
> > > >> >>>>>>>>> of
> > > >> >>> the
> > > >> >>>>>>>>> "DataSet.print()" function.
> > > >> >>>>>>>>>
> > > >> >>>>>>>>> "print()" now prints to stdout on the client process,
> > > >> >>>>>>>>> rather
> > > >> >>> than
> > > >> >>>>> the
> > > >> >>>>>>>>> TaskManager process, as before. This is much nicer for
> > > >> >>> debugging
> > > >> >>>>> and
> > > >> >>>>>>>>> exploring data sets.
> > > >> >>>>>>>>>
> > > >> >>>>>>>>> One implication of this is that print() is now an eager
> > > >> >>>>>>>>> method
> > > >> >>> (
> > > >> >>>>> like
> > > >> >>>>>>>>> collect() or count() ). That means that calling "print()"
> > > >> >>>>> immediately
> > > >> >>>>>>>>> triggers the execution, and no "env.execute()" is required
> > > >> >>>>>>>>> any
> > > >> >>>>> more.
> > > >> >>>>>>>>>
> > > >> >>>>>>>>> Greetings,
> > > >> >>>>>>>>> Stephan
> > > >> >>>>>>>>>
> > > >> >>>>>>>>>
> > > >> >>>>>>>>
> > > >> >>>>>>>
> > > >> >>>>>>
> > > >> >>>>>
> > > >> >>>>
> > > >> >>>
> > > >> >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > >
> >
>
mxm
Reply | Threaded
Open this post in threaded view
|

Re: Changed the behavior of "DataSet.print()"

mxm
+1 for printOnTaskManager(prefix)

On Tue, Jun 2, 2015 at 1:54 PM, Kostas Tzoumas <[hidden email]> wrote:

> +1 for printOnTaskManager(prefix)
>
> On Tue, Jun 2, 2015 at 1:35 PM, Till Rohrmann <[hidden email]>
> wrote:
>
> > +1 for printOnTaskManager(prefix)
> >
> > On Tue, Jun 2, 2015 at 12:08 PM, Fabian Hueske <[hidden email]>
> wrote:
> >
> > > +1 for writeToWorkerStdOut(prefix)
> > > On Jun 2, 2015 11:42, "Aljoscha Krettek" <[hidden email]> wrote:
> > >
> > > > +1 for printOnTaskManager(prefix)
> > > >
> > > > On Tue, Jun 2, 2015 at 11:35 AM, Robert Metzger <[hidden email]
> >
> > > > wrote:
> > > > > I would like to reach consensus on this before the 0.9 release.
> > > > >
> > > > > So far we have the following ideas:
> > > > >
> > > > > writeToWorkerStdOut(prefix)
> > > > > printOnTaskManager(prefix) (+1)
> > > > > logOnTaskManager(prefix)
> > > > >
> > > > > I'm against logOnTM because we are not logging the output, we are
> > > writing
> > > > > or printing it.
> > > > >
> > > > >
> > > > > *I would vote for deprecating "print(prefix)" and adding
> > > > > "writeToWorkerStdOut(prefix)"*
> > > > >
> > > > >
> > > > >
> > > > > On Thu, May 28, 2015 at 5:00 PM, Chiwan Park <
> [hidden email]>
> > > > wrote:
> > > > >
> > > > >> I agree that avoiding name which starts with “print” is better.
> > > > >>
> > > > >> Regards,
> > > > >> Chiwan Park
> > > > >>
> > > > >> > On May 28, 2015, at 11:35 PM, Maximilian Michels <
> [hidden email]>
> > > > wrote:
> > > > >> >
> > > > >> > +1 for printOnTaskManager()
> > > > >> >
> > > > >> > On Thu, May 28, 2015 at 2:53 PM, Kruse, Sebastian <
> > > > >> [hidden email]>
> > > > >> > wrote:
> > > > >> >
> > > > >> >> Thanks, for your quick responses!
> > > > >> >>
> > > > >> >> I also think that renaming the old print method should do the
> > > trick.
> > > > As
> > > > >> a
> > > > >> >> contribution to your brainstorming for a name, I propose
> > > > >> logOnTaskManager()
> > > > >> >> ;)
> > > > >> >>
> > > > >> >> Cheers,
> > > > >> >> Sebastian
> > > > >> >>
> > > > >> >> -----Original Message-----
> > > > >> >> From: Fabian Hueske [mailto:[hidden email]]
> > > > >> >> Sent: Donnerstag, 28. Mai 2015 14:34
> > > > >> >> To: [hidden email]
> > > > >> >> Subject: Re: Changed the behavior of "DataSet.print()"
> > > > >> >>
> > > > >> >> As I said, the common print prefix might indicate eager
> > execution.
> > > > >> >>
> > > > >> >> I know that writeToTaskManagerStdOut() is quite bulky, but we
> > > should
> > > > >> make
> > > > >> >> the difference in the behavior very clear, IMO.
> > > > >> >>
> > > > >> >> 2015-05-28 14:29 GMT+02:00 Stephan Ewen <[hidden email]>:
> > > > >> >>
> > > > >> >>> Actually, there is a method "print(String prefix)" which still
> > > goes
> > > > to
> > > > >> >>> the sysout of where the job is executed.
> > > > >> >>>
> > > > >> >>> Let's give that one the name "printOnTaskManager()" and then
> we
> > > > should
> > > > >> >>> have it...
> > > > >> >>>
> > > > >> >>> On Thu, May 28, 2015 at 2:13 PM, Fabian Hueske <
> > [hidden email]
> > > >
> > > > >> >> wrote:
> > > > >> >>>
> > > > >> >>>> I would avoid to call it printXYZ, since print()'s behavior
> > > changed
> > > > >> >>>> to eager execution.
> > > > >> >>>>
> > > > >> >>>> 2015-05-28 14:10 GMT+02:00 Robert Metzger <
> [hidden email]
> > >:
> > > > >> >>>>
> > > > >> >>>>> Okay, you are right, local is actually confusing.
> > > > >> >>>>> I'm against introducing "worker" as a term in the API. Its
> > still
> > > > >> >>>>> called "TaskManager". Maybe "printOnTaskManager()" ?
> > > > >> >>>>>
> > > > >> >>>>> On Thu, May 28, 2015 at 2:06 PM, Fabian Hueske <
> > > [hidden email]
> > > > >
> > > > >> >>>> wrote:
> > > > >> >>>>>
> > > > >> >>>>>> +1 for both.
> > > > >> >>>>>>
> > > > >> >>>>>> printLocal() might not be the best name, because "local" is
> > not
> > > > >> >>>>>> well defined and could also be understood as the local
> > machine
> > > > >> >>>>>> of the
> > > > >> >>> user.
> > > > >> >>>>>> How about naming the method completely different
> > > > >> >>>> (writeToWorkerStdOut()?)
> > > > >> >>>>>> to make sure users are not confused with eager and lazy
> > > > execution?
> > > > >> >>>>>>
> > > > >> >>>>>>
> > > > >> >>>>>> 2015-05-28 13:44 GMT+02:00 Robert Metzger <
> > [hidden email]
> > > >:
> > > > >> >>>>>>
> > > > >> >>>>>>> Hi Sebastian,
> > > > >> >>>>>>>
> > > > >> >>>>>>> thank you for the feedback. I agree that both variants
> have
> > a
> > > > >> >>>>>>> right
> > > > >> >>>> to
> > > > >> >>>>>>> exist.
> > > > >> >>>>>>>
> > > > >> >>>>>>> I would vote for adding another method to the DataSet
> called
> > > > >> >>>>>> "printLocal()"
> > > > >> >>>>>>> that has the old behavior.
> > > > >> >>>>>>>
> > > > >> >>>>>>> On Thu, May 28, 2015 at 1:01 PM, Kruse, Sebastian <
> > > > >> >>>>>> [hidden email]>
> > > > >> >>>>>>> wrote:
> > > > >> >>>>>>>
> > > > >> >>>>>>>> Hi everyone,
> > > > >> >>>>>>>>
> > > > >> >>>>>>>> I am a bit worried about that recent change of the
> print()
> > > > >> >>> method.
> > > > >> >>>> I
> > > > >> >>>>>> can
> > > > >> >>>>>>>> understand the rationale that obtaining the stdout from
> all
> > > > >> >>>>>>>> the taskmanagers is cumbersome (although, for local
> > > > >> >>>>>>>> debugging the old
> > > > >> >>>>>> print()
> > > > >> >>>>>>>> was fine).
> > > > >> >>>>>>>> However, a major problem, I see with the new print(), is,
> > > > >> >>>>>>>> that
> > > > >> >>> now
> > > > >> >>>>> you
> > > > >> >>>>>>> can
> > > > >> >>>>>>>> only have one print() per plan, as the plan is directly
> > > > >> >>>>>>>> executed
> > > > >> >>> as
> > > > >> >>>>>> soon
> > > > >> >>>>>>> as
> > > > >> >>>>>>>> print() is invoked. If you regard print() as a debugging
> > > > >> >>>>>>>> means,
> > > > >> >>>> this
> > > > >> >>>>>> is a
> > > > >> >>>>>>>> severe restriction.
> > > > >> >>>>>>>> I see use cases for both print() implementations, but I
> > > > >> >>>>>>>> would at
> > > > >> >>>>> least
> > > > >> >>>>>>>> provide some kind of backwards compatibility, be at a
> > > > >> >>>>>>>> parameter
> > > > >> >>> or
> > > > >> >>>> a
> > > > >> >>>>>>>> legacyPrint() method or anything else. As I assume
> print()
> > > > >> >>>>>>>> to be
> > > > >> >>>> very
> > > > >> >>>>>>>> frequently used, a lot of existing programs would benefit
> > > > >> >>>>>>>> from
> > > > >> >>> this
> > > > >> >>>>> and
> > > > >> >>>>>>>> might otherwise not be directly portable to newer Flink
> > > > >> >> versions.
> > > > >> >>>>> What
> > > > >> >>>>>> do
> > > > >> >>>>>>>> you think?
> > > > >> >>>>>>>>
> > > > >> >>>>>>>> Cheers,
> > > > >> >>>>>>>> Sebastian
> > > > >> >>>>>>>>
> > > > >> >>>>>>>> -----Original Message-----
> > > > >> >>>>>>>> From: Robert Metzger [mailto:[hidden email]]
> > > > >> >>>>>>>> Sent: Dienstag, 26. Mai 2015 11:12
> > > > >> >>>>>>>> To: [hidden email]
> > > > >> >>>>>>>> Subject: Re: Changed the behavior of "DataSet.print()"
> > > > >> >>>>>>>>
> > > > >> >>>>>>>> I've filed a JIRA to update the documentation:
> > > > >> >>>>>>>> https://issues.apache.org/jira/browse/FLINK-2092
> > > > >> >>>>>>>>
> > > > >> >>>>>>>> On Fri, May 22, 2015 at 11:08 AM, Stephan Ewen
> > > > >> >>>>>>>> <[hidden email]
> > > > >> >>>>
> > > > >> >>>>>> wrote:
> > > > >> >>>>>>>>
> > > > >> >>>>>>>>> Hi all!
> > > > >> >>>>>>>>>
> > > > >> >>>>>>>>> Me merged a patch yesterday that changed the API
> behavior
> > > > >> >>>>>>>>> of
> > > > >> >>> the
> > > > >> >>>>>>>>> "DataSet.print()" function.
> > > > >> >>>>>>>>>
> > > > >> >>>>>>>>> "print()" now prints to stdout on the client process,
> > > > >> >>>>>>>>> rather
> > > > >> >>> than
> > > > >> >>>>> the
> > > > >> >>>>>>>>> TaskManager process, as before. This is much nicer for
> > > > >> >>> debugging
> > > > >> >>>>> and
> > > > >> >>>>>>>>> exploring data sets.
> > > > >> >>>>>>>>>
> > > > >> >>>>>>>>> One implication of this is that print() is now an eager
> > > > >> >>>>>>>>> method
> > > > >> >>> (
> > > > >> >>>>> like
> > > > >> >>>>>>>>> collect() or count() ). That means that calling
> "print()"
> > > > >> >>>>> immediately
> > > > >> >>>>>>>>> triggers the execution, and no "env.execute()" is
> required
> > > > >> >>>>>>>>> any
> > > > >> >>>>> more.
> > > > >> >>>>>>>>>
> > > > >> >>>>>>>>> Greetings,
> > > > >> >>>>>>>>> Stephan
> > > > >> >>>>>>>>>
> > > > >> >>>>>>>>>
> > > > >> >>>>>>>>
> > > > >> >>>>>>>
> > > > >> >>>>>>
> > > > >> >>>>>
> > > > >> >>>>
> > > > >> >>>
> > > > >> >>
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Changed the behavior of "DataSet.print()"

Aljoscha Krettek-2
By the way, we also should rename the corresponding Streaming API
method accordingly.

On Tue, Jun 2, 2015 at 3:24 PM, Maximilian Michels <[hidden email]> wrote:

> +1 for printOnTaskManager(prefix)
>
> On Tue, Jun 2, 2015 at 1:54 PM, Kostas Tzoumas <[hidden email]> wrote:
>
>> +1 for printOnTaskManager(prefix)
>>
>> On Tue, Jun 2, 2015 at 1:35 PM, Till Rohrmann <[hidden email]>
>> wrote:
>>
>> > +1 for printOnTaskManager(prefix)
>> >
>> > On Tue, Jun 2, 2015 at 12:08 PM, Fabian Hueske <[hidden email]>
>> wrote:
>> >
>> > > +1 for writeToWorkerStdOut(prefix)
>> > > On Jun 2, 2015 11:42, "Aljoscha Krettek" <[hidden email]> wrote:
>> > >
>> > > > +1 for printOnTaskManager(prefix)
>> > > >
>> > > > On Tue, Jun 2, 2015 at 11:35 AM, Robert Metzger <[hidden email]
>> >
>> > > > wrote:
>> > > > > I would like to reach consensus on this before the 0.9 release.
>> > > > >
>> > > > > So far we have the following ideas:
>> > > > >
>> > > > > writeToWorkerStdOut(prefix)
>> > > > > printOnTaskManager(prefix) (+1)
>> > > > > logOnTaskManager(prefix)
>> > > > >
>> > > > > I'm against logOnTM because we are not logging the output, we are
>> > > writing
>> > > > > or printing it.
>> > > > >
>> > > > >
>> > > > > *I would vote for deprecating "print(prefix)" and adding
>> > > > > "writeToWorkerStdOut(prefix)"*
>> > > > >
>> > > > >
>> > > > >
>> > > > > On Thu, May 28, 2015 at 5:00 PM, Chiwan Park <
>> [hidden email]>
>> > > > wrote:
>> > > > >
>> > > > >> I agree that avoiding name which starts with “print” is better.
>> > > > >>
>> > > > >> Regards,
>> > > > >> Chiwan Park
>> > > > >>
>> > > > >> > On May 28, 2015, at 11:35 PM, Maximilian Michels <
>> [hidden email]>
>> > > > wrote:
>> > > > >> >
>> > > > >> > +1 for printOnTaskManager()
>> > > > >> >
>> > > > >> > On Thu, May 28, 2015 at 2:53 PM, Kruse, Sebastian <
>> > > > >> [hidden email]>
>> > > > >> > wrote:
>> > > > >> >
>> > > > >> >> Thanks, for your quick responses!
>> > > > >> >>
>> > > > >> >> I also think that renaming the old print method should do the
>> > > trick.
>> > > > As
>> > > > >> a
>> > > > >> >> contribution to your brainstorming for a name, I propose
>> > > > >> logOnTaskManager()
>> > > > >> >> ;)
>> > > > >> >>
>> > > > >> >> Cheers,
>> > > > >> >> Sebastian
>> > > > >> >>
>> > > > >> >> -----Original Message-----
>> > > > >> >> From: Fabian Hueske [mailto:[hidden email]]
>> > > > >> >> Sent: Donnerstag, 28. Mai 2015 14:34
>> > > > >> >> To: [hidden email]
>> > > > >> >> Subject: Re: Changed the behavior of "DataSet.print()"
>> > > > >> >>
>> > > > >> >> As I said, the common print prefix might indicate eager
>> > execution.
>> > > > >> >>
>> > > > >> >> I know that writeToTaskManagerStdOut() is quite bulky, but we
>> > > should
>> > > > >> make
>> > > > >> >> the difference in the behavior very clear, IMO.
>> > > > >> >>
>> > > > >> >> 2015-05-28 14:29 GMT+02:00 Stephan Ewen <[hidden email]>:
>> > > > >> >>
>> > > > >> >>> Actually, there is a method "print(String prefix)" which still
>> > > goes
>> > > > to
>> > > > >> >>> the sysout of where the job is executed.
>> > > > >> >>>
>> > > > >> >>> Let's give that one the name "printOnTaskManager()" and then
>> we
>> > > > should
>> > > > >> >>> have it...
>> > > > >> >>>
>> > > > >> >>> On Thu, May 28, 2015 at 2:13 PM, Fabian Hueske <
>> > [hidden email]
>> > > >
>> > > > >> >> wrote:
>> > > > >> >>>
>> > > > >> >>>> I would avoid to call it printXYZ, since print()'s behavior
>> > > changed
>> > > > >> >>>> to eager execution.
>> > > > >> >>>>
>> > > > >> >>>> 2015-05-28 14:10 GMT+02:00 Robert Metzger <
>> [hidden email]
>> > >:
>> > > > >> >>>>
>> > > > >> >>>>> Okay, you are right, local is actually confusing.
>> > > > >> >>>>> I'm against introducing "worker" as a term in the API. Its
>> > still
>> > > > >> >>>>> called "TaskManager". Maybe "printOnTaskManager()" ?
>> > > > >> >>>>>
>> > > > >> >>>>> On Thu, May 28, 2015 at 2:06 PM, Fabian Hueske <
>> > > [hidden email]
>> > > > >
>> > > > >> >>>> wrote:
>> > > > >> >>>>>
>> > > > >> >>>>>> +1 for both.
>> > > > >> >>>>>>
>> > > > >> >>>>>> printLocal() might not be the best name, because "local" is
>> > not
>> > > > >> >>>>>> well defined and could also be understood as the local
>> > machine
>> > > > >> >>>>>> of the
>> > > > >> >>> user.
>> > > > >> >>>>>> How about naming the method completely different
>> > > > >> >>>> (writeToWorkerStdOut()?)
>> > > > >> >>>>>> to make sure users are not confused with eager and lazy
>> > > > execution?
>> > > > >> >>>>>>
>> > > > >> >>>>>>
>> > > > >> >>>>>> 2015-05-28 13:44 GMT+02:00 Robert Metzger <
>> > [hidden email]
>> > > >:
>> > > > >> >>>>>>
>> > > > >> >>>>>>> Hi Sebastian,
>> > > > >> >>>>>>>
>> > > > >> >>>>>>> thank you for the feedback. I agree that both variants
>> have
>> > a
>> > > > >> >>>>>>> right
>> > > > >> >>>> to
>> > > > >> >>>>>>> exist.
>> > > > >> >>>>>>>
>> > > > >> >>>>>>> I would vote for adding another method to the DataSet
>> called
>> > > > >> >>>>>> "printLocal()"
>> > > > >> >>>>>>> that has the old behavior.
>> > > > >> >>>>>>>
>> > > > >> >>>>>>> On Thu, May 28, 2015 at 1:01 PM, Kruse, Sebastian <
>> > > > >> >>>>>> [hidden email]>
>> > > > >> >>>>>>> wrote:
>> > > > >> >>>>>>>
>> > > > >> >>>>>>>> Hi everyone,
>> > > > >> >>>>>>>>
>> > > > >> >>>>>>>> I am a bit worried about that recent change of the
>> print()
>> > > > >> >>> method.
>> > > > >> >>>> I
>> > > > >> >>>>>> can
>> > > > >> >>>>>>>> understand the rationale that obtaining the stdout from
>> all
>> > > > >> >>>>>>>> the taskmanagers is cumbersome (although, for local
>> > > > >> >>>>>>>> debugging the old
>> > > > >> >>>>>> print()
>> > > > >> >>>>>>>> was fine).
>> > > > >> >>>>>>>> However, a major problem, I see with the new print(), is,
>> > > > >> >>>>>>>> that
>> > > > >> >>> now
>> > > > >> >>>>> you
>> > > > >> >>>>>>> can
>> > > > >> >>>>>>>> only have one print() per plan, as the plan is directly
>> > > > >> >>>>>>>> executed
>> > > > >> >>> as
>> > > > >> >>>>>> soon
>> > > > >> >>>>>>> as
>> > > > >> >>>>>>>> print() is invoked. If you regard print() as a debugging
>> > > > >> >>>>>>>> means,
>> > > > >> >>>> this
>> > > > >> >>>>>> is a
>> > > > >> >>>>>>>> severe restriction.
>> > > > >> >>>>>>>> I see use cases for both print() implementations, but I
>> > > > >> >>>>>>>> would at
>> > > > >> >>>>> least
>> > > > >> >>>>>>>> provide some kind of backwards compatibility, be at a
>> > > > >> >>>>>>>> parameter
>> > > > >> >>> or
>> > > > >> >>>> a
>> > > > >> >>>>>>>> legacyPrint() method or anything else. As I assume
>> print()
>> > > > >> >>>>>>>> to be
>> > > > >> >>>> very
>> > > > >> >>>>>>>> frequently used, a lot of existing programs would benefit
>> > > > >> >>>>>>>> from
>> > > > >> >>> this
>> > > > >> >>>>> and
>> > > > >> >>>>>>>> might otherwise not be directly portable to newer Flink
>> > > > >> >> versions.
>> > > > >> >>>>> What
>> > > > >> >>>>>> do
>> > > > >> >>>>>>>> you think?
>> > > > >> >>>>>>>>
>> > > > >> >>>>>>>> Cheers,
>> > > > >> >>>>>>>> Sebastian
>> > > > >> >>>>>>>>
>> > > > >> >>>>>>>> -----Original Message-----
>> > > > >> >>>>>>>> From: Robert Metzger [mailto:[hidden email]]
>> > > > >> >>>>>>>> Sent: Dienstag, 26. Mai 2015 11:12
>> > > > >> >>>>>>>> To: [hidden email]
>> > > > >> >>>>>>>> Subject: Re: Changed the behavior of "DataSet.print()"
>> > > > >> >>>>>>>>
>> > > > >> >>>>>>>> I've filed a JIRA to update the documentation:
>> > > > >> >>>>>>>> https://issues.apache.org/jira/browse/FLINK-2092
>> > > > >> >>>>>>>>
>> > > > >> >>>>>>>> On Fri, May 22, 2015 at 11:08 AM, Stephan Ewen
>> > > > >> >>>>>>>> <[hidden email]
>> > > > >> >>>>
>> > > > >> >>>>>> wrote:
>> > > > >> >>>>>>>>
>> > > > >> >>>>>>>>> Hi all!
>> > > > >> >>>>>>>>>
>> > > > >> >>>>>>>>> Me merged a patch yesterday that changed the API
>> behavior
>> > > > >> >>>>>>>>> of
>> > > > >> >>> the
>> > > > >> >>>>>>>>> "DataSet.print()" function.
>> > > > >> >>>>>>>>>
>> > > > >> >>>>>>>>> "print()" now prints to stdout on the client process,
>> > > > >> >>>>>>>>> rather
>> > > > >> >>> than
>> > > > >> >>>>> the
>> > > > >> >>>>>>>>> TaskManager process, as before. This is much nicer for
>> > > > >> >>> debugging
>> > > > >> >>>>> and
>> > > > >> >>>>>>>>> exploring data sets.
>> > > > >> >>>>>>>>>
>> > > > >> >>>>>>>>> One implication of this is that print() is now an eager
>> > > > >> >>>>>>>>> method
>> > > > >> >>> (
>> > > > >> >>>>> like
>> > > > >> >>>>>>>>> collect() or count() ). That means that calling
>> "print()"
>> > > > >> >>>>> immediately
>> > > > >> >>>>>>>>> triggers the execution, and no "env.execute()" is
>> required
>> > > > >> >>>>>>>>> any
>> > > > >> >>>>> more.
>> > > > >> >>>>>>>>>
>> > > > >> >>>>>>>>> Greetings,
>> > > > >> >>>>>>>>> Stephan
>> > > > >> >>>>>>>>>
>> > > > >> >>>>>>>>>
>> > > > >> >>>>>>>>
>> > > > >> >>>>>>>
>> > > > >> >>>>>>
>> > > > >> >>>>>
>> > > > >> >>>>
>> > > > >> >>>
>> > > > >> >>
>> > > > >>
>> > > > >>
>> > > > >>
>> > > > >>
>> > > > >>
>> > > >
>> > >
>> >
>>
Reply | Threaded
Open this post in threaded view
|

Re: Changed the behavior of "DataSet.print()"

Stephan Ewen
+1 for printOnTaskManager(prefix)

+1 for deprecating the print(prefix) method.

On Tue, Jun 2, 2015 at 5:24 PM, Aljoscha Krettek <[hidden email]>
wrote:

> By the way, we also should rename the corresponding Streaming API
> method accordingly.
>
> On Tue, Jun 2, 2015 at 3:24 PM, Maximilian Michels <[hidden email]> wrote:
> > +1 for printOnTaskManager(prefix)
> >
> > On Tue, Jun 2, 2015 at 1:54 PM, Kostas Tzoumas <[hidden email]>
> wrote:
> >
> >> +1 for printOnTaskManager(prefix)
> >>
> >> On Tue, Jun 2, 2015 at 1:35 PM, Till Rohrmann <[hidden email]>
> >> wrote:
> >>
> >> > +1 for printOnTaskManager(prefix)
> >> >
> >> > On Tue, Jun 2, 2015 at 12:08 PM, Fabian Hueske <[hidden email]>
> >> wrote:
> >> >
> >> > > +1 for writeToWorkerStdOut(prefix)
> >> > > On Jun 2, 2015 11:42, "Aljoscha Krettek" <[hidden email]>
> wrote:
> >> > >
> >> > > > +1 for printOnTaskManager(prefix)
> >> > > >
> >> > > > On Tue, Jun 2, 2015 at 11:35 AM, Robert Metzger <
> [hidden email]
> >> >
> >> > > > wrote:
> >> > > > > I would like to reach consensus on this before the 0.9 release.
> >> > > > >
> >> > > > > So far we have the following ideas:
> >> > > > >
> >> > > > > writeToWorkerStdOut(prefix)
> >> > > > > printOnTaskManager(prefix) (+1)
> >> > > > > logOnTaskManager(prefix)
> >> > > > >
> >> > > > > I'm against logOnTM because we are not logging the output, we
> are
> >> > > writing
> >> > > > > or printing it.
> >> > > > >
> >> > > > >
> >> > > > > *I would vote for deprecating "print(prefix)" and adding
> >> > > > > "writeToWorkerStdOut(prefix)"*
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > > On Thu, May 28, 2015 at 5:00 PM, Chiwan Park <
> >> [hidden email]>
> >> > > > wrote:
> >> > > > >
> >> > > > >> I agree that avoiding name which starts with “print” is better.
> >> > > > >>
> >> > > > >> Regards,
> >> > > > >> Chiwan Park
> >> > > > >>
> >> > > > >> > On May 28, 2015, at 11:35 PM, Maximilian Michels <
> >> [hidden email]>
> >> > > > wrote:
> >> > > > >> >
> >> > > > >> > +1 for printOnTaskManager()
> >> > > > >> >
> >> > > > >> > On Thu, May 28, 2015 at 2:53 PM, Kruse, Sebastian <
> >> > > > >> [hidden email]>
> >> > > > >> > wrote:
> >> > > > >> >
> >> > > > >> >> Thanks, for your quick responses!
> >> > > > >> >>
> >> > > > >> >> I also think that renaming the old print method should do
> the
> >> > > trick.
> >> > > > As
> >> > > > >> a
> >> > > > >> >> contribution to your brainstorming for a name, I propose
> >> > > > >> logOnTaskManager()
> >> > > > >> >> ;)
> >> > > > >> >>
> >> > > > >> >> Cheers,
> >> > > > >> >> Sebastian
> >> > > > >> >>
> >> > > > >> >> -----Original Message-----
> >> > > > >> >> From: Fabian Hueske [mailto:[hidden email]]
> >> > > > >> >> Sent: Donnerstag, 28. Mai 2015 14:34
> >> > > > >> >> To: [hidden email]
> >> > > > >> >> Subject: Re: Changed the behavior of "DataSet.print()"
> >> > > > >> >>
> >> > > > >> >> As I said, the common print prefix might indicate eager
> >> > execution.
> >> > > > >> >>
> >> > > > >> >> I know that writeToTaskManagerStdOut() is quite bulky, but
> we
> >> > > should
> >> > > > >> make
> >> > > > >> >> the difference in the behavior very clear, IMO.
> >> > > > >> >>
> >> > > > >> >> 2015-05-28 14:29 GMT+02:00 Stephan Ewen <[hidden email]>:
> >> > > > >> >>
> >> > > > >> >>> Actually, there is a method "print(String prefix)" which
> still
> >> > > goes
> >> > > > to
> >> > > > >> >>> the sysout of where the job is executed.
> >> > > > >> >>>
> >> > > > >> >>> Let's give that one the name "printOnTaskManager()" and
> then
> >> we
> >> > > > should
> >> > > > >> >>> have it...
> >> > > > >> >>>
> >> > > > >> >>> On Thu, May 28, 2015 at 2:13 PM, Fabian Hueske <
> >> > [hidden email]
> >> > > >
> >> > > > >> >> wrote:
> >> > > > >> >>>
> >> > > > >> >>>> I would avoid to call it printXYZ, since print()'s
> behavior
> >> > > changed
> >> > > > >> >>>> to eager execution.
> >> > > > >> >>>>
> >> > > > >> >>>> 2015-05-28 14:10 GMT+02:00 Robert Metzger <
> >> [hidden email]
> >> > >:
> >> > > > >> >>>>
> >> > > > >> >>>>> Okay, you are right, local is actually confusing.
> >> > > > >> >>>>> I'm against introducing "worker" as a term in the API.
> Its
> >> > still
> >> > > > >> >>>>> called "TaskManager". Maybe "printOnTaskManager()" ?
> >> > > > >> >>>>>
> >> > > > >> >>>>> On Thu, May 28, 2015 at 2:06 PM, Fabian Hueske <
> >> > > [hidden email]
> >> > > > >
> >> > > > >> >>>> wrote:
> >> > > > >> >>>>>
> >> > > > >> >>>>>> +1 for both.
> >> > > > >> >>>>>>
> >> > > > >> >>>>>> printLocal() might not be the best name, because
> "local" is
> >> > not
> >> > > > >> >>>>>> well defined and could also be understood as the local
> >> > machine
> >> > > > >> >>>>>> of the
> >> > > > >> >>> user.
> >> > > > >> >>>>>> How about naming the method completely different
> >> > > > >> >>>> (writeToWorkerStdOut()?)
> >> > > > >> >>>>>> to make sure users are not confused with eager and lazy
> >> > > > execution?
> >> > > > >> >>>>>>
> >> > > > >> >>>>>>
> >> > > > >> >>>>>> 2015-05-28 13:44 GMT+02:00 Robert Metzger <
> >> > [hidden email]
> >> > > >:
> >> > > > >> >>>>>>
> >> > > > >> >>>>>>> Hi Sebastian,
> >> > > > >> >>>>>>>
> >> > > > >> >>>>>>> thank you for the feedback. I agree that both variants
> >> have
> >> > a
> >> > > > >> >>>>>>> right
> >> > > > >> >>>> to
> >> > > > >> >>>>>>> exist.
> >> > > > >> >>>>>>>
> >> > > > >> >>>>>>> I would vote for adding another method to the DataSet
> >> called
> >> > > > >> >>>>>> "printLocal()"
> >> > > > >> >>>>>>> that has the old behavior.
> >> > > > >> >>>>>>>
> >> > > > >> >>>>>>> On Thu, May 28, 2015 at 1:01 PM, Kruse, Sebastian <
> >> > > > >> >>>>>> [hidden email]>
> >> > > > >> >>>>>>> wrote:
> >> > > > >> >>>>>>>
> >> > > > >> >>>>>>>> Hi everyone,
> >> > > > >> >>>>>>>>
> >> > > > >> >>>>>>>> I am a bit worried about that recent change of the
> >> print()
> >> > > > >> >>> method.
> >> > > > >> >>>> I
> >> > > > >> >>>>>> can
> >> > > > >> >>>>>>>> understand the rationale that obtaining the stdout
> from
> >> all
> >> > > > >> >>>>>>>> the taskmanagers is cumbersome (although, for local
> >> > > > >> >>>>>>>> debugging the old
> >> > > > >> >>>>>> print()
> >> > > > >> >>>>>>>> was fine).
> >> > > > >> >>>>>>>> However, a major problem, I see with the new print(),
> is,
> >> > > > >> >>>>>>>> that
> >> > > > >> >>> now
> >> > > > >> >>>>> you
> >> > > > >> >>>>>>> can
> >> > > > >> >>>>>>>> only have one print() per plan, as the plan is
> directly
> >> > > > >> >>>>>>>> executed
> >> > > > >> >>> as
> >> > > > >> >>>>>> soon
> >> > > > >> >>>>>>> as
> >> > > > >> >>>>>>>> print() is invoked. If you regard print() as a
> debugging
> >> > > > >> >>>>>>>> means,
> >> > > > >> >>>> this
> >> > > > >> >>>>>> is a
> >> > > > >> >>>>>>>> severe restriction.
> >> > > > >> >>>>>>>> I see use cases for both print() implementations, but
> I
> >> > > > >> >>>>>>>> would at
> >> > > > >> >>>>> least
> >> > > > >> >>>>>>>> provide some kind of backwards compatibility, be at a
> >> > > > >> >>>>>>>> parameter
> >> > > > >> >>> or
> >> > > > >> >>>> a
> >> > > > >> >>>>>>>> legacyPrint() method or anything else. As I assume
> >> print()
> >> > > > >> >>>>>>>> to be
> >> > > > >> >>>> very
> >> > > > >> >>>>>>>> frequently used, a lot of existing programs would
> benefit
> >> > > > >> >>>>>>>> from
> >> > > > >> >>> this
> >> > > > >> >>>>> and
> >> > > > >> >>>>>>>> might otherwise not be directly portable to newer
> Flink
> >> > > > >> >> versions.
> >> > > > >> >>>>> What
> >> > > > >> >>>>>> do
> >> > > > >> >>>>>>>> you think?
> >> > > > >> >>>>>>>>
> >> > > > >> >>>>>>>> Cheers,
> >> > > > >> >>>>>>>> Sebastian
> >> > > > >> >>>>>>>>
> >> > > > >> >>>>>>>> -----Original Message-----
> >> > > > >> >>>>>>>> From: Robert Metzger [mailto:[hidden email]]
> >> > > > >> >>>>>>>> Sent: Dienstag, 26. Mai 2015 11:12
> >> > > > >> >>>>>>>> To: [hidden email]
> >> > > > >> >>>>>>>> Subject: Re: Changed the behavior of "DataSet.print()"
> >> > > > >> >>>>>>>>
> >> > > > >> >>>>>>>> I've filed a JIRA to update the documentation:
> >> > > > >> >>>>>>>> https://issues.apache.org/jira/browse/FLINK-2092
> >> > > > >> >>>>>>>>
> >> > > > >> >>>>>>>> On Fri, May 22, 2015 at 11:08 AM, Stephan Ewen
> >> > > > >> >>>>>>>> <[hidden email]
> >> > > > >> >>>>
> >> > > > >> >>>>>> wrote:
> >> > > > >> >>>>>>>>
> >> > > > >> >>>>>>>>> Hi all!
> >> > > > >> >>>>>>>>>
> >> > > > >> >>>>>>>>> Me merged a patch yesterday that changed the API
> >> behavior
> >> > > > >> >>>>>>>>> of
> >> > > > >> >>> the
> >> > > > >> >>>>>>>>> "DataSet.print()" function.
> >> > > > >> >>>>>>>>>
> >> > > > >> >>>>>>>>> "print()" now prints to stdout on the client process,
> >> > > > >> >>>>>>>>> rather
> >> > > > >> >>> than
> >> > > > >> >>>>> the
> >> > > > >> >>>>>>>>> TaskManager process, as before. This is much nicer
> for
> >> > > > >> >>> debugging
> >> > > > >> >>>>> and
> >> > > > >> >>>>>>>>> exploring data sets.
> >> > > > >> >>>>>>>>>
> >> > > > >> >>>>>>>>> One implication of this is that print() is now an
> eager
> >> > > > >> >>>>>>>>> method
> >> > > > >> >>> (
> >> > > > >> >>>>> like
> >> > > > >> >>>>>>>>> collect() or count() ). That means that calling
> >> "print()"
> >> > > > >> >>>>> immediately
> >> > > > >> >>>>>>>>> triggers the execution, and no "env.execute()" is
> >> required
> >> > > > >> >>>>>>>>> any
> >> > > > >> >>>>> more.
> >> > > > >> >>>>>>>>>
> >> > > > >> >>>>>>>>> Greetings,
> >> > > > >> >>>>>>>>> Stephan
> >> > > > >> >>>>>>>>>
> >> > > > >> >>>>>>>>>
> >> > > > >> >>>>>>>>
> >> > > > >> >>>>>>>
> >> > > > >> >>>>>>
> >> > > > >> >>>>>
> >> > > > >> >>>>
> >> > > > >> >>>
> >> > > > >> >>
> >> > > > >>
> >> > > > >>
> >> > > > >>
> >> > > > >>
> >> > > > >>
> >> > > >
> >> > >
> >> >
> >>
>
12