Dear Flink community,
I have stopped to count how many people on the user list and during Flink trainings have asked why their Flink program throws an Exception when they just one to print a DataSet. The reason for this is that print() now executes eagerly, thus, executes the Flink program. Subsequent calls to execute() need to define new DataSinks and throw an exception otherwise. We have recently introduced a flag in the ExecutionEnvironment that checks whether the user executed before (explicitly via execute() or implicitly through collect()/print()/count()). That enabled us to print a nicer exception message. However, users either do not read the exception message or do not understand it. They do ask this question a lot. That's why I propose to ignore calls to execute() entirely if no sinks are defined. That will get rid of one of the core annoyances for Flink users. I know, that this is painfully for us programmers because we understand how Flink works internally but let's step back once and see that it wouldn't be so bad if execute didn't do anything in case of no new sinks. What would be the downside of this change? Users might call execute() and wonder that nothing happens. We would then simply print a warning that their program didn't define any sinks. That is a big difference to the behavior before because users are scared of exceptions. If they just get a warning they will double-check their program and investigate why nothing happens. Most of the cases they do actually have defined sinks but simply left a call to execute() when they were printing a DataSet. What are you opinions on this issue? I have opened a JIRA for this as well: https://issues.apache.org/jira/browse/FLINK-2249 Best, Max |
+1 for ignoring execute() call with warning.
But I'm concerned for how the user catches the error in program without any data sinks. By the way, eager execution is not well documented in data sinks section but is in program skeleton section. [1] This makes the user’s confusion. We should clean up documents. There are many codes calling execute() method after print() method. [2][3] We should add a description for count() method to documents too. [1] http://ci.apache.org/projects/flink/flink-docs-master/apis/programming_guide.html#data-sinks [2] http://ci.apache.org/projects/flink/flink-docs-master/apis/programming_guide.html#parallel-execution [3] http://ci.apache.org/projects/flink/flink-docs-master/apis/programming_guide.html#iteration-operators Regards, Chiwan Park > On Jun 19, 2015, at 9:15 PM, Maximilian Michels <[hidden email]> wrote: > > Dear Flink community, > > I have stopped to count how many people on the user list and during Flink > trainings have asked why their Flink program throws an Exception when they > just one to print a DataSet. The reason for this is that print() now > executes eagerly, thus, executes the Flink program. Subsequent calls to > execute() need to define new DataSinks and throw an exception otherwise. > > We have recently introduced a flag in the ExecutionEnvironment that checks > whether the user executed before (explicitly via execute() or implicitly > through collect()/print()/count()). That enabled us to print a nicer > exception message. However, users either do not read the exception message > or do not understand it. They do ask this question a lot. > > That's why I propose to ignore calls to execute() entirely if no sinks are > defined. That will get rid of one of the core annoyances for Flink users. I > know, that this is painfully for us programmers because we understand how > Flink works internally but let's step back once and see that it wouldn't be > so bad if execute didn't do anything in case of no new sinks. > > What would be the downside of this change? Users might call execute() and > wonder that nothing happens. We would then simply print a warning that > their program didn't define any sinks. That is a big difference to the > behavior before because users are scared of exceptions. If they just get a > warning they will double-check their program and investigate why nothing > happens. Most of the cases they do actually have defined sinks but simply > left a call to execute() when they were printing a DataSet. > > What are you opinions on this issue? I have opened a JIRA for this as well: > https://issues.apache.org/jira/browse/FLINK-2249 > > Best, > Max |
We could also add a link to the documentation into the exception that
explains the behavior. On Fri, Jun 19, 2015 at 5:52 AM, Chiwan Park <[hidden email]> wrote: > +1 for ignoring execute() call with warning. > > But I'm concerned for how the user catches the error in program without > any data sinks. > > By the way, eager execution is not well documented in data sinks section > but is in program > skeleton section. [1] This makes the user’s confusion. We should clean up > documents. > There are many codes calling execute() method after print() method. [2][3] > > We should add a description for count() method to documents too. > > [1] > http://ci.apache.org/projects/flink/flink-docs-master/apis/programming_guide.html#data-sinks > [2] > http://ci.apache.org/projects/flink/flink-docs-master/apis/programming_guide.html#parallel-execution > [3] > http://ci.apache.org/projects/flink/flink-docs-master/apis/programming_guide.html#iteration-operators > > Regards, > Chiwan Park > > > On Jun 19, 2015, at 9:15 PM, Maximilian Michels <[hidden email]> wrote: > > > > Dear Flink community, > > > > I have stopped to count how many people on the user list and during Flink > > trainings have asked why their Flink program throws an Exception when > they > > just one to print a DataSet. The reason for this is that print() now > > executes eagerly, thus, executes the Flink program. Subsequent calls to > > execute() need to define new DataSinks and throw an exception otherwise. > > > > We have recently introduced a flag in the ExecutionEnvironment that > checks > > whether the user executed before (explicitly via execute() or implicitly > > through collect()/print()/count()). That enabled us to print a nicer > > exception message. However, users either do not read the exception > message > > or do not understand it. They do ask this question a lot. > > > > That's why I propose to ignore calls to execute() entirely if no sinks > are > > defined. That will get rid of one of the core annoyances for Flink > users. I > > know, that this is painfully for us programmers because we understand how > > Flink works internally but let's step back once and see that it wouldn't > be > > so bad if execute didn't do anything in case of no new sinks. > > > > What would be the downside of this change? Users might call execute() and > > wonder that nothing happens. We would then simply print a warning that > > their program didn't define any sinks. That is a big difference to the > > behavior before because users are scared of exceptions. If they just get > a > > warning they will double-check their program and investigate why nothing > > happens. Most of the cases they do actually have defined sinks but simply > > left a call to execute() when they were printing a DataSet. > > > > What are you opinions on this issue? I have opened a JIRA for this as > well: > > https://issues.apache.org/jira/browse/FLINK-2249 > > > > Best, > > Max > > > > > |
+1 for cleaning up the documentation
+1 for adding a link to the documentation (should be a permalink) +1 for printing a warning instead of an exception On Sun, Jun 21, 2015 at 12:25 AM, Robert Metzger <[hidden email]> wrote: > We could also add a link to the documentation into the exception that > explains the behavior. > > On Fri, Jun 19, 2015 at 5:52 AM, Chiwan Park <[hidden email]> > wrote: > > > +1 for ignoring execute() call with warning. > > > > But I'm concerned for how the user catches the error in program without > > any data sinks. > > > > By the way, eager execution is not well documented in data sinks section > > but is in program > > skeleton section. [1] This makes the user’s confusion. We should clean up > > documents. > > There are many codes calling execute() method after print() method. > [2][3] > > > > We should add a description for count() method to documents too. > > > > [1] > > > http://ci.apache.org/projects/flink/flink-docs-master/apis/programming_guide.html#data-sinks > > [2] > > > http://ci.apache.org/projects/flink/flink-docs-master/apis/programming_guide.html#parallel-execution > > [3] > > > http://ci.apache.org/projects/flink/flink-docs-master/apis/programming_guide.html#iteration-operators > > > > Regards, > > Chiwan Park > > > > > On Jun 19, 2015, at 9:15 PM, Maximilian Michels <[hidden email]> > wrote: > > > > > > Dear Flink community, > > > > > > I have stopped to count how many people on the user list and during > Flink > > > trainings have asked why their Flink program throws an Exception when > > they > > > just one to print a DataSet. The reason for this is that print() now > > > executes eagerly, thus, executes the Flink program. Subsequent calls to > > > execute() need to define new DataSinks and throw an exception > otherwise. > > > > > > We have recently introduced a flag in the ExecutionEnvironment that > > checks > > > whether the user executed before (explicitly via execute() or > implicitly > > > through collect()/print()/count()). That enabled us to print a nicer > > > exception message. However, users either do not read the exception > > message > > > or do not understand it. They do ask this question a lot. > > > > > > That's why I propose to ignore calls to execute() entirely if no sinks > > are > > > defined. That will get rid of one of the core annoyances for Flink > > users. I > > > know, that this is painfully for us programmers because we understand > how > > > Flink works internally but let's step back once and see that it > wouldn't > > be > > > so bad if execute didn't do anything in case of no new sinks. > > > > > > What would be the downside of this change? Users might call execute() > and > > > wonder that nothing happens. We would then simply print a warning that > > > their program didn't define any sinks. That is a big difference to the > > > behavior before because users are scared of exceptions. If they just > get > > a > > > warning they will double-check their program and investigate why > nothing > > > happens. Most of the cases they do actually have defined sinks but > simply > > > left a call to execute() when they were printing a DataSet. > > > > > > What are you opinions on this issue? I have opened a JIRA for this as > > well: > > > https://issues.apache.org/jira/browse/FLINK-2249 > > > > > > Best, > > > Max > > > > > > > > > > > |
We have two situations to trade off here, and fixing one will make the
other worse: 1) env.execute() after collect() - see Max's mail 2) env.execute() on empty sinks program. Not throwing an exception makes people wonder why nothing happens (if they write the program to just test whether it runs or if they want to measure time). Both choices make one behave nice and the other not. So far, the idea was that throwing an exception on empty sinks is that the error message will help people figure out what is wrong fast. Debugging why nothing happens can be slow. It is hard to say if we would not introduce another source of confusion by fixing one... On Mon, Jun 22, 2015 at 10:26 AM, Maximilian Michels <[hidden email]> wrote: > +1 for cleaning up the documentation > +1 for adding a link to the documentation (should be a permalink) > +1 for printing a warning instead of an exception > > On Sun, Jun 21, 2015 at 12:25 AM, Robert Metzger <[hidden email]> > wrote: > > > We could also add a link to the documentation into the exception that > > explains the behavior. > > > > On Fri, Jun 19, 2015 at 5:52 AM, Chiwan Park <[hidden email]> > > wrote: > > > > > +1 for ignoring execute() call with warning. > > > > > > But I'm concerned for how the user catches the error in program without > > > any data sinks. > > > > > > By the way, eager execution is not well documented in data sinks > section > > > but is in program > > > skeleton section. [1] This makes the user’s confusion. We should clean > up > > > documents. > > > There are many codes calling execute() method after print() method. > > [2][3] > > > > > > We should add a description for count() method to documents too. > > > > > > [1] > > > > > > http://ci.apache.org/projects/flink/flink-docs-master/apis/programming_guide.html#data-sinks > > > [2] > > > > > > http://ci.apache.org/projects/flink/flink-docs-master/apis/programming_guide.html#parallel-execution > > > [3] > > > > > > http://ci.apache.org/projects/flink/flink-docs-master/apis/programming_guide.html#iteration-operators > > > > > > Regards, > > > Chiwan Park > > > > > > > On Jun 19, 2015, at 9:15 PM, Maximilian Michels <[hidden email]> > > wrote: > > > > > > > > Dear Flink community, > > > > > > > > I have stopped to count how many people on the user list and during > > Flink > > > > trainings have asked why their Flink program throws an Exception when > > > they > > > > just one to print a DataSet. The reason for this is that print() now > > > > executes eagerly, thus, executes the Flink program. Subsequent calls > to > > > > execute() need to define new DataSinks and throw an exception > > otherwise. > > > > > > > > We have recently introduced a flag in the ExecutionEnvironment that > > > checks > > > > whether the user executed before (explicitly via execute() or > > implicitly > > > > through collect()/print()/count()). That enabled us to print a nicer > > > > exception message. However, users either do not read the exception > > > message > > > > or do not understand it. They do ask this question a lot. > > > > > > > > That's why I propose to ignore calls to execute() entirely if no > sinks > > > are > > > > defined. That will get rid of one of the core annoyances for Flink > > > users. I > > > > know, that this is painfully for us programmers because we understand > > how > > > > Flink works internally but let's step back once and see that it > > wouldn't > > > be > > > > so bad if execute didn't do anything in case of no new sinks. > > > > > > > > What would be the downside of this change? Users might call execute() > > and > > > > wonder that nothing happens. We would then simply print a warning > that > > > > their program didn't define any sinks. That is a big difference to > the > > > > behavior before because users are scared of exceptions. If they just > > get > > > a > > > > warning they will double-check their program and investigate why > > nothing > > > > happens. Most of the cases they do actually have defined sinks but > > simply > > > > left a call to execute() when they were printing a DataSet. > > > > > > > > What are you opinions on this issue? I have opened a JIRA for this as > > > well: > > > > https://issues.apache.org/jira/browse/FLINK-2249 > > > > > > > > Best, > > > > Max > > > > > > > > > > > > > > > > > > |
What about adding some state state to the DataBag internals that tracks the
following conditions 1. whether the last job execution was triggered by an "enforcer" API method like print() / collect(); 2. whether a DataSource / lazy operator was created after that; If 1 is true and 2 is false, a WARN can be displayed. Otherwise, we can still throw an error. 2015-06-22 18:17 GMT+02:00 Stephan Ewen <[hidden email]>: > We have two situations to trade off here, and fixing one will make the > other worse: > > 1) env.execute() after collect() - see Max's mail > > 2) env.execute() on empty sinks program. Not throwing an exception makes > people wonder why nothing happens (if they write the program to just test > whether it runs or if they want to measure time). > > Both choices make one behave nice and the other not. So far, the idea was > that throwing an exception on empty sinks is that the error message will > help people figure out what is wrong fast. Debugging why nothing happens > can be slow. > > > It is hard to say if we would not introduce another source of confusion by > fixing one... > > > > > > On Mon, Jun 22, 2015 at 10:26 AM, Maximilian Michels <[hidden email]> > wrote: > > > +1 for cleaning up the documentation > > +1 for adding a link to the documentation (should be a permalink) > > +1 for printing a warning instead of an exception > > > > On Sun, Jun 21, 2015 at 12:25 AM, Robert Metzger <[hidden email]> > > wrote: > > > > > We could also add a link to the documentation into the exception that > > > explains the behavior. > > > > > > On Fri, Jun 19, 2015 at 5:52 AM, Chiwan Park <[hidden email]> > > > wrote: > > > > > > > +1 for ignoring execute() call with warning. > > > > > > > > But I'm concerned for how the user catches the error in program > without > > > > any data sinks. > > > > > > > > By the way, eager execution is not well documented in data sinks > > section > > > > but is in program > > > > skeleton section. [1] This makes the user’s confusion. We should > clean > > up > > > > documents. > > > > There are many codes calling execute() method after print() method. > > > [2][3] > > > > > > > > We should add a description for count() method to documents too. > > > > > > > > [1] > > > > > > > > > > http://ci.apache.org/projects/flink/flink-docs-master/apis/programming_guide.html#data-sinks > > > > [2] > > > > > > > > > > http://ci.apache.org/projects/flink/flink-docs-master/apis/programming_guide.html#parallel-execution > > > > [3] > > > > > > > > > > http://ci.apache.org/projects/flink/flink-docs-master/apis/programming_guide.html#iteration-operators > > > > > > > > Regards, > > > > Chiwan Park > > > > > > > > > On Jun 19, 2015, at 9:15 PM, Maximilian Michels <[hidden email]> > > > wrote: > > > > > > > > > > Dear Flink community, > > > > > > > > > > I have stopped to count how many people on the user list and during > > > Flink > > > > > trainings have asked why their Flink program throws an Exception > when > > > > they > > > > > just one to print a DataSet. The reason for this is that print() > now > > > > > executes eagerly, thus, executes the Flink program. Subsequent > calls > > to > > > > > execute() need to define new DataSinks and throw an exception > > > otherwise. > > > > > > > > > > We have recently introduced a flag in the ExecutionEnvironment that > > > > checks > > > > > whether the user executed before (explicitly via execute() or > > > implicitly > > > > > through collect()/print()/count()). That enabled us to print a > nicer > > > > > exception message. However, users either do not read the exception > > > > message > > > > > or do not understand it. They do ask this question a lot. > > > > > > > > > > That's why I propose to ignore calls to execute() entirely if no > > sinks > > > > are > > > > > defined. That will get rid of one of the core annoyances for Flink > > > > users. I > > > > > know, that this is painfully for us programmers because we > understand > > > how > > > > > Flink works internally but let's step back once and see that it > > > wouldn't > > > > be > > > > > so bad if execute didn't do anything in case of no new sinks. > > > > > > > > > > What would be the downside of this change? Users might call > execute() > > > and > > > > > wonder that nothing happens. We would then simply print a warning > > that > > > > > their program didn't define any sinks. That is a big difference to > > the > > > > > behavior before because users are scared of exceptions. If they > just > > > get > > > > a > > > > > warning they will double-check their program and investigate why > > > nothing > > > > > happens. Most of the cases they do actually have defined sinks but > > > simply > > > > > left a call to execute() when they were printing a DataSet. > > > > > > > > > > What are you opinions on this issue? I have opened a JIRA for this > as > > > > well: > > > > > https://issues.apache.org/jira/browse/FLINK-2249 > > > > > > > > > > Best, > > > > > Max > > > > > > > > > > > > > > > > > > > > > > > > > > |
@Stephan I understand your concerns that the user might wonder that nothing
happens when executing. However, in this case a warning will provide a hint to the user that he didn't define any sinks. In the case where he immediately calls execute() after an eager execution, the program is actually executed and he still gets an Exception although everything already ran. I think that's much worse because the user sees that something executed and thinks it failed. @Alexander I like your proposal. It adds a bit more logic to the ExecutionEnvironment but I think that's acceptable for the sake of the user experience. On Mon, Jun 22, 2015 at 9:37 PM, Alexander Alexandrov < [hidden email]> wrote: > What about adding some state state to the DataBag internals that tracks the > following conditions > > 1. whether the last job execution was triggered by an "enforcer" API method > like print() / collect(); > 2. whether a DataSource / lazy operator was created after that; > > If 1 is true and 2 is false, a WARN can be displayed. Otherwise, we can > still throw an error. > > > 2015-06-22 18:17 GMT+02:00 Stephan Ewen <[hidden email]>: > > > We have two situations to trade off here, and fixing one will make the > > other worse: > > > > 1) env.execute() after collect() - see Max's mail > > > > 2) env.execute() on empty sinks program. Not throwing an exception makes > > people wonder why nothing happens (if they write the program to just test > > whether it runs or if they want to measure time). > > > > Both choices make one behave nice and the other not. So far, the idea was > > that throwing an exception on empty sinks is that the error message will > > help people figure out what is wrong fast. Debugging why nothing happens > > can be slow. > > > > > > It is hard to say if we would not introduce another source of confusion > by > > fixing one... > > > > > > > > > > > > On Mon, Jun 22, 2015 at 10:26 AM, Maximilian Michels <[hidden email]> > > wrote: > > > > > +1 for cleaning up the documentation > > > +1 for adding a link to the documentation (should be a permalink) > > > +1 for printing a warning instead of an exception > > > > > > On Sun, Jun 21, 2015 at 12:25 AM, Robert Metzger <[hidden email]> > > > wrote: > > > > > > > We could also add a link to the documentation into the exception that > > > > explains the behavior. > > > > > > > > On Fri, Jun 19, 2015 at 5:52 AM, Chiwan Park <[hidden email]> > > > > wrote: > > > > > > > > > +1 for ignoring execute() call with warning. > > > > > > > > > > But I'm concerned for how the user catches the error in program > > without > > > > > any data sinks. > > > > > > > > > > By the way, eager execution is not well documented in data sinks > > > section > > > > > but is in program > > > > > skeleton section. [1] This makes the user’s confusion. We should > > clean > > > up > > > > > documents. > > > > > There are many codes calling execute() method after print() method. > > > > [2][3] > > > > > > > > > > We should add a description for count() method to documents too. > > > > > > > > > > [1] > > > > > > > > > > > > > > > http://ci.apache.org/projects/flink/flink-docs-master/apis/programming_guide.html#data-sinks > > > > > [2] > > > > > > > > > > > > > > > http://ci.apache.org/projects/flink/flink-docs-master/apis/programming_guide.html#parallel-execution > > > > > [3] > > > > > > > > > > > > > > > http://ci.apache.org/projects/flink/flink-docs-master/apis/programming_guide.html#iteration-operators > > > > > > > > > > Regards, > > > > > Chiwan Park > > > > > > > > > > > On Jun 19, 2015, at 9:15 PM, Maximilian Michels <[hidden email]> > > > > wrote: > > > > > > > > > > > > Dear Flink community, > > > > > > > > > > > > I have stopped to count how many people on the user list and > during > > > > Flink > > > > > > trainings have asked why their Flink program throws an Exception > > when > > > > > they > > > > > > just one to print a DataSet. The reason for this is that print() > > now > > > > > > executes eagerly, thus, executes the Flink program. Subsequent > > calls > > > to > > > > > > execute() need to define new DataSinks and throw an exception > > > > otherwise. > > > > > > > > > > > > We have recently introduced a flag in the ExecutionEnvironment > that > > > > > checks > > > > > > whether the user executed before (explicitly via execute() or > > > > implicitly > > > > > > through collect()/print()/count()). That enabled us to print a > > nicer > > > > > > exception message. However, users either do not read the > exception > > > > > message > > > > > > or do not understand it. They do ask this question a lot. > > > > > > > > > > > > That's why I propose to ignore calls to execute() entirely if no > > > sinks > > > > > are > > > > > > defined. That will get rid of one of the core annoyances for > Flink > > > > > users. I > > > > > > know, that this is painfully for us programmers because we > > understand > > > > how > > > > > > Flink works internally but let's step back once and see that it > > > > wouldn't > > > > > be > > > > > > so bad if execute didn't do anything in case of no new sinks. > > > > > > > > > > > > What would be the downside of this change? Users might call > > execute() > > > > and > > > > > > wonder that nothing happens. We would then simply print a warning > > > that > > > > > > their program didn't define any sinks. That is a big difference > to > > > the > > > > > > behavior before because users are scared of exceptions. If they > > just > > > > get > > > > > a > > > > > > warning they will double-check their program and investigate why > > > > nothing > > > > > > happens. Most of the cases they do actually have defined sinks > but > > > > simply > > > > > > left a call to execute() when they were printing a DataSet. > > > > > > > > > > > > What are you opinions on this issue? I have opened a JIRA for > this > > as > > > > > well: > > > > > > https://issues.apache.org/jira/browse/FLINK-2249 > > > > > > > > > > > > Best, > > > > > > Max > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > |
In reply to this post by aalexandrov
That would help to get around many cases, it still leaves some open, like
forgetting to create a sink after transformations after a collect() call. Would probably be a good improvement over the status quo, though... On Mon, Jun 22, 2015 at 9:37 PM, Alexander Alexandrov < [hidden email]> wrote: > What about adding some state state to the DataBag internals that tracks the > following conditions > > 1. whether the last job execution was triggered by an "enforcer" API method > like print() / collect(); > 2. whether a DataSource / lazy operator was created after that; > > If 1 is true and 2 is false, a WARN can be displayed. Otherwise, we can > still throw an error. > > > 2015-06-22 18:17 GMT+02:00 Stephan Ewen <[hidden email]>: > > > We have two situations to trade off here, and fixing one will make the > > other worse: > > > > 1) env.execute() after collect() - see Max's mail > > > > 2) env.execute() on empty sinks program. Not throwing an exception makes > > people wonder why nothing happens (if they write the program to just test > > whether it runs or if they want to measure time). > > > > Both choices make one behave nice and the other not. So far, the idea was > > that throwing an exception on empty sinks is that the error message will > > help people figure out what is wrong fast. Debugging why nothing happens > > can be slow. > > > > > > It is hard to say if we would not introduce another source of confusion > by > > fixing one... > > > > > > > > > > > > On Mon, Jun 22, 2015 at 10:26 AM, Maximilian Michels <[hidden email]> > > wrote: > > > > > +1 for cleaning up the documentation > > > +1 for adding a link to the documentation (should be a permalink) > > > +1 for printing a warning instead of an exception > > > > > > On Sun, Jun 21, 2015 at 12:25 AM, Robert Metzger <[hidden email]> > > > wrote: > > > > > > > We could also add a link to the documentation into the exception that > > > > explains the behavior. > > > > > > > > On Fri, Jun 19, 2015 at 5:52 AM, Chiwan Park <[hidden email]> > > > > wrote: > > > > > > > > > +1 for ignoring execute() call with warning. > > > > > > > > > > But I'm concerned for how the user catches the error in program > > without > > > > > any data sinks. > > > > > > > > > > By the way, eager execution is not well documented in data sinks > > > section > > > > > but is in program > > > > > skeleton section. [1] This makes the user’s confusion. We should > > clean > > > up > > > > > documents. > > > > > There are many codes calling execute() method after print() method. > > > > [2][3] > > > > > > > > > > We should add a description for count() method to documents too. > > > > > > > > > > [1] > > > > > > > > > > > > > > > http://ci.apache.org/projects/flink/flink-docs-master/apis/programming_guide.html#data-sinks > > > > > [2] > > > > > > > > > > > > > > > http://ci.apache.org/projects/flink/flink-docs-master/apis/programming_guide.html#parallel-execution > > > > > [3] > > > > > > > > > > > > > > > http://ci.apache.org/projects/flink/flink-docs-master/apis/programming_guide.html#iteration-operators > > > > > > > > > > Regards, > > > > > Chiwan Park > > > > > > > > > > > On Jun 19, 2015, at 9:15 PM, Maximilian Michels <[hidden email]> > > > > wrote: > > > > > > > > > > > > Dear Flink community, > > > > > > > > > > > > I have stopped to count how many people on the user list and > during > > > > Flink > > > > > > trainings have asked why their Flink program throws an Exception > > when > > > > > they > > > > > > just one to print a DataSet. The reason for this is that print() > > now > > > > > > executes eagerly, thus, executes the Flink program. Subsequent > > calls > > > to > > > > > > execute() need to define new DataSinks and throw an exception > > > > otherwise. > > > > > > > > > > > > We have recently introduced a flag in the ExecutionEnvironment > that > > > > > checks > > > > > > whether the user executed before (explicitly via execute() or > > > > implicitly > > > > > > through collect()/print()/count()). That enabled us to print a > > nicer > > > > > > exception message. However, users either do not read the > exception > > > > > message > > > > > > or do not understand it. They do ask this question a lot. > > > > > > > > > > > > That's why I propose to ignore calls to execute() entirely if no > > > sinks > > > > > are > > > > > > defined. That will get rid of one of the core annoyances for > Flink > > > > > users. I > > > > > > know, that this is painfully for us programmers because we > > understand > > > > how > > > > > > Flink works internally but let's step back once and see that it > > > > wouldn't > > > > > be > > > > > > so bad if execute didn't do anything in case of no new sinks. > > > > > > > > > > > > What would be the downside of this change? Users might call > > execute() > > > > and > > > > > > wonder that nothing happens. We would then simply print a warning > > > that > > > > > > their program didn't define any sinks. That is a big difference > to > > > the > > > > > > behavior before because users are scared of exceptions. If they > > just > > > > get > > > > > a > > > > > > warning they will double-check their program and investigate why > > > > nothing > > > > > > happens. Most of the cases they do actually have defined sinks > but > > > > simply > > > > > > left a call to execute() when they were printing a DataSet. > > > > > > > > > > > > What are you opinions on this issue? I have opened a JIRA for > this > > as > > > > > well: > > > > > > https://issues.apache.org/jira/browse/FLINK-2249 > > > > > > > > > > > > Best, > > > > > > Max > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > |
Free forum by Nabble | Edit this page |