Hi,
currently, the only way to stop a streaming job is to "cancel" the job, This has multiple disadvantage: 1) a "clean" stopping is not possible (see https://issues.apache.org/jira/browse/FLINK-1929 -- I think a clean stop is a pre-requirement for FLINK-1929) and 2) as a minor issue, all canceled jobs are listed as canceled in the history (what is somewhat confusing for the user -- at least it was for me when I started to work with Flink Streaming). This issue was raised a few times already, however, no final conclusion was there (if I remember correctly). I could not find a JIRA for it either. From my understanding of the system, there would be two ways to implement a nice way for stopping streaming jobs: 1) "Task"s can be distinguished between "batch" and "streaming" -> canceling a batch jobs works as always -> canceling a streaming job only send a "canceling" signal to the sources, and waits until the job finishes (ie, sources stop emitting data and finish regularly, triggering the finishing of all operators). For this case, streaming jobs are stopped in a "clean way" (as is the input would have be finite) and the job will be listed as "finished" in the history regularly. This approach has the advantage, that it should be simpler to implement. However, the disadvantages are (1) a "hard canceling" of jobs is not possible any more, and (2) Flink must be able to distinguishes batch and streaming jobs (I don't think Flink runtime can distinguish both right now?) 2) A new message "terminate" (or similar) is introduced, that can only be used for streaming jobs (would be ignored for batch jobs) that stops the sources and waits until the job finishes regularly. This approach has the advantage, that current system behavior is preserved (it only adds a few feature). The disadvantage is, that all clients need to be touched and it must be clear to the user, that "terminate" does not work for streaming jobs. If an error/warning should be raised if a user tries to "terminate" a batch job, Flink must be able to distinguish between batch and streaming jobs, too. As an alternative, "terminate" on batch jobs could be interpreted as "cancel", too. I personally think, that the second approach is better. Please give feedback. If we can get to a conclusion how to implement it, I would like to work on it. -Matthias |
Hey,
I would also strongly prefer the second option, users need to have the option to force cancel a program in case of something unwanted behaviour. Cheers, Gyula Matthias J. Sax <[hidden email]> ezt írta (időpont: 2015. máj. 27., Sze, 1:20): > Hi, > > currently, the only way to stop a streaming job is to "cancel" the job, > This has multiple disadvantage: > 1) a "clean" stopping is not possible (see > https://issues.apache.org/jira/browse/FLINK-1929 -- I think a clean stop > is a pre-requirement for FLINK-1929) and > 2) as a minor issue, all canceled jobs are listed as canceled in the > history (what is somewhat confusing for the user -- at least it was for > me when I started to work with Flink Streaming). > > This issue was raised a few times already, however, no final conclusion > was there (if I remember correctly). I could not find a JIRA for it either. > > From my understanding of the system, there would be two ways to > implement a nice way for stopping streaming jobs: > > 1) "Task"s can be distinguished between "batch" and "streaming" > -> canceling a batch jobs works as always > -> canceling a streaming job only send a "canceling" signal to the > sources, and waits until the job finishes (ie, sources stop emitting > data and finish regularly, triggering the finishing of all operators). > For this case, streaming jobs are stopped in a "clean way" (as is the > input would have be finite) and the job will be listed as "finished" in > the history regularly. > > This approach has the advantage, that it should be simpler to > implement. However, the disadvantages are (1) a "hard canceling" of jobs > is not possible any more, and (2) Flink must be able to distinguishes > batch and streaming jobs (I don't think Flink runtime can distinguish > both right now?) > > 2) A new message "terminate" (or similar) is introduced, that can only > be used for streaming jobs (would be ignored for batch jobs) that stops > the sources and waits until the job finishes regularly. > > This approach has the advantage, that current system behavior is > preserved (it only adds a few feature). The disadvantage is, that all > clients need to be touched and it must be clear to the user, that > "terminate" does not work for streaming jobs. If an error/warning should > be raised if a user tries to "terminate" a batch job, Flink must be able > to distinguish between batch and streaming jobs, too. As an > alternative, "terminate" on batch jobs could be interpreted as "cancel", > too. > > > I personally think, that the second approach is better. Please give > feedback. If we can get to a conclusion how to implement it, I would > like to work on it. > > > -Matthias > > |
+1 for the second option:
It would also provide possibility to properly commit a state checkpoint after the terminate message was triggered. In some cases this can be a desirable behaviour. On Wed, May 27, 2015 at 8:46 AM, Gyula Fóra <[hidden email]> wrote: > Hey, > > I would also strongly prefer the second option, users need to have the > option to force cancel a program in case of something unwanted behaviour. > > Cheers, > Gyula > > Matthias J. Sax <[hidden email]> ezt írta (időpont: 2015. > máj. 27., Sze, 1:20): > > > Hi, > > > > currently, the only way to stop a streaming job is to "cancel" the job, > > This has multiple disadvantage: > > 1) a "clean" stopping is not possible (see > > https://issues.apache.org/jira/browse/FLINK-1929 -- I think a clean stop > > is a pre-requirement for FLINK-1929) and > > 2) as a minor issue, all canceled jobs are listed as canceled in the > > history (what is somewhat confusing for the user -- at least it was for > > me when I started to work with Flink Streaming). > > > > This issue was raised a few times already, however, no final conclusion > > was there (if I remember correctly). I could not find a JIRA for it > either. > > > > From my understanding of the system, there would be two ways to > > implement a nice way for stopping streaming jobs: > > > > 1) "Task"s can be distinguished between "batch" and "streaming" > > -> canceling a batch jobs works as always > > -> canceling a streaming job only send a "canceling" signal to the > > sources, and waits until the job finishes (ie, sources stop emitting > > data and finish regularly, triggering the finishing of all operators). > > For this case, streaming jobs are stopped in a "clean way" (as is the > > input would have be finite) and the job will be listed as "finished" in > > the history regularly. > > > > This approach has the advantage, that it should be simpler to > > implement. However, the disadvantages are (1) a "hard canceling" of jobs > > is not possible any more, and (2) Flink must be able to distinguishes > > batch and streaming jobs (I don't think Flink runtime can distinguish > > both right now?) > > > > 2) A new message "terminate" (or similar) is introduced, that can only > > be used for streaming jobs (would be ignored for batch jobs) that stops > > the sources and waits until the job finishes regularly. > > > > This approach has the advantage, that current system behavior is > > preserved (it only adds a few feature). The disadvantage is, that all > > clients need to be touched and it must be clear to the user, that > > "terminate" does not work for streaming jobs. If an error/warning should > > be raised if a user tries to "terminate" a batch job, Flink must be able > > to distinguish between batch and streaming jobs, too. As an > > alternative, "terminate" on batch jobs could be interpreted as "cancel", > > too. > > > > > > I personally think, that the second approach is better. Please give > > feedback. If we can get to a conclusion how to implement it, I would > > like to work on it. > > > > > > -Matthias > > > > > |
I would also prefer the second option. The first is rather a hack but not
an option. :D On May 27, 2015 9:14 AM, "Márton Balassi" <[hidden email]> wrote: > +1 for the second option: > > It would also provide possibility to properly commit a state checkpoint > after the terminate message was triggered. In some cases this can be a > desirable behaviour. > > On Wed, May 27, 2015 at 8:46 AM, Gyula Fóra <[hidden email]> wrote: > > > Hey, > > > > I would also strongly prefer the second option, users need to have the > > option to force cancel a program in case of something unwanted behaviour. > > > > Cheers, > > Gyula > > > > Matthias J. Sax <[hidden email]> ezt írta (időpont: 2015. > > máj. 27., Sze, 1:20): > > > > > Hi, > > > > > > currently, the only way to stop a streaming job is to "cancel" the job, > > > This has multiple disadvantage: > > > 1) a "clean" stopping is not possible (see > > > https://issues.apache.org/jira/browse/FLINK-1929 -- I think a clean > stop > > > is a pre-requirement for FLINK-1929) and > > > 2) as a minor issue, all canceled jobs are listed as canceled in the > > > history (what is somewhat confusing for the user -- at least it was for > > > me when I started to work with Flink Streaming). > > > > > > This issue was raised a few times already, however, no final conclusion > > > was there (if I remember correctly). I could not find a JIRA for it > > either. > > > > > > From my understanding of the system, there would be two ways to > > > implement a nice way for stopping streaming jobs: > > > > > > 1) "Task"s can be distinguished between "batch" and "streaming" > > > -> canceling a batch jobs works as always > > > -> canceling a streaming job only send a "canceling" signal to the > > > sources, and waits until the job finishes (ie, sources stop emitting > > > data and finish regularly, triggering the finishing of all operators). > > > For this case, streaming jobs are stopped in a "clean way" (as is the > > > input would have be finite) and the job will be listed as "finished" in > > > the history regularly. > > > > > > This approach has the advantage, that it should be simpler to > > > implement. However, the disadvantages are (1) a "hard canceling" of > jobs > > > is not possible any more, and (2) Flink must be able to distinguishes > > > batch and streaming jobs (I don't think Flink runtime can distinguish > > > both right now?) > > > > > > 2) A new message "terminate" (or similar) is introduced, that can > only > > > be used for streaming jobs (would be ignored for batch jobs) that stops > > > the sources and waits until the job finishes regularly. > > > > > > This approach has the advantage, that current system behavior is > > > preserved (it only adds a few feature). The disadvantage is, that all > > > clients need to be touched and it must be clear to the user, that > > > "terminate" does not work for streaming jobs. If an error/warning > should > > > be raised if a user tries to "terminate" a batch job, Flink must be > able > > > to distinguish between batch and streaming jobs, too. As an > > > alternative, "terminate" on batch jobs could be interpreted as > "cancel", > > > too. > > > > > > > > > I personally think, that the second approach is better. Please give > > > feedback. If we can get to a conclusion how to implement it, I would > > > like to work on it. > > > > > > > > > -Matthias > > > > > > > > > |
+1 for the second option.
How about we allow to pass a flag that indicates whether a checkpoint should be taken together with the canceling? On Wed, May 27, 2015 at 12:27 PM, Aljoscha Krettek <[hidden email]> wrote: > I would also prefer the second option. The first is rather a hack but not > an option. :D > On May 27, 2015 9:14 AM, "Márton Balassi" <[hidden email]> > wrote: > > > +1 for the second option: > > > > It would also provide possibility to properly commit a state checkpoint > > after the terminate message was triggered. In some cases this can be a > > desirable behaviour. > > > > On Wed, May 27, 2015 at 8:46 AM, Gyula Fóra <[hidden email]> wrote: > > > > > Hey, > > > > > > I would also strongly prefer the second option, users need to have the > > > option to force cancel a program in case of something unwanted > behaviour. > > > > > > Cheers, > > > Gyula > > > > > > Matthias J. Sax <[hidden email]> ezt írta (időpont: > 2015. > > > máj. 27., Sze, 1:20): > > > > > > > Hi, > > > > > > > > currently, the only way to stop a streaming job is to "cancel" the > job, > > > > This has multiple disadvantage: > > > > 1) a "clean" stopping is not possible (see > > > > https://issues.apache.org/jira/browse/FLINK-1929 -- I think a clean > > stop > > > > is a pre-requirement for FLINK-1929) and > > > > 2) as a minor issue, all canceled jobs are listed as canceled in the > > > > history (what is somewhat confusing for the user -- at least it was > for > > > > me when I started to work with Flink Streaming). > > > > > > > > This issue was raised a few times already, however, no final > conclusion > > > > was there (if I remember correctly). I could not find a JIRA for it > > > either. > > > > > > > > From my understanding of the system, there would be two ways to > > > > implement a nice way for stopping streaming jobs: > > > > > > > > 1) "Task"s can be distinguished between "batch" and "streaming" > > > > -> canceling a batch jobs works as always > > > > -> canceling a streaming job only send a "canceling" signal to > the > > > > sources, and waits until the job finishes (ie, sources stop emitting > > > > data and finish regularly, triggering the finishing of all > operators). > > > > For this case, streaming jobs are stopped in a "clean way" (as is the > > > > input would have be finite) and the job will be listed as "finished" > in > > > > the history regularly. > > > > > > > > This approach has the advantage, that it should be simpler to > > > > implement. However, the disadvantages are (1) a "hard canceling" of > > jobs > > > > is not possible any more, and (2) Flink must be able to distinguishes > > > > batch and streaming jobs (I don't think Flink runtime can distinguish > > > > both right now?) > > > > > > > > 2) A new message "terminate" (or similar) is introduced, that can > > only > > > > be used for streaming jobs (would be ignored for batch jobs) that > stops > > > > the sources and waits until the job finishes regularly. > > > > > > > > This approach has the advantage, that current system behavior is > > > > preserved (it only adds a few feature). The disadvantage is, that all > > > > clients need to be touched and it must be clear to the user, that > > > > "terminate" does not work for streaming jobs. If an error/warning > > should > > > > be raised if a user tries to "terminate" a batch job, Flink must be > > able > > > > to distinguish between batch and streaming jobs, too. As an > > > > alternative, "terminate" on batch jobs could be interpreted as > > "cancel", > > > > too. > > > > > > > > > > > > I personally think, that the second approach is better. Please give > > > > feedback. If we can get to a conclusion how to implement it, I would > > > > like to work on it. > > > > > > > > > > > > -Matthias > > > > > > > > > > > > > > |
Stephan, not sure what you mean by this exactly... But I guess, this a
an "add-on" that can be done later. Seems to be related to https://issues.apache.org/jira/browse/FLINK-1929 I will open a JIRA for the new "terminate" message and assign it to myself. -Matthias On 05/27/2015 12:36 PM, Stephan Ewen wrote: > +1 for the second option. > > How about we allow to pass a flag that indicates whether a checkpoint > should be taken together with the canceling? > > > On Wed, May 27, 2015 at 12:27 PM, Aljoscha Krettek <[hidden email]> > wrote: > >> I would also prefer the second option. The first is rather a hack but not >> an option. :D >> On May 27, 2015 9:14 AM, "Márton Balassi" <[hidden email]> >> wrote: >> >>> +1 for the second option: >>> >>> It would also provide possibility to properly commit a state checkpoint >>> after the terminate message was triggered. In some cases this can be a >>> desirable behaviour. >>> >>> On Wed, May 27, 2015 at 8:46 AM, Gyula Fóra <[hidden email]> wrote: >>> >>>> Hey, >>>> >>>> I would also strongly prefer the second option, users need to have the >>>> option to force cancel a program in case of something unwanted >> behaviour. >>>> >>>> Cheers, >>>> Gyula >>>> >>>> Matthias J. Sax <[hidden email]> ezt írta (időpont: >> 2015. >>>> máj. 27., Sze, 1:20): >>>> >>>>> Hi, >>>>> >>>>> currently, the only way to stop a streaming job is to "cancel" the >> job, >>>>> This has multiple disadvantage: >>>>> 1) a "clean" stopping is not possible (see >>>>> https://issues.apache.org/jira/browse/FLINK-1929 -- I think a clean >>> stop >>>>> is a pre-requirement for FLINK-1929) and >>>>> 2) as a minor issue, all canceled jobs are listed as canceled in the >>>>> history (what is somewhat confusing for the user -- at least it was >> for >>>>> me when I started to work with Flink Streaming). >>>>> >>>>> This issue was raised a few times already, however, no final >> conclusion >>>>> was there (if I remember correctly). I could not find a JIRA for it >>>> either. >>>>> >>>>> From my understanding of the system, there would be two ways to >>>>> implement a nice way for stopping streaming jobs: >>>>> >>>>> 1) "Task"s can be distinguished between "batch" and "streaming" >>>>> -> canceling a batch jobs works as always >>>>> -> canceling a streaming job only send a "canceling" signal to >> the >>>>> sources, and waits until the job finishes (ie, sources stop emitting >>>>> data and finish regularly, triggering the finishing of all >> operators). >>>>> For this case, streaming jobs are stopped in a "clean way" (as is the >>>>> input would have be finite) and the job will be listed as "finished" >> in >>>>> the history regularly. >>>>> >>>>> This approach has the advantage, that it should be simpler to >>>>> implement. However, the disadvantages are (1) a "hard canceling" of >>> jobs >>>>> is not possible any more, and (2) Flink must be able to distinguishes >>>>> batch and streaming jobs (I don't think Flink runtime can distinguish >>>>> both right now?) >>>>> >>>>> 2) A new message "terminate" (or similar) is introduced, that can >>> only >>>>> be used for streaming jobs (would be ignored for batch jobs) that >> stops >>>>> the sources and waits until the job finishes regularly. >>>>> >>>>> This approach has the advantage, that current system behavior is >>>>> preserved (it only adds a few feature). The disadvantage is, that all >>>>> clients need to be touched and it must be clear to the user, that >>>>> "terminate" does not work for streaming jobs. If an error/warning >>> should >>>>> be raised if a user tries to "terminate" a batch job, Flink must be >>> able >>>>> to distinguish between batch and streaming jobs, too. As an >>>>> alternative, "terminate" on batch jobs could be interpreted as >>> "cancel", >>>>> too. >>>>> >>>>> >>>>> I personally think, that the second approach is better. Please give >>>>> feedback. If we can get to a conclusion how to implement it, I would >>>>> like to work on it. >>>>> >>>>> >>>>> -Matthias >>>>> >>>>> >>>> >>> >> > |
Free forum by Nabble | Edit this page |