[DISCUSS] FLIP-134: DataStream Semantics for Bounded Input

classic Classic list List threaded Threaded
22 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-134: DataStream Semantics for Bounded Input

David Anderson-3
Aljoscha,

Thanks for the thorough response. I'm still wanting to think about and
discuss the Trigger topic some more, but I'm content with where you've left
it for now. Everything else seems good.

David

On Fri, Sep 11, 2020 at 2:08 PM Aljoscha Krettek <[hidden email]>
wrote:

> Thanks for the thoughtful comments! I'll try and address them inline
> below. I'm hoping to start a VOTE thread soon if there are no other
> comments by the end of today.
>
> On 10.09.20 15:40, David Anderson wrote:
> > Having just re-read FLIP-134, I think it mostly makes sense, though I'm
> not
> > exactly looking forward to figuring out how to explain it without making
> it
> > seem overly complicated.
>
> Which are the points where you see the explanation could become to
> complex? For me, the only difference in behaviour is processing-time
> timers, which will fail hard in BATCH execution mode. Things like
> shuffle-mode and schedule-mode should be transparent and I would not
> mention them in the documentation except in an advanced section.
>
> > I'm a bit confused by the discussion around custom window Triggers. Yes,
> I
> > agree that complex, mixed Triggers are sometimes useful. And I buy into
> the
> > argument that we want to FAIL hard for processing-time on BATCH. But why
> > not go ahead and FAIL Triggers that can't work, rather than ignoring all
> > custom Triggers?
>
> The motivation is to allow the same program to work on BATCH and on
> STREAMING, and in reality DataStream programs often have Triggers that
> you wouldn't need for BATCH execution.
>
> I do think that this topic is too important to have it as a sub-section
> in this FLIP. I will remove it and write another FLIP just about this
> topic. This will mean that DataStream programs that have Triggers that
> use processing-time will simply fail hard. Which is acceptable for an
> initial version, I thin
> > I do think it's critical that bounded streaming has the same
> configuration
> > as unbounded streaming. Users expect/need things like processing time
> > timers in bounded streaming during development. If I've understood the
> > proposal correctly, this will be the case.
>
> If you're referring to the case where you have STREAMING execution mode
> but your sources are bounded (for development), then yes, I think we're
> on the same page.
>
> > I would prefer WARN over IGNORE as the default for cases where users have
> > explicitly specified something that isn’t going to happen. (I would also
> > like to see a warning given for any job that uses event time timers
> without
> > having a watermark strategy, though that's unrelated to the topic at
> hand.)
>
> Agreed, that's why I'm proposing pipeline.processing-time.allow: FAIL as
> the default setting for BATCH execution mode. Is there another setting
> where we currently propose IGNORE but you think it should be FAIL? There
> is pipeline.processing-time.end-of-input: IGNORE, which is in line with
> the current behaviour, and failing when timers are set means there won't
> be any to fire in BATCH execution mode.
>
> Aljoscha
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-134: DataStream Semantics for Bounded Input

Kostas Kloudas-4
Hi all,

Thanks for keeping the discussion running while I was on holidays!
I am catching up currently and I will post in the voting thread if I
have any comments :)

Cheers,
Kostas

On Wed, Sep 16, 2020 at 11:25 AM David Anderson <[hidden email]> wrote:

>
> Aljoscha,
>
> Thanks for the thorough response. I'm still wanting to think about and
> discuss the Trigger topic some more, but I'm content with where you've left
> it for now. Everything else seems good.
>
> David
>
> On Fri, Sep 11, 2020 at 2:08 PM Aljoscha Krettek <[hidden email]>
> wrote:
>
> > Thanks for the thoughtful comments! I'll try and address them inline
> > below. I'm hoping to start a VOTE thread soon if there are no other
> > comments by the end of today.
> >
> > On 10.09.20 15:40, David Anderson wrote:
> > > Having just re-read FLIP-134, I think it mostly makes sense, though I'm
> > not
> > > exactly looking forward to figuring out how to explain it without making
> > it
> > > seem overly complicated.
> >
> > Which are the points where you see the explanation could become to
> > complex? For me, the only difference in behaviour is processing-time
> > timers, which will fail hard in BATCH execution mode. Things like
> > shuffle-mode and schedule-mode should be transparent and I would not
> > mention them in the documentation except in an advanced section.
> >
> > > I'm a bit confused by the discussion around custom window Triggers. Yes,
> > I
> > > agree that complex, mixed Triggers are sometimes useful. And I buy into
> > the
> > > argument that we want to FAIL hard for processing-time on BATCH. But why
> > > not go ahead and FAIL Triggers that can't work, rather than ignoring all
> > > custom Triggers?
> >
> > The motivation is to allow the same program to work on BATCH and on
> > STREAMING, and in reality DataStream programs often have Triggers that
> > you wouldn't need for BATCH execution.
> >
> > I do think that this topic is too important to have it as a sub-section
> > in this FLIP. I will remove it and write another FLIP just about this
> > topic. This will mean that DataStream programs that have Triggers that
> > use processing-time will simply fail hard. Which is acceptable for an
> > initial version, I thin
> > > I do think it's critical that bounded streaming has the same
> > configuration
> > > as unbounded streaming. Users expect/need things like processing time
> > > timers in bounded streaming during development. If I've understood the
> > > proposal correctly, this will be the case.
> >
> > If you're referring to the case where you have STREAMING execution mode
> > but your sources are bounded (for development), then yes, I think we're
> > on the same page.
> >
> > > I would prefer WARN over IGNORE as the default for cases where users have
> > > explicitly specified something that isn’t going to happen. (I would also
> > > like to see a warning given for any job that uses event time timers
> > without
> > > having a watermark strategy, though that's unrelated to the topic at
> > hand.)
> >
> > Agreed, that's why I'm proposing pipeline.processing-time.allow: FAIL as
> > the default setting for BATCH execution mode. Is there another setting
> > where we currently propose IGNORE but you think it should be FAIL? There
> > is pipeline.processing-time.end-of-input: IGNORE, which is in line with
> > the current behaviour, and failing when timers are set means there won't
> > be any to fire in BATCH execution mode.
> >
> > Aljoscha
> >
> >
12