Terminating streaming test

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Terminating streaming test

Thomas Weise
Hi,

I'm looking for an example of an integration test that runs a streaming job
and terminates when the expected result becomes available. I could think of
2 approaches:

1. Modified version of LocalStreamEnvironment that executes the job
asynchronously and polls for the result or

2. Source that emits a final watermark that causes the topology to
terminate after the watermark has traversed the topology. Is that possible
with Flink?

But probably this is a rather common testing need that's already solved?!

Thanks,
Thomas
Reply | Threaded
Open this post in threaded view
|

Re: Terminating streaming test

Ken Krugler
Hi Thomas,

Normally the streaming job will terminate when the sources are exhausted and all records have been processed.

I assume you have some unbounded source(s), thus this doesn’t work for your case.

We’d run into a similar situation with a streaming job that has iterations.

Our solution was your option #1 below, where we created a modified version of LocalStreamEnvironment <https://raw.githubusercontent.com/ScaleUnlimited/flink-crawler/master/src/main/java/org/apache/flink/streaming/api/environment/LocalStreamEnvironmentWithAsyncExecution.java> that supports async execution.

— Ken


> On Feb 6, 2018, at 4:21 PM, Thomas Weise <[hidden email]> wrote:
>
> Hi,
>
> I'm looking for an example of an integration test that runs a streaming job
> and terminates when the expected result becomes available. I could think of
> 2 approaches:
>
> 1. Modified version of LocalStreamEnvironment that executes the job
> asynchronously and polls for the result or
>
> 2. Source that emits a final watermark that causes the topology to
> terminate after the watermark has traversed the topology. Is that possible
> with Flink?
>
> But probably this is a rather common testing need that's already solved?!
>
> Thanks,
> Thomas

--------------------------
Ken Krugler
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr

Reply | Threaded
Open this post in threaded view
|

Re: Terminating streaming test

Thomas Weise
Hi Ken,

Thanks! I would expect more folks to run into this and hence surprised to
not find this in LocalStreamEnvironment. Is there a reason for that?

In the specific case, we have an unbounded source (Kinesis), but for
testing we would like to make it bounded. Hence the earlier question
whether it is possible to terminate a topology with a final watermark or
different means from within the source, similar to how a bounded source in
Beam would behave.

Thanks,
Thomas






On Tue, Feb 6, 2018 at 5:16 PM, Ken Krugler <[hidden email]>
wrote:

> Hi Thomas,
>
> Normally the streaming job will terminate when the sources are exhausted
> and all records have been processed.
>
> I assume you have some unbounded source(s), thus this doesn’t work for
> your case.
>
> We’d run into a similar situation with a streaming job that has iterations.
>
> Our solution was your option #1 below, where we created a modified version
> of LocalStreamEnvironment <https://raw.githubusercontent.com/
> ScaleUnlimited/flink-crawler/master/src/main/java/org/
> apache/flink/streaming/api/environment/LocalStreamEnvironmentWithAsyn
> cExecution.java> that supports async execution.
>
> — Ken
>
>
> > On Feb 6, 2018, at 4:21 PM, Thomas Weise <[hidden email]> wrote:
> >
> > Hi,
> >
> > I'm looking for an example of an integration test that runs a streaming
> job
> > and terminates when the expected result becomes available. I could think
> of
> > 2 approaches:
> >
> > 1. Modified version of LocalStreamEnvironment that executes the job
> > asynchronously and polls for the result or
> >
> > 2. Source that emits a final watermark that causes the topology to
> > terminate after the watermark has traversed the topology. Is that
> possible
> > with Flink?
> >
> > But probably this is a rather common testing need that's already solved?!
> >
> > Thanks,
> > Thomas
>
> --------------------------
> Ken Krugler
> http://www.scaleunlimited.com
> custom big data solutions & training
> Hadoop, Cascading, Cassandra & Solr
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Terminating streaming test

Aljoscha Krettek-2
There is StatefulJobSavepointMigrationITCase, which executes a proper unbounded pipeline on a locally started cluster and "listens" for some criteria via accumulators before cancelling the job and shutting down the cluster. The communication with the cluster is quite custom here, but I would really like to have a framework that comes with Flink that allows writing such tests. Somewhat similar to how PAssert works in Beam.

Best,
Aljoscha

> On 7. Feb 2018, at 04:34, Thomas Weise <[hidden email]> wrote:
>
> Hi Ken,
>
> Thanks! I would expect more folks to run into this and hence surprised to
> not find this in LocalStreamEnvironment. Is there a reason for that?
>
> In the specific case, we have an unbounded source (Kinesis), but for
> testing we would like to make it bounded. Hence the earlier question
> whether it is possible to terminate a topology with a final watermark or
> different means from within the source, similar to how a bounded source in
> Beam would behave.
>
> Thanks,
> Thomas
>
>
>
>
>
>
> On Tue, Feb 6, 2018 at 5:16 PM, Ken Krugler <[hidden email]>
> wrote:
>
>> Hi Thomas,
>>
>> Normally the streaming job will terminate when the sources are exhausted
>> and all records have been processed.
>>
>> I assume you have some unbounded source(s), thus this doesn’t work for
>> your case.
>>
>> We’d run into a similar situation with a streaming job that has iterations.
>>
>> Our solution was your option #1 below, where we created a modified version
>> of LocalStreamEnvironment <https://raw.githubusercontent.com/
>> ScaleUnlimited/flink-crawler/master/src/main/java/org/
>> apache/flink/streaming/api/environment/LocalStreamEnvironmentWithAsyn
>> cExecution.java> that supports async execution.
>>
>> — Ken
>>
>>
>>> On Feb 6, 2018, at 4:21 PM, Thomas Weise <[hidden email]> wrote:
>>>
>>> Hi,
>>>
>>> I'm looking for an example of an integration test that runs a streaming
>> job
>>> and terminates when the expected result becomes available. I could think
>> of
>>> 2 approaches:
>>>
>>> 1. Modified version of LocalStreamEnvironment that executes the job
>>> asynchronously and polls for the result or
>>>
>>> 2. Source that emits a final watermark that causes the topology to
>>> terminate after the watermark has traversed the topology. Is that
>> possible
>>> with Flink?
>>>
>>> But probably this is a rather common testing need that's already solved?!
>>>
>>> Thanks,
>>> Thomas
>>
>> --------------------------
>> Ken Krugler
>> http://www.scaleunlimited.com
>> custom big data solutions & training
>> Hadoop, Cascading, Cassandra & Solr
>>
>>

Reply | Threaded
Open this post in threaded view
|

Re: Terminating streaming test

Thomas Weise
Thanks! It would indeed be nice to have this as framework that makes test
fun and easy to write ;-)

Looking at SavepointMigrationTestBase, I see that the cluster is eventually
stopped in teardown, but I don't find where the individual job is
terminated after the expected results are in? Also, CheckingRestoringSource
will run until cancelled, is there a way where the source can trigger
pipeline termination?

Thanks,
Thomas


On Wed, Feb 7, 2018 at 7:56 AM, Aljoscha Krettek <[hidden email]>
wrote:

> There is StatefulJobSavepointMigrationITCase, which executes a proper
> unbounded pipeline on a locally started cluster and "listens" for some
> criteria via accumulators before cancelling the job and shutting down the
> cluster. The communication with the cluster is quite custom here, but I
> would really like to have a framework that comes with Flink that allows
> writing such tests. Somewhat similar to how PAssert works in Beam.
>
> Best,
> Aljoscha
>
> > On 7. Feb 2018, at 04:34, Thomas Weise <[hidden email]> wrote:
> >
> > Hi Ken,
> >
> > Thanks! I would expect more folks to run into this and hence surprised to
> > not find this in LocalStreamEnvironment. Is there a reason for that?
> >
> > In the specific case, we have an unbounded source (Kinesis), but for
> > testing we would like to make it bounded. Hence the earlier question
> > whether it is possible to terminate a topology with a final watermark or
> > different means from within the source, similar to how a bounded source
> in
> > Beam would behave.
> >
> > Thanks,
> > Thomas
> >
> >
> >
> >
> >
> >
> > On Tue, Feb 6, 2018 at 5:16 PM, Ken Krugler <[hidden email]
> >
> > wrote:
> >
> >> Hi Thomas,
> >>
> >> Normally the streaming job will terminate when the sources are exhausted
> >> and all records have been processed.
> >>
> >> I assume you have some unbounded source(s), thus this doesn’t work for
> >> your case.
> >>
> >> We’d run into a similar situation with a streaming job that has
> iterations.
> >>
> >> Our solution was your option #1 below, where we created a modified
> version
> >> of LocalStreamEnvironment <https://raw.githubusercontent.com/
> >> ScaleUnlimited/flink-crawler/master/src/main/java/org/
> >> apache/flink/streaming/api/environment/LocalStreamEnvironmentWithAsyn
> >> cExecution.java> that supports async execution.
> >>
> >> — Ken
> >>
> >>
> >>> On Feb 6, 2018, at 4:21 PM, Thomas Weise <[hidden email]> wrote:
> >>>
> >>> Hi,
> >>>
> >>> I'm looking for an example of an integration test that runs a streaming
> >> job
> >>> and terminates when the expected result becomes available. I could
> think
> >> of
> >>> 2 approaches:
> >>>
> >>> 1. Modified version of LocalStreamEnvironment that executes the job
> >>> asynchronously and polls for the result or
> >>>
> >>> 2. Source that emits a final watermark that causes the topology to
> >>> terminate after the watermark has traversed the topology. Is that
> >> possible
> >>> with Flink?
> >>>
> >>> But probably this is a rather common testing need that's already
> solved?!
> >>>
> >>> Thanks,
> >>> Thomas
> >>
> >> --------------------------
> >> Ken Krugler
> >> http://www.scaleunlimited.com
> >> custom big data solutions & training
> >> Hadoop, Cascading, Cassandra & Solr
> >>
> >>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Terminating streaming test

Aljoscha Krettek-2
Hi,

The job is not explicitly stopped, bringing down the cluster will also bring down the job. (Which is maybe not the nicest way of doing things but it works.)

Sources can trigger pipeline termination by returning from their run() method.

Best,
Aljoscha

> On 7. Feb 2018, at 21:15, Thomas Weise <[hidden email]> wrote:
>
> Thanks! It would indeed be nice to have this as framework that makes test
> fun and easy to write ;-)
>
> Looking at SavepointMigrationTestBase, I see that the cluster is eventually
> stopped in teardown, but I don't find where the individual job is
> terminated after the expected results are in? Also, CheckingRestoringSource
> will run until cancelled, is there a way where the source can trigger
> pipeline termination?
>
> Thanks,
> Thomas
>
>
> On Wed, Feb 7, 2018 at 7:56 AM, Aljoscha Krettek <[hidden email]>
> wrote:
>
>> There is StatefulJobSavepointMigrationITCase, which executes a proper
>> unbounded pipeline on a locally started cluster and "listens" for some
>> criteria via accumulators before cancelling the job and shutting down the
>> cluster. The communication with the cluster is quite custom here, but I
>> would really like to have a framework that comes with Flink that allows
>> writing such tests. Somewhat similar to how PAssert works in Beam.
>>
>> Best,
>> Aljoscha
>>
>>> On 7. Feb 2018, at 04:34, Thomas Weise <[hidden email]> wrote:
>>>
>>> Hi Ken,
>>>
>>> Thanks! I would expect more folks to run into this and hence surprised to
>>> not find this in LocalStreamEnvironment. Is there a reason for that?
>>>
>>> In the specific case, we have an unbounded source (Kinesis), but for
>>> testing we would like to make it bounded. Hence the earlier question
>>> whether it is possible to terminate a topology with a final watermark or
>>> different means from within the source, similar to how a bounded source
>> in
>>> Beam would behave.
>>>
>>> Thanks,
>>> Thomas
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Tue, Feb 6, 2018 at 5:16 PM, Ken Krugler <[hidden email]
>>>
>>> wrote:
>>>
>>>> Hi Thomas,
>>>>
>>>> Normally the streaming job will terminate when the sources are exhausted
>>>> and all records have been processed.
>>>>
>>>> I assume you have some unbounded source(s), thus this doesn’t work for
>>>> your case.
>>>>
>>>> We’d run into a similar situation with a streaming job that has
>> iterations.
>>>>
>>>> Our solution was your option #1 below, where we created a modified
>> version
>>>> of LocalStreamEnvironment <https://raw.githubusercontent.com/
>>>> ScaleUnlimited/flink-crawler/master/src/main/java/org/
>>>> apache/flink/streaming/api/environment/LocalStreamEnvironmentWithAsyn
>>>> cExecution.java> that supports async execution.
>>>>
>>>> — Ken
>>>>
>>>>
>>>>> On Feb 6, 2018, at 4:21 PM, Thomas Weise <[hidden email]> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> I'm looking for an example of an integration test that runs a streaming
>>>> job
>>>>> and terminates when the expected result becomes available. I could
>> think
>>>> of
>>>>> 2 approaches:
>>>>>
>>>>> 1. Modified version of LocalStreamEnvironment that executes the job
>>>>> asynchronously and polls for the result or
>>>>>
>>>>> 2. Source that emits a final watermark that causes the topology to
>>>>> terminate after the watermark has traversed the topology. Is that
>>>> possible
>>>>> with Flink?
>>>>>
>>>>> But probably this is a rather common testing need that's already
>> solved?!
>>>>>
>>>>> Thanks,
>>>>> Thomas
>>>>
>>>> --------------------------
>>>> Ken Krugler
>>>> http://www.scaleunlimited.com
>>>> custom big data solutions & training
>>>> Hadoop, Cascading, Cassandra & Solr
>>>>
>>>>
>>
>>