Verifying watermarks in integration test

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Verifying watermarks in integration test

Thomas Weise
Hi,

I have a streaming integration test with two operators. A source that emits
records and watermarks, and a sink that collects the records. The topology
runs in embedded mode and the results are collected in memory. Now, in
addition to the records, I also want to verify that watermarks have been
emitted. What's the recommended way of doing that?

Thanks
Reply | Threaded
Open this post in threaded view
|

Re: Verifying watermarks in integration test

xingcanc
Hi Thomas,

some test cases in JoinHarnessTest <https://github.com/apache/flink/blob/release-1.4/flink-libraries/flink-table/src/test/scala/org/apache/flink/table/runtime/harness/JoinHarnessTest.scala> show how to verify the emitted watermarks.

Hope this helps.

Best,
Xingcan

> On 21 Feb 2018, at 2:09 PM, Thomas Weise <[hidden email]> wrote:
>
> Hi,
>
> I have a streaming integration test with two operators. A source that emits
> records and watermarks, and a sink that collects the records. The topology
> runs in embedded mode and the results are collected in memory. Now, in
> addition to the records, I also want to verify that watermarks have been
> emitted. What's the recommended way of doing that?
>
> Thanks

Reply | Threaded
Open this post in threaded view
|

Re: Verifying watermarks in integration test

Thomas Weise
Hi Xingcan,

thanks, this is a good way of testing an individual operator. I had written
my own mock code to intercept source context and collect the results, this
is a much better approach for operator testing.

I wonder how I can verify with an embedded Flink cluster though. Even
though my single operator test passes, the results are not emitted as
expected within a topology (not observed in the attached sink). What's the
test approach there?

Thanks,
Thomas


On Wed, Feb 21, 2018 at 12:43 AM, Xingcan Cui <[hidden email]> wrote:

> Hi Thomas,
>
> some test cases in JoinHarnessTest <https://github.com/apache/
> flink/blob/release-1.4/flink-libraries/flink-table/src/
> test/scala/org/apache/flink/table/runtime/harness/JoinHarnessTest.scala>
> show how to verify the emitted watermarks.
>
> Hope this helps.
>
> Best,
> Xingcan
>
> > On 21 Feb 2018, at 2:09 PM, Thomas Weise <[hidden email]> wrote:
> >
> > Hi,
> >
> > I have a streaming integration test with two operators. A source that
> emits
> > records and watermarks, and a sink that collects the records. The
> topology
> > runs in embedded mode and the results are collected in memory. Now, in
> > addition to the records, I also want to verify that watermarks have been
> > emitted. What's the recommended way of doing that?
> >
> > Thanks
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Verifying watermarks in integration test

xingcan
Hi Thomas,

generally speaking, if you want to test a whole job, just run the pipeline in your test case with a collection-based source and a result collecting sink. If your single operator tests passes while the integration test fails, maybe you should first check the timestamp / watermark assigners or the partitioning mechanisms used.

Best,
Xingcan

> On 28 Feb 2018, at 5:46 AM, Thomas Weise <[hidden email]> wrote:
>
> Hi Xingcan,
>
> thanks, this is a good way of testing an individual operator. I had written
> my own mock code to intercept source context and collect the results, this
> is a much better approach for operator testing.
>
> I wonder how I can verify with an embedded Flink cluster though. Even
> though my single operator test passes, the results are not emitted as
> expected within a topology (not observed in the attached sink). What's the
> test approach there?
>
> Thanks,
> Thomas
>
>
> On Wed, Feb 21, 2018 at 12:43 AM, Xingcan Cui <[hidden email]> wrote:
>
>> Hi Thomas,
>>
>> some test cases in JoinHarnessTest <https://github.com/apache/
>> flink/blob/release-1.4/flink-libraries/flink-table/src/
>> test/scala/org/apache/flink/table/runtime/harness/JoinHarnessTest.scala>
>> show how to verify the emitted watermarks.
>>
>> Hope this helps.
>>
>> Best,
>> Xingcan
>>
>>> On 21 Feb 2018, at 2:09 PM, Thomas Weise <[hidden email]> wrote:
>>>
>>> Hi,
>>>
>>> I have a streaming integration test with two operators. A source that
>> emits
>>> records and watermarks, and a sink that collects the records. The
>> topology
>>> runs in embedded mode and the results are collected in memory. Now, in
>>> addition to the records, I also want to verify that watermarks have been
>>> emitted. What's the recommended way of doing that?
>>>
>>> Thanks
>>
>>



Reply | Threaded
Open this post in threaded view
|

Re: Verifying watermarks in integration test

xingcanc
In reply to this post by Thomas Weise
Hi Thomas,

generally speaking, if you want to test a whole job, just run the pipeline in your test case with a collection-based source and a result collecting sink. If your single operator tests passes while the integration test fails, maybe you should first check the timestamp / watermark assigners or the partitioning mechanisms used.

Best,
Xingcan

> On 28 Feb 2018, at 5:46 AM, Thomas Weise <[hidden email]> wrote:
>
> Hi Xingcan,
>
> thanks, this is a good way of testing an individual operator. I had written
> my own mock code to intercept source context and collect the results, this
> is a much better approach for operator testing.
>
> I wonder how I can verify with an embedded Flink cluster though. Even
> though my single operator test passes, the results are not emitted as
> expected within a topology (not observed in the attached sink). What's the
> test approach there?
>
> Thanks,
> Thomas
>
>
> On Wed, Feb 21, 2018 at 12:43 AM, Xingcan Cui <[hidden email]> wrote:
>
>> Hi Thomas,
>>
>> some test cases in JoinHarnessTest <https://github.com/apache/
>> flink/blob/release-1.4/flink-libraries/flink-table/src/
>> test/scala/org/apache/flink/table/runtime/harness/JoinHarnessTest.scala>
>> show how to verify the emitted watermarks.
>>
>> Hope this helps.
>>
>> Best,
>> Xingcan
>>
>>> On 21 Feb 2018, at 2:09 PM, Thomas Weise <[hidden email]> wrote:
>>>
>>> Hi,
>>>
>>> I have a streaming integration test with two operators. A source that
>> emits
>>> records and watermarks, and a sink that collects the records. The
>> topology
>>> runs in embedded mode and the results are collected in memory. Now, in
>>> addition to the records, I also want to verify that watermarks have been
>>> emitted. What's the recommended way of doing that?
>>>
>>> Thanks
>>
>>

Reply | Threaded
Open this post in threaded view
|

Re: Verifying watermarks in integration test

Thomas Weise
Hi,

I had sorted out how to run the topology in embedded mode. What wasn't
clear to me is how I can verify the watermark, but as per following thread
that can be done by inserting a process function:

https://lists.apache.org/thread.html/9a47cb2284032527c1b63d35beb9bd2d4bdc36197849b84f5f69b768@%3Cdev.flink.apache.org%3E

Thanks,
Thomas



On Wed, Feb 28, 2018 at 4:35 AM, Xingcan Cui <[hidden email]> wrote:

> Hi Thomas,
>
> generally speaking, if you want to test a whole job, just run the pipeline
> in your test case with a collection-based source and a result collecting
> sink. If your single operator tests passes while the integration test
> fails, maybe you should first check the timestamp / watermark assigners or
> the partitioning mechanisms used.
>
> Best,
> Xingcan
>
> > On 28 Feb 2018, at 5:46 AM, Thomas Weise <[hidden email]> wrote:
> >
> > Hi Xingcan,
> >
> > thanks, this is a good way of testing an individual operator. I had
> written
> > my own mock code to intercept source context and collect the results,
> this
> > is a much better approach for operator testing.
> >
> > I wonder how I can verify with an embedded Flink cluster though. Even
> > though my single operator test passes, the results are not emitted as
> > expected within a topology (not observed in the attached sink). What's
> the
> > test approach there?
> >
> > Thanks,
> > Thomas
> >
> >
> > On Wed, Feb 21, 2018 at 12:43 AM, Xingcan Cui <[hidden email]>
> wrote:
> >
> >> Hi Thomas,
> >>
> >> some test cases in JoinHarnessTest <https://github.com/apache/
> >> flink/blob/release-1.4/flink-libraries/flink-table/src/
> >> test/scala/org/apache/flink/table/runtime/harness/
> JoinHarnessTest.scala>
> >> show how to verify the emitted watermarks.
> >>
> >> Hope this helps.
> >>
> >> Best,
> >> Xingcan
> >>
> >>> On 21 Feb 2018, at 2:09 PM, Thomas Weise <[hidden email]> wrote:
> >>>
> >>> Hi,
> >>>
> >>> I have a streaming integration test with two operators. A source that
> >> emits
> >>> records and watermarks, and a sink that collects the records. The
> >> topology
> >>> runs in embedded mode and the results are collected in memory. Now, in
> >>> addition to the records, I also want to verify that watermarks have
> been
> >>> emitted. What's the recommended way of doing that?
> >>>
> >>> Thanks
> >>
> >>
>
>