Re: Multi-stream question

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: Multi-stream question

TechnoMage
In my case I have more elaborate logic to select data from the streams.  They are not all the same logical type, though I may be able to represent them as the same Java type.  My main question is whether it is technically feasible to have a single operator that takes multiple streams as input.  For example Operator(stream1, stream2, stream3) and produces an output stream.  Can the checkpointing and other logic accomodate this if I write sufficient custom code in the operator?

Michael

> On Apr 7, 2018, at 10:42 AM, Ken Krugler <[hidden email]> wrote:
>
> When you say “join” are you talking about a real join (so one or more fields can be used as a joining key), or some other operation?
>
> For more than two streams, you can do cascading window joins via multiple join()s that reduce your source streams down to a single stream.
>
> If the fields are the same across these streams, then a union() followed by say a ProcessFunction that implements your joining logic could work.
>
> Or you can convert all the streams to a common tuple format that consists of a unions the fields, so you can do a union() and then follow that with whatever logic is needed to actually do the join.
>
> Though I’m sure there are more elegant approaches :)
>
> — Ken
>
>
>
>> On Apr 6, 2018, at 5:04 PM, Michael Latta <[hidden email]> wrote:
>>
>> I would like to “join” several streams (>3) in a custom operator. Is this feasible in Flink?
>>
>>
>> Michael
>
> --------------------------------------------
> http://about.me/kkrugler
> +1 530-210-6378
>

Reply | Threaded
Open this post in threaded view
|

Re: Multi-stream question

Ken Krugler
Hi Michael,

There isn’t an operator that takes three (or more) streams, AFAIK.

There is a CoFlatMapFunction that takes two different streams in, which could be used for some types of joins.

Streaming joins are (typically) windowed (bounded), by time/count/something, so if you can maintain the required windowed state in a ProcessFunction then you can implement whatever custom logic is required for your join case.

And for creating a unioned stream of multiple data types, one easy way is via (e.g.) Tuple3<POJO1, POJO2, POJO3>, where only one of the three fields is non-null for each tuple.

-- Ken

PS - I think the [hidden email] <mailto:[hidden email]> list is probably a better forum for this question.

> On Apr 7, 2018, at 10:47 AM, TechnoMage <[hidden email]> wrote:
>
> In my case I have more elaborate logic to select data from the streams.  They are not all the same logical type, though I may be able to represent them as the same Java type.  My main question is whether it is technically feasible to have a single operator that takes multiple streams as input.  For example Operator(stream1, stream2, stream3) and produces an output stream.  Can the checkpointing and other logic accomodate this if I write sufficient custom code in the operator?
>
> Michael
>
>> On Apr 7, 2018, at 10:42 AM, Ken Krugler <[hidden email]> wrote:
>>
>> When you say “join” are you talking about a real join (so one or more fields can be used as a joining key), or some other operation?
>>
>> For more than two streams, you can do cascading window joins via multiple join()s that reduce your source streams down to a single stream.
>>
>> If the fields are the same across these streams, then a union() followed by say a ProcessFunction that implements your joining logic could work.
>>
>> Or you can convert all the streams to a common tuple format that consists of a unions the fields, so you can do a union() and then follow that with whatever logic is needed to actually do the join.
>>
>> Though I’m sure there are more elegant approaches :)
>>
>> — Ken
>>
>>
>>
>>> On Apr 6, 2018, at 5:04 PM, Michael Latta <[hidden email]> wrote:
>>>
>>> I would like to “join” several streams (>3) in a custom operator. Is this feasible in Flink?
>>>
>>>
>>> Michael
>>
>> --------------------------------------------
>> http://about.me/kkrugler
>> +1 530-210-6378
>>
>

--------------------------------------------
http://about.me/kkrugler
+1 530-210-6378

Reply | Threaded
Open this post in threaded view
|

Re: Multi-stream question

TechnoMage
Thanks for the Tuple suggestion, I may use that.  I was asking about building a custom operator (just an idea).  I have since decided I can decompose the problem into pairs of streams and emit a stream to the next CoFlatMap to get the result I need.  Now to see if the idea works ...

Michael

> On Apr 7, 2018, at 1:10 PM, Ken Krugler <[hidden email]> wrote:
>
> Hi Michael,
>
> There isn’t an operator that takes three (or more) streams, AFAIK.
>
> There is a CoFlatMapFunction that takes two different streams in, which could be used for some types of joins.
>
> Streaming joins are (typically) windowed (bounded), by time/count/something, so if you can maintain the required windowed state in a ProcessFunction then you can implement whatever custom logic is required for your join case.
>
> And for creating a unioned stream of multiple data types, one easy way is via (e.g.) Tuple3<POJO1, POJO2, POJO3>, where only one of the three fields is non-null for each tuple.
>
> -- Ken
>
> PS - I think the [hidden email] <mailto:[hidden email]> list is probably a better forum for this question.
>
>> On Apr 7, 2018, at 10:47 AM, TechnoMage <[hidden email]> wrote:
>>
>> In my case I have more elaborate logic to select data from the streams.  They are not all the same logical type, though I may be able to represent them as the same Java type.  My main question is whether it is technically feasible to have a single operator that takes multiple streams as input.  For example Operator(stream1, stream2, stream3) and produces an output stream.  Can the checkpointing and other logic accomodate this if I write sufficient custom code in the operator?
>>
>> Michael
>>
>>> On Apr 7, 2018, at 10:42 AM, Ken Krugler <[hidden email]> wrote:
>>>
>>> When you say “join” are you talking about a real join (so one or more fields can be used as a joining key), or some other operation?
>>>
>>> For more than two streams, you can do cascading window joins via multiple join()s that reduce your source streams down to a single stream.
>>>
>>> If the fields are the same across these streams, then a union() followed by say a ProcessFunction that implements your joining logic could work.
>>>
>>> Or you can convert all the streams to a common tuple format that consists of a unions the fields, so you can do a union() and then follow that with whatever logic is needed to actually do the join.
>>>
>>> Though I’m sure there are more elegant approaches :)
>>>
>>> — Ken
>>>
>>>
>>>
>>>> On Apr 6, 2018, at 5:04 PM, Michael Latta <[hidden email]> wrote:
>>>>
>>>> I would like to “join” several streams (>3) in a custom operator. Is this feasible in Flink?
>>>>
>>>>
>>>> Michael
>>>
>>> --------------------------------------------
>>> http://about.me/kkrugler
>>> +1 530-210-6378
>>>
>>
>
> --------------------------------------------
> http://about.me/kkrugler
> +1 530-210-6378
>

Reply | Threaded
Open this post in threaded view
|

Re: Multi-stream question

Fabian Hueske-2
Hi,

Ken's approach of having a joint data type and unioning the streams is
good. This will work seamlessly with checkpoints. Timo (in CC) used the
same approach to implement a prototype of a multi-way join.

A Tuple won't work though because the Tuple serializer does not support
null fields. You can use a Row or implement a custom, Either-like type.

Best, Fabian


TechnoMage <[hidden email]> schrieb am Sa., 7. Apr. 2018, 17:25:

> Thanks for the Tuple suggestion, I may use that.  I was asking about
> building a custom operator (just an idea).  I have since decided I can
> decompose the problem into pairs of streams and emit a stream to the next
> CoFlatMap to get the result I need.  Now to see if the idea works ...
>
> Michael
>
> > On Apr 7, 2018, at 1:10 PM, Ken Krugler <[hidden email]>
> wrote:
> >
> > Hi Michael,
> >
> > There isn’t an operator that takes three (or more) streams, AFAIK.
> >
> > There is a CoFlatMapFunction that takes two different streams in, which
> could be used for some types of joins.
> >
> > Streaming joins are (typically) windowed (bounded), by
> time/count/something, so if you can maintain the required windowed state in
> a ProcessFunction then you can implement whatever custom logic is required
> for your join case.
> >
> > And for creating a unioned stream of multiple data types, one easy way
> is via (e.g.) Tuple3<POJO1, POJO2, POJO3>, where only one of the three
> fields is non-null for each tuple.
> >
> > -- Ken
> >
> > PS - I think the [hidden email] <mailto:[hidden email]>
> list is probably a better forum for this question.
> >
> >> On Apr 7, 2018, at 10:47 AM, TechnoMage <[hidden email]> wrote:
> >>
> >> In my case I have more elaborate logic to select data from the
> streams.  They are not all the same logical type, though I may be able to
> represent them as the same Java type.  My main question is whether it is
> technically feasible to have a single operator that takes multiple streams
> as input.  For example Operator(stream1, stream2, stream3) and produces an
> output stream.  Can the checkpointing and other logic accomodate this if I
> write sufficient custom code in the operator?
> >>
> >> Michael
> >>
> >>> On Apr 7, 2018, at 10:42 AM, Ken Krugler <[hidden email]>
> wrote:
> >>>
> >>> When you say “join” are you talking about a real join (so one or more
> fields can be used as a joining key), or some other operation?
> >>>
> >>> For more than two streams, you can do cascading window joins via
> multiple join()s that reduce your source streams down to a single stream.
> >>>
> >>> If the fields are the same across these streams, then a union()
> followed by say a ProcessFunction that implements your joining logic could
> work.
> >>>
> >>> Or you can convert all the streams to a common tuple format that
> consists of a unions the fields, so you can do a union() and then follow
> that with whatever logic is needed to actually do the join.
> >>>
> >>> Though I’m sure there are more elegant approaches :)
> >>>
> >>> — Ken
> >>>
> >>>
> >>>
> >>>> On Apr 6, 2018, at 5:04 PM, Michael Latta <[hidden email]>
> wrote:
> >>>>
> >>>> I would like to “join” several streams (>3) in a custom operator. Is
> this feasible in Flink?
> >>>>
> >>>>
> >>>> Michael
> >>>
> >>> --------------------------------------------
> >>> http://about.me/kkrugler
> >>> +1 530-210-6378
> >>>
> >>
> >
> > --------------------------------------------
> > http://about.me/kkrugler
> > +1 530-210-6378
> >
>
>