No error on dangling dataflow parts

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

No error on dangling dataflow parts

Matthias J. Sax-2
Hi,

I just hit a problem in Storm Compatibility:
https://issues.apache.org/jira/browse/FLINK-2837

If a bolt has multiple inputs, the topology is not translated correctly
into a Flink streaming program. The point is, that the Flink program can
be executed without an error, even if the assembled data flow has
dangling parts...

For example:

Source1 --+--+--> Bolt --> SinkBolt
          |  |
Source2 --+  |
             |
Source3 -----+

Is translated to the following Flink program

Source1 --> Bolt --> SinkBolt

Source2 --> Bolt

Source3 --> Bolt

with Source2 and Source3 being added to the environment but not
connected correctly to the overall program because the Bolt is
instantiated three times and only a single bolt is connect to the sink.
It is clear, that Flink just drops the dangling parts, as it builds the
JobGraph starting from the sink and traversing backwards. I was just
wondering, if an error should actually occur.


-Matthias


signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: No error on dangling dataflow parts

Aljoscha Krettek-2
Hi,
I think Flink does in fact not drop the dangling parts. In streaming it is allowed to have dangling operators that are not sinks. They are executed and the output is just discarded.

Cheers,
Aljoscha

> On 08 Oct 2015, at 16:18, Matthias J. Sax <[hidden email]> wrote:
>
> Hi,
>
> I just hit a problem in Storm Compatibility:
> https://issues.apache.org/jira/browse/FLINK-2837
>
> If a bolt has multiple inputs, the topology is not translated correctly
> into a Flink streaming program. The point is, that the Flink program can
> be executed without an error, even if the assembled data flow has
> dangling parts...
>
> For example:
>
> Source1 --+--+--> Bolt --> SinkBolt
>          |  |
> Source2 --+  |
>             |
> Source3 -----+
>
> Is translated to the following Flink program
>
> Source1 --> Bolt --> SinkBolt
>
> Source2 --> Bolt
>
> Source3 --> Bolt
>
> with Source2 and Source3 being added to the environment but not
> connected correctly to the overall program because the Bolt is
> instantiated three times and only a single bolt is connect to the sink.
> It is clear, that Flink just drops the dangling parts, as it builds the
> JobGraph starting from the sink and traversing backwards. I was just
> wondering, if an error should actually occur.
>
>
> -Matthias
>

Reply | Threaded
Open this post in threaded view
|

Re: No error on dangling dataflow parts

Matthias J. Sax-2
Well. This behavior would also be kind of strange... (at least to me)

On 10/08/2015 04:22 PM, Aljoscha Krettek wrote:

> Hi,
> I think Flink does in fact not drop the dangling parts. In streaming it is allowed to have dangling operators that are not sinks. They are executed and the output is just discarded.
>
> Cheers,
> Aljoscha
>> On 08 Oct 2015, at 16:18, Matthias J. Sax <[hidden email]> wrote:
>>
>> Hi,
>>
>> I just hit a problem in Storm Compatibility:
>> https://issues.apache.org/jira/browse/FLINK-2837
>>
>> If a bolt has multiple inputs, the topology is not translated correctly
>> into a Flink streaming program. The point is, that the Flink program can
>> be executed without an error, even if the assembled data flow has
>> dangling parts...
>>
>> For example:
>>
>> Source1 --+--+--> Bolt --> SinkBolt
>>          |  |
>> Source2 --+  |
>>             |
>> Source3 -----+
>>
>> Is translated to the following Flink program
>>
>> Source1 --> Bolt --> SinkBolt
>>
>> Source2 --> Bolt
>>
>> Source3 --> Bolt
>>
>> with Source2 and Source3 being added to the environment but not
>> connected correctly to the overall program because the Bolt is
>> instantiated three times and only a single bolt is connect to the sink.
>> It is clear, that Flink just drops the dangling parts, as it builds the
>> JobGraph starting from the sink and traversing backwards. I was just
>> wondering, if an error should actually occur.
>>
>>
>> -Matthias
>>
>


signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: No error on dangling dataflow parts

Aljoscha Krettek-2
What do you mean? The current behavior is strange or the other way round would be strange?

I think it is in line with what other Stream Processing Systems provide. For example Storm and Google Dataflow behave similarly.

> On 08 Oct 2015, at 16:25, Matthias J. Sax <[hidden email]> wrote:
>
> Well. This behavior would also be kind of strange... (at least to me)
>
> On 10/08/2015 04:22 PM, Aljoscha Krettek wrote:
>> Hi,
>> I think Flink does in fact not drop the dangling parts. In streaming it is allowed to have dangling operators that are not sinks. They are executed and the output is just discarded.
>>
>> Cheers,
>> Aljoscha
>>> On 08 Oct 2015, at 16:18, Matthias J. Sax <[hidden email]> wrote:
>>>
>>> Hi,
>>>
>>> I just hit a problem in Storm Compatibility:
>>> https://issues.apache.org/jira/browse/FLINK-2837
>>>
>>> If a bolt has multiple inputs, the topology is not translated correctly
>>> into a Flink streaming program. The point is, that the Flink program can
>>> be executed without an error, even if the assembled data flow has
>>> dangling parts...
>>>
>>> For example:
>>>
>>> Source1 --+--+--> Bolt --> SinkBolt
>>>         |  |
>>> Source2 --+  |
>>>            |
>>> Source3 -----+
>>>
>>> Is translated to the following Flink program
>>>
>>> Source1 --> Bolt --> SinkBolt
>>>
>>> Source2 --> Bolt
>>>
>>> Source3 --> Bolt
>>>
>>> with Source2 and Source3 being added to the environment but not
>>> connected correctly to the overall program because the Bolt is
>>> instantiated three times and only a single bolt is connect to the sink.
>>> It is clear, that Flink just drops the dangling parts, as it builds the
>>> JobGraph starting from the sink and traversing backwards. I was just
>>> wondering, if an error should actually occur.
>>>
>>>
>>> -Matthias
>>>
>>
>

Reply | Threaded
Open this post in threaded view
|

Re: No error on dangling dataflow parts

Matthias J. Sax-2
Dropping would be less strange.

However, raising an exception would be natural (at least to me)


On 10/08/2015 04:30 PM, Aljoscha Krettek wrote:

> What do you mean? The current behavior is strange or the other way round would be strange?
>
> I think it is in line with what other Stream Processing Systems provide. For example Storm and Google Dataflow behave similarly.
>
>> On 08 Oct 2015, at 16:25, Matthias J. Sax <[hidden email]> wrote:
>>
>> Well. This behavior would also be kind of strange... (at least to me)
>>
>> On 10/08/2015 04:22 PM, Aljoscha Krettek wrote:
>>> Hi,
>>> I think Flink does in fact not drop the dangling parts. In streaming it is allowed to have dangling operators that are not sinks. They are executed and the output is just discarded.
>>>
>>> Cheers,
>>> Aljoscha
>>>> On 08 Oct 2015, at 16:18, Matthias J. Sax <[hidden email]> wrote:
>>>>
>>>> Hi,
>>>>
>>>> I just hit a problem in Storm Compatibility:
>>>> https://issues.apache.org/jira/browse/FLINK-2837
>>>>
>>>> If a bolt has multiple inputs, the topology is not translated correctly
>>>> into a Flink streaming program. The point is, that the Flink program can
>>>> be executed without an error, even if the assembled data flow has
>>>> dangling parts...
>>>>
>>>> For example:
>>>>
>>>> Source1 --+--+--> Bolt --> SinkBolt
>>>>         |  |
>>>> Source2 --+  |
>>>>            |
>>>> Source3 -----+
>>>>
>>>> Is translated to the following Flink program
>>>>
>>>> Source1 --> Bolt --> SinkBolt
>>>>
>>>> Source2 --> Bolt
>>>>
>>>> Source3 --> Bolt
>>>>
>>>> with Source2 and Source3 being added to the environment but not
>>>> connected correctly to the overall program because the Bolt is
>>>> instantiated three times and only a single bolt is connect to the sink.
>>>> It is clear, that Flink just drops the dangling parts, as it builds the
>>>> JobGraph starting from the sink and traversing backwards. I was just
>>>> wondering, if an error should actually occur.
>>>>
>>>>
>>>> -Matthias
>>>>
>>>
>>
>


signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: No error on dangling dataflow parts

Aljoscha Krettek-2
We had some discussion on the Jira Issue when I changed the translation
from operators to StreamGraph:
https://issues.apache.org/jira/browse/FLINK-2398

There are arguments for both ways of doing it, I'm not partial towards any
solution but if you want this changed you can start a discussion. (If we
change it, however, this will probably not come in time for 0.10)

On Thu, 8 Oct 2015 at 16:42 Matthias J. Sax <[hidden email]> wrote:

> Dropping would be less strange.
>
> However, raising an exception would be natural (at least to me)
>
>
> On 10/08/2015 04:30 PM, Aljoscha Krettek wrote:
> > What do you mean? The current behavior is strange or the other way round
> would be strange?
> >
> > I think it is in line with what other Stream Processing Systems provide.
> For example Storm and Google Dataflow behave similarly.
> >
> >> On 08 Oct 2015, at 16:25, Matthias J. Sax <[hidden email]> wrote:
> >>
> >> Well. This behavior would also be kind of strange... (at least to me)
> >>
> >> On 10/08/2015 04:22 PM, Aljoscha Krettek wrote:
> >>> Hi,
> >>> I think Flink does in fact not drop the dangling parts. In streaming
> it is allowed to have dangling operators that are not sinks. They are
> executed and the output is just discarded.
> >>>
> >>> Cheers,
> >>> Aljoscha
> >>>> On 08 Oct 2015, at 16:18, Matthias J. Sax <[hidden email]> wrote:
> >>>>
> >>>> Hi,
> >>>>
> >>>> I just hit a problem in Storm Compatibility:
> >>>> https://issues.apache.org/jira/browse/FLINK-2837
> >>>>
> >>>> If a bolt has multiple inputs, the topology is not translated
> correctly
> >>>> into a Flink streaming program. The point is, that the Flink program
> can
> >>>> be executed without an error, even if the assembled data flow has
> >>>> dangling parts...
> >>>>
> >>>> For example:
> >>>>
> >>>> Source1 --+--+--> Bolt --> SinkBolt
> >>>>         |  |
> >>>> Source2 --+  |
> >>>>            |
> >>>> Source3 -----+
> >>>>
> >>>> Is translated to the following Flink program
> >>>>
> >>>> Source1 --> Bolt --> SinkBolt
> >>>>
> >>>> Source2 --> Bolt
> >>>>
> >>>> Source3 --> Bolt
> >>>>
> >>>> with Source2 and Source3 being added to the environment but not
> >>>> connected correctly to the overall program because the Bolt is
> >>>> instantiated three times and only a single bolt is connect to the
> sink.
> >>>> It is clear, that Flink just drops the dangling parts, as it builds
> the
> >>>> JobGraph starting from the sink and traversing backwards. I was just
> >>>> wondering, if an error should actually occur.
> >>>>
> >>>>
> >>>> -Matthias
> >>>>
> >>>
> >>
> >
>
>
Reply | Threaded
Open this post in threaded view
|

Re: No error on dangling dataflow parts

Matthias J. Sax-2
I just skipped over the discussion. If I got it right, the question was
about if a dedicated sink operator must be the last in a flow or not. In
the case I described below, it is about distinct flows in a single
program. This was not discussed in the PR (or did I miss it?).

For sink/no-sink I don't care. However, having two data flow components
that are not connected to each other is an issue for me. I also would
raise an exception for a program like this:

Source1 --> Bolt --> SinkBolt
Source2 --> Bolt --> SinkBolt




-Matthias


On 10/08/2015 04:55 PM, Aljoscha Krettek wrote:

> We had some discussion on the Jira Issue when I changed the translation
> from operators to StreamGraph:
> https://issues.apache.org/jira/browse/FLINK-2398
>
> There are arguments for both ways of doing it, I'm not partial towards any
> solution but if you want this changed you can start a discussion. (If we
> change it, however, this will probably not come in time for 0.10)
>
> On Thu, 8 Oct 2015 at 16:42 Matthias J. Sax <[hidden email]> wrote:
>
>> Dropping would be less strange.
>>
>> However, raising an exception would be natural (at least to me)
>>
>>
>> On 10/08/2015 04:30 PM, Aljoscha Krettek wrote:
>>> What do you mean? The current behavior is strange or the other way round
>> would be strange?
>>>
>>> I think it is in line with what other Stream Processing Systems provide.
>> For example Storm and Google Dataflow behave similarly.
>>>
>>>> On 08 Oct 2015, at 16:25, Matthias J. Sax <[hidden email]> wrote:
>>>>
>>>> Well. This behavior would also be kind of strange... (at least to me)
>>>>
>>>> On 10/08/2015 04:22 PM, Aljoscha Krettek wrote:
>>>>> Hi,
>>>>> I think Flink does in fact not drop the dangling parts. In streaming
>> it is allowed to have dangling operators that are not sinks. They are
>> executed and the output is just discarded.
>>>>>
>>>>> Cheers,
>>>>> Aljoscha
>>>>>> On 08 Oct 2015, at 16:18, Matthias J. Sax <[hidden email]> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I just hit a problem in Storm Compatibility:
>>>>>> https://issues.apache.org/jira/browse/FLINK-2837
>>>>>>
>>>>>> If a bolt has multiple inputs, the topology is not translated
>> correctly
>>>>>> into a Flink streaming program. The point is, that the Flink program
>> can
>>>>>> be executed without an error, even if the assembled data flow has
>>>>>> dangling parts...
>>>>>>
>>>>>> For example:
>>>>>>
>>>>>> Source1 --+--+--> Bolt --> SinkBolt
>>>>>>         |  |
>>>>>> Source2 --+  |
>>>>>>            |
>>>>>> Source3 -----+
>>>>>>
>>>>>> Is translated to the following Flink program
>>>>>>
>>>>>> Source1 --> Bolt --> SinkBolt
>>>>>>
>>>>>> Source2 --> Bolt
>>>>>>
>>>>>> Source3 --> Bolt
>>>>>>
>>>>>> with Source2 and Source3 being added to the environment but not
>>>>>> connected correctly to the overall program because the Bolt is
>>>>>> instantiated three times and only a single bolt is connect to the
>> sink.
>>>>>> It is clear, that Flink just drops the dangling parts, as it builds
>> the
>>>>>> JobGraph starting from the sink and traversing backwards. I was just
>>>>>> wondering, if an error should actually occur.
>>>>>>
>>>>>>
>>>>>> -Matthias
>>>>>>
>>>>>
>>>>
>>>
>>
>>
>


signature.asc (836 bytes) Download Attachment