Hi,
I just hit a problem in Storm Compatibility: https://issues.apache.org/jira/browse/FLINK-2837 If a bolt has multiple inputs, the topology is not translated correctly into a Flink streaming program. The point is, that the Flink program can be executed without an error, even if the assembled data flow has dangling parts... For example: Source1 --+--+--> Bolt --> SinkBolt | | Source2 --+ | | Source3 -----+ Is translated to the following Flink program Source1 --> Bolt --> SinkBolt Source2 --> Bolt Source3 --> Bolt with Source2 and Source3 being added to the environment but not connected correctly to the overall program because the Bolt is instantiated three times and only a single bolt is connect to the sink. It is clear, that Flink just drops the dangling parts, as it builds the JobGraph starting from the sink and traversing backwards. I was just wondering, if an error should actually occur. -Matthias |
Hi,
I think Flink does in fact not drop the dangling parts. In streaming it is allowed to have dangling operators that are not sinks. They are executed and the output is just discarded. Cheers, Aljoscha > On 08 Oct 2015, at 16:18, Matthias J. Sax <[hidden email]> wrote: > > Hi, > > I just hit a problem in Storm Compatibility: > https://issues.apache.org/jira/browse/FLINK-2837 > > If a bolt has multiple inputs, the topology is not translated correctly > into a Flink streaming program. The point is, that the Flink program can > be executed without an error, even if the assembled data flow has > dangling parts... > > For example: > > Source1 --+--+--> Bolt --> SinkBolt > | | > Source2 --+ | > | > Source3 -----+ > > Is translated to the following Flink program > > Source1 --> Bolt --> SinkBolt > > Source2 --> Bolt > > Source3 --> Bolt > > with Source2 and Source3 being added to the environment but not > connected correctly to the overall program because the Bolt is > instantiated three times and only a single bolt is connect to the sink. > It is clear, that Flink just drops the dangling parts, as it builds the > JobGraph starting from the sink and traversing backwards. I was just > wondering, if an error should actually occur. > > > -Matthias > |
Well. This behavior would also be kind of strange... (at least to me)
On 10/08/2015 04:22 PM, Aljoscha Krettek wrote: > Hi, > I think Flink does in fact not drop the dangling parts. In streaming it is allowed to have dangling operators that are not sinks. They are executed and the output is just discarded. > > Cheers, > Aljoscha >> On 08 Oct 2015, at 16:18, Matthias J. Sax <[hidden email]> wrote: >> >> Hi, >> >> I just hit a problem in Storm Compatibility: >> https://issues.apache.org/jira/browse/FLINK-2837 >> >> If a bolt has multiple inputs, the topology is not translated correctly >> into a Flink streaming program. The point is, that the Flink program can >> be executed without an error, even if the assembled data flow has >> dangling parts... >> >> For example: >> >> Source1 --+--+--> Bolt --> SinkBolt >> | | >> Source2 --+ | >> | >> Source3 -----+ >> >> Is translated to the following Flink program >> >> Source1 --> Bolt --> SinkBolt >> >> Source2 --> Bolt >> >> Source3 --> Bolt >> >> with Source2 and Source3 being added to the environment but not >> connected correctly to the overall program because the Bolt is >> instantiated three times and only a single bolt is connect to the sink. >> It is clear, that Flink just drops the dangling parts, as it builds the >> JobGraph starting from the sink and traversing backwards. I was just >> wondering, if an error should actually occur. >> >> >> -Matthias >> > |
What do you mean? The current behavior is strange or the other way round would be strange?
I think it is in line with what other Stream Processing Systems provide. For example Storm and Google Dataflow behave similarly. > On 08 Oct 2015, at 16:25, Matthias J. Sax <[hidden email]> wrote: > > Well. This behavior would also be kind of strange... (at least to me) > > On 10/08/2015 04:22 PM, Aljoscha Krettek wrote: >> Hi, >> I think Flink does in fact not drop the dangling parts. In streaming it is allowed to have dangling operators that are not sinks. They are executed and the output is just discarded. >> >> Cheers, >> Aljoscha >>> On 08 Oct 2015, at 16:18, Matthias J. Sax <[hidden email]> wrote: >>> >>> Hi, >>> >>> I just hit a problem in Storm Compatibility: >>> https://issues.apache.org/jira/browse/FLINK-2837 >>> >>> If a bolt has multiple inputs, the topology is not translated correctly >>> into a Flink streaming program. The point is, that the Flink program can >>> be executed without an error, even if the assembled data flow has >>> dangling parts... >>> >>> For example: >>> >>> Source1 --+--+--> Bolt --> SinkBolt >>> | | >>> Source2 --+ | >>> | >>> Source3 -----+ >>> >>> Is translated to the following Flink program >>> >>> Source1 --> Bolt --> SinkBolt >>> >>> Source2 --> Bolt >>> >>> Source3 --> Bolt >>> >>> with Source2 and Source3 being added to the environment but not >>> connected correctly to the overall program because the Bolt is >>> instantiated three times and only a single bolt is connect to the sink. >>> It is clear, that Flink just drops the dangling parts, as it builds the >>> JobGraph starting from the sink and traversing backwards. I was just >>> wondering, if an error should actually occur. >>> >>> >>> -Matthias >>> >> > |
Dropping would be less strange.
However, raising an exception would be natural (at least to me) On 10/08/2015 04:30 PM, Aljoscha Krettek wrote: > What do you mean? The current behavior is strange or the other way round would be strange? > > I think it is in line with what other Stream Processing Systems provide. For example Storm and Google Dataflow behave similarly. > >> On 08 Oct 2015, at 16:25, Matthias J. Sax <[hidden email]> wrote: >> >> Well. This behavior would also be kind of strange... (at least to me) >> >> On 10/08/2015 04:22 PM, Aljoscha Krettek wrote: >>> Hi, >>> I think Flink does in fact not drop the dangling parts. In streaming it is allowed to have dangling operators that are not sinks. They are executed and the output is just discarded. >>> >>> Cheers, >>> Aljoscha >>>> On 08 Oct 2015, at 16:18, Matthias J. Sax <[hidden email]> wrote: >>>> >>>> Hi, >>>> >>>> I just hit a problem in Storm Compatibility: >>>> https://issues.apache.org/jira/browse/FLINK-2837 >>>> >>>> If a bolt has multiple inputs, the topology is not translated correctly >>>> into a Flink streaming program. The point is, that the Flink program can >>>> be executed without an error, even if the assembled data flow has >>>> dangling parts... >>>> >>>> For example: >>>> >>>> Source1 --+--+--> Bolt --> SinkBolt >>>> | | >>>> Source2 --+ | >>>> | >>>> Source3 -----+ >>>> >>>> Is translated to the following Flink program >>>> >>>> Source1 --> Bolt --> SinkBolt >>>> >>>> Source2 --> Bolt >>>> >>>> Source3 --> Bolt >>>> >>>> with Source2 and Source3 being added to the environment but not >>>> connected correctly to the overall program because the Bolt is >>>> instantiated three times and only a single bolt is connect to the sink. >>>> It is clear, that Flink just drops the dangling parts, as it builds the >>>> JobGraph starting from the sink and traversing backwards. I was just >>>> wondering, if an error should actually occur. >>>> >>>> >>>> -Matthias >>>> >>> >> > |
We had some discussion on the Jira Issue when I changed the translation
from operators to StreamGraph: https://issues.apache.org/jira/browse/FLINK-2398 There are arguments for both ways of doing it, I'm not partial towards any solution but if you want this changed you can start a discussion. (If we change it, however, this will probably not come in time for 0.10) On Thu, 8 Oct 2015 at 16:42 Matthias J. Sax <[hidden email]> wrote: > Dropping would be less strange. > > However, raising an exception would be natural (at least to me) > > > On 10/08/2015 04:30 PM, Aljoscha Krettek wrote: > > What do you mean? The current behavior is strange or the other way round > would be strange? > > > > I think it is in line with what other Stream Processing Systems provide. > For example Storm and Google Dataflow behave similarly. > > > >> On 08 Oct 2015, at 16:25, Matthias J. Sax <[hidden email]> wrote: > >> > >> Well. This behavior would also be kind of strange... (at least to me) > >> > >> On 10/08/2015 04:22 PM, Aljoscha Krettek wrote: > >>> Hi, > >>> I think Flink does in fact not drop the dangling parts. In streaming > it is allowed to have dangling operators that are not sinks. They are > executed and the output is just discarded. > >>> > >>> Cheers, > >>> Aljoscha > >>>> On 08 Oct 2015, at 16:18, Matthias J. Sax <[hidden email]> wrote: > >>>> > >>>> Hi, > >>>> > >>>> I just hit a problem in Storm Compatibility: > >>>> https://issues.apache.org/jira/browse/FLINK-2837 > >>>> > >>>> If a bolt has multiple inputs, the topology is not translated > correctly > >>>> into a Flink streaming program. The point is, that the Flink program > can > >>>> be executed without an error, even if the assembled data flow has > >>>> dangling parts... > >>>> > >>>> For example: > >>>> > >>>> Source1 --+--+--> Bolt --> SinkBolt > >>>> | | > >>>> Source2 --+ | > >>>> | > >>>> Source3 -----+ > >>>> > >>>> Is translated to the following Flink program > >>>> > >>>> Source1 --> Bolt --> SinkBolt > >>>> > >>>> Source2 --> Bolt > >>>> > >>>> Source3 --> Bolt > >>>> > >>>> with Source2 and Source3 being added to the environment but not > >>>> connected correctly to the overall program because the Bolt is > >>>> instantiated three times and only a single bolt is connect to the > sink. > >>>> It is clear, that Flink just drops the dangling parts, as it builds > the > >>>> JobGraph starting from the sink and traversing backwards. I was just > >>>> wondering, if an error should actually occur. > >>>> > >>>> > >>>> -Matthias > >>>> > >>> > >> > > > > |
I just skipped over the discussion. If I got it right, the question was
about if a dedicated sink operator must be the last in a flow or not. In the case I described below, it is about distinct flows in a single program. This was not discussed in the PR (or did I miss it?). For sink/no-sink I don't care. However, having two data flow components that are not connected to each other is an issue for me. I also would raise an exception for a program like this: Source1 --> Bolt --> SinkBolt Source2 --> Bolt --> SinkBolt -Matthias On 10/08/2015 04:55 PM, Aljoscha Krettek wrote: > We had some discussion on the Jira Issue when I changed the translation > from operators to StreamGraph: > https://issues.apache.org/jira/browse/FLINK-2398 > > There are arguments for both ways of doing it, I'm not partial towards any > solution but if you want this changed you can start a discussion. (If we > change it, however, this will probably not come in time for 0.10) > > On Thu, 8 Oct 2015 at 16:42 Matthias J. Sax <[hidden email]> wrote: > >> Dropping would be less strange. >> >> However, raising an exception would be natural (at least to me) >> >> >> On 10/08/2015 04:30 PM, Aljoscha Krettek wrote: >>> What do you mean? The current behavior is strange or the other way round >> would be strange? >>> >>> I think it is in line with what other Stream Processing Systems provide. >> For example Storm and Google Dataflow behave similarly. >>> >>>> On 08 Oct 2015, at 16:25, Matthias J. Sax <[hidden email]> wrote: >>>> >>>> Well. This behavior would also be kind of strange... (at least to me) >>>> >>>> On 10/08/2015 04:22 PM, Aljoscha Krettek wrote: >>>>> Hi, >>>>> I think Flink does in fact not drop the dangling parts. In streaming >> it is allowed to have dangling operators that are not sinks. They are >> executed and the output is just discarded. >>>>> >>>>> Cheers, >>>>> Aljoscha >>>>>> On 08 Oct 2015, at 16:18, Matthias J. Sax <[hidden email]> wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>> I just hit a problem in Storm Compatibility: >>>>>> https://issues.apache.org/jira/browse/FLINK-2837 >>>>>> >>>>>> If a bolt has multiple inputs, the topology is not translated >> correctly >>>>>> into a Flink streaming program. The point is, that the Flink program >> can >>>>>> be executed without an error, even if the assembled data flow has >>>>>> dangling parts... >>>>>> >>>>>> For example: >>>>>> >>>>>> Source1 --+--+--> Bolt --> SinkBolt >>>>>> | | >>>>>> Source2 --+ | >>>>>> | >>>>>> Source3 -----+ >>>>>> >>>>>> Is translated to the following Flink program >>>>>> >>>>>> Source1 --> Bolt --> SinkBolt >>>>>> >>>>>> Source2 --> Bolt >>>>>> >>>>>> Source3 --> Bolt >>>>>> >>>>>> with Source2 and Source3 being added to the environment but not >>>>>> connected correctly to the overall program because the Bolt is >>>>>> instantiated three times and only a single bolt is connect to the >> sink. >>>>>> It is clear, that Flink just drops the dangling parts, as it builds >> the >>>>>> JobGraph starting from the sink and traversing backwards. I was just >>>>>> wondering, if an error should actually occur. >>>>>> >>>>>> >>>>>> -Matthias >>>>>> >>>>> >>>> >>> >> >> > |
Free forum by Nabble | Edit this page |