Storm Compatibility Improvement

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Storm Compatibility Improvement

Matthias J. Sax
Hi,

I started to work on a missing feature for the Storm compatibility
layer: named attribute access

In Storm, each attribute of an input tuple can be accessed via index or
by name. Currently, only index access is supported. In order to support
this feature in Flink (embedded Bolt in Flink program), I see two
(independent and complementary) ways to support this feature:

 1) the input type is a POJO
 2) Flink's Tuple type is extended to support named attributes

Right now I started a prototype for POJOs. I would like to extend Tuple
type with named attributes. However, I am not sure how the community
likes this idea.

I would like to get some feedback for the POJO prototype, too. I use
reflections and I am not sure if my code is elegant enough. You can find
it here: https://github.com/mjsax/flink/tree/flink-storm-compatibility


-Matthias





signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Storm Compatibility Improvement

Gyula Fóra
Hey,
I didn't look through the whole code so I probably don't get something but
why don't you just do what storm does? Keep a map from the field names to
indexes somewhere (make this accessible from the tuple) and then you can
just use a simple Flink tuple.

I think this is what's happening in storm, they get the index from the
context, which knows the declared output fields.

Gyula

Matthias J. Sax <[hidden email]> ezt írta (időpont: 2015.
jún. 29., H, 18:08):

> Hi,
>
> I started to work on a missing feature for the Storm compatibility
> layer: named attribute access
>
> In Storm, each attribute of an input tuple can be accessed via index or
> by name. Currently, only index access is supported. In order to support
> this feature in Flink (embedded Bolt in Flink program), I see two
> (independent and complementary) ways to support this feature:
>
>  1) the input type is a POJO
>  2) Flink's Tuple type is extended to support named attributes
>
> Right now I started a prototype for POJOs. I would like to extend Tuple
> type with named attributes. However, I am not sure how the community
> likes this idea.
>
> I would like to get some feedback for the POJO prototype, too. I use
> reflections and I am not sure if my code is elegant enough. You can find
> it here: https://github.com/mjsax/flink/tree/flink-storm-compatibility
>
>
> -Matthias
>
>
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Storm Compatibility Improvement

Matthias J. Sax
Well. If a whole Storm topology is executed, this is of course the way
to got. However, I want to have named-attribute access in the case of an
embedded bolt (as a single operator) in a Flink program. And is this
case, fields are not declared and do not have a name (eg, if the bolt's
consumers emits a stream of type Tuple3)

-Matthias


On 06/29/2015 11:42 PM, Gyula Fóra wrote:

> Hey,
> I didn't look through the whole code so I probably don't get something but
> why don't you just do what storm does? Keep a map from the field names to
> indexes somewhere (make this accessible from the tuple) and then you can
> just use a simple Flink tuple.
>
> I think this is what's happening in storm, they get the index from the
> context, which knows the declared output fields.
>
> Gyula
>
> Matthias J. Sax <[hidden email]> ezt írta (időpont: 2015.
> jún. 29., H, 18:08):
>
>> Hi,
>>
>> I started to work on a missing feature for the Storm compatibility
>> layer: named attribute access
>>
>> In Storm, each attribute of an input tuple can be accessed via index or
>> by name. Currently, only index access is supported. In order to support
>> this feature in Flink (embedded Bolt in Flink program), I see two
>> (independent and complementary) ways to support this feature:
>>
>>  1) the input type is a POJO
>>  2) Flink's Tuple type is extended to support named attributes
>>
>> Right now I started a prototype for POJOs. I would like to extend Tuple
>> type with named attributes. However, I am not sure how the community
>> likes this idea.
>>
>> I would like to get some feedback for the POJO prototype, too. I use
>> reflections and I am not sure if my code is elegant enough. You can find
>> it here: https://github.com/mjsax/flink/tree/flink-storm-compatibility
>>
>>
>> -Matthias
>>
>>
>>
>>
>>
>


signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Storm Compatibility Improvement

Gyula Fóra
Ah ok, now I get what I didn't get before :)

So you want to take some input stream , and execute a bolt implementation
on it. And the question is what input type to assume when the user wants to
use field name based access.

Can't we force the user to declare the names of the inputs/outputs even in
this case? Otherwise there is not much we can do. Maybe stick to either
public fields or getter setters.


Matthias J. Sax <[hidden email]> ezt írta (időpont: 2015.
jún. 29., H, 23:51):

> Well. If a whole Storm topology is executed, this is of course the way
> to got. However, I want to have named-attribute access in the case of an
> embedded bolt (as a single operator) in a Flink program. And is this
> case, fields are not declared and do not have a name (eg, if the bolt's
> consumers emits a stream of type Tuple3)
>
> -Matthias
>
>
> On 06/29/2015 11:42 PM, Gyula Fóra wrote:
> > Hey,
> > I didn't look through the whole code so I probably don't get something
> but
> > why don't you just do what storm does? Keep a map from the field names to
> > indexes somewhere (make this accessible from the tuple) and then you can
> > just use a simple Flink tuple.
> >
> > I think this is what's happening in storm, they get the index from the
> > context, which knows the declared output fields.
> >
> > Gyula
> >
> > Matthias J. Sax <[hidden email]> ezt írta (időpont: 2015.
> > jún. 29., H, 18:08):
> >
> >> Hi,
> >>
> >> I started to work on a missing feature for the Storm compatibility
> >> layer: named attribute access
> >>
> >> In Storm, each attribute of an input tuple can be accessed via index or
> >> by name. Currently, only index access is supported. In order to support
> >> this feature in Flink (embedded Bolt in Flink program), I see two
> >> (independent and complementary) ways to support this feature:
> >>
> >>  1) the input type is a POJO
> >>  2) Flink's Tuple type is extended to support named attributes
> >>
> >> Right now I started a prototype for POJOs. I would like to extend Tuple
> >> type with named attributes. However, I am not sure how the community
> >> likes this idea.
> >>
> >> I would like to get some feedback for the POJO prototype, too. I use
> >> reflections and I am not sure if my code is elegant enough. You can find
> >> it here: https://github.com/mjsax/flink/tree/flink-storm-compatibility
> >>
> >>
> >> -Matthias
> >>
> >>
> >>
> >>
> >>
> >
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Storm Compatibility Improvement

Gyula Fóra
By declare I mean we assume a Flink Tuple datatype and the user declares
the name mapping (sorry its getting late).

Gyula Fóra <[hidden email]> ezt írta (időpont: 2015. jún. 29., H,
23:57):

> Ah ok, now I get what I didn't get before :)
>
> So you want to take some input stream , and execute a bolt implementation
> on it. And the question is what input type to assume when the user wants to
> use field name based access.
>
> Can't we force the user to declare the names of the inputs/outputs even in
> this case? Otherwise there is not much we can do. Maybe stick to either
> public fields or getter setters.
>
>
> Matthias J. Sax <[hidden email]> ezt írta (időpont: 2015.
> jún. 29., H, 23:51):
>
>> Well. If a whole Storm topology is executed, this is of course the way
>> to got. However, I want to have named-attribute access in the case of an
>> embedded bolt (as a single operator) in a Flink program. And is this
>> case, fields are not declared and do not have a name (eg, if the bolt's
>> consumers emits a stream of type Tuple3)
>>
>> -Matthias
>>
>>
>> On 06/29/2015 11:42 PM, Gyula Fóra wrote:
>> > Hey,
>> > I didn't look through the whole code so I probably don't get something
>> but
>> > why don't you just do what storm does? Keep a map from the field names
>> to
>> > indexes somewhere (make this accessible from the tuple) and then you can
>> > just use a simple Flink tuple.
>> >
>> > I think this is what's happening in storm, they get the index from the
>> > context, which knows the declared output fields.
>> >
>> > Gyula
>> >
>> > Matthias J. Sax <[hidden email]> ezt írta (időpont:
>> 2015.
>> > jún. 29., H, 18:08):
>> >
>> >> Hi,
>> >>
>> >> I started to work on a missing feature for the Storm compatibility
>> >> layer: named attribute access
>> >>
>> >> In Storm, each attribute of an input tuple can be accessed via index or
>> >> by name. Currently, only index access is supported. In order to support
>> >> this feature in Flink (embedded Bolt in Flink program), I see two
>> >> (independent and complementary) ways to support this feature:
>> >>
>> >>  1) the input type is a POJO
>> >>  2) Flink's Tuple type is extended to support named attributes
>> >>
>> >> Right now I started a prototype for POJOs. I would like to extend Tuple
>> >> type with named attributes. However, I am not sure how the community
>> >> likes this idea.
>> >>
>> >> I would like to get some feedback for the POJO prototype, too. I use
>> >> reflections and I am not sure if my code is elegant enough. You can
>> find
>> >> it here: https://github.com/mjsax/flink/tree/flink-storm-compatibility
>> >>
>> >>
>> >> -Matthias
>> >>
>> >>
>> >>
>> >>
>> >>
>> >
>>
>>
Reply | Threaded
Open this post in threaded view
|

Re: Storm Compatibility Improvement

Matthias J. Sax
That would also work. I thought about it already, too. Thanks for the
feedback. If two people have similar idea, it might be the right way to
got. I will just include all this stuff and open an PR. Than we can
evaluate it again.

-Matthias

On 06/30/2015 12:01 AM, Gyula Fóra wrote:

> By declare I mean we assume a Flink Tuple datatype and the user declares
> the name mapping (sorry its getting late).
>
> Gyula Fóra <[hidden email]> ezt írta (időpont: 2015. jún. 29., H,
> 23:57):
>
>> Ah ok, now I get what I didn't get before :)
>>
>> So you want to take some input stream , and execute a bolt implementation
>> on it. And the question is what input type to assume when the user wants to
>> use field name based access.
>>
>> Can't we force the user to declare the names of the inputs/outputs even in
>> this case? Otherwise there is not much we can do. Maybe stick to either
>> public fields or getter setters.
>>
>>
>> Matthias J. Sax <[hidden email]> ezt írta (időpont: 2015.
>> jún. 29., H, 23:51):
>>
>>> Well. If a whole Storm topology is executed, this is of course the way
>>> to got. However, I want to have named-attribute access in the case of an
>>> embedded bolt (as a single operator) in a Flink program. And is this
>>> case, fields are not declared and do not have a name (eg, if the bolt's
>>> consumers emits a stream of type Tuple3)
>>>
>>> -Matthias
>>>
>>>
>>> On 06/29/2015 11:42 PM, Gyula Fóra wrote:
>>>> Hey,
>>>> I didn't look through the whole code so I probably don't get something
>>> but
>>>> why don't you just do what storm does? Keep a map from the field names
>>> to
>>>> indexes somewhere (make this accessible from the tuple) and then you can
>>>> just use a simple Flink tuple.
>>>>
>>>> I think this is what's happening in storm, they get the index from the
>>>> context, which knows the declared output fields.
>>>>
>>>> Gyula
>>>>
>>>> Matthias J. Sax <[hidden email]> ezt írta (időpont:
>>> 2015.
>>>> jún. 29., H, 18:08):
>>>>
>>>>> Hi,
>>>>>
>>>>> I started to work on a missing feature for the Storm compatibility
>>>>> layer: named attribute access
>>>>>
>>>>> In Storm, each attribute of an input tuple can be accessed via index or
>>>>> by name. Currently, only index access is supported. In order to support
>>>>> this feature in Flink (embedded Bolt in Flink program), I see two
>>>>> (independent and complementary) ways to support this feature:
>>>>>
>>>>>  1) the input type is a POJO
>>>>>  2) Flink's Tuple type is extended to support named attributes
>>>>>
>>>>> Right now I started a prototype for POJOs. I would like to extend Tuple
>>>>> type with named attributes. However, I am not sure how the community
>>>>> likes this idea.
>>>>>
>>>>> I would like to get some feedback for the POJO prototype, too. I use
>>>>> reflections and I am not sure if my code is elegant enough. You can
>>> find
>>>>> it here: https://github.com/mjsax/flink/tree/flink-storm-compatibility
>>>>>
>>>>>
>>>>> -Matthias
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>


signature.asc (836 bytes) Download Attachment