Hi,
I started to work on a missing feature for the Storm compatibility layer: named attribute access In Storm, each attribute of an input tuple can be accessed via index or by name. Currently, only index access is supported. In order to support this feature in Flink (embedded Bolt in Flink program), I see two (independent and complementary) ways to support this feature: 1) the input type is a POJO 2) Flink's Tuple type is extended to support named attributes Right now I started a prototype for POJOs. I would like to extend Tuple type with named attributes. However, I am not sure how the community likes this idea. I would like to get some feedback for the POJO prototype, too. I use reflections and I am not sure if my code is elegant enough. You can find it here: https://github.com/mjsax/flink/tree/flink-storm-compatibility -Matthias |
Hey,
I didn't look through the whole code so I probably don't get something but why don't you just do what storm does? Keep a map from the field names to indexes somewhere (make this accessible from the tuple) and then you can just use a simple Flink tuple. I think this is what's happening in storm, they get the index from the context, which knows the declared output fields. Gyula Matthias J. Sax <[hidden email]> ezt írta (időpont: 2015. jún. 29., H, 18:08): > Hi, > > I started to work on a missing feature for the Storm compatibility > layer: named attribute access > > In Storm, each attribute of an input tuple can be accessed via index or > by name. Currently, only index access is supported. In order to support > this feature in Flink (embedded Bolt in Flink program), I see two > (independent and complementary) ways to support this feature: > > 1) the input type is a POJO > 2) Flink's Tuple type is extended to support named attributes > > Right now I started a prototype for POJOs. I would like to extend Tuple > type with named attributes. However, I am not sure how the community > likes this idea. > > I would like to get some feedback for the POJO prototype, too. I use > reflections and I am not sure if my code is elegant enough. You can find > it here: https://github.com/mjsax/flink/tree/flink-storm-compatibility > > > -Matthias > > > > > |
Well. If a whole Storm topology is executed, this is of course the way
to got. However, I want to have named-attribute access in the case of an embedded bolt (as a single operator) in a Flink program. And is this case, fields are not declared and do not have a name (eg, if the bolt's consumers emits a stream of type Tuple3) -Matthias On 06/29/2015 11:42 PM, Gyula Fóra wrote: > Hey, > I didn't look through the whole code so I probably don't get something but > why don't you just do what storm does? Keep a map from the field names to > indexes somewhere (make this accessible from the tuple) and then you can > just use a simple Flink tuple. > > I think this is what's happening in storm, they get the index from the > context, which knows the declared output fields. > > Gyula > > Matthias J. Sax <[hidden email]> ezt írta (időpont: 2015. > jún. 29., H, 18:08): > >> Hi, >> >> I started to work on a missing feature for the Storm compatibility >> layer: named attribute access >> >> In Storm, each attribute of an input tuple can be accessed via index or >> by name. Currently, only index access is supported. In order to support >> this feature in Flink (embedded Bolt in Flink program), I see two >> (independent and complementary) ways to support this feature: >> >> 1) the input type is a POJO >> 2) Flink's Tuple type is extended to support named attributes >> >> Right now I started a prototype for POJOs. I would like to extend Tuple >> type with named attributes. However, I am not sure how the community >> likes this idea. >> >> I would like to get some feedback for the POJO prototype, too. I use >> reflections and I am not sure if my code is elegant enough. You can find >> it here: https://github.com/mjsax/flink/tree/flink-storm-compatibility >> >> >> -Matthias >> >> >> >> >> > |
Ah ok, now I get what I didn't get before :)
So you want to take some input stream , and execute a bolt implementation on it. And the question is what input type to assume when the user wants to use field name based access. Can't we force the user to declare the names of the inputs/outputs even in this case? Otherwise there is not much we can do. Maybe stick to either public fields or getter setters. Matthias J. Sax <[hidden email]> ezt írta (időpont: 2015. jún. 29., H, 23:51): > Well. If a whole Storm topology is executed, this is of course the way > to got. However, I want to have named-attribute access in the case of an > embedded bolt (as a single operator) in a Flink program. And is this > case, fields are not declared and do not have a name (eg, if the bolt's > consumers emits a stream of type Tuple3) > > -Matthias > > > On 06/29/2015 11:42 PM, Gyula Fóra wrote: > > Hey, > > I didn't look through the whole code so I probably don't get something > but > > why don't you just do what storm does? Keep a map from the field names to > > indexes somewhere (make this accessible from the tuple) and then you can > > just use a simple Flink tuple. > > > > I think this is what's happening in storm, they get the index from the > > context, which knows the declared output fields. > > > > Gyula > > > > Matthias J. Sax <[hidden email]> ezt írta (időpont: 2015. > > jún. 29., H, 18:08): > > > >> Hi, > >> > >> I started to work on a missing feature for the Storm compatibility > >> layer: named attribute access > >> > >> In Storm, each attribute of an input tuple can be accessed via index or > >> by name. Currently, only index access is supported. In order to support > >> this feature in Flink (embedded Bolt in Flink program), I see two > >> (independent and complementary) ways to support this feature: > >> > >> 1) the input type is a POJO > >> 2) Flink's Tuple type is extended to support named attributes > >> > >> Right now I started a prototype for POJOs. I would like to extend Tuple > >> type with named attributes. However, I am not sure how the community > >> likes this idea. > >> > >> I would like to get some feedback for the POJO prototype, too. I use > >> reflections and I am not sure if my code is elegant enough. You can find > >> it here: https://github.com/mjsax/flink/tree/flink-storm-compatibility > >> > >> > >> -Matthias > >> > >> > >> > >> > >> > > > > |
By declare I mean we assume a Flink Tuple datatype and the user declares
the name mapping (sorry its getting late). Gyula Fóra <[hidden email]> ezt írta (időpont: 2015. jún. 29., H, 23:57): > Ah ok, now I get what I didn't get before :) > > So you want to take some input stream , and execute a bolt implementation > on it. And the question is what input type to assume when the user wants to > use field name based access. > > Can't we force the user to declare the names of the inputs/outputs even in > this case? Otherwise there is not much we can do. Maybe stick to either > public fields or getter setters. > > > Matthias J. Sax <[hidden email]> ezt írta (időpont: 2015. > jún. 29., H, 23:51): > >> Well. If a whole Storm topology is executed, this is of course the way >> to got. However, I want to have named-attribute access in the case of an >> embedded bolt (as a single operator) in a Flink program. And is this >> case, fields are not declared and do not have a name (eg, if the bolt's >> consumers emits a stream of type Tuple3) >> >> -Matthias >> >> >> On 06/29/2015 11:42 PM, Gyula Fóra wrote: >> > Hey, >> > I didn't look through the whole code so I probably don't get something >> but >> > why don't you just do what storm does? Keep a map from the field names >> to >> > indexes somewhere (make this accessible from the tuple) and then you can >> > just use a simple Flink tuple. >> > >> > I think this is what's happening in storm, they get the index from the >> > context, which knows the declared output fields. >> > >> > Gyula >> > >> > Matthias J. Sax <[hidden email]> ezt írta (időpont: >> 2015. >> > jún. 29., H, 18:08): >> > >> >> Hi, >> >> >> >> I started to work on a missing feature for the Storm compatibility >> >> layer: named attribute access >> >> >> >> In Storm, each attribute of an input tuple can be accessed via index or >> >> by name. Currently, only index access is supported. In order to support >> >> this feature in Flink (embedded Bolt in Flink program), I see two >> >> (independent and complementary) ways to support this feature: >> >> >> >> 1) the input type is a POJO >> >> 2) Flink's Tuple type is extended to support named attributes >> >> >> >> Right now I started a prototype for POJOs. I would like to extend Tuple >> >> type with named attributes. However, I am not sure how the community >> >> likes this idea. >> >> >> >> I would like to get some feedback for the POJO prototype, too. I use >> >> reflections and I am not sure if my code is elegant enough. You can >> find >> >> it here: https://github.com/mjsax/flink/tree/flink-storm-compatibility >> >> >> >> >> >> -Matthias >> >> >> >> >> >> >> >> >> >> >> > >> >> |
That would also work. I thought about it already, too. Thanks for the
feedback. If two people have similar idea, it might be the right way to got. I will just include all this stuff and open an PR. Than we can evaluate it again. -Matthias On 06/30/2015 12:01 AM, Gyula Fóra wrote: > By declare I mean we assume a Flink Tuple datatype and the user declares > the name mapping (sorry its getting late). > > Gyula Fóra <[hidden email]> ezt írta (időpont: 2015. jún. 29., H, > 23:57): > >> Ah ok, now I get what I didn't get before :) >> >> So you want to take some input stream , and execute a bolt implementation >> on it. And the question is what input type to assume when the user wants to >> use field name based access. >> >> Can't we force the user to declare the names of the inputs/outputs even in >> this case? Otherwise there is not much we can do. Maybe stick to either >> public fields or getter setters. >> >> >> Matthias J. Sax <[hidden email]> ezt írta (időpont: 2015. >> jún. 29., H, 23:51): >> >>> Well. If a whole Storm topology is executed, this is of course the way >>> to got. However, I want to have named-attribute access in the case of an >>> embedded bolt (as a single operator) in a Flink program. And is this >>> case, fields are not declared and do not have a name (eg, if the bolt's >>> consumers emits a stream of type Tuple3) >>> >>> -Matthias >>> >>> >>> On 06/29/2015 11:42 PM, Gyula Fóra wrote: >>>> Hey, >>>> I didn't look through the whole code so I probably don't get something >>> but >>>> why don't you just do what storm does? Keep a map from the field names >>> to >>>> indexes somewhere (make this accessible from the tuple) and then you can >>>> just use a simple Flink tuple. >>>> >>>> I think this is what's happening in storm, they get the index from the >>>> context, which knows the declared output fields. >>>> >>>> Gyula >>>> >>>> Matthias J. Sax <[hidden email]> ezt írta (időpont: >>> 2015. >>>> jún. 29., H, 18:08): >>>> >>>>> Hi, >>>>> >>>>> I started to work on a missing feature for the Storm compatibility >>>>> layer: named attribute access >>>>> >>>>> In Storm, each attribute of an input tuple can be accessed via index or >>>>> by name. Currently, only index access is supported. In order to support >>>>> this feature in Flink (embedded Bolt in Flink program), I see two >>>>> (independent and complementary) ways to support this feature: >>>>> >>>>> 1) the input type is a POJO >>>>> 2) Flink's Tuple type is extended to support named attributes >>>>> >>>>> Right now I started a prototype for POJOs. I would like to extend Tuple >>>>> type with named attributes. However, I am not sure how the community >>>>> likes this idea. >>>>> >>>>> I would like to get some feedback for the POJO prototype, too. I use >>>>> reflections and I am not sure if my code is elegant enough. You can >>> find >>>>> it here: https://github.com/mjsax/flink/tree/flink-storm-compatibility >>>>> >>>>> >>>>> -Matthias >>>>> >>>>> >>>>> >>>>> >>>>> >>>> >>> >>> > |
Free forum by Nabble | Edit this page |