Hej,
I have a Flink job with with a filter step. I now have a list of exceptions where I need to do some extra work (300k data). I thought I just use a boradcast set and then for each like compare if its in the exception set. What is the best way to implement this in Flink? Is there an efficient way of checking if a certain element is in the Broadcastset? Or can I somehow dump the Broadcastset into a set? cheers Martin |
Hej!
Yes, the "getRuntimeEnvironment().getBroadcastVariable() returns a list, which you can add to set: // in the function: private Set<T> specials; public void open(Configuration conf) { List<T> bc = getRuntimeContect().getBroadcastVariable("the-bc-var-name"); specials = new HashSet<T>(bc); } Is that what you had in mind? Stephan On Mon, Oct 6, 2014 at 11:57 AM, Martin Neumann <[hidden email]> wrote: > Hej, > > I have a Flink job with with a filter step. I now have a list of exceptions > where I need to do some extra work (300k data). I thought I just use a > boradcast set and then for each like compare if its in the exception set. > > What is the best way to implement this in Flink? Is there an efficient way > of checking if a certain element is in the Broadcastset? Or can I somehow > dump the Broadcastset into a set? > > cheers Martin > |
Yes, thanks.
I was wondering if I can check directly in the broadcast set but since I have to get it local anyway it should be not to much overhead. cheers Martin On Mon, Oct 6, 2014 at 12:34 PM, Stephan Ewen <[hidden email]> wrote: > Hej! > > Yes, the "getRuntimeEnvironment().getBroadcastVariable() returns a list, > which you can add to set: > > > // in the function: > > private Set<T> specials; > > public void open(Configuration conf) { > List<T> bc = > getRuntimeContect().getBroadcastVariable("the-bc-var-name"); > specials = new HashSet<T>(bc); > } > > > > Is that what you had in mind? > > Stephan > > > On Mon, Oct 6, 2014 at 11:57 AM, Martin Neumann <[hidden email]> > wrote: > > > Hej, > > > > I have a Flink job with with a filter step. I now have a list of > exceptions > > where I need to do some extra work (300k data). I thought I just use a > > boradcast set and then for each like compare if its in the exception set. > > > > What is the best way to implement this in Flink? Is there an efficient > way > > of checking if a certain element is in the Broadcastset? Or can I somehow > > dump the Broadcastset into a set? > > > > cheers Martin > > > |
You actually can check directly in the broadcast set. But since it is a
list, searching in is slower than in a hash set. That is all... On Mon, Oct 6, 2014 at 1:09 PM, Martin Neumann <[hidden email]> wrote: > Yes, thanks. > > I was wondering if I can check directly in the broadcast set but since I > have to get it local anyway it should be not to much overhead. > > cheers Martin > > On Mon, Oct 6, 2014 at 12:34 PM, Stephan Ewen <[hidden email]> wrote: > > > Hej! > > > > Yes, the "getRuntimeEnvironment().getBroadcastVariable() returns a list, > > which you can add to set: > > > > > > // in the function: > > > > private Set<T> specials; > > > > public void open(Configuration conf) { > > List<T> bc = > > getRuntimeContect().getBroadcastVariable("the-bc-var-name"); > > specials = new HashSet<T>(bc); > > } > > > > > > > > Is that what you had in mind? > > > > Stephan > > > > > > On Mon, Oct 6, 2014 at 11:57 AM, Martin Neumann <[hidden email]> > > wrote: > > > > > Hej, > > > > > > I have a Flink job with with a filter step. I now have a list of > > exceptions > > > where I need to do some extra work (300k data). I thought I just use a > > > boradcast set and then for each like compare if its in the exception > set. > > > > > > What is the best way to implement this in Flink? Is there an efficient > > way > > > of checking if a certain element is in the Broadcastset? Or can I > somehow > > > dump the Broadcastset into a set? > > > > > > cheers Martin > > > > > > |
Free forum by Nabble | Edit this page |