A little help on (FLINK-1725) New Partitioner for better load balancing for skewed data
Hi,
I am trying to implement a variation of the issue FLINK-1725 [1].
Basically, I want to have the frequency of the keys, and if there is
skewness I will split this key into different key-groups.
My first tests (word count) are working. I introduced this new operator
(keyByPartial) [2][3], then I decide at the KeyGroupStreamPartitioner which
strategy to pick [4].
My problem is that I am using two "IFrequency frequency" objects in two
different places. The first is at the AbstractKeyedStateBackend [5] and the
second is in the StreamPartitioner [6]. Then I compute their result at
the KeyGroupRangeAssignment [7]. I would like to have just one instance of
my "IFrequency frequency" for the keyByPartial transformation. Is that
possible? Where do you suggest to put it?
If I was not very clear on my question, please let me know.