[Discuss] Questions about SortedMapState in Stream API

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

[Discuss] Questions about SortedMapState in Stream API

Sean Z
Hi Flink Community,

I'm new to this community but have been using Flink for a year or so. As a
user, my team built stuff upon Flink stateful stream processing. We used to
have a use case that we need a sorted map data structure to store our
state, something like TreeMap in Java, to query higher/lower keys, do range
queries, and etc. However, currently, we only have a MapState interface in
Flink. We found RocksDB is kind of sorted storage in nature. To achieve
our use case, we have to do some hacky tricks to bypass current limitations
and use lower RocksDB features as a workaround to implement our own
SortedMapState. I assume that there should be lots of other users who have
the same use case, so I have a few questions here.

1. Do we have a feature like SortedMapState already in place, in
development or in the future roadmap?

2. If not, could that be a good feature to have? and are there any other
concerns?

Just want to start a discussion here from a user perspective. If everything
goes well, we are also interested in contributing back to the community.


Best regards,
Sean
Reply | Threaded
Open this post in threaded view
|

Re: [Discuss] Questions about SortedMapState in Stream API

Congxian Qiu
Hi  Sean
   AFAIK, Flink does not support SortedMapState currently, there was an
issue FLINK-6219 about sorted mapstate before. I think you can reach to
this issue if you want sorted mapstate.
  Sorted MapState for RocksDB can be challenging for Java-based comparator,
RocksDB supports the bytes-comparator only.

Best,
Congxian


Sean Z <[hidden email]> 于2020年7月24日周五 上午3:36写道:

> Hi Flink Community,
>
> I'm new to this community but have been using Flink for a year or so. As a
> user, my team built stuff upon Flink stateful stream processing. We used to
> have a use case that we need a sorted map data structure to store our
> state, something like TreeMap in Java, to query higher/lower keys, do range
> queries, and etc. However, currently, we only have a MapState interface in
> Flink. We found RocksDB is kind of sorted storage in nature. To achieve
> our use case, we have to do some hacky tricks to bypass current limitations
> and use lower RocksDB features as a workaround to implement our own
> SortedMapState. I assume that there should be lots of other users who have
> the same use case, so I have a few questions here.
>
> 1. Do we have a feature like SortedMapState already in place, in
> development or in the future roadmap?
>
> 2. If not, could that be a good feature to have? and are there any other
> concerns?
>
> Just want to start a discussion here from a user perspective. If everything
> goes well, we are also interested in contributing back to the community.
>
>
> Best regards,
> Sean
>
Reply | Threaded
Open this post in threaded view
|

Re: [Discuss] Questions about SortedMapState in Stream API

Sean Z
Thanks for the reply.

Yes, it's true that RocksDB only supports bytes-comparator which makes it
challenging.
I checked FLINK-6219 has already been closed with reason
"The thing is this issue will be addressed after blink code gets merged,
and we will open another Jira to track the requirements. "
However, it looks like it's still not resolved yet. Not sure if the blink
merge is still ongoing.
Could we bring this back to the table?

Best,
Sean

On Sat, Jul 25, 2020 at 4:56 AM Congxian Qiu <[hidden email]> wrote:

> Hi  Sean
>    AFAIK, Flink does not support SortedMapState currently, there was an
> issue FLINK-6219 about sorted mapstate before. I think you can reach to
> this issue if you want sorted mapstate.
>   Sorted MapState for RocksDB can be challenging for Java-based comparator,
> RocksDB supports the bytes-comparator only.
>
> Best,
> Congxian
>
>
> Sean Z <[hidden email]> 于2020年7月24日周五 上午3:36写道:
>
> > Hi Flink Community,
> >
> > I'm new to this community but have been using Flink for a year or so. As
> a
> > user, my team built stuff upon Flink stateful stream processing. We used
> to
> > have a use case that we need a sorted map data structure to store our
> > state, something like TreeMap in Java, to query higher/lower keys, do
> range
> > queries, and etc. However, currently, we only have a MapState interface
> in
> > Flink. We found RocksDB is kind of sorted storage in nature. To achieve
> > our use case, we have to do some hacky tricks to bypass current
> limitations
> > and use lower RocksDB features as a workaround to implement our own
> > SortedMapState. I assume that there should be lots of other users who
> have
> > the same use case, so I have a few questions here.
> >
> > 1. Do we have a feature like SortedMapState already in place, in
> > development or in the future roadmap?
> >
> > 2. If not, could that be a good feature to have? and are there any other
> > concerns?
> >
> > Just want to start a discussion here from a user perspective. If
> everything
> > goes well, we are also interested in contributing back to the community.
> >
> >
> > Best regards,
> > Sean
> >
>