[DISCUSS] Refactor StateBackend into Partitioned State and Non-Partitioned State Backends

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

[DISCUSS] Refactor StateBackend into Partitioned State and Non-Partitioned State Backends

Aljoscha Krettek-2
Hi,
I’m currently examining ways to 1) change the window operators to use the partitioned state abstraction for window state and 2) implement state backends for managed memory/out-of-core state.

I think it would be helpful to pull the state backend apart. Right now, for example, the DbStateBackend has a custom way of specifying another state backend that should be used for non-partitioned state since a data base really only makes sense for partitioned state. I was thinking about adding a state backend based on RocksDB, which would also only make sense for partitioned state. Pulling the two ways of state apart would allow the implementation to focus on the important parts and give the user flexibility without requiring every state backend to implement this.

What do you think about pulling the back ends apart?


Aljoscha

P.S. I have a prototype WindowOperator on partitioned state that does not regress in performance compared to the current WindowOperator. Also, I have a prototype RocksDB state backend. Here, the performance is about 1/10th compared to using the in-memory state backend (with 100.000 keys) but it scales way better (with the in-memory state backend performance goes down when increasing the number of keys while it stays constant with RocksDB). This is quite nice since it allows to use the same windowing code while exchanging the state backend based on the job requirements.
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Refactor StateBackend into Partitioned State and Non-Partitioned State Backends

Gyula Fóra-2
Hi,

+1

I think it would be a good idea to separate the 2 state backends. I think
you are right in most cases the new partitioned state implementations will
benefit from this as it removes a lot of additional overhead (although
sometimes it's nice to have the 2 together, for instance if they both use
the filesystem) :)

Cheers,
Gyula

Aljoscha Krettek <[hidden email]> ezt írta (időpont: 2016. jan. 7.,
Cs, 11:02):

> Hi,
> I’m currently examining ways to 1) change the window operators to use the
> partitioned state abstraction for window state and 2) implement state
> backends for managed memory/out-of-core state.
>
> I think it would be helpful to pull the state backend apart. Right now,
> for example, the DbStateBackend has a custom way of specifying another
> state backend that should be used for non-partitioned state since a data
> base really only makes sense for partitioned state. I was thinking about
> adding a state backend based on RocksDB, which would also only make sense
> for partitioned state. Pulling the two ways of state apart would allow the
> implementation to focus on the important parts and give the user
> flexibility without requiring every state backend to implement this.
>
> What do you think about pulling the back ends apart?
>
> —
> Aljoscha
>
> P.S. I have a prototype WindowOperator on partitioned state that does not
> regress in performance compared to the current WindowOperator. Also, I have
> a prototype RocksDB state backend. Here, the performance is about 1/10th
> compared to using the in-memory state backend (with 100.000 keys) but it
> scales way better (with the in-memory state backend performance goes down
> when increasing the number of keys while it stays constant with RocksDB).
> This is quite nice since it allows to use the same windowing code while
> exchanging the state backend based on the job requirements.