[jira] [Created] (FLINK-11141) Key generation for RocksDBMapState can theoretically be ambiguous

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (FLINK-11141) Key generation for RocksDBMapState can theoretically be ambiguous

Shang Yuanchun (Jira)
Stefan Richter created FLINK-11141:
--------------------------------------

             Summary: Key generation for RocksDBMapState can theoretically be ambiguous
                 Key: FLINK-11141
                 URL: https://issues.apache.org/jira/browse/FLINK-11141
             Project: Flink
          Issue Type: Bug
          Components: State Backends, Checkpointing
    Affects Versions: 1.7.0, 1.6.2, 1.5.5
            Reporter: Stefan Richter


RocksDBMap state stores values in RocksDB under a composite key from the serialized bytes of {{key-group-id|key|namespace|user-key}}. In this composition, key, namespace, and user-key can either have fixed sized or variable sized serialization formats. In cases of at least 2 variable formats, ambiguity can be possible, e.g.:

abcd <-> efg
abc <-> defg

Our code takes care of this for all other states, where composite keys only consist of key and namespace by checking for 2x variable size and appending the serialized length to each byte sequence.

However, for map state there is no inclusion of the user-key in the check for potential ambiguity, as well as for appending the size. This means that, in theory, some combinations can produce colliding composite keys in RocksDB. What is required is to include the user-key serializer in the check and append the length there as well.

Please notice that this cannot be simply changed because it has implications for backwards compatibility and requires some form of migration for the state keys on restore.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)