My assumption is that this was a sanity check that actually just stuck in
the code. It can probably be removed. PS: Moving this to the [hidden email] list... On Fri, Jul 15, 2016 at 11:05 AM, 刘彪 <[hidden email]> wrote: > In AbstractRocksDBState.writeKeyAndNamespace(): > > protected void writeKeyAndNamespace(DataOutputView out) throws IOException > { > backend.keySerializer().serialize(backend.currentKey(), out); > out.writeByte(42); > namespaceSerializer.serialize(currentNamespace, out); > } > > Why write a byte 42 between key and namespace? The keySerializer and > namespaceSerializer know their lengths. It seems we don't need this byte. > > Could anybody tell me what it is for? Is there any situation that we must > have this separator? > |
I left that in on purpose to protect against cases where the combination of
key and namespace can be ambiguous. For example, these two combinations of key and namespace have the same written representation: key [0 1 2] namespace [3 4 5] (values in brackets are byte arrays) key [0 1] namespace [2 3 4 5] having the "magic number" in there protects against such cases. On Fri, 15 Jul 2016 at 16:31 Stephan Ewen <[hidden email]> wrote: > My assumption is that this was a sanity check that actually just stuck in > the code. > > It can probably be removed. > > PS: Moving this to the [hidden email] list... > > > > On Fri, Jul 15, 2016 at 11:05 AM, 刘彪 <[hidden email]> wrote: > > > In AbstractRocksDBState.writeKeyAndNamespace(): > > > > protected void writeKeyAndNamespace(DataOutputView out) throws > IOException > > { > > backend.keySerializer().serialize(backend.currentKey(), out); > > out.writeByte(42); > > namespaceSerializer.serialize(currentNamespace, out); > > } > > > > Why write a byte 42 between key and namespace? The keySerializer and > > namespaceSerializer know their lengths. It seems we don't need this byte. > > > > Could anybody tell me what it is for? Is there any situation that we > must > > have this separator? > > > |
Every serializer should know how many bytes to consume. The key serializer
should not need to look for 42 to know where to terminate. Otherwise this would be a problem case: key[42, 42] - 42 - namespace [42, 42, 42] key[42, 42, 42] - 42 - namespace [42, 42] On Fri, Jul 15, 2016 at 5:38 PM, Aljoscha Krettek <[hidden email]> wrote: > I left that in on purpose to protect against cases where the combination > of key and namespace can be ambiguous. For example, these two combinations > of key and namespace have the same written representation: > key [0 1 2] namespace [3 4 5] (values in brackets are byte arrays) > key [0 1] namespace [2 3 4 5] > > having the "magic number" in there protects against such cases. > > On Fri, 15 Jul 2016 at 16:31 Stephan Ewen <[hidden email]> wrote: > >> My assumption is that this was a sanity check that actually just stuck in >> the code. >> >> It can probably be removed. >> >> PS: Moving this to the [hidden email] list... >> >> >> >> On Fri, Jul 15, 2016 at 11:05 AM, 刘彪 <[hidden email]> wrote: >> >> > In AbstractRocksDBState.writeKeyAndNamespace(): >> > >> > protected void writeKeyAndNamespace(DataOutputView out) throws >> IOException >> > { >> > backend.keySerializer().serialize(backend.currentKey(), out); >> > out.writeByte(42); >> > namespaceSerializer.serialize(currentNamespace, out); >> > } >> > >> > Why write a byte 42 between key and namespace? The keySerializer and >> > namespaceSerializer know their lengths. It seems we don't need this >> byte. >> > >> > Could anybody tell me what it is for? Is there any situation that we >> must >> > have this separator? >> > >> > |
I've faced a similar issue when serializing data two a key value store. Not
sure how helpful it is for this case but two possible solutions I've used for persisting keys and values under different namespaces to the same key value store are: - have all namespaces be the same number of bytes and prefix each key with its namespace. - Include the number of bytes in the name space and key. So the bytes would look like this: [name space num bytes] [ name space] [key num bytes] [key] Thanks, Tim On Fri, Jul 15, 2016 at 9:45 AM, Stephan Ewen <[hidden email]> wrote: > Every serializer should know how many bytes to consume. The key serializer > should not need to look for 42 to know where to terminate. > > Otherwise this would be a problem case: > key[42, 42] - 42 - namespace [42, 42, 42] > key[42, 42, 42] - 42 - namespace [42, 42] > > > > On Fri, Jul 15, 2016 at 5:38 PM, Aljoscha Krettek <[hidden email]> > wrote: > > > I left that in on purpose to protect against cases where the combination > > of key and namespace can be ambiguous. For example, these two > combinations > > of key and namespace have the same written representation: > > key [0 1 2] namespace [3 4 5] (values in brackets are byte arrays) > > key [0 1] namespace [2 3 4 5] > > > > having the "magic number" in there protects against such cases. > > > > On Fri, 15 Jul 2016 at 16:31 Stephan Ewen <[hidden email]> wrote: > > > >> My assumption is that this was a sanity check that actually just stuck > in > >> the code. > >> > >> It can probably be removed. > >> > >> PS: Moving this to the [hidden email] list... > >> > >> > >> > >> On Fri, Jul 15, 2016 at 11:05 AM, 刘彪 <[hidden email]> wrote: > >> > >> > In AbstractRocksDBState.writeKeyAndNamespace(): > >> > > >> > protected void writeKeyAndNamespace(DataOutputView out) throws > >> IOException > >> > { > >> > backend.keySerializer().serialize(backend.currentKey(), out); > >> > out.writeByte(42); > >> > namespaceSerializer.serialize(currentNamespace, out); > >> > } > >> > > >> > Why write a byte 42 between key and namespace? The keySerializer and > >> > namespaceSerializer know their lengths. It seems we don't need this > >> byte. > >> > > >> > Could anybody tell me what it is for? Is there any situation that we > >> must > >> > have this separator? > >> > > >> > > > |
@Stephan It's not about the serializers not being able to read the key. The
key/namespace are never read again. It's just about the serialized form possibly being ambiguous since we don't control the TypeSerializers and there might be wanky var-length encoding schemes and what not. On Fri, 15 Jul 2016 at 19:20 Timothy Farkas <[hidden email]> wrote: > I've faced a similar issue when serializing data two a key value store. Not > sure how helpful it is for this case but two possible solutions I've used > for persisting keys and values under different namespaces to the same key > value store are: > > - have all namespaces be the same number of bytes and prefix each key with > its namespace. > - Include the number of bytes in the name space and key. So the bytes would > look like this: > > [name space num bytes] [ name space] [key num bytes] [key] > > Thanks, > Tim > > On Fri, Jul 15, 2016 at 9:45 AM, Stephan Ewen <[hidden email]> wrote: > > > Every serializer should know how many bytes to consume. The key > serializer > > should not need to look for 42 to know where to terminate. > > > > Otherwise this would be a problem case: > > key[42, 42] - 42 - namespace [42, 42, 42] > > key[42, 42, 42] - 42 - namespace [42, 42] > > > > > > > > On Fri, Jul 15, 2016 at 5:38 PM, Aljoscha Krettek <[hidden email]> > > wrote: > > > > > I left that in on purpose to protect against cases where the > combination > > > of key and namespace can be ambiguous. For example, these two > > combinations > > > of key and namespace have the same written representation: > > > key [0 1 2] namespace [3 4 5] (values in brackets are byte arrays) > > > key [0 1] namespace [2 3 4 5] > > > > > > having the "magic number" in there protects against such cases. > > > > > > On Fri, 15 Jul 2016 at 16:31 Stephan Ewen <[hidden email]> wrote: > > > > > >> My assumption is that this was a sanity check that actually just stuck > > in > > >> the code. > > >> > > >> It can probably be removed. > > >> > > >> PS: Moving this to the [hidden email] list... > > >> > > >> > > >> > > >> On Fri, Jul 15, 2016 at 11:05 AM, 刘彪 <[hidden email]> wrote: > > >> > > >> > In AbstractRocksDBState.writeKeyAndNamespace(): > > >> > > > >> > protected void writeKeyAndNamespace(DataOutputView out) throws > > >> IOException > > >> > { > > >> > backend.keySerializer().serialize(backend.currentKey(), out); > > >> > out.writeByte(42); > > >> > namespaceSerializer.serialize(currentNamespace, out); > > >> > } > > >> > > > >> > Why write a byte 42 between key and namespace? The keySerializer and > > >> > namespaceSerializer know their lengths. It seems we don't need this > > >> byte. > > >> > > > >> > Could anybody tell me what it is for? Is there any situation that > we > > >> must > > >> > have this separator? > > >> > > > >> > > > > > > |
Got it. But the ambiguity is not really solved by that, just lessened.
On Sun, Jul 17, 2016 at 2:10 PM, Aljoscha Krettek <[hidden email]> wrote: > @Stephan It's not about the serializers not being able to read the key. The > key/namespace are never read again. It's just about the serialized form > possibly being ambiguous since we don't control the TypeSerializers and > there might be wanky var-length encoding schemes and what not. > > On Fri, 15 Jul 2016 at 19:20 Timothy Farkas <[hidden email]> > wrote: > > > I've faced a similar issue when serializing data two a key value store. > Not > > sure how helpful it is for this case but two possible solutions I've used > > for persisting keys and values under different namespaces to the same key > > value store are: > > > > - have all namespaces be the same number of bytes and prefix each key > with > > its namespace. > > - Include the number of bytes in the name space and key. So the bytes > would > > look like this: > > > > [name space num bytes] [ name space] [key num bytes] [key] > > > > Thanks, > > Tim > > > > On Fri, Jul 15, 2016 at 9:45 AM, Stephan Ewen <[hidden email]> wrote: > > > > > Every serializer should know how many bytes to consume. The key > > serializer > > > should not need to look for 42 to know where to terminate. > > > > > > Otherwise this would be a problem case: > > > key[42, 42] - 42 - namespace [42, 42, 42] > > > key[42, 42, 42] - 42 - namespace [42, 42] > > > > > > > > > > > > On Fri, Jul 15, 2016 at 5:38 PM, Aljoscha Krettek <[hidden email] > > > > > wrote: > > > > > > > I left that in on purpose to protect against cases where the > > combination > > > > of key and namespace can be ambiguous. For example, these two > > > combinations > > > > of key and namespace have the same written representation: > > > > key [0 1 2] namespace [3 4 5] (values in brackets are byte arrays) > > > > key [0 1] namespace [2 3 4 5] > > > > > > > > having the "magic number" in there protects against such cases. > > > > > > > > On Fri, 15 Jul 2016 at 16:31 Stephan Ewen <[hidden email]> wrote: > > > > > > > >> My assumption is that this was a sanity check that actually just > stuck > > > in > > > >> the code. > > > >> > > > >> It can probably be removed. > > > >> > > > >> PS: Moving this to the [hidden email] list... > > > >> > > > >> > > > >> > > > >> On Fri, Jul 15, 2016 at 11:05 AM, 刘彪 <[hidden email]> wrote: > > > >> > > > >> > In AbstractRocksDBState.writeKeyAndNamespace(): > > > >> > > > > >> > protected void writeKeyAndNamespace(DataOutputView out) throws > > > >> IOException > > > >> > { > > > >> > backend.keySerializer().serialize(backend.currentKey(), out); > > > >> > out.writeByte(42); > > > >> > namespaceSerializer.serialize(currentNamespace, out); > > > >> > } > > > >> > > > > >> > Why write a byte 42 between key and namespace? The keySerializer > and > > > >> > namespaceSerializer know their lengths. It seems we don't need > this > > > >> byte. > > > >> > > > > >> > Could anybody tell me what it is for? Is there any situation that > > we > > > >> must > > > >> > have this separator? > > > >> > > > > >> > > > > > > > > > > |
In which cases is it not solved? Because then we should make sure to solve
it. On Mon, 18 Jul 2016 at 10:33 Stephan Ewen <[hidden email]> wrote: > Got it. But the ambiguity is not really solved by that, just lessened. > > On Sun, Jul 17, 2016 at 2:10 PM, Aljoscha Krettek <[hidden email]> > wrote: > > > @Stephan It's not about the serializers not being able to read the key. > The > > key/namespace are never read again. It's just about the serialized form > > possibly being ambiguous since we don't control the TypeSerializers and > > there might be wanky var-length encoding schemes and what not. > > > > On Fri, 15 Jul 2016 at 19:20 Timothy Farkas < > [hidden email]> > > wrote: > > > > > I've faced a similar issue when serializing data two a key value store. > > Not > > > sure how helpful it is for this case but two possible solutions I've > used > > > for persisting keys and values under different namespaces to the same > key > > > value store are: > > > > > > - have all namespaces be the same number of bytes and prefix each key > > with > > > its namespace. > > > - Include the number of bytes in the name space and key. So the bytes > > would > > > look like this: > > > > > > [name space num bytes] [ name space] [key num bytes] [key] > > > > > > Thanks, > > > Tim > > > > > > On Fri, Jul 15, 2016 at 9:45 AM, Stephan Ewen <[hidden email]> > wrote: > > > > > > > Every serializer should know how many bytes to consume. The key > > > serializer > > > > should not need to look for 42 to know where to terminate. > > > > > > > > Otherwise this would be a problem case: > > > > key[42, 42] - 42 - namespace [42, 42, 42] > > > > key[42, 42, 42] - 42 - namespace [42, 42] > > > > > > > > > > > > > > > > On Fri, Jul 15, 2016 at 5:38 PM, Aljoscha Krettek < > [hidden email] > > > > > > > wrote: > > > > > > > > > I left that in on purpose to protect against cases where the > > > combination > > > > > of key and namespace can be ambiguous. For example, these two > > > > combinations > > > > > of key and namespace have the same written representation: > > > > > key [0 1 2] namespace [3 4 5] (values in brackets are byte arrays) > > > > > key [0 1] namespace [2 3 4 5] > > > > > > > > > > having the "magic number" in there protects against such cases. > > > > > > > > > > On Fri, 15 Jul 2016 at 16:31 Stephan Ewen <[hidden email]> > wrote: > > > > > > > > > >> My assumption is that this was a sanity check that actually just > > stuck > > > > in > > > > >> the code. > > > > >> > > > > >> It can probably be removed. > > > > >> > > > > >> PS: Moving this to the [hidden email] list... > > > > >> > > > > >> > > > > >> > > > > >> On Fri, Jul 15, 2016 at 11:05 AM, 刘彪 <[hidden email]> wrote: > > > > >> > > > > >> > In AbstractRocksDBState.writeKeyAndNamespace(): > > > > >> > > > > > >> > protected void writeKeyAndNamespace(DataOutputView out) throws > > > > >> IOException > > > > >> > { > > > > >> > backend.keySerializer().serialize(backend.currentKey(), out); > > > > >> > out.writeByte(42); > > > > >> > namespaceSerializer.serialize(currentNamespace, out); > > > > >> > } > > > > >> > > > > > >> > Why write a byte 42 between key and namespace? The keySerializer > > and > > > > >> > namespaceSerializer know their lengths. It seems we don't need > > this > > > > >> byte. > > > > >> > > > > > >> > Could anybody tell me what it is for? Is there any situation > that > > > we > > > > >> must > > > > >> > have this separator? > > > > >> > > > > > >> > > > > > > > > > > > > > > > |
Ah I see, Stephan and I had a quick chat and it's for cases where there are
42s around the edges of the key/namespace. On Mon, 18 Jul 2016 at 11:51 Aljoscha Krettek <[hidden email]> wrote: > In which cases is it not solved? Because then we should make sure to solve > it. > > On Mon, 18 Jul 2016 at 10:33 Stephan Ewen <[hidden email]> wrote: > >> Got it. But the ambiguity is not really solved by that, just lessened. >> >> On Sun, Jul 17, 2016 at 2:10 PM, Aljoscha Krettek <[hidden email]> >> wrote: >> >> > @Stephan It's not about the serializers not being able to read the key. >> The >> > key/namespace are never read again. It's just about the serialized form >> > possibly being ambiguous since we don't control the TypeSerializers and >> > there might be wanky var-length encoding schemes and what not. >> > >> > On Fri, 15 Jul 2016 at 19:20 Timothy Farkas < >> [hidden email]> >> > wrote: >> > >> > > I've faced a similar issue when serializing data two a key value >> store. >> > Not >> > > sure how helpful it is for this case but two possible solutions I've >> used >> > > for persisting keys and values under different namespaces to the same >> key >> > > value store are: >> > > >> > > - have all namespaces be the same number of bytes and prefix each key >> > with >> > > its namespace. >> > > - Include the number of bytes in the name space and key. So the bytes >> > would >> > > look like this: >> > > >> > > [name space num bytes] [ name space] [key num bytes] [key] >> > > >> > > Thanks, >> > > Tim >> > > >> > > On Fri, Jul 15, 2016 at 9:45 AM, Stephan Ewen <[hidden email]> >> wrote: >> > > >> > > > Every serializer should know how many bytes to consume. The key >> > > serializer >> > > > should not need to look for 42 to know where to terminate. >> > > > >> > > > Otherwise this would be a problem case: >> > > > key[42, 42] - 42 - namespace [42, 42, 42] >> > > > key[42, 42, 42] - 42 - namespace [42, 42] >> > > > >> > > > >> > > > >> > > > On Fri, Jul 15, 2016 at 5:38 PM, Aljoscha Krettek < >> [hidden email] >> > > >> > > > wrote: >> > > > >> > > > > I left that in on purpose to protect against cases where the >> > > combination >> > > > > of key and namespace can be ambiguous. For example, these two >> > > > combinations >> > > > > of key and namespace have the same written representation: >> > > > > key [0 1 2] namespace [3 4 5] (values in brackets are byte arrays) >> > > > > key [0 1] namespace [2 3 4 5] >> > > > > >> > > > > having the "magic number" in there protects against such cases. >> > > > > >> > > > > On Fri, 15 Jul 2016 at 16:31 Stephan Ewen <[hidden email]> >> wrote: >> > > > > >> > > > >> My assumption is that this was a sanity check that actually just >> > stuck >> > > > in >> > > > >> the code. >> > > > >> >> > > > >> It can probably be removed. >> > > > >> >> > > > >> PS: Moving this to the [hidden email] list... >> > > > >> >> > > > >> >> > > > >> >> > > > >> On Fri, Jul 15, 2016 at 11:05 AM, 刘彪 <[hidden email]> wrote: >> > > > >> >> > > > >> > In AbstractRocksDBState.writeKeyAndNamespace(): >> > > > >> > >> > > > >> > protected void writeKeyAndNamespace(DataOutputView out) throws >> > > > >> IOException >> > > > >> > { >> > > > >> > backend.keySerializer().serialize(backend.currentKey(), out); >> > > > >> > out.writeByte(42); >> > > > >> > namespaceSerializer.serialize(currentNamespace, out); >> > > > >> > } >> > > > >> > >> > > > >> > Why write a byte 42 between key and namespace? The >> keySerializer >> > and >> > > > >> > namespaceSerializer know their lengths. It seems we don't need >> > this >> > > > >> byte. >> > > > >> > >> > > > >> > Could anybody tell me what it is for? Is there any situation >> that >> > > we >> > > > >> must >> > > > >> > have this separator? >> > > > >> > >> > > > >> >> > > > > >> > > > >> > > >> > >> > |
Is there a JIRA issue for this?
On Mon, Jul 18, 2016 at 12:15 PM, Aljoscha Krettek <[hidden email]> wrote: > Ah I see, Stephan and I had a quick chat and it's for cases where there are > 42s around the edges of the key/namespace. > > On Mon, 18 Jul 2016 at 11:51 Aljoscha Krettek <[hidden email]> wrote: > >> In which cases is it not solved? Because then we should make sure to solve >> it. >> >> On Mon, 18 Jul 2016 at 10:33 Stephan Ewen <[hidden email]> wrote: >> >>> Got it. But the ambiguity is not really solved by that, just lessened. >>> >>> On Sun, Jul 17, 2016 at 2:10 PM, Aljoscha Krettek <[hidden email]> >>> wrote: >>> >>> > @Stephan It's not about the serializers not being able to read the key. >>> The >>> > key/namespace are never read again. It's just about the serialized form >>> > possibly being ambiguous since we don't control the TypeSerializers and >>> > there might be wanky var-length encoding schemes and what not. >>> > >>> > On Fri, 15 Jul 2016 at 19:20 Timothy Farkas < >>> [hidden email]> >>> > wrote: >>> > >>> > > I've faced a similar issue when serializing data two a key value >>> store. >>> > Not >>> > > sure how helpful it is for this case but two possible solutions I've >>> used >>> > > for persisting keys and values under different namespaces to the same >>> key >>> > > value store are: >>> > > >>> > > - have all namespaces be the same number of bytes and prefix each key >>> > with >>> > > its namespace. >>> > > - Include the number of bytes in the name space and key. So the bytes >>> > would >>> > > look like this: >>> > > >>> > > [name space num bytes] [ name space] [key num bytes] [key] >>> > > >>> > > Thanks, >>> > > Tim >>> > > >>> > > On Fri, Jul 15, 2016 at 9:45 AM, Stephan Ewen <[hidden email]> >>> wrote: >>> > > >>> > > > Every serializer should know how many bytes to consume. The key >>> > > serializer >>> > > > should not need to look for 42 to know where to terminate. >>> > > > >>> > > > Otherwise this would be a problem case: >>> > > > key[42, 42] - 42 - namespace [42, 42, 42] >>> > > > key[42, 42, 42] - 42 - namespace [42, 42] >>> > > > >>> > > > >>> > > > >>> > > > On Fri, Jul 15, 2016 at 5:38 PM, Aljoscha Krettek < >>> [hidden email] >>> > > >>> > > > wrote: >>> > > > >>> > > > > I left that in on purpose to protect against cases where the >>> > > combination >>> > > > > of key and namespace can be ambiguous. For example, these two >>> > > > combinations >>> > > > > of key and namespace have the same written representation: >>> > > > > key [0 1 2] namespace [3 4 5] (values in brackets are byte arrays) >>> > > > > key [0 1] namespace [2 3 4 5] >>> > > > > >>> > > > > having the "magic number" in there protects against such cases. >>> > > > > >>> > > > > On Fri, 15 Jul 2016 at 16:31 Stephan Ewen <[hidden email]> >>> wrote: >>> > > > > >>> > > > >> My assumption is that this was a sanity check that actually just >>> > stuck >>> > > > in >>> > > > >> the code. >>> > > > >> >>> > > > >> It can probably be removed. >>> > > > >> >>> > > > >> PS: Moving this to the [hidden email] list... >>> > > > >> >>> > > > >> >>> > > > >> >>> > > > >> On Fri, Jul 15, 2016 at 11:05 AM, 刘彪 <[hidden email]> wrote: >>> > > > >> >>> > > > >> > In AbstractRocksDBState.writeKeyAndNamespace(): >>> > > > >> > >>> > > > >> > protected void writeKeyAndNamespace(DataOutputView out) throws >>> > > > >> IOException >>> > > > >> > { >>> > > > >> > backend.keySerializer().serialize(backend.currentKey(), out); >>> > > > >> > out.writeByte(42); >>> > > > >> > namespaceSerializer.serialize(currentNamespace, out); >>> > > > >> > } >>> > > > >> > >>> > > > >> > Why write a byte 42 between key and namespace? The >>> keySerializer >>> > and >>> > > > >> > namespaceSerializer know their lengths. It seems we don't need >>> > this >>> > > > >> byte. >>> > > > >> > >>> > > > >> > Could anybody tell me what it is for? Is there any situation >>> that >>> > > we >>> > > > >> must >>> > > > >> > have this separator? >>> > > > >> > >>> > > > >> >>> > > > > >>> > > > >>> > > >>> > >>> >> |
No, there is no issue for now. It's just not theoretically 100% safe but
the way we use it for now is not problematic. On Wed, 20 Jul 2016 at 16:07 Maximilian Michels <[hidden email]> wrote: > Is there a JIRA issue for this? > > On Mon, Jul 18, 2016 at 12:15 PM, Aljoscha Krettek <[hidden email]> > wrote: > > Ah I see, Stephan and I had a quick chat and it's for cases where there > are > > 42s around the edges of the key/namespace. > > > > On Mon, 18 Jul 2016 at 11:51 Aljoscha Krettek <[hidden email]> > wrote: > > > >> In which cases is it not solved? Because then we should make sure to > solve > >> it. > >> > >> On Mon, 18 Jul 2016 at 10:33 Stephan Ewen <[hidden email]> wrote: > >> > >>> Got it. But the ambiguity is not really solved by that, just lessened. > >>> > >>> On Sun, Jul 17, 2016 at 2:10 PM, Aljoscha Krettek <[hidden email] > > > >>> wrote: > >>> > >>> > @Stephan It's not about the serializers not being able to read the > key. > >>> The > >>> > key/namespace are never read again. It's just about the serialized > form > >>> > possibly being ambiguous since we don't control the TypeSerializers > and > >>> > there might be wanky var-length encoding schemes and what not. > >>> > > >>> > On Fri, 15 Jul 2016 at 19:20 Timothy Farkas < > >>> [hidden email]> > >>> > wrote: > >>> > > >>> > > I've faced a similar issue when serializing data two a key value > >>> store. > >>> > Not > >>> > > sure how helpful it is for this case but two possible solutions > I've > >>> used > >>> > > for persisting keys and values under different namespaces to the > same > >>> key > >>> > > value store are: > >>> > > > >>> > > - have all namespaces be the same number of bytes and prefix each > key > >>> > with > >>> > > its namespace. > >>> > > - Include the number of bytes in the name space and key. So the > bytes > >>> > would > >>> > > look like this: > >>> > > > >>> > > [name space num bytes] [ name space] [key num bytes] [key] > >>> > > > >>> > > Thanks, > >>> > > Tim > >>> > > > >>> > > On Fri, Jul 15, 2016 at 9:45 AM, Stephan Ewen <[hidden email]> > >>> wrote: > >>> > > > >>> > > > Every serializer should know how many bytes to consume. The key > >>> > > serializer > >>> > > > should not need to look for 42 to know where to terminate. > >>> > > > > >>> > > > Otherwise this would be a problem case: > >>> > > > key[42, 42] - 42 - namespace [42, 42, 42] > >>> > > > key[42, 42, 42] - 42 - namespace [42, 42] > >>> > > > > >>> > > > > >>> > > > > >>> > > > On Fri, Jul 15, 2016 at 5:38 PM, Aljoscha Krettek < > >>> [hidden email] > >>> > > > >>> > > > wrote: > >>> > > > > >>> > > > > I left that in on purpose to protect against cases where the > >>> > > combination > >>> > > > > of key and namespace can be ambiguous. For example, these two > >>> > > > combinations > >>> > > > > of key and namespace have the same written representation: > >>> > > > > key [0 1 2] namespace [3 4 5] (values in brackets are byte > arrays) > >>> > > > > key [0 1] namespace [2 3 4 5] > >>> > > > > > >>> > > > > having the "magic number" in there protects against such cases. > >>> > > > > > >>> > > > > On Fri, 15 Jul 2016 at 16:31 Stephan Ewen <[hidden email]> > >>> wrote: > >>> > > > > > >>> > > > >> My assumption is that this was a sanity check that actually > just > >>> > stuck > >>> > > > in > >>> > > > >> the code. > >>> > > > >> > >>> > > > >> It can probably be removed. > >>> > > > >> > >>> > > > >> PS: Moving this to the [hidden email] list... > >>> > > > >> > >>> > > > >> > >>> > > > >> > >>> > > > >> On Fri, Jul 15, 2016 at 11:05 AM, 刘彪 <[hidden email]> > wrote: > >>> > > > >> > >>> > > > >> > In AbstractRocksDBState.writeKeyAndNamespace(): > >>> > > > >> > > >>> > > > >> > protected void writeKeyAndNamespace(DataOutputView out) > throws > >>> > > > >> IOException > >>> > > > >> > { > >>> > > > >> > backend.keySerializer().serialize(backend.currentKey(), > out); > >>> > > > >> > out.writeByte(42); > >>> > > > >> > namespaceSerializer.serialize(currentNamespace, out); > >>> > > > >> > } > >>> > > > >> > > >>> > > > >> > Why write a byte 42 between key and namespace? The > >>> keySerializer > >>> > and > >>> > > > >> > namespaceSerializer know their lengths. It seems we don't > need > >>> > this > >>> > > > >> byte. > >>> > > > >> > > >>> > > > >> > Could anybody tell me what it is for? Is there any > situation > >>> that > >>> > > we > >>> > > > >> must > >>> > > > >> > have this separator? > >>> > > > >> > > >>> > > > >> > >>> > > > > > >>> > > > > >>> > > > >>> > > >>> > >> > |
Free forum by Nabble | Edit this page |