[GitHub] incubator-flink pull request: Serialized String comparison, Unicod...

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-flink pull request: Serialized String comparison, Unicod...

zentol
Github user zentol commented on the pull request:

    https://github.com/apache/incubator-flink/pull/4#issuecomment-45841376
 
    I reworked the serializer/comparator again. It now uses the first bit of every byte to indicate whether there is at least one more byte coming.
   
    This has the bonus that all letters are serialized as one byte (opposed to the previous version which could only do this for the numbers and a few special characters (which actually made the variable length encoding pointless...))
   
    I currently do a selective shift starting on the flag positions to make space for them. I wonder if there is a more efficient way to do that, here's an example how i do it:
   
    char to send:
    ```0010 0110 1111 1001```
    1) move the lowest 7 bits into a tmp variable (by doing & with 0000 0000 0111 1111)
    ```0000 0000 0111 1001```
    2) shift char to the right by 7 positions to omit the lower part
    ```0000 0000 0100 1101```
    3) shift char to the right by 8 positions (finalizing the shifting of the upper part)
    ```0100 1101 0000 0000```
    4) char |= tmp
    ```0100 1100 0111 1001```
    (this would be done resursively for every flag position needed, starting from the right, so up to 3 times)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---