[jira] [Resolved] (FLINK-801) Serialized String comparison, Unicode support

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Resolved] (FLINK-801) Serialized String comparison, Unicode support

Shang Yuanchun (Jira)

     [ https://issues.apache.org/jira/browse/FLINK-801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Metzger resolved FLINK-801.
----------------------------------

    Resolution: Invalid

There is a new pull request for that: https://github.com/apache/incubator-flink/pull/4

> Serialized String comparison, Unicode support
> ---------------------------------------------
>
>                 Key: FLINK-801
>                 URL: https://issues.apache.org/jira/browse/FLINK-801
>             Project: Flink
>          Issue Type: Bug
>            Reporter: GitHub Import
>              Labels: github-import
>             Fix For: pre-apache
>
>         Attachments: pull-request-801-3431874524946732791.patch
>
>
> The StringComparator now works on serialized data.
> To this end new string read/write/copy/compare methods were introduced, which use a variable-length encoding for the characters.
> key-points:
> - The most significant bits are written/read first.
> - The first 2 bits of the character are used to encode the size of the character.
> - A character is at most 3 Bytes big.
> Additionally, the StringSerializer now has full unicode support. i couldn't find a unicode character that uses more than 22 bits, as such 3 Bytes should be sufficient.
> ---------------- Imported from GitHub ----------------
> Url: https://github.com/stratosphere/stratosphere/pull/801
> Created by: [zentol|https://github.com/zentol]
> Labels:
> Created at: Tue May 13 18:06:22 CEST 2014
> State: open



--
This message was sent by Atlassian JIRA
(v6.2#6252)