http://deprecated-apache-flink-mailing-list-archive.368.s1.nabble.com/jira-Resolved-FLINK-801-Serialized-String-comparison-Unicode-support-tp459.html
Robert Metzger resolved FLINK-801.
> Serialized String comparison, Unicode support
> ---------------------------------------------
>
> Key: FLINK-801
> URL:
https://issues.apache.org/jira/browse/FLINK-801> Project: Flink
> Issue Type: Bug
> Reporter: GitHub Import
> Labels: github-import
> Fix For: pre-apache
>
> Attachments: pull-request-801-3431874524946732791.patch
>
>
> The StringComparator now works on serialized data.
> To this end new string read/write/copy/compare methods were introduced, which use a variable-length encoding for the characters.
> key-points:
> - The most significant bits are written/read first.
> - The first 2 bits of the character are used to encode the size of the character.
> - A character is at most 3 Bytes big.
> Additionally, the StringSerializer now has full unicode support. i couldn't find a unicode character that uses more than 22 bits, as such 3 Bytes should be sufficient.
> ---------------- Imported from GitHub ----------------
> Url:
https://github.com/stratosphere/stratosphere/pull/801> Created by: [zentol|
https://github.com/zentol]
> Labels:
> Created at: Tue May 13 18:06:22 CEST 2014
> State: open