(DEPRECATED) Apache Flink Mailing List archive.

回复： Let Flink SQL PlannerExpressionParserImpl#FieldRefrence use Unicode as its default charset

Classic

List

Threaded

1 message

刘首维

回复： Let Flink SQL PlannerExpressionParserImpl#FieldRefrence use Unicode as its default charset

Hi, 
    What I am talking about is the `PlannerExpressionParserImpl`, which is written by Scala Parser tool, Every time we call  StreamTableEnvironment#FromDataStream, the field String (or maybe scala.Symbol by scala Api) shall be parsed by `PlannerExpressionParserImpl ` into `Expression`.
As we can see the  parser grammar  written in `PlannerExpressionParserImpl `, the `fieldRefrence` is  defined by `*` or `ident`.   `ident` in    `PlannerExpressionParserImpl` is just the  one in [[scala.util.parsing.combinator.JavaTokenParsers]]  which is JavaIdentifier. 

   After discussed with Jark（云邪）, I also discovered that `PlannerExpressionParserImpl` currrently even does not support quote （'`'). I did't know what  u just told me about Calcite before. But it doesn't matter. Well maybe we can just  let PlannerExpressionParserImpl#FieldRefrence use Unicode as its default charset and support '`'   for the first step, and then make the whole project supports Unicode charset  when Calcite related part is available.

btw I have been to ur lecture in FFA Asia on Calcite, which really inspired me a lot~
 

Best Regards
刘首维Shoi Liu 
大连理工大学

 

------------------ 原始邮件 ------------------
发件人: "Danny Chan"<[hidden email]>;
发送时间: 2020年1月16日(星期四) 中午12:45
收件人: "刘首维"<[hidden email]>;

主题: Re: Let Flink SQL PlannerExpressionParserImpl#FieldRefrence use Unicode as its default charset

User defined charset for DB/session/table/column is not supported yet for Flink now, specifically, Flink use Calcite as the panner engine that also does not support configurable charset well, there is a design doc [1] but has never been implemented. Apache Calcite’s default system charset is “ISO-8859-1”.

Actually I’m a little confused about your description, do you mean the charset of SqlIdentifier or the string literal ? They are different topics.

[1] https://docs.google.com/document/d/1wo5byn_6K_YOKiPdXNav1zgzt9IBC3SbPvpPnIShtXk/edit#heading=h.g4bnumde4dl5

Best, Danny Chan

在 2020年1月15日 +0800 PM11:08，刘首维 <[hidden email]>，写道：
Hi all,
 the related issue:https://issues.apache.org/jira/browse/FLINK-15573

  As the title tells, what I do want to do is let the `FieldRefrence` use Unicode as its default charset (or maybe as an optional  charset which can be configured).
According to the  `PlannerExpressionParserImpl`, currently FLINK uses JavaIdentifier as   `FieldRefrence`‘s default charset. But, from my perspective, it is not enough. Considering that user who uses ElasticSearch as sink，we all know that ES has A field called `@timestamp`, which JavaIdentifier cannot meet.

  So in my team, we just let `PlannerExpressionParserImpl#FieldRefrence` use Unicode as its default charset so that solves this kind of problem. (Plz refer to the issue I mentioned above )

In my Opinion, the change shall be for general purpose:
 Firstly, Mysql supports unicode as default field charset, see the field named `@@`, so shall we support unicode also?
<[hidden email]>

  What’ s more,  my team really get a lot of benefits  from this change. I also believe that it can give other users more benefits without even any harm!
  Fortunately, the change supports fully forwards compatibility.Cuz Unicode is the superset of  JavaIdentifier. Only a few code change can achieve this goal.
  Looking forward for any opinion.
 
 btw, thanks to tison~

Best Regards
刘首维 Shoi Liu