回复: Let Flink SQL PlannerExpressionParserImpl#FieldRefrence use Unicode as its default charset

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

回复: Let Flink SQL PlannerExpressionParserImpl#FieldRefrence use Unicode as its default charset

刘首维
Hi, 
    What I am talking about is the `PlannerExpressionParserImpl`, which is written by Scala Parser tool, Every time we call  StreamTableEnvironment#FromDataStream, the field String (or maybe scala.Symbol by scala Api) shall be parsed by `PlannerExpressionParserImpl ` into `Expression`.
As we can see the  parser grammar  written in `PlannerExpressionParserImpl `, the `fieldRefrence` is  defined by `*` or `ident`.   `ident` in    `PlannerExpressionParserImpl` is just the  one in [[scala.util.parsing.combinator.JavaTokenParsers]]  which is JavaIdentifier. 


   After discussed with Jark(云邪), I also discovered that `PlannerExpressionParserImpl` currrently even does not support quote ('`'). I did't know what  u just told me about Calcite before. But it doesn't matter. Well maybe we can just  let PlannerExpressionParserImpl#FieldRefrence use Unicode as its default charset and support '`'   for the first step, and then make the whole project supports Unicode charset  when Calcite related part is available.




btw I have been to ur lecture in FFA Asia on Calcite, which really inspired me a lot~
 





Best Regards
刘首维Shoi Liu 
大连理工大学




 




------------------ 原始邮件 ------------------
发件人:&nbsp;"Danny Chan"<[hidden email]&gt;;
发送时间:&nbsp;2020年1月16日(星期四) 中午12:45
收件人:&nbsp;"刘首维"<[hidden email]&gt;;

主题:&nbsp;Re: Let Flink SQL PlannerExpressionParserImpl#FieldRefrence use Unicode as its default charset



  User defined charset for DB/session/table/column is not supported yet for Flink now, specifically, Flink use Calcite as the panner engine that also does not support configurable charset well, there is a design doc [1] but has never been implemented. Apache Calcite’s default system charset is “ISO-8859-1”.

 Actually I’m a little confused about your description, do you mean the charset of SqlIdentifier or the string literal ? They are different topics.
 

 [1]&nbsp;https://docs.google.com/document/d/1wo5byn_6K_YOKiPdXNav1zgzt9IBC3SbPvpPnIShtXk/edit#heading=h.g4bnumde4dl5
 
 
 
 
 Best, Danny Chan
 
 
 在 2020年1月15日 +0800 PM11:08,刘首维 <[hidden email]&gt;,写道:
  Hi all,
 &nbsp;the related issue:https://issues.apache.org/jira/browse/FLINK-15573
 

 &nbsp; As the title tells, what I do want to do is let the `FieldRefrence` use Unicode as its default charset (or maybe as an optional&nbsp; charset which can be configured).
 According to the&nbsp; `PlannerExpressionParserImpl`, currently FLINK uses JavaIdentifier as&nbsp; &nbsp;`FieldRefrence`‘s default charset. But, from my perspective, it is not enough. Considering that user who uses ElasticSearch as sink,we all know that ES has A field called `@timestamp`, which JavaIdentifier cannot meet.
 

 &nbsp; So in my team, we just let `PlannerExpressionParserImpl#FieldRefrence` use Unicode as its default charset so that solves this kind of problem. (Plz refer to the issue I mentioned above )
 

 In my Opinion, the change shall be for general purpose:
 &nbsp;Firstly, Mysql supports unicode as default field charset, see the field named `@@`, so shall we support unicode also?
 <[hidden email]&gt;
 

 &nbsp; What’ s more,&nbsp; my team really get a lot of benefits&nbsp; from this change. I also believe that it can give other users more benefits without even any harm!
 &nbsp; Fortunately, the change supports fully forwards compatibility.Cuz Unicode is the superset of&nbsp; JavaIdentifier. Only a few code change can achieve this goal.
 &nbsp; Looking forward for any opinion.
 &nbsp;
 &nbsp;btw, thanks to tison~
 

 
 

 Best Regards
 刘首维 Shoi Liu
 

 
 
 &nbsp;