Hi,
I have encountered a weird issue. When constructing a Table Query with CASE Statements, like below: .append("CASE ") .append("WHEN aggrcat = '0' AND botcode='r4' THEN 'monitoring' ") .append("WHEN aggrcat = '1' AND botcode='r4' THEN 'aggregator' ") .append("WHEN aggrcat = '2' AND botcode='r4' THEN 'social network' ") .append("WHEN botcode='r0' OR botcode='r8' THEN 'crawler' ") Followed by converting to a stream: DataStream<Row> ds = tableEnv.toAppendStream(table_name, Row.Class) ds.print() I can see that while I print this, it takes the length of the longest string of all the four categories and appends spaces till that length. Eg. Crawler is of length of the biggest string [social network mayb]. This affects when my data is sinked to ES. The fields are appended with empty spaces, which affects the querying. I know i can do a simple split to eliminate these spaces in my code, but just curious to understand why this behavior. It becomes cumbersome to maintain code this way and for readability as well. ~Ramya. |
Hi,
Table API queries are executed with SQL semantics. In SQL, the strings are padded because the result data type is not a VARCHAR but a CHAR with as many character as the longest string. You can use the RTRIM function to remove the padding whitespaces. Best, Fabian Am Di., 29. Jan. 2019 um 15:08 Uhr schrieb Ramya Ramamurthy < [hidden email]>: > Hi, > > I have encountered a weird issue. When constructing a Table Query with CASE > Statements, like below: > .append("CASE ") > .append("WHEN aggrcat = '0' AND botcode='r4' THEN 'monitoring' ") > .append("WHEN aggrcat = '1' AND botcode='r4' THEN 'aggregator' ") > .append("WHEN aggrcat = '2' AND botcode='r4' THEN 'social network' ") > .append("WHEN botcode='r0' OR botcode='r8' THEN 'crawler' ") > > Followed by converting to a stream: > DataStream<Row> ds = tableEnv.toAppendStream(table_name, Row.Class) > ds.print() > > I can see that while I print this, it takes the length of the longest > string of all the four categories and appends spaces till that length. > Eg. Crawler is of length of the biggest string [social network mayb]. > > This affects when my data is sinked to ES. The fields are appended with > empty spaces, which affects the querying. > > I know i can do a simple split to eliminate these spaces in my code, but > just curious to understand why this behavior. It becomes cumbersome to > maintain code this way and for readability as well. > > ~Ramya. > |
Hi Ramya,
Fabian is right. The behavior in strict SQL standard mode(SQL:2003) returns a CHAR(N) type with blank-padded. Best, Hequn On Wed, Jan 30, 2019 at 1:59 AM Fabian Hueske <[hidden email]> wrote: > Hi, > > Table API queries are executed with SQL semantics. > In SQL, the strings are padded because the result data type is not a > VARCHAR but a CHAR with as many character as the longest string. > You can use the RTRIM function to remove the padding whitespaces. > > Best, Fabian > > > Am Di., 29. Jan. 2019 um 15:08 Uhr schrieb Ramya Ramamurthy < > [hidden email]>: > > > Hi, > > > > I have encountered a weird issue. When constructing a Table Query with > CASE > > Statements, like below: > > .append("CASE ") > > .append("WHEN aggrcat = '0' AND botcode='r4' THEN 'monitoring' ") > > .append("WHEN aggrcat = '1' AND botcode='r4' THEN 'aggregator' ") > > .append("WHEN aggrcat = '2' AND botcode='r4' THEN 'social network' ") > > .append("WHEN botcode='r0' OR botcode='r8' THEN 'crawler' ") > > > > Followed by converting to a stream: > > DataStream<Row> ds = tableEnv.toAppendStream(table_name, Row.Class) > > ds.print() > > > > I can see that while I print this, it takes the length of the longest > > string of all the four categories and appends spaces till that length. > > Eg. Crawler is of length of the biggest string [social network mayb]. > > > > This affects when my data is sinked to ES. The fields are appended with > > empty spaces, which affects the querying. > > > > I know i can do a simple split to eliminate these spaces in my code, but > > just curious to understand why this behavior. It becomes cumbersome to > > maintain code this way and for readability as well. > > > > ~Ramya. > > > |
Hi Ramya,
Fabian and Hequn are partially correct. Return type of `CASE WHEN` statement in this case should be CHAR(N), with N being the max of the WHEN branches. However the problem that you are facing is that Flink doesn’t handle CHAR(N) type correctly. There is an open issue for that: https://issues.apache.org/jira/browse/FLINK-10257 <https://issues.apache.org/jira/browse/FLINK-10257> One of it’s side effects is that CHAR(x) strings written back to the connectors are incorrectly padded with spaces (this should only happen if the connector itself support CHAR(x) data type and the output table's column is defined as CHAR(x), which is very uncommon). Unfortunately currently the only solution os as Fabian suggested to use RTRIM function. Piotrek > On 30 Jan 2019, at 04:38, Hequn Cheng <[hidden email]> wrote: > > Hi Ramya, > > Fabian is right. The behavior in strict SQL standard mode(SQL:2003) returns > a CHAR(N) type with blank-padded. > > Best, > Hequn > > > On Wed, Jan 30, 2019 at 1:59 AM Fabian Hueske <[hidden email]> wrote: > >> Hi, >> >> Table API queries are executed with SQL semantics. >> In SQL, the strings are padded because the result data type is not a >> VARCHAR but a CHAR with as many character as the longest string. >> You can use the RTRIM function to remove the padding whitespaces. >> >> Best, Fabian >> >> >> Am Di., 29. Jan. 2019 um 15:08 Uhr schrieb Ramya Ramamurthy < >> [hidden email]>: >> >>> Hi, >>> >>> I have encountered a weird issue. When constructing a Table Query with >> CASE >>> Statements, like below: >>> .append("CASE ") >>> .append("WHEN aggrcat = '0' AND botcode='r4' THEN 'monitoring' ") >>> .append("WHEN aggrcat = '1' AND botcode='r4' THEN 'aggregator' ") >>> .append("WHEN aggrcat = '2' AND botcode='r4' THEN 'social network' ") >>> .append("WHEN botcode='r0' OR botcode='r8' THEN 'crawler' ") >>> >>> Followed by converting to a stream: >>> DataStream<Row> ds = tableEnv.toAppendStream(table_name, Row.Class) >>> ds.print() >>> >>> I can see that while I print this, it takes the length of the longest >>> string of all the four categories and appends spaces till that length. >>> Eg. Crawler is of length of the biggest string [social network mayb]. >>> >>> This affects when my data is sinked to ES. The fields are appended with >>> empty spaces, which affects the querying. >>> >>> I know i can do a simple split to eliminate these spaces in my code, but >>> just curious to understand why this behavior. It becomes cumbersome to >>> maintain code this way and for readability as well. >>> >>> ~Ramya. >>> >> |
RTRIM is not the only solution. It is also possible to cast all literals
of a CASE WHEN statement to VARCHAR, right? WHEN aggrcat = '0' AND botcode='r4' THEN CAST('monitoring' AS VARCHAR) The common datatype of all branches should then be VARCHAR. Regards, Timo Am 04.02.19 um 11:06 schrieb Piotr Nowojski: > Hi Ramya, > > Fabian and Hequn are partially correct. Return type of `CASE WHEN` statement in this case should be CHAR(N), with N being the max of the WHEN branches. However the problem that you are facing is that Flink doesn’t handle CHAR(N) type correctly. There is an open issue for that: > > https://issues.apache.org/jira/browse/FLINK-10257 <https://issues.apache.org/jira/browse/FLINK-10257> > > One of it’s side effects is that CHAR(x) strings written back to the connectors are incorrectly padded with spaces (this should only happen if the connector itself support CHAR(x) data type and the output table's column is defined as CHAR(x), which is very uncommon). Unfortunately currently the only solution os as Fabian suggested to use RTRIM function. > > Piotrek > >> On 30 Jan 2019, at 04:38, Hequn Cheng <[hidden email]> wrote: >> >> Hi Ramya, >> >> Fabian is right. The behavior in strict SQL standard mode(SQL:2003) returns >> a CHAR(N) type with blank-padded. >> >> Best, >> Hequn >> >> >> On Wed, Jan 30, 2019 at 1:59 AM Fabian Hueske <[hidden email]> wrote: >> >>> Hi, >>> >>> Table API queries are executed with SQL semantics. >>> In SQL, the strings are padded because the result data type is not a >>> VARCHAR but a CHAR with as many character as the longest string. >>> You can use the RTRIM function to remove the padding whitespaces. >>> >>> Best, Fabian >>> >>> >>> Am Di., 29. Jan. 2019 um 15:08 Uhr schrieb Ramya Ramamurthy < >>> [hidden email]>: >>> >>>> Hi, >>>> >>>> I have encountered a weird issue. When constructing a Table Query with >>> CASE >>>> Statements, like below: >>>> .append("CASE ") >>>> .append("WHEN aggrcat = '0' AND botcode='r4' THEN 'monitoring' ") >>>> .append("WHEN aggrcat = '1' AND botcode='r4' THEN 'aggregator' ") >>>> .append("WHEN aggrcat = '2' AND botcode='r4' THEN 'social network' ") >>>> .append("WHEN botcode='r0' OR botcode='r8' THEN 'crawler' ") >>>> >>>> Followed by converting to a stream: >>>> DataStream<Row> ds = tableEnv.toAppendStream(table_name, Row.Class) >>>> ds.print() >>>> >>>> I can see that while I print this, it takes the length of the longest >>>> string of all the four categories and appends spaces till that length. >>>> Eg. Crawler is of length of the biggest string [social network mayb]. >>>> >>>> This affects when my data is sinked to ES. The fields are appended with >>>> empty spaces, which affects the querying. >>>> >>>> I know i can do a simple split to eliminate these spaces in my code, but >>>> just curious to understand why this behavior. It becomes cumbersome to >>>> maintain code this way and for readability as well. >>>> >>>> ~Ramya. >>>> > |
Free forum by Nabble | Edit this page |