Flink Table Case Statements appending spaces

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Flink Table Case Statements appending spaces

Ramya Ramamurthy
Hi,

I have encountered a weird issue. When constructing a Table Query with CASE
Statements, like below:
.append("CASE ")
.append("WHEN aggrcat = '0' AND botcode='r4'  THEN 'monitoring' ")
.append("WHEN aggrcat = '1' AND botcode='r4' THEN 'aggregator' ")
.append("WHEN aggrcat = '2' AND botcode='r4' THEN 'social network' ")
 .append("WHEN botcode='r0' OR botcode='r8' THEN 'crawler' ")

Followed by converting to a stream:
DataStream<Row> ds = tableEnv.toAppendStream(table_name, Row.Class)
ds.print()

I can see that while I print this, it takes the length of the longest
string of all the four categories and appends spaces till that length.
Eg. Crawler is of length of the biggest string [social network mayb].

This affects when my data is sinked to ES. The fields are appended with
empty spaces, which affects the querying.

I know i can do a simple split to eliminate these spaces in my code, but
just curious to understand why this behavior. It becomes cumbersome to
maintain code this way and for readability as well.

~Ramya.
Reply | Threaded
Open this post in threaded view
|

Re: Flink Table Case Statements appending spaces

Fabian Hueske-2
Hi,

Table API queries are executed with SQL semantics.
In SQL, the strings are padded because the result data type is not a
VARCHAR but a CHAR with as many character as the longest string.
You can use the RTRIM function to remove the padding whitespaces.

Best, Fabian


Am Di., 29. Jan. 2019 um 15:08 Uhr schrieb Ramya Ramamurthy <
[hidden email]>:

> Hi,
>
> I have encountered a weird issue. When constructing a Table Query with CASE
> Statements, like below:
> .append("CASE ")
> .append("WHEN aggrcat = '0' AND botcode='r4'  THEN 'monitoring' ")
> .append("WHEN aggrcat = '1' AND botcode='r4' THEN 'aggregator' ")
> .append("WHEN aggrcat = '2' AND botcode='r4' THEN 'social network' ")
>  .append("WHEN botcode='r0' OR botcode='r8' THEN 'crawler' ")
>
> Followed by converting to a stream:
> DataStream<Row> ds = tableEnv.toAppendStream(table_name, Row.Class)
> ds.print()
>
> I can see that while I print this, it takes the length of the longest
> string of all the four categories and appends spaces till that length.
> Eg. Crawler is of length of the biggest string [social network mayb].
>
> This affects when my data is sinked to ES. The fields are appended with
> empty spaces, which affects the querying.
>
> I know i can do a simple split to eliminate these spaces in my code, but
> just curious to understand why this behavior. It becomes cumbersome to
> maintain code this way and for readability as well.
>
> ~Ramya.
>
Reply | Threaded
Open this post in threaded view
|

Re: Flink Table Case Statements appending spaces

Hequn Cheng
Hi Ramya,

Fabian is right. The behavior in strict SQL standard mode(SQL:2003) returns
a CHAR(N) type with blank-padded.

Best,
Hequn


On Wed, Jan 30, 2019 at 1:59 AM Fabian Hueske <[hidden email]> wrote:

> Hi,
>
> Table API queries are executed with SQL semantics.
> In SQL, the strings are padded because the result data type is not a
> VARCHAR but a CHAR with as many character as the longest string.
> You can use the RTRIM function to remove the padding whitespaces.
>
> Best, Fabian
>
>
> Am Di., 29. Jan. 2019 um 15:08 Uhr schrieb Ramya Ramamurthy <
> [hidden email]>:
>
> > Hi,
> >
> > I have encountered a weird issue. When constructing a Table Query with
> CASE
> > Statements, like below:
> > .append("CASE ")
> > .append("WHEN aggrcat = '0' AND botcode='r4'  THEN 'monitoring' ")
> > .append("WHEN aggrcat = '1' AND botcode='r4' THEN 'aggregator' ")
> > .append("WHEN aggrcat = '2' AND botcode='r4' THEN 'social network' ")
> >  .append("WHEN botcode='r0' OR botcode='r8' THEN 'crawler' ")
> >
> > Followed by converting to a stream:
> > DataStream<Row> ds = tableEnv.toAppendStream(table_name, Row.Class)
> > ds.print()
> >
> > I can see that while I print this, it takes the length of the longest
> > string of all the four categories and appends spaces till that length.
> > Eg. Crawler is of length of the biggest string [social network mayb].
> >
> > This affects when my data is sinked to ES. The fields are appended with
> > empty spaces, which affects the querying.
> >
> > I know i can do a simple split to eliminate these spaces in my code, but
> > just curious to understand why this behavior. It becomes cumbersome to
> > maintain code this way and for readability as well.
> >
> > ~Ramya.
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Flink Table Case Statements appending spaces

Piotr Nowojski-2
Hi Ramya,

Fabian and Hequn are partially correct. Return type of `CASE WHEN` statement in this case should be CHAR(N), with N being the max of the WHEN branches. However the problem that you are facing is that Flink doesn’t handle CHAR(N) type correctly. There is an open issue for that:

https://issues.apache.org/jira/browse/FLINK-10257 <https://issues.apache.org/jira/browse/FLINK-10257>

One of it’s side effects is that CHAR(x) strings written back to the connectors are incorrectly padded with spaces (this should only happen if the connector itself support CHAR(x) data type and the output table's column is defined as CHAR(x), which is very uncommon). Unfortunately currently the only solution os as Fabian suggested to use RTRIM function.

Piotrek

> On 30 Jan 2019, at 04:38, Hequn Cheng <[hidden email]> wrote:
>
> Hi Ramya,
>
> Fabian is right. The behavior in strict SQL standard mode(SQL:2003) returns
> a CHAR(N) type with blank-padded.
>
> Best,
> Hequn
>
>
> On Wed, Jan 30, 2019 at 1:59 AM Fabian Hueske <[hidden email]> wrote:
>
>> Hi,
>>
>> Table API queries are executed with SQL semantics.
>> In SQL, the strings are padded because the result data type is not a
>> VARCHAR but a CHAR with as many character as the longest string.
>> You can use the RTRIM function to remove the padding whitespaces.
>>
>> Best, Fabian
>>
>>
>> Am Di., 29. Jan. 2019 um 15:08 Uhr schrieb Ramya Ramamurthy <
>> [hidden email]>:
>>
>>> Hi,
>>>
>>> I have encountered a weird issue. When constructing a Table Query with
>> CASE
>>> Statements, like below:
>>> .append("CASE ")
>>> .append("WHEN aggrcat = '0' AND botcode='r4'  THEN 'monitoring' ")
>>> .append("WHEN aggrcat = '1' AND botcode='r4' THEN 'aggregator' ")
>>> .append("WHEN aggrcat = '2' AND botcode='r4' THEN 'social network' ")
>>> .append("WHEN botcode='r0' OR botcode='r8' THEN 'crawler' ")
>>>
>>> Followed by converting to a stream:
>>> DataStream<Row> ds = tableEnv.toAppendStream(table_name, Row.Class)
>>> ds.print()
>>>
>>> I can see that while I print this, it takes the length of the longest
>>> string of all the four categories and appends spaces till that length.
>>> Eg. Crawler is of length of the biggest string [social network mayb].
>>>
>>> This affects when my data is sinked to ES. The fields are appended with
>>> empty spaces, which affects the querying.
>>>
>>> I know i can do a simple split to eliminate these spaces in my code, but
>>> just curious to understand why this behavior. It becomes cumbersome to
>>> maintain code this way and for readability as well.
>>>
>>> ~Ramya.
>>>
>>

Reply | Threaded
Open this post in threaded view
|

Re: Flink Table Case Statements appending spaces

Timo Walther-2
RTRIM is not the only solution. It is also possible to cast all literals
of a CASE WHEN statement to VARCHAR, right?

WHEN aggrcat = '0' AND botcode='r4'  THEN CAST('monitoring' AS VARCHAR)

The common datatype of all branches should then be VARCHAR.

Regards,

Timo



Am 04.02.19 um 11:06 schrieb Piotr Nowojski:

> Hi Ramya,
>
> Fabian and Hequn are partially correct. Return type of `CASE WHEN` statement in this case should be CHAR(N), with N being the max of the WHEN branches. However the problem that you are facing is that Flink doesn’t handle CHAR(N) type correctly. There is an open issue for that:
>
> https://issues.apache.org/jira/browse/FLINK-10257 <https://issues.apache.org/jira/browse/FLINK-10257>
>
> One of it’s side effects is that CHAR(x) strings written back to the connectors are incorrectly padded with spaces (this should only happen if the connector itself support CHAR(x) data type and the output table's column is defined as CHAR(x), which is very uncommon). Unfortunately currently the only solution os as Fabian suggested to use RTRIM function.
>
> Piotrek
>
>> On 30 Jan 2019, at 04:38, Hequn Cheng <[hidden email]> wrote:
>>
>> Hi Ramya,
>>
>> Fabian is right. The behavior in strict SQL standard mode(SQL:2003) returns
>> a CHAR(N) type with blank-padded.
>>
>> Best,
>> Hequn
>>
>>
>> On Wed, Jan 30, 2019 at 1:59 AM Fabian Hueske <[hidden email]> wrote:
>>
>>> Hi,
>>>
>>> Table API queries are executed with SQL semantics.
>>> In SQL, the strings are padded because the result data type is not a
>>> VARCHAR but a CHAR with as many character as the longest string.
>>> You can use the RTRIM function to remove the padding whitespaces.
>>>
>>> Best, Fabian
>>>
>>>
>>> Am Di., 29. Jan. 2019 um 15:08 Uhr schrieb Ramya Ramamurthy <
>>> [hidden email]>:
>>>
>>>> Hi,
>>>>
>>>> I have encountered a weird issue. When constructing a Table Query with
>>> CASE
>>>> Statements, like below:
>>>> .append("CASE ")
>>>> .append("WHEN aggrcat = '0' AND botcode='r4'  THEN 'monitoring' ")
>>>> .append("WHEN aggrcat = '1' AND botcode='r4' THEN 'aggregator' ")
>>>> .append("WHEN aggrcat = '2' AND botcode='r4' THEN 'social network' ")
>>>> .append("WHEN botcode='r0' OR botcode='r8' THEN 'crawler' ")
>>>>
>>>> Followed by converting to a stream:
>>>> DataStream<Row> ds = tableEnv.toAppendStream(table_name, Row.Class)
>>>> ds.print()
>>>>
>>>> I can see that while I print this, it takes the length of the longest
>>>> string of all the four categories and appends spaces till that length.
>>>> Eg. Crawler is of length of the biggest string [social network mayb].
>>>>
>>>> This affects when my data is sinked to ES. The fields are appended with
>>>> empty spaces, which affects the querying.
>>>>
>>>> I know i can do a simple split to eliminate these spaces in my code, but
>>>> just curious to understand why this behavior. It becomes cumbersome to
>>>> maintain code this way and for readability as well.
>>>>
>>>> ~Ramya.
>>>>
>