[jira] [Created] (FLINK-20898) Code of BatchExpand & LocalNoGroupingAggregateWithoutKeys grows beyond 64 KB

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (FLINK-20898) Code of BatchExpand & LocalNoGroupingAggregateWithoutKeys grows beyond 64 KB

Shang Yuanchun (Jira)
Sebastian Liu created FLINK-20898:
-------------------------------------

             Summary: Code of BatchExpand & LocalNoGroupingAggregateWithoutKeys grows beyond 64 KB
                 Key: FLINK-20898
                 URL: https://issues.apache.org/jira/browse/FLINK-20898
             Project: Flink
          Issue Type: Bug
          Components: Table SQL / Planner
            Reporter: Sebastian Liu


When we write a complex batch aggregation SQL, the generated code can easily exceed the 64KB size limitation for BatchExpand and LocalNoGroupingAggregateWithoutKeys operator. Especially for the analyze table scenario. 

For a simple sql of
{code:java}
analyze table tpc_ds.call_center compute statistics for all columns{code}
the underlying sql to execute will be:
{code:java}
SELECT CAST(COUNT(1) AS BIGINT),
    CAST(COUNT(DISTINCT `cc_call_center_sk`) AS BIGINT),
    CAST(
        (COUNT(1) - COUNT(`cc_call_center_sk`)) AS BIGINT
    ),
    CAST(8.0 AS DOUBLE),
    CAST(8.0 AS INTEGER),
    CAST(MAX(`cc_call_center_sk`) AS BIGINT),
    CAST(MIN(`cc_call_center_sk`) AS BIGINT),
    CAST(COUNT(DISTINCT `cc_call_center_id`) AS BIGINT),
    CAST(
        (COUNT(1) - COUNT(`cc_call_center_id`)) AS BIGINT
    ),
    CAST(
        AVG(CAST(CHAR_LENGTH(`cc_call_center_id`) AS DOUBLE)) AS DOUBLE
    ),
    CAST(MAX(CHAR_LENGTH(`cc_call_center_id`)) AS INTEGER),
    CAST(MAX(`cc_call_center_id`) AS VARCHAR),
    CAST(MIN(`cc_call_center_id`) AS VARCHAR),
    CAST(COUNT(DISTINCT `cc_rec_start_date`) AS BIGINT),
    CAST(
        (COUNT(1) - COUNT(`cc_rec_start_date`)) AS BIGINT
    ),
    CAST(12.0 AS DOUBLE),
    CAST(12.0 AS INTEGER),
    CAST(MAX(`cc_rec_start_date`) AS DATE),
    CAST(MIN(`cc_rec_start_date`) AS DATE),
    CAST(COUNT(DISTINCT `cc_rec_end_date`) AS BIGINT),
    CAST((COUNT(1) - COUNT(`cc_rec_end_date`)) AS BIGINT),
    CAST(12.0 AS DOUBLE),
    CAST(12.0 AS INTEGER),
    CAST(MAX(`cc_rec_end_date`) AS DATE),
    CAST(MIN(`cc_rec_end_date`) AS DATE),
    CAST(COUNT(DISTINCT `cc_closed_date_sk`) AS BIGINT),
    CAST(
        (COUNT(1) - COUNT(`cc_closed_date_sk`)) AS BIGINT
    ),
    CAST(8.0 AS DOUBLE),
    CAST(8.0 AS INTEGER),
    CAST(MAX(`cc_closed_date_sk`) AS BIGINT),
    CAST(MIN(`cc_closed_date_sk`) AS BIGINT),
    CAST(COUNT(DISTINCT `cc_open_date_sk`) AS BIGINT),
    CAST((COUNT(1) - COUNT(`cc_open_date_sk`)) AS BIGINT),
    CAST(8.0 AS DOUBLE),
    CAST(8.0 AS INTEGER),
    CAST(MAX(`cc_open_date_sk`) AS BIGINT),
    CAST(MIN(`cc_open_date_sk`) AS BIGINT),
    CAST(COUNT(DISTINCT `cc_name`) AS BIGINT),
    CAST((COUNT(1) - COUNT(`cc_name`)) AS BIGINT),
    CAST(
        AVG(CAST(CHAR_LENGTH(`cc_name`) AS DOUBLE)) AS DOUBLE
    ),
    CAST(MAX(CHAR_LENGTH(`cc_name`)) AS INTEGER),
    CAST(MAX(`cc_name`) AS VARCHAR),
    CAST(MIN(`cc_name`) AS VARCHAR),
    CAST(COUNT(DISTINCT `cc_class`) AS BIGINT),
    CAST((COUNT(1) - COUNT(`cc_class`)) AS BIGINT),
    CAST(
        AVG(CAST(CHAR_LENGTH(`cc_class`) AS DOUBLE)) AS DOUBLE
    ),
    CAST(MAX(CHAR_LENGTH(`cc_class`)) AS INTEGER),
    CAST(MAX(`cc_class`) AS VARCHAR),
    CAST(MIN(`cc_class`) AS VARCHAR),
    CAST(COUNT(DISTINCT `cc_employees`) AS BIGINT),
    CAST((COUNT(1) - COUNT(`cc_employees`)) AS BIGINT),
    CAST(4.0 AS DOUBLE),
    CAST(4.0 AS INTEGER),
    CAST(MAX(`cc_employees`) AS INTEGER),
    CAST(MIN(`cc_employees`) AS INTEGER),
    CAST(COUNT(DISTINCT `cc_sq_ft`) AS BIGINT),
    CAST((COUNT(1) - COUNT(`cc_sq_ft`)) AS BIGINT),
    CAST(4.0 AS DOUBLE),
    CAST(4.0 AS INTEGER),
    CAST(MAX(`cc_sq_ft`) AS INTEGER),
    CAST(MIN(`cc_sq_ft`) AS INTEGER),
    CAST(COUNT(DISTINCT `cc_hours`) AS BIGINT),
    CAST((COUNT(1) - COUNT(`cc_hours`)) AS BIGINT),
    CAST(
        AVG(CAST(CHAR_LENGTH(`cc_hours`) AS DOUBLE)) AS DOUBLE
    ),
    CAST(MAX(CHAR_LENGTH(`cc_hours`)) AS INTEGER),
    CAST(MAX(`cc_hours`) AS VARCHAR),
    CAST(MIN(`cc_hours`) AS VARCHAR),
    CAST(COUNT(DISTINCT `cc_manager`) AS BIGINT),
    CAST((COUNT(1) - COUNT(`cc_manager`)) AS BIGINT),
    CAST(
        AVG(CAST(CHAR_LENGTH(`cc_manager`) AS DOUBLE)) AS DOUBLE
    ),
    CAST(MAX(CHAR_LENGTH(`cc_manager`)) AS INTEGER),
    CAST(MAX(`cc_manager`) AS VARCHAR),
    CAST(MIN(`cc_manager`) AS VARCHAR),
    CAST(COUNT(DISTINCT `cc_mkt_id`) AS BIGINT),
    CAST((COUNT(1) - COUNT(`cc_mkt_id`)) AS BIGINT),
    CAST(4.0 AS DOUBLE),
    CAST(4.0 AS INTEGER),
    CAST(MAX(`cc_mkt_id`) AS INTEGER),
    CAST(MIN(`cc_mkt_id`) AS INTEGER),
    CAST(COUNT(DISTINCT `cc_mkt_class`) AS BIGINT),
    CAST((COUNT(1) - COUNT(`cc_mkt_class`)) AS BIGINT),
    CAST(
        AVG(CAST(CHAR_LENGTH(`cc_mkt_class`) AS DOUBLE)) AS DOUBLE
    ),
    CAST(MAX(CHAR_LENGTH(`cc_mkt_class`)) AS INTEGER),
    CAST(MAX(`cc_mkt_class`) AS VARCHAR),
    CAST(MIN(`cc_mkt_class`) AS VARCHAR),
    CAST(COUNT(DISTINCT `cc_mkt_desc`) AS BIGINT),
    CAST((COUNT(1) - COUNT(`cc_mkt_desc`)) AS BIGINT),
    CAST(
        AVG(CAST(CHAR_LENGTH(`cc_mkt_desc`) AS DOUBLE)) AS DOUBLE
    ),
    CAST(MAX(CHAR_LENGTH(`cc_mkt_desc`)) AS INTEGER),
    CAST(MAX(`cc_mkt_desc`) AS VARCHAR),
    CAST(MIN(`cc_mkt_desc`) AS VARCHAR),
    CAST(COUNT(DISTINCT `cc_market_manager`) AS BIGINT),
    CAST(
        (COUNT(1) - COUNT(`cc_market_manager`)) AS BIGINT
    ),
    CAST(
        AVG(CAST(CHAR_LENGTH(`cc_market_manager`) AS DOUBLE)) AS DOUBLE
    ),
    CAST(MAX(CHAR_LENGTH(`cc_market_manager`)) AS INTEGER),
    CAST(MAX(`cc_market_manager`) AS VARCHAR),
    CAST(MIN(`cc_market_manager`) AS VARCHAR),
    CAST(COUNT(DISTINCT `cc_division`) AS BIGINT),
    CAST((COUNT(1) - COUNT(`cc_division`)) AS BIGINT),
    CAST(4.0 AS DOUBLE),
    CAST(4.0 AS INTEGER),
    CAST(MAX(`cc_division`) AS INTEGER),
    CAST(MIN(`cc_division`) AS INTEGER),
    CAST(COUNT(DISTINCT `cc_division_name`) AS BIGINT),
    CAST((COUNT(1) - COUNT(`cc_division_name`)) AS BIGINT),
    CAST(
        AVG(CAST(CHAR_LENGTH(`cc_division_name`) AS DOUBLE)) AS DOUBLE
    ),
    CAST(MAX(CHAR_LENGTH(`cc_division_name`)) AS INTEGER),
    CAST(MAX(`cc_division_name`) AS VARCHAR),
    CAST(MIN(`cc_division_name`) AS VARCHAR),
    CAST(COUNT(DISTINCT `cc_company`) AS BIGINT),
    CAST((COUNT(1) - COUNT(`cc_company`)) AS BIGINT),
    CAST(4.0 AS DOUBLE),
    CAST(4.0 AS INTEGER),
    CAST(MAX(`cc_company`) AS INTEGER),
    CAST(MIN(`cc_company`) AS INTEGER),
    CAST(COUNT(DISTINCT `cc_company_name`) AS BIGINT),
    CAST((COUNT(1) - COUNT(`cc_company_name`)) AS BIGINT),
    CAST(
        AVG(CAST(CHAR_LENGTH(`cc_company_name`) AS DOUBLE)) AS DOUBLE
    ),
    CAST(MAX(CHAR_LENGTH(`cc_company_name`)) AS INTEGER),
    CAST(MAX(`cc_company_name`) AS VARCHAR),
    CAST(MIN(`cc_company_name`) AS VARCHAR),
    CAST(COUNT(DISTINCT `cc_street_number`) AS BIGINT),
    CAST((COUNT(1) - COUNT(`cc_street_number`)) AS BIGINT),
    CAST(
        AVG(CAST(CHAR_LENGTH(`cc_street_number`) AS DOUBLE)) AS DOUBLE
    ),
    CAST(MAX(CHAR_LENGTH(`cc_street_number`)) AS INTEGER),
    CAST(MAX(`cc_street_number`) AS VARCHAR),
    CAST(MIN(`cc_street_number`) AS VARCHAR),
    CAST(COUNT(DISTINCT `cc_street_name`) AS BIGINT),
    CAST((COUNT(1) - COUNT(`cc_street_name`)) AS BIGINT),
    CAST(
        AVG(CAST(CHAR_LENGTH(`cc_street_name`) AS DOUBLE)) AS DOUBLE
    ),
    CAST(MAX(CHAR_LENGTH(`cc_street_name`)) AS INTEGER),
    CAST(MAX(`cc_street_name`) AS VARCHAR),
    CAST(MIN(`cc_street_name`) AS VARCHAR),
    CAST(COUNT(DISTINCT `cc_street_type`) AS BIGINT),
    CAST((COUNT(1) - COUNT(`cc_street_type`)) AS BIGINT),
    CAST(
        AVG(CAST(CHAR_LENGTH(`cc_street_type`) AS DOUBLE)) AS DOUBLE
    ),
    CAST(MAX(CHAR_LENGTH(`cc_street_type`)) AS INTEGER),
    CAST(MAX(`cc_street_type`) AS VARCHAR),
    CAST(MIN(`cc_street_type`) AS VARCHAR),
    CAST(COUNT(DISTINCT `cc_suite_number`) AS BIGINT),
    CAST((COUNT(1) - COUNT(`cc_suite_number`)) AS BIGINT),
    CAST(
        AVG(CAST(CHAR_LENGTH(`cc_suite_number`) AS DOUBLE)) AS DOUBLE
    ),
    CAST(MAX(CHAR_LENGTH(`cc_suite_number`)) AS INTEGER),
    CAST(MAX(`cc_suite_number`) AS VARCHAR),
    CAST(MIN(`cc_suite_number`) AS VARCHAR),
    CAST(COUNT(DISTINCT `cc_city`) AS BIGINT),
    CAST((COUNT(1) - COUNT(`cc_city`)) AS BIGINT),
    CAST(
        AVG(CAST(CHAR_LENGTH(`cc_city`) AS DOUBLE)) AS DOUBLE
    ),
    CAST(MAX(CHAR_LENGTH(`cc_city`)) AS INTEGER),
    CAST(MAX(`cc_city`) AS VARCHAR),
    CAST(MIN(`cc_city`) AS VARCHAR),
    CAST(COUNT(DISTINCT `cc_county`) AS BIGINT),
    CAST((COUNT(1) - COUNT(`cc_county`)) AS BIGINT),
    CAST(
        AVG(CAST(CHAR_LENGTH(`cc_county`) AS DOUBLE)) AS DOUBLE
    ),
    CAST(MAX(CHAR_LENGTH(`cc_county`)) AS INTEGER),
    CAST(MAX(`cc_county`) AS VARCHAR),
    CAST(MIN(`cc_county`) AS VARCHAR),
    CAST(COUNT(DISTINCT `cc_state`) AS BIGINT),
    CAST((COUNT(1) - COUNT(`cc_state`)) AS BIGINT),
    CAST(
        AVG(CAST(CHAR_LENGTH(`cc_state`) AS DOUBLE)) AS DOUBLE
    ),
    CAST(MAX(CHAR_LENGTH(`cc_state`)) AS INTEGER),
    CAST(MAX(`cc_state`) AS VARCHAR),
    CAST(MIN(`cc_state`) AS VARCHAR),
    CAST(COUNT(DISTINCT `cc_zip`) AS BIGINT),
    CAST((COUNT(1) - COUNT(`cc_zip`)) AS BIGINT),
    CAST(
        AVG(CAST(CHAR_LENGTH(`cc_zip`) AS DOUBLE)) AS DOUBLE
    ),
    CAST(MAX(CHAR_LENGTH(`cc_zip`)) AS INTEGER),
    CAST(MAX(`cc_zip`) AS VARCHAR),
    CAST(MIN(`cc_zip`) AS VARCHAR),
    CAST(COUNT(DISTINCT `cc_country`) AS BIGINT),
    CAST((COUNT(1) - COUNT(`cc_country`)) AS BIGINT),
    CAST(
        AVG(CAST(CHAR_LENGTH(`cc_country`) AS DOUBLE)) AS DOUBLE
    ),
    CAST(MAX(CHAR_LENGTH(`cc_country`)) AS INTEGER),
    CAST(MAX(`cc_country`) AS VARCHAR),
    CAST(MIN(`cc_country`) AS VARCHAR),
    CAST(COUNT(DISTINCT `cc_gmt_offset`) AS BIGINT),
    CAST((COUNT(1) - COUNT(`cc_gmt_offset`)) AS BIGINT),
    CAST(8.0 AS DOUBLE),
    CAST(8.0 AS INTEGER),
    CAST(MAX(`cc_gmt_offset`) AS DOUBLE),
    CAST(MIN(`cc_gmt_offset`) AS DOUBLE),
    CAST(COUNT(DISTINCT `cc_tax_percentage`) AS BIGINT),
    CAST(
        (COUNT(1) - COUNT(`cc_tax_percentage`)) AS BIGINT
    ),
    CAST(8.0 AS DOUBLE),
    CAST(8.0 AS INTEGER),
    CAST(MAX(`cc_tax_percentage`) AS DOUBLE),
    CAST(MIN(`cc_tax_percentage`) AS DOUBLE)
FROM `bytedance_hive`.`tpc_ds`.`call_center`
{code}
 For this sql, we will get the following root cause exception for compiling code in TM side.
{code:java}
Caused by: org.codehaus.janino.InternalCompilerException: Code of method "processElement(Lorg/apache/flink/streaming/runtime/streamrecord/StreamRecord;)V" of class "BatchExpand$36757" grows beyond 64 KB
{code}
and
{code:java}
Caused by: org.codehaus.janino.InternalCompilerException: Code of method "processElement(Lorg/apache/flink/streaming/runtime/streamrecord/StreamRecord;)V" of class "LocalNoGroupingAggregateWithoutKeys$34429" grows beyond 64 KB
{code}
We need split the generated code for BatchExpand and LocalNoGroupingAggregateWithoutKeys for complex sql. Just like the ExprCodeGenerator and AggsHandlerCodeGenerator.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)