Sebastian Liu created FLINK-20898:
------------------------------------- Summary: Code of BatchExpand & LocalNoGroupingAggregateWithoutKeys grows beyond 64 KB Key: FLINK-20898 URL: https://issues.apache.org/jira/browse/FLINK-20898 Project: Flink Issue Type: Bug Components: Table SQL / Planner Reporter: Sebastian Liu When we write a complex batch aggregation SQL, the generated code can easily exceed the 64KB size limitation for BatchExpand and LocalNoGroupingAggregateWithoutKeys operator. Especially for the analyze table scenario. For a simple sql of {code:java} analyze table tpc_ds.call_center compute statistics for all columns{code} the underlying sql to execute will be: {code:java} SELECT CAST(COUNT(1) AS BIGINT), CAST(COUNT(DISTINCT `cc_call_center_sk`) AS BIGINT), CAST( (COUNT(1) - COUNT(`cc_call_center_sk`)) AS BIGINT ), CAST(8.0 AS DOUBLE), CAST(8.0 AS INTEGER), CAST(MAX(`cc_call_center_sk`) AS BIGINT), CAST(MIN(`cc_call_center_sk`) AS BIGINT), CAST(COUNT(DISTINCT `cc_call_center_id`) AS BIGINT), CAST( (COUNT(1) - COUNT(`cc_call_center_id`)) AS BIGINT ), CAST( AVG(CAST(CHAR_LENGTH(`cc_call_center_id`) AS DOUBLE)) AS DOUBLE ), CAST(MAX(CHAR_LENGTH(`cc_call_center_id`)) AS INTEGER), CAST(MAX(`cc_call_center_id`) AS VARCHAR), CAST(MIN(`cc_call_center_id`) AS VARCHAR), CAST(COUNT(DISTINCT `cc_rec_start_date`) AS BIGINT), CAST( (COUNT(1) - COUNT(`cc_rec_start_date`)) AS BIGINT ), CAST(12.0 AS DOUBLE), CAST(12.0 AS INTEGER), CAST(MAX(`cc_rec_start_date`) AS DATE), CAST(MIN(`cc_rec_start_date`) AS DATE), CAST(COUNT(DISTINCT `cc_rec_end_date`) AS BIGINT), CAST((COUNT(1) - COUNT(`cc_rec_end_date`)) AS BIGINT), CAST(12.0 AS DOUBLE), CAST(12.0 AS INTEGER), CAST(MAX(`cc_rec_end_date`) AS DATE), CAST(MIN(`cc_rec_end_date`) AS DATE), CAST(COUNT(DISTINCT `cc_closed_date_sk`) AS BIGINT), CAST( (COUNT(1) - COUNT(`cc_closed_date_sk`)) AS BIGINT ), CAST(8.0 AS DOUBLE), CAST(8.0 AS INTEGER), CAST(MAX(`cc_closed_date_sk`) AS BIGINT), CAST(MIN(`cc_closed_date_sk`) AS BIGINT), CAST(COUNT(DISTINCT `cc_open_date_sk`) AS BIGINT), CAST((COUNT(1) - COUNT(`cc_open_date_sk`)) AS BIGINT), CAST(8.0 AS DOUBLE), CAST(8.0 AS INTEGER), CAST(MAX(`cc_open_date_sk`) AS BIGINT), CAST(MIN(`cc_open_date_sk`) AS BIGINT), CAST(COUNT(DISTINCT `cc_name`) AS BIGINT), CAST((COUNT(1) - COUNT(`cc_name`)) AS BIGINT), CAST( AVG(CAST(CHAR_LENGTH(`cc_name`) AS DOUBLE)) AS DOUBLE ), CAST(MAX(CHAR_LENGTH(`cc_name`)) AS INTEGER), CAST(MAX(`cc_name`) AS VARCHAR), CAST(MIN(`cc_name`) AS VARCHAR), CAST(COUNT(DISTINCT `cc_class`) AS BIGINT), CAST((COUNT(1) - COUNT(`cc_class`)) AS BIGINT), CAST( AVG(CAST(CHAR_LENGTH(`cc_class`) AS DOUBLE)) AS DOUBLE ), CAST(MAX(CHAR_LENGTH(`cc_class`)) AS INTEGER), CAST(MAX(`cc_class`) AS VARCHAR), CAST(MIN(`cc_class`) AS VARCHAR), CAST(COUNT(DISTINCT `cc_employees`) AS BIGINT), CAST((COUNT(1) - COUNT(`cc_employees`)) AS BIGINT), CAST(4.0 AS DOUBLE), CAST(4.0 AS INTEGER), CAST(MAX(`cc_employees`) AS INTEGER), CAST(MIN(`cc_employees`) AS INTEGER), CAST(COUNT(DISTINCT `cc_sq_ft`) AS BIGINT), CAST((COUNT(1) - COUNT(`cc_sq_ft`)) AS BIGINT), CAST(4.0 AS DOUBLE), CAST(4.0 AS INTEGER), CAST(MAX(`cc_sq_ft`) AS INTEGER), CAST(MIN(`cc_sq_ft`) AS INTEGER), CAST(COUNT(DISTINCT `cc_hours`) AS BIGINT), CAST((COUNT(1) - COUNT(`cc_hours`)) AS BIGINT), CAST( AVG(CAST(CHAR_LENGTH(`cc_hours`) AS DOUBLE)) AS DOUBLE ), CAST(MAX(CHAR_LENGTH(`cc_hours`)) AS INTEGER), CAST(MAX(`cc_hours`) AS VARCHAR), CAST(MIN(`cc_hours`) AS VARCHAR), CAST(COUNT(DISTINCT `cc_manager`) AS BIGINT), CAST((COUNT(1) - COUNT(`cc_manager`)) AS BIGINT), CAST( AVG(CAST(CHAR_LENGTH(`cc_manager`) AS DOUBLE)) AS DOUBLE ), CAST(MAX(CHAR_LENGTH(`cc_manager`)) AS INTEGER), CAST(MAX(`cc_manager`) AS VARCHAR), CAST(MIN(`cc_manager`) AS VARCHAR), CAST(COUNT(DISTINCT `cc_mkt_id`) AS BIGINT), CAST((COUNT(1) - COUNT(`cc_mkt_id`)) AS BIGINT), CAST(4.0 AS DOUBLE), CAST(4.0 AS INTEGER), CAST(MAX(`cc_mkt_id`) AS INTEGER), CAST(MIN(`cc_mkt_id`) AS INTEGER), CAST(COUNT(DISTINCT `cc_mkt_class`) AS BIGINT), CAST((COUNT(1) - COUNT(`cc_mkt_class`)) AS BIGINT), CAST( AVG(CAST(CHAR_LENGTH(`cc_mkt_class`) AS DOUBLE)) AS DOUBLE ), CAST(MAX(CHAR_LENGTH(`cc_mkt_class`)) AS INTEGER), CAST(MAX(`cc_mkt_class`) AS VARCHAR), CAST(MIN(`cc_mkt_class`) AS VARCHAR), CAST(COUNT(DISTINCT `cc_mkt_desc`) AS BIGINT), CAST((COUNT(1) - COUNT(`cc_mkt_desc`)) AS BIGINT), CAST( AVG(CAST(CHAR_LENGTH(`cc_mkt_desc`) AS DOUBLE)) AS DOUBLE ), CAST(MAX(CHAR_LENGTH(`cc_mkt_desc`)) AS INTEGER), CAST(MAX(`cc_mkt_desc`) AS VARCHAR), CAST(MIN(`cc_mkt_desc`) AS VARCHAR), CAST(COUNT(DISTINCT `cc_market_manager`) AS BIGINT), CAST( (COUNT(1) - COUNT(`cc_market_manager`)) AS BIGINT ), CAST( AVG(CAST(CHAR_LENGTH(`cc_market_manager`) AS DOUBLE)) AS DOUBLE ), CAST(MAX(CHAR_LENGTH(`cc_market_manager`)) AS INTEGER), CAST(MAX(`cc_market_manager`) AS VARCHAR), CAST(MIN(`cc_market_manager`) AS VARCHAR), CAST(COUNT(DISTINCT `cc_division`) AS BIGINT), CAST((COUNT(1) - COUNT(`cc_division`)) AS BIGINT), CAST(4.0 AS DOUBLE), CAST(4.0 AS INTEGER), CAST(MAX(`cc_division`) AS INTEGER), CAST(MIN(`cc_division`) AS INTEGER), CAST(COUNT(DISTINCT `cc_division_name`) AS BIGINT), CAST((COUNT(1) - COUNT(`cc_division_name`)) AS BIGINT), CAST( AVG(CAST(CHAR_LENGTH(`cc_division_name`) AS DOUBLE)) AS DOUBLE ), CAST(MAX(CHAR_LENGTH(`cc_division_name`)) AS INTEGER), CAST(MAX(`cc_division_name`) AS VARCHAR), CAST(MIN(`cc_division_name`) AS VARCHAR), CAST(COUNT(DISTINCT `cc_company`) AS BIGINT), CAST((COUNT(1) - COUNT(`cc_company`)) AS BIGINT), CAST(4.0 AS DOUBLE), CAST(4.0 AS INTEGER), CAST(MAX(`cc_company`) AS INTEGER), CAST(MIN(`cc_company`) AS INTEGER), CAST(COUNT(DISTINCT `cc_company_name`) AS BIGINT), CAST((COUNT(1) - COUNT(`cc_company_name`)) AS BIGINT), CAST( AVG(CAST(CHAR_LENGTH(`cc_company_name`) AS DOUBLE)) AS DOUBLE ), CAST(MAX(CHAR_LENGTH(`cc_company_name`)) AS INTEGER), CAST(MAX(`cc_company_name`) AS VARCHAR), CAST(MIN(`cc_company_name`) AS VARCHAR), CAST(COUNT(DISTINCT `cc_street_number`) AS BIGINT), CAST((COUNT(1) - COUNT(`cc_street_number`)) AS BIGINT), CAST( AVG(CAST(CHAR_LENGTH(`cc_street_number`) AS DOUBLE)) AS DOUBLE ), CAST(MAX(CHAR_LENGTH(`cc_street_number`)) AS INTEGER), CAST(MAX(`cc_street_number`) AS VARCHAR), CAST(MIN(`cc_street_number`) AS VARCHAR), CAST(COUNT(DISTINCT `cc_street_name`) AS BIGINT), CAST((COUNT(1) - COUNT(`cc_street_name`)) AS BIGINT), CAST( AVG(CAST(CHAR_LENGTH(`cc_street_name`) AS DOUBLE)) AS DOUBLE ), CAST(MAX(CHAR_LENGTH(`cc_street_name`)) AS INTEGER), CAST(MAX(`cc_street_name`) AS VARCHAR), CAST(MIN(`cc_street_name`) AS VARCHAR), CAST(COUNT(DISTINCT `cc_street_type`) AS BIGINT), CAST((COUNT(1) - COUNT(`cc_street_type`)) AS BIGINT), CAST( AVG(CAST(CHAR_LENGTH(`cc_street_type`) AS DOUBLE)) AS DOUBLE ), CAST(MAX(CHAR_LENGTH(`cc_street_type`)) AS INTEGER), CAST(MAX(`cc_street_type`) AS VARCHAR), CAST(MIN(`cc_street_type`) AS VARCHAR), CAST(COUNT(DISTINCT `cc_suite_number`) AS BIGINT), CAST((COUNT(1) - COUNT(`cc_suite_number`)) AS BIGINT), CAST( AVG(CAST(CHAR_LENGTH(`cc_suite_number`) AS DOUBLE)) AS DOUBLE ), CAST(MAX(CHAR_LENGTH(`cc_suite_number`)) AS INTEGER), CAST(MAX(`cc_suite_number`) AS VARCHAR), CAST(MIN(`cc_suite_number`) AS VARCHAR), CAST(COUNT(DISTINCT `cc_city`) AS BIGINT), CAST((COUNT(1) - COUNT(`cc_city`)) AS BIGINT), CAST( AVG(CAST(CHAR_LENGTH(`cc_city`) AS DOUBLE)) AS DOUBLE ), CAST(MAX(CHAR_LENGTH(`cc_city`)) AS INTEGER), CAST(MAX(`cc_city`) AS VARCHAR), CAST(MIN(`cc_city`) AS VARCHAR), CAST(COUNT(DISTINCT `cc_county`) AS BIGINT), CAST((COUNT(1) - COUNT(`cc_county`)) AS BIGINT), CAST( AVG(CAST(CHAR_LENGTH(`cc_county`) AS DOUBLE)) AS DOUBLE ), CAST(MAX(CHAR_LENGTH(`cc_county`)) AS INTEGER), CAST(MAX(`cc_county`) AS VARCHAR), CAST(MIN(`cc_county`) AS VARCHAR), CAST(COUNT(DISTINCT `cc_state`) AS BIGINT), CAST((COUNT(1) - COUNT(`cc_state`)) AS BIGINT), CAST( AVG(CAST(CHAR_LENGTH(`cc_state`) AS DOUBLE)) AS DOUBLE ), CAST(MAX(CHAR_LENGTH(`cc_state`)) AS INTEGER), CAST(MAX(`cc_state`) AS VARCHAR), CAST(MIN(`cc_state`) AS VARCHAR), CAST(COUNT(DISTINCT `cc_zip`) AS BIGINT), CAST((COUNT(1) - COUNT(`cc_zip`)) AS BIGINT), CAST( AVG(CAST(CHAR_LENGTH(`cc_zip`) AS DOUBLE)) AS DOUBLE ), CAST(MAX(CHAR_LENGTH(`cc_zip`)) AS INTEGER), CAST(MAX(`cc_zip`) AS VARCHAR), CAST(MIN(`cc_zip`) AS VARCHAR), CAST(COUNT(DISTINCT `cc_country`) AS BIGINT), CAST((COUNT(1) - COUNT(`cc_country`)) AS BIGINT), CAST( AVG(CAST(CHAR_LENGTH(`cc_country`) AS DOUBLE)) AS DOUBLE ), CAST(MAX(CHAR_LENGTH(`cc_country`)) AS INTEGER), CAST(MAX(`cc_country`) AS VARCHAR), CAST(MIN(`cc_country`) AS VARCHAR), CAST(COUNT(DISTINCT `cc_gmt_offset`) AS BIGINT), CAST((COUNT(1) - COUNT(`cc_gmt_offset`)) AS BIGINT), CAST(8.0 AS DOUBLE), CAST(8.0 AS INTEGER), CAST(MAX(`cc_gmt_offset`) AS DOUBLE), CAST(MIN(`cc_gmt_offset`) AS DOUBLE), CAST(COUNT(DISTINCT `cc_tax_percentage`) AS BIGINT), CAST( (COUNT(1) - COUNT(`cc_tax_percentage`)) AS BIGINT ), CAST(8.0 AS DOUBLE), CAST(8.0 AS INTEGER), CAST(MAX(`cc_tax_percentage`) AS DOUBLE), CAST(MIN(`cc_tax_percentage`) AS DOUBLE) FROM `bytedance_hive`.`tpc_ds`.`call_center` {code} For this sql, we will get the following root cause exception for compiling code in TM side. {code:java} Caused by: org.codehaus.janino.InternalCompilerException: Code of method "processElement(Lorg/apache/flink/streaming/runtime/streamrecord/StreamRecord;)V" of class "BatchExpand$36757" grows beyond 64 KB {code} and {code:java} Caused by: org.codehaus.janino.InternalCompilerException: Code of method "processElement(Lorg/apache/flink/streaming/runtime/streamrecord/StreamRecord;)V" of class "LocalNoGroupingAggregateWithoutKeys$34429" grows beyond 64 KB {code} We need split the generated code for BatchExpand and LocalNoGroupingAggregateWithoutKeys for complex sql. Just like the ExprCodeGenerator and AggsHandlerCodeGenerator. -- This message was sent by Atlassian Jira (v8.3.4#803005) |
Free forum by Nabble | Edit this page |