[jira] [Created] (FLINK-12671) Summarizer: summary statistics for Table

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (FLINK-12671) Summarizer: summary statistics for Table

Shang Yuanchun (Jira)
Xu Yang created FLINK-12671:
-------------------------------

             Summary: Summarizer: summary statistics for Table
                 Key: FLINK-12671
                 URL: https://issues.apache.org/jira/browse/FLINK-12671
             Project: Flink
          Issue Type: Sub-task
            Reporter: Xu Yang
            Assignee: Xu Yang


We provide summary statistics for Table through Summarizer. User can easily get the total count and the basic column-wise metrics: max, min, mean, variance, standardDeviation, normL1, normL2, the number of missing values and the number of valid values.

SparkML has same function, [http://spark.apache.org/docs/latest/ml-statistics.html#summarizer]

 

 

Example:

 

Table input = … 

TableSummary summary = *new* Summarizer(_input_).collectResult();

System.*_out_*.println(summary.mean(*"age"*));  // print the mean of the column(Name: “age”)

System.out.println(summary);

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)