[jira] [Created] (FLINK-12983) Replace descriptive histogram's storage back-end

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (FLINK-12983) Replace descriptive histogram's storage back-end

Shang Yuanchun (Jira)
Nico Kruber created FLINK-12983:
-----------------------------------

             Summary: Replace descriptive histogram's storage back-end
                 Key: FLINK-12983
                 URL: https://issues.apache.org/jira/browse/FLINK-12983
             Project: Flink
          Issue Type: Sub-task
          Components: Runtime / Metrics
            Reporter: Nico Kruber
            Assignee: Nico Kruber


{{DescriptiveStatistics}} relies on their {{ResizableDoubleArray}} for storing double values for their histograms. However, this is constantly resizing an internal array and seems to have quite some overhead.

Additionally, we're not using {{SynchronizedDescriptiveStatistics}} which, according to its docs, we should. Currently, we seem to be somewhat safe because {{ResizableDoubleArray}} has some synchronized parts but these are scheduled to go away with commons.math version 4.

Internal tests with the current implementation, one based on a linear array of twice the histogram size (and moving values back to the start once the window reaches the end), and one using a circular array (wrapping around with flexible start position) has shown these numbers using the optimised code from FLINK-10236, FLINK-12981, and FLINK-12982:
# only adding values to the histogram
{code}
Benchmark                                       Mode  Cnt      Score        Error   Units
HistogramBenchmarks.dropwizardHistogramAdd     thrpt   30   47985.359 ±    25.847  ops/ms
HistogramBenchmarks.descriptiveHistogramAdd    thrpt   30   70158.792 ±   276.858  ops/ms
--- with FLINK-10236, FLINK-12981, and FLINK-12982 ---
HistogramBenchmarks.descriptiveHistogramAdd    thrpt   30   75303.040 ±   475.355  ops/ms
HistogramBenchmarks.histogramCircularArrayAdd  thrpt   30  790123.475 ± 48420.672  ops/ms
HistogramBenchmarks.histogramLinearArrayAdd    thrpt   30  385126.074 ±  3038.773  ops/ms
{code}
# after adding each value, also retrieving a common set of metrics:
{code}
Benchmark                                       Mode  Cnt      Score        Error   Units
HistogramBenchmarks.dropwizardHistogram        thrpt   30     400.274 ±     4.930  ops/ms
HistogramBenchmarks.descriptiveHistogram       thrpt   30     124.533 ±     1.060  ops/ms
--- with FLINK-10236, FLINK-12981, and FLINK-12982 ---
HistogramBenchmarks.descriptiveHistogram       thrpt   30     251.895 ±     1.809  ops/ms
HistogramBenchmarks.histogramCircularArray     thrpt   30     298.881 ±    10.027  ops/ms
HistogramBenchmarks.histogramLinearArray       thrpt   30     234.380 ±     5.014  ops/ms
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)