[jira] [Created] (FLINK-22915) Extend Flink ML API to support Estimator/Transformer DAG

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (FLINK-22915) Extend Flink ML API to support Estimator/Transformer DAG

Shang Yuanchun (Jira)
Dong Lin created FLINK-22915:
--------------------------------

             Summary: Extend Flink ML API to support Estimator/Transformer DAG
                 Key: FLINK-22915
                 URL: https://issues.apache.org/jira/browse/FLINK-22915
             Project: Flink
          Issue Type: Improvement
            Reporter: Dong Lin


Currently Flink ML API allows users to compose an Estimator/Transformer from a pipeline (i.e. linear sequence) of Estimator/Transformer. We propose to extend the Flink ML API so that users can compose an Estimator/Transformer from a directed-acyclic-graph (i.e. DAG) of Estimator/Transformer.

This feature is useful for the following use-cases:

1) The preprocessing workflow (shared between training and inference workflows) may involve the join of multiple tables, where the join of two tables can be expressed as a Transformer of 2 inputs and 1 output. And the preprocessing workflow could also involve the spilt operation, where the split operation has 1 input (e.g. the original table) and 2 outputs (e.g. the split of the original table).

The expression of preprocessing workflow involving the join/split operation needs to be expressed as a DAG of Transformer.

2) The graph-embedding algorithm can be expressed as an Estimator, where the Estimator takes as input two tables (e.g. a node table and an edge table). The corresponding Transformer has 1 input (i.e. the node) and 1 output (i.e. the node after embedding)

The expression of training workflow involving the graph-embedding Estimator needs to be expressed as a DAG of Transformer/Estimator.







--
This message was sent by Atlassian Jira
(v8.3.4#803005)