Chesnay Schepler created FLINK-3751:
---------------------------------------
Summary: default Operator names are inconsistent
Key: FLINK-3751
URL:
https://issues.apache.org/jira/browse/FLINK-3751 Project: Flink
Issue Type: Bug
Components: DataSet API, DataStream API
Affects Versions: 1.0.1
Reporter: Chesnay Schepler
Priority: Minor
h3. The Problem
If a user doesn't name an operator explicitly (generally using the name() method) then Flink auto generates a name. These generated names are really (like, _really_) inconsistent within and across API's.
In the batch API non-source/-sink operator names are _generally_ formed like this:
{code}FlatMap (FlatMap at main(WordCount.java:81)){code}
We have
* FlatMap, describing the runtime operator type
* another FlatMap, describing which user-call created this operator
* main(WordCount.java:81), describing the call location
This already falls apart when you have a DataSource, which looks like this:
{code}DataSource (at getDefaultTextLineDataSet(WordCountData.java:70) (org.apache.flink.CollectionInputFormat){code}
It is missing the call that created the sink (fromElements()) and suddenly includes the inputFormat name.
Sink are a different story yet again, since collect() is displayed as
{code} DataSink (collect()) {code}
which is missing the call location.
Then we have the Streaming API where things are named completely different as well:
The fromElements source is displayed as
{code} Source: Collection Source {code}
non-source/-sink operators are displayed simply as their runtime operator type
{code} FlatMap {code}
and sinks, at times, do not have a name at all.
{code} Sink: Unnamed {code}
To put the cherry on top, chains are displayed in the Batch API as
{code} CHAIN <operator> -> <operator> {code}
while in the Streaming API we lost the CHAIN keyword
{code} <operator> -> <operator> {code}
Considering that these names are right in the users face via the Dashboard we should try to homogenize them a bit.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)