(DEPRECATED) Apache Flink Mailing List archive.

Object reuse documentation should be improved

Classic

List

Threaded

5 messages Options

Gábor Gévay

Object reuse documentation should be improved

Hello,

I find the documentation about object reuse [1] very confusing. I
started a Google Doc [2] about clarifying/rewriting it.

First, it states four questions that I think should have answers
stated explicitly in the documentation, and then lists some concrete
problems (ambiguities) in the current text.

[1] https://ci.apache.org/projects/flink/flink-docs-master/apis/programming_guide.html#object-reuse-behavior
[2] https://docs.google.com/document/d/1cgkuttvmj4jUonG7E2RdFVjKlfQDm_hE6gvFcgAfzXg/edit?usp=sharing

Best,
Gabor

Aljoscha Krettek-2

Re: Object reuse documentation should be improved

Good write up. You could extend the Table of 1) a/b 2) a/b at the top with “chaining” (but you already know this, I guess). Chaining changes all of these and I think it can be tricky to know whether stuff is chained or not (for users, and even for us developers…).

> On 13 Dec 2015, at 19:24, Gábor Gévay <[hidden email]> wrote:
>
> Hello,
>
> I find the documentation about object reuse [1] very confusing. I
> started a Google Doc [2] about clarifying/rewriting it.
>
> First, it states four questions that I think should have answers
> stated explicitly in the documentation, and then lists some concrete
> problems (ambiguities) in the current text.
>
> [1] https://ci.apache.org/projects/flink/flink-docs-master/apis/programming_guide.html#object-reuse-behavior
> [2] https://docs.google.com/document/d/1cgkuttvmj4jUonG7E2RdFVjKlfQDm_hE6gvFcgAfzXg/edit?usp=sharing
>
> Best,
> Gabor

Gábor Gévay

Re: Object reuse documentation should be improved

I guess chaining happens so often, that we should just write this doc
assuming that there is chaining, and not even describe the rules for
the non-chaining case. I mean I would never risk writing a UDF that
only works when there is no chaining, and then constantly worry about
when do I accidentally introduce chaining. Or what do you think?

Best,
Gábor

2015-12-14 11:33 GMT+01:00 Aljoscha Krettek <[hidden email]>:

> Good write up. You could extend the Table of 1) a/b 2) a/b at the top with “chaining” (but you already know this, I guess). Chaining changes all of these and I think it can be tricky to know whether stuff is chained or not (for users, and even for us developers…).
>
>
>> On 13 Dec 2015, at 19:24, Gábor Gévay <[hidden email]> wrote:
>>
>> Hello,
>>
>> I find the documentation about object reuse [1] very confusing. I
>> started a Google Doc [2] about clarifying/rewriting it.
>>
>> First, it states four questions that I think should have answers
>> stated explicitly in the documentation, and then lists some concrete
>> problems (ambiguities) in the current text.
>>
>> [1] https://ci.apache.org/projects/flink/flink-docs-master/apis/programming_guide.html#object-reuse-behavior
>> [2] https://docs.google.com/document/d/1cgkuttvmj4jUonG7E2RdFVjKlfQDm_hE6gvFcgAfzXg/edit?usp=sharing
>>
>> Best,
>> Gabor
>

Márton Balassi

Re: Object reuse documentation should be improved

Thanks for writing this up, Gábor. As Aljoscha suggested chaining changes
all of these and makes it very tricky to work with these which should be
clearly documented. That was the reason while some time ago the streaming
API always copied the output of a UDF by default to avoid this ambiguous
cases. Now this copying is omitted for performance reasons.

On Mon, Dec 14, 2015 at 1:15 PM, Gábor Gévay <[hidden email]> wrote:

> I guess chaining happens so often, that we should just write this doc
> assuming that there is chaining, and not even describe the rules for
> the non-chaining case. I mean I would never risk writing a UDF that
> only works when there is no chaining, and then constantly worry about
> when do I accidentally introduce chaining. Or what do you think?
>
> Best,
> Gábor
>
>
>
>
> 2015-12-14 11:33 GMT+01:00 Aljoscha Krettek <[hidden email]>:
> > Good write up. You could extend the Table of 1) a/b 2) a/b at the top
> with “chaining” (but you already know this, I guess). Chaining changes all
> of these and I think it can be tricky to know whether stuff is chained or
> not (for users, and even for us developers…).
> >
> >
> >> On 13 Dec 2015, at 19:24, Gábor Gévay <[hidden email]> wrote:
> >>
> >> Hello,
> >>
> >> I find the documentation about object reuse [1] very confusing. I
> >> started a Google Doc [2] about clarifying/rewriting it.
> >>
> >> First, it states four questions that I think should have answers
> >> stated explicitly in the documentation, and then lists some concrete
> >> problems (ambiguities) in the current text.
> >>
> >> [1]
> https://ci.apache.org/projects/flink/flink-docs-master/apis/programming_guide.html#object-reuse-behavior
> >> [2]
> https://docs.google.com/document/d/1cgkuttvmj4jUonG7E2RdFVjKlfQDm_hE6gvFcgAfzXg/edit?usp=sharing
> >>
> >> Best,
> >> Gabor
> >
>

Aljoscha Krettek-2

Re: Object reuse documentation should be improved

If I’m not mistaken copying is still performed in the streaming API by default.

> On 14 Dec 2015, at 13:20, Márton Balassi <[hidden email]> wrote:
>
> Thanks for writing this up, Gábor. As Aljoscha suggested chaining changes
> all of these and makes it very tricky to work with these which should be
> clearly documented. That was the reason while some time ago the streaming
> API always copied the output of a UDF by default to avoid this ambiguous
> cases. Now this copying is omitted for performance reasons.
>
> On Mon, Dec 14, 2015 at 1:15 PM, Gábor Gévay <[hidden email]> wrote:
>
>> I guess chaining happens so often, that we should just write this doc
>> assuming that there is chaining, and not even describe the rules for
>> the non-chaining case. I mean I would never risk writing a UDF that
>> only works when there is no chaining, and then constantly worry about
>> when do I accidentally introduce chaining. Or what do you think?
>>
>> Best,
>> Gábor
>>
>>
>>
>>
>> 2015-12-14 11:33 GMT+01:00 Aljoscha Krettek <[hidden email]>:
>>> Good write up. You could extend the Table of 1) a/b 2) a/b at the top
>> with “chaining” (but you already know this, I guess). Chaining changes all
>> of these and I think it can be tricky to know whether stuff is chained or
>> not (for users, and even for us developers…).
>>>
>>>
>>>> On 13 Dec 2015, at 19:24, Gábor Gévay <[hidden email]> wrote:
>>>>
>>>> Hello,
>>>>
>>>> I find the documentation about object reuse [1] very confusing. I
>>>> started a Google Doc [2] about clarifying/rewriting it.
>>>>
>>>> First, it states four questions that I think should have answers
>>>> stated explicitly in the documentation, and then lists some concrete
>>>> problems (ambiguities) in the current text.
>>>>
>>>> [1]
>> https://ci.apache.org/projects/flink/flink-docs-master/apis/programming_guide.html#object-reuse-behavior
>>>> [2]
>> https://docs.google.com/document/d/1cgkuttvmj4jUonG7E2RdFVjKlfQDm_hE6gvFcgAfzXg/edit?usp=sharing
>>>>
>>>> Best,
>>>> Gabor
>>>
>>