[DISCUSS] Generating java code?

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

[DISCUSS] Generating java code?

Niels Basjes
Hi,

I'm working on https://issues.apache.org/jira/browse/FLINK-16871 to make
more build time variables (like the scala version) into the code available
at runtime.

During the review process there was discussion around a basic question: *Is
generating java code during the build ok?*
See

   - https://github.com/apache/flink/pull/11245#discussion_r400035133
   - https://github.com/apache/flink/pull/11592
   - https://github.com/apache/flink/pull/11592#issuecomment-610282450

As suggested by Chesnay Schepler I'm putting this question to the mailing
list.
  https://github.com/apache/flink/pull/11592#issuecomment-610963947

The main discussion was around the ease of use when running in an IDE like
IntelliJ.

So essentially we have two solution directions available:

   1. *Generate a properties file and then use the classloader to load this
   file as a resource and then parse it as a property file.*
   This is the currently used solution direction for this part of the code.
   A rough version of this (to be improved) :
   https://github.com/apache/flink/commit/47099f663b7644056e9d87b262cd4dba034f513e
   This method has several effects:
      1. The developer can run the project immediately from within the IDE
      as fallback values are provided if the 'actual' values are missing.
      2. This property file (with stuff that should never be overwritten)
      can be modified by placing a different one in the classpath. In
fact it IS
      modified in the flink-dist as it generates a new file with the same name
      into the binary distribution (I consider this to be bad).
      3. Loading resources means loading, parsing and a lot of error
      handling. Lots of things "can be null" or  be a default value. So the
      values are unreliable and lots of code needs to handle this. In fact when
      running from IntelliJ the properties file is generated poorly most of the
      time, only during a normal maven build will it work correctly.
   2. *Generate a Java source file and then simply compile this and make it
   part of the project.*
   Essentially the same model as you would have when using Apache Avro,
   Protobuf, Antlr 4 and javacc (several of those are used in Flink!).
   A rough version of this (to be improved) :
   https://github.com/apache/flink/commit/d215e4df60dc9d647dcee1aa9a2114cbf49d0566
   This method has several effects:
   1. The developer MUST run 'mvn generate-sources' before the actual the
      project immediately from within the IDE as fallback values are
provided if
      the 'actual' values are missing.
      2. The code/test will not run until this step is done.
      3. Because the file is generated by a plugin it is always correct. As
      a consequence all variables are always available and the downstream users
      no longer have to handle the "can be null" or "default value" situations.

So is generating code similar to what I created a desired change?
My opinion is that it is the better solution, the data available is more
reliable and as a consequence the rest of the code is simpler.
It would probably mean that during development of flink you should be aware
of this and do an 'extra step' to get it running.

What do you guys think?

--
Best regards / Met vriendelijke groeten,

Niels Basjes
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Generating java code?

Yang Wang
Hi Niels,

Thanks a lot for starting this discussion. Although the latter option is
more stable,
i think it is not acceptable for all the developers to execute `mvn
generate-sources`
first. Otherwise, the Flink project is just broken and could not run tests,
Flink jobs
in IDE.

So i think the version properties file is enough for now. +1 for the first
option.

Best,
Yang

Niels Basjes <[hidden email]> 于2020年4月9日周四 下午4:47写道:

> Hi,
>
> I'm working on https://issues.apache.org/jira/browse/FLINK-16871 to make
> more build time variables (like the scala version) into the code available
> at runtime.
>
> During the review process there was discussion around a basic question: *Is
> generating java code during the build ok?*
> See
>
>    - https://github.com/apache/flink/pull/11245#discussion_r400035133
>    - https://github.com/apache/flink/pull/11592
>    - https://github.com/apache/flink/pull/11592#issuecomment-610282450
>
> As suggested by Chesnay Schepler I'm putting this question to the mailing
> list.
>   https://github.com/apache/flink/pull/11592#issuecomment-610963947
>
> The main discussion was around the ease of use when running in an IDE like
> IntelliJ.
>
> So essentially we have two solution directions available:
>
>    1. *Generate a properties file and then use the classloader to load this
>    file as a resource and then parse it as a property file.*
>    This is the currently used solution direction for this part of the code.
>    A rough version of this (to be improved) :
>
> https://github.com/apache/flink/commit/47099f663b7644056e9d87b262cd4dba034f513e
>    This method has several effects:
>       1. The developer can run the project immediately from within the IDE
>       as fallback values are provided if the 'actual' values are missing.
>       2. This property file (with stuff that should never be overwritten)
>       can be modified by placing a different one in the classpath. In
> fact it IS
>       modified in the flink-dist as it generates a new file with the same
> name
>       into the binary distribution (I consider this to be bad).
>       3. Loading resources means loading, parsing and a lot of error
>       handling. Lots of things "can be null" or  be a default value. So the
>       values are unreliable and lots of code needs to handle this. In fact
> when
>       running from IntelliJ the properties file is generated poorly most
> of the
>       time, only during a normal maven build will it work correctly.
>    2. *Generate a Java source file and then simply compile this and make it
>    part of the project.*
>    Essentially the same model as you would have when using Apache Avro,
>    Protobuf, Antlr 4 and javacc (several of those are used in Flink!).
>    A rough version of this (to be improved) :
>
> https://github.com/apache/flink/commit/d215e4df60dc9d647dcee1aa9a2114cbf49d0566
>    This method has several effects:
>    1. The developer MUST run 'mvn generate-sources' before the actual the
>       project immediately from within the IDE as fallback values are
> provided if
>       the 'actual' values are missing.
>       2. The code/test will not run until this step is done.
>       3. Because the file is generated by a plugin it is always correct. As
>       a consequence all variables are always available and the downstream
> users
>       no longer have to handle the "can be null" or "default value"
> situations.
>
> So is generating code similar to what I created a desired change?
> My opinion is that it is the better solution, the data available is more
> reliable and as a consequence the rest of the code is simpler.
> It would probably mean that during development of flink you should be aware
> of this and do an 'extra step' to get it running.
>
> What do you guys think?
>
> --
> Best regards / Met vriendelijke groeten,
>
> Niels Basjes
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Generating java code?

Niels Basjes
Hi,

On Mon, Apr 13, 2020 at 8:30 AM Yang Wang <[hidden email]> wrote:

> Although the latter option is more stable,
> i think it is not acceptable for all the developers to execute `mvn
> generate-sources` first.

Otherwise, the Flink project is just broken and could not run tests, Flink
> jobs in IDE.
>
>
It is important to realize that this discussion is essentially around the
fact that systems like IntelliJ (what I use) execute various ways of
generating code that have been written in a maven pom.xml in different
ways. Simply put: They don't execute all instructions that have been
defined in the pom.xml automatically when needed.

   - With the properties file variant (which uses resource filtering)
   IntelliJ does resource filtering automatically yet no maven plugins are
   run that affect the properties that are replaced.
   So the git variables are NOT populated when running in IntelliJ and I
   have seen several values in the propersites file are nonsense.
   As a consequence the Java code reading those needs to catch things like
   "missing properties file" and various kinds of "nonsense values" and
   replace them with a workable default.
   So when running this code you are actually running something different
   from what will be run when doing a `mvn clean install`, yet the developer
   is under the illusion it all works because of the code that catches all of
   those problems.
   Depending on the way these variables are used this may lead to "It fails
   in production but it works fine in IntelliJ" situations.
   In my mind the code that catches all of those exceptional situations in
   poorly generated properties only exists so that developers can run the
   (otherwise perfectly fine) software in a "broken" build/development
   environment.


   - With the way I propose to generate the code the effect is indeed
   slightly harder: If you do not run the pom.xml based code generation it
   will not work.
   I understand that this requires the developers to think a bit more about
   the code they are working on.
   The upside is that the code either works perfectly or does not compile.
   There is no "it compiles but is really nonsense".
   I prefer this.
   Also at this point in time this is already true for Flink: There are
   quite a few modules where the developer MUST run mvn generate-sources for
   all tests to run successfully.
   Like I indicated there is a javacc SQL parser and there are a lot of
   tests that rely on generating Avro code before they are able to run.
   I have several projects where systems like Avro and Antlr force me in
   this direction, there is simply no other way to do this.

So i think the version properties file is enough for now. +1 for the first
> option.
>

Like I said. I'm fine with either choice by the committers.
I would choose the `mvn generate-sources` code variant as it is much
simpler and much more reliable.

Niels

Niels Basjes <[hidden email]> 于2020年4月9日周四 下午4:47写道:

>
> > Hi,
> >
> > I'm working on https://issues.apache.org/jira/browse/FLINK-16871 to make
> > more build time variables (like the scala version) into the code
> available
> > at runtime.
> >
> > During the review process there was discussion around a basic question:
> *Is
> > generating java code during the build ok?*
> > See
> >
> >    - https://github.com/apache/flink/pull/11245#discussion_r400035133
> >    - https://github.com/apache/flink/pull/11592
> >    - https://github.com/apache/flink/pull/11592#issuecomment-610282450
> >
> > As suggested by Chesnay Schepler I'm putting this question to the mailing
> > list.
> >   https://github.com/apache/flink/pull/11592#issuecomment-610963947
> >
> > The main discussion was around the ease of use when running in an IDE
> like
> > IntelliJ.
> >
> > So essentially we have two solution directions available:
> >
> >    1. *Generate a properties file and then use the classloader to load
> this
> >    file as a resource and then parse it as a property file.*
> >    This is the currently used solution direction for this part of the
> code.
> >    A rough version of this (to be improved) :
> >
> >
> https://github.com/apache/flink/commit/47099f663b7644056e9d87b262cd4dba034f513e
> >    This method has several effects:
> >       1. The developer can run the project immediately from within the
> IDE
> >       as fallback values are provided if the 'actual' values are missing.
> >       2. This property file (with stuff that should never be overwritten)
> >       can be modified by placing a different one in the classpath. In
> > fact it IS
> >       modified in the flink-dist as it generates a new file with the same
> > name
> >       into the binary distribution (I consider this to be bad).
> >       3. Loading resources means loading, parsing and a lot of error
> >       handling. Lots of things "can be null" or  be a default value. So
> the
> >       values are unreliable and lots of code needs to handle this. In
> fact
> > when
> >       running from IntelliJ the properties file is generated poorly most
> > of the
> >       time, only during a normal maven build will it work correctly.
> >    2. *Generate a Java source file and then simply compile this and make
> it
> >    part of the project.*
> >    Essentially the same model as you would have when using Apache Avro,
> >    Protobuf, Antlr 4 and javacc (several of those are used in Flink!).
> >    A rough version of this (to be improved) :
> >
> >
> https://github.com/apache/flink/commit/d215e4df60dc9d647dcee1aa9a2114cbf49d0566
> >    This method has several effects:
> >    1. The developer MUST run 'mvn generate-sources' before the actual the
> >       project immediately from within the IDE as fallback values are
> > provided if
> >       the 'actual' values are missing.
> >       2. The code/test will not run until this step is done.
> >       3. Because the file is generated by a plugin it is always correct.
> As
> >       a consequence all variables are always available and the downstream
> > users
> >       no longer have to handle the "can be null" or "default value"
> > situations.
> >
> > So is generating code similar to what I created a desired change?
> > My opinion is that it is the better solution, the data available is more
> > reliable and as a consequence the rest of the code is simpler.
> > It would probably mean that during development of flink you should be
> aware
> > of this and do an 'extra step' to get it running.
> >
> > What do you guys think?
> >
> > --
> > Best regards / Met vriendelijke groeten,
> >
> > Niels Basjes
> >
>


--
Best regards / Met vriendelijke groeten,

Niels Basjes
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Generating java code?

Till Rohrmann
Hi everyone,

thanks for starting this discussion Niels.

I like the idea of getting rid of parsing a Properties instance. On the
other hand, I also understand that people are concerned about disrupting
the workflows of our devs.

Maybe we can find a compromise between both approaches. For example, one
could define an EnvironmentInformationProvider interface which is loaded at
runtime. The implementation of this interface could be generated by `mvn
generate-sources`. Moreover, if we cannot find it, then we fail with a
descriptive exception saying to run `mvn generate-sources`. That way only
devs who rely on the EnvironmentInformation have to run `mvn
generate-sources` when testing/developing and we could still get rid of all
the parsing and logic of a properties file. What do you think?

Cheers,
Till

On Mon, Apr 13, 2020 at 10:53 AM Niels Basjes <[hidden email]> wrote:

> Hi,
>
> On Mon, Apr 13, 2020 at 8:30 AM Yang Wang <[hidden email]> wrote:
>
> > Although the latter option is more stable,
> > i think it is not acceptable for all the developers to execute `mvn
> > generate-sources` first.
>
> Otherwise, the Flink project is just broken and could not run tests, Flink
> > jobs in IDE.
> >
> >
> It is important to realize that this discussion is essentially around the
> fact that systems like IntelliJ (what I use) execute various ways of
> generating code that have been written in a maven pom.xml in different
> ways. Simply put: They don't execute all instructions that have been
> defined in the pom.xml automatically when needed.
>
>    - With the properties file variant (which uses resource filtering)
>    IntelliJ does resource filtering automatically yet no maven plugins are
>    run that affect the properties that are replaced.
>    So the git variables are NOT populated when running in IntelliJ and I
>    have seen several values in the propersites file are nonsense.
>    As a consequence the Java code reading those needs to catch things like
>    "missing properties file" and various kinds of "nonsense values" and
>    replace them with a workable default.
>    So when running this code you are actually running something different
>    from what will be run when doing a `mvn clean install`, yet the
> developer
>    is under the illusion it all works because of the code that catches all
> of
>    those problems.
>    Depending on the way these variables are used this may lead to "It fails
>    in production but it works fine in IntelliJ" situations.
>    In my mind the code that catches all of those exceptional situations in
>    poorly generated properties only exists so that developers can run the
>    (otherwise perfectly fine) software in a "broken" build/development
>    environment.
>
>
>    - With the way I propose to generate the code the effect is indeed
>    slightly harder: If you do not run the pom.xml based code generation it
>    will not work.
>    I understand that this requires the developers to think a bit more about
>    the code they are working on.
>    The upside is that the code either works perfectly or does not compile.
>    There is no "it compiles but is really nonsense".
>    I prefer this.
>    Also at this point in time this is already true for Flink: There are
>    quite a few modules where the developer MUST run mvn generate-sources
> for
>    all tests to run successfully.
>    Like I indicated there is a javacc SQL parser and there are a lot of
>    tests that rely on generating Avro code before they are able to run.
>    I have several projects where systems like Avro and Antlr force me in
>    this direction, there is simply no other way to do this.
>
> So i think the version properties file is enough for now. +1 for the first
> > option.
> >
>
> Like I said. I'm fine with either choice by the committers.
> I would choose the `mvn generate-sources` code variant as it is much
> simpler and much more reliable.
>
> Niels
>
> Niels Basjes <[hidden email]> 于2020年4月9日周四 下午4:47写道:
> >
> > > Hi,
> > >
> > > I'm working on https://issues.apache.org/jira/browse/FLINK-16871 to
> make
> > > more build time variables (like the scala version) into the code
> > available
> > > at runtime.
> > >
> > > During the review process there was discussion around a basic question:
> > *Is
> > > generating java code during the build ok?*
> > > See
> > >
> > >    - https://github.com/apache/flink/pull/11245#discussion_r400035133
> > >    - https://github.com/apache/flink/pull/11592
> > >    - https://github.com/apache/flink/pull/11592#issuecomment-610282450
> > >
> > > As suggested by Chesnay Schepler I'm putting this question to the
> mailing
> > > list.
> > >   https://github.com/apache/flink/pull/11592#issuecomment-610963947
> > >
> > > The main discussion was around the ease of use when running in an IDE
> > like
> > > IntelliJ.
> > >
> > > So essentially we have two solution directions available:
> > >
> > >    1. *Generate a properties file and then use the classloader to load
> > this
> > >    file as a resource and then parse it as a property file.*
> > >    This is the currently used solution direction for this part of the
> > code.
> > >    A rough version of this (to be improved) :
> > >
> > >
> >
> https://github.com/apache/flink/commit/47099f663b7644056e9d87b262cd4dba034f513e
> > >    This method has several effects:
> > >       1. The developer can run the project immediately from within the
> > IDE
> > >       as fallback values are provided if the 'actual' values are
> missing.
> > >       2. This property file (with stuff that should never be
> overwritten)
> > >       can be modified by placing a different one in the classpath. In
> > > fact it IS
> > >       modified in the flink-dist as it generates a new file with the
> same
> > > name
> > >       into the binary distribution (I consider this to be bad).
> > >       3. Loading resources means loading, parsing and a lot of error
> > >       handling. Lots of things "can be null" or  be a default value. So
> > the
> > >       values are unreliable and lots of code needs to handle this. In
> > fact
> > > when
> > >       running from IntelliJ the properties file is generated poorly
> most
> > > of the
> > >       time, only during a normal maven build will it work correctly.
> > >    2. *Generate a Java source file and then simply compile this and
> make
> > it
> > >    part of the project.*
> > >    Essentially the same model as you would have when using Apache Avro,
> > >    Protobuf, Antlr 4 and javacc (several of those are used in Flink!).
> > >    A rough version of this (to be improved) :
> > >
> > >
> >
> https://github.com/apache/flink/commit/d215e4df60dc9d647dcee1aa9a2114cbf49d0566
> > >    This method has several effects:
> > >    1. The developer MUST run 'mvn generate-sources' before the actual
> the
> > >       project immediately from within the IDE as fallback values are
> > > provided if
> > >       the 'actual' values are missing.
> > >       2. The code/test will not run until this step is done.
> > >       3. Because the file is generated by a plugin it is always
> correct.
> > As
> > >       a consequence all variables are always available and the
> downstream
> > > users
> > >       no longer have to handle the "can be null" or "default value"
> > > situations.
> > >
> > > So is generating code similar to what I created a desired change?
> > > My opinion is that it is the better solution, the data available is
> more
> > > reliable and as a consequence the rest of the code is simpler.
> > > It would probably mean that during development of flink you should be
> > aware
> > > of this and do an 'extra step' to get it running.
> > >
> > > What do you guys think?
> > >
> > > --
> > > Best regards / Met vriendelijke groeten,
> > >
> > > Niels Basjes
> > >
> >
>
>
> --
> Best regards / Met vriendelijke groeten,
>
> Niels Basjes
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Generating java code?

Jingsong Li
Hi Till,

+1 to define an interface and load it at runtime if we can do.
No disrupting the workflows of devs and throw an exception with good
description look good to me.
This also force us to do a good dependent class abstract.

Best,
Jingsong Lee

On Wed, Apr 15, 2020 at 10:31 PM Till Rohrmann <[hidden email]> wrote:

> Hi everyone,
>
> thanks for starting this discussion Niels.
>
> I like the idea of getting rid of parsing a Properties instance. On the
> other hand, I also understand that people are concerned about disrupting
> the workflows of our devs.
>
> Maybe we can find a compromise between both approaches. For example, one
> could define an EnvironmentInformationProvider interface which is loaded at
> runtime. The implementation of this interface could be generated by `mvn
> generate-sources`. Moreover, if we cannot find it, then we fail with a
> descriptive exception saying to run `mvn generate-sources`. That way only
> devs who rely on the EnvironmentInformation have to run `mvn
> generate-sources` when testing/developing and we could still get rid of all
> the parsing and logic of a properties file. What do you think?
>
> Cheers,
> Till
>
> On Mon, Apr 13, 2020 at 10:53 AM Niels Basjes <[hidden email]> wrote:
>
> > Hi,
> >
> > On Mon, Apr 13, 2020 at 8:30 AM Yang Wang <[hidden email]> wrote:
> >
> > > Although the latter option is more stable,
> > > i think it is not acceptable for all the developers to execute `mvn
> > > generate-sources` first.
> >
> > Otherwise, the Flink project is just broken and could not run tests,
> Flink
> > > jobs in IDE.
> > >
> > >
> > It is important to realize that this discussion is essentially around the
> > fact that systems like IntelliJ (what I use) execute various ways of
> > generating code that have been written in a maven pom.xml in different
> > ways. Simply put: They don't execute all instructions that have been
> > defined in the pom.xml automatically when needed.
> >
> >    - With the properties file variant (which uses resource filtering)
> >    IntelliJ does resource filtering automatically yet no maven plugins
> are
> >    run that affect the properties that are replaced.
> >    So the git variables are NOT populated when running in IntelliJ and I
> >    have seen several values in the propersites file are nonsense.
> >    As a consequence the Java code reading those needs to catch things
> like
> >    "missing properties file" and various kinds of "nonsense values" and
> >    replace them with a workable default.
> >    So when running this code you are actually running something different
> >    from what will be run when doing a `mvn clean install`, yet the
> > developer
> >    is under the illusion it all works because of the code that catches
> all
> > of
> >    those problems.
> >    Depending on the way these variables are used this may lead to "It
> fails
> >    in production but it works fine in IntelliJ" situations.
> >    In my mind the code that catches all of those exceptional situations
> in
> >    poorly generated properties only exists so that developers can run the
> >    (otherwise perfectly fine) software in a "broken" build/development
> >    environment.
> >
> >
> >    - With the way I propose to generate the code the effect is indeed
> >    slightly harder: If you do not run the pom.xml based code generation
> it
> >    will not work.
> >    I understand that this requires the developers to think a bit more
> about
> >    the code they are working on.
> >    The upside is that the code either works perfectly or does not
> compile.
> >    There is no "it compiles but is really nonsense".
> >    I prefer this.
> >    Also at this point in time this is already true for Flink: There are
> >    quite a few modules where the developer MUST run mvn generate-sources
> > for
> >    all tests to run successfully.
> >    Like I indicated there is a javacc SQL parser and there are a lot of
> >    tests that rely on generating Avro code before they are able to run.
> >    I have several projects where systems like Avro and Antlr force me in
> >    this direction, there is simply no other way to do this.
> >
> > So i think the version properties file is enough for now. +1 for the
> first
> > > option.
> > >
> >
> > Like I said. I'm fine with either choice by the committers.
> > I would choose the `mvn generate-sources` code variant as it is much
> > simpler and much more reliable.
> >
> > Niels
> >
> > Niels Basjes <[hidden email]> 于2020年4月9日周四 下午4:47写道:
> > >
> > > > Hi,
> > > >
> > > > I'm working on https://issues.apache.org/jira/browse/FLINK-16871 to
> > make
> > > > more build time variables (like the scala version) into the code
> > > available
> > > > at runtime.
> > > >
> > > > During the review process there was discussion around a basic
> question:
> > > *Is
> > > > generating java code during the build ok?*
> > > > See
> > > >
> > > >    -
> https://github.com/apache/flink/pull/11245#discussion_r400035133
> > > >    - https://github.com/apache/flink/pull/11592
> > > >    -
> https://github.com/apache/flink/pull/11592#issuecomment-610282450
> > > >
> > > > As suggested by Chesnay Schepler I'm putting this question to the
> > mailing
> > > > list.
> > > >   https://github.com/apache/flink/pull/11592#issuecomment-610963947
> > > >
> > > > The main discussion was around the ease of use when running in an IDE
> > > like
> > > > IntelliJ.
> > > >
> > > > So essentially we have two solution directions available:
> > > >
> > > >    1. *Generate a properties file and then use the classloader to
> load
> > > this
> > > >    file as a resource and then parse it as a property file.*
> > > >    This is the currently used solution direction for this part of the
> > > code.
> > > >    A rough version of this (to be improved) :
> > > >
> > > >
> > >
> >
> https://github.com/apache/flink/commit/47099f663b7644056e9d87b262cd4dba034f513e
> > > >    This method has several effects:
> > > >       1. The developer can run the project immediately from within
> the
> > > IDE
> > > >       as fallback values are provided if the 'actual' values are
> > missing.
> > > >       2. This property file (with stuff that should never be
> > overwritten)
> > > >       can be modified by placing a different one in the classpath. In
> > > > fact it IS
> > > >       modified in the flink-dist as it generates a new file with the
> > same
> > > > name
> > > >       into the binary distribution (I consider this to be bad).
> > > >       3. Loading resources means loading, parsing and a lot of error
> > > >       handling. Lots of things "can be null" or  be a default value.
> So
> > > the
> > > >       values are unreliable and lots of code needs to handle this. In
> > > fact
> > > > when
> > > >       running from IntelliJ the properties file is generated poorly
> > most
> > > > of the
> > > >       time, only during a normal maven build will it work correctly.
> > > >    2. *Generate a Java source file and then simply compile this and
> > make
> > > it
> > > >    part of the project.*
> > > >    Essentially the same model as you would have when using Apache
> Avro,
> > > >    Protobuf, Antlr 4 and javacc (several of those are used in
> Flink!).
> > > >    A rough version of this (to be improved) :
> > > >
> > > >
> > >
> >
> https://github.com/apache/flink/commit/d215e4df60dc9d647dcee1aa9a2114cbf49d0566
> > > >    This method has several effects:
> > > >    1. The developer MUST run 'mvn generate-sources' before the actual
> > the
> > > >       project immediately from within the IDE as fallback values are
> > > > provided if
> > > >       the 'actual' values are missing.
> > > >       2. The code/test will not run until this step is done.
> > > >       3. Because the file is generated by a plugin it is always
> > correct.
> > > As
> > > >       a consequence all variables are always available and the
> > downstream
> > > > users
> > > >       no longer have to handle the "can be null" or "default value"
> > > > situations.
> > > >
> > > > So is generating code similar to what I created a desired change?
> > > > My opinion is that it is the better solution, the data available is
> > more
> > > > reliable and as a consequence the rest of the code is simpler.
> > > > It would probably mean that during development of flink you should be
> > > aware
> > > > of this and do an 'extra step' to get it running.
> > > >
> > > > What do you guys think?
> > > >
> > > > --
> > > > Best regards / Met vriendelijke groeten,
> > > >
> > > > Niels Basjes
> > > >
> > >
> >
> >
> > --
> > Best regards / Met vriendelijke groeten,
> >
> > Niels Basjes
> >
>


--
Best, Jingsong Lee
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Generating java code?

Chesnay Schepler-3
In reply to this post by Till Rohrmann
It doesn't have to be a properties file, nor do we necessarily have to
do any manual parsing.
It could just be a JSON file that we point Jackson at. Naturally we
could also generate it with Jackson.
You'd have a POJO for all the fields with sane defaults (an analogue to
the proposed generated POJO) and 5-6 lines at most for loading the file
and handling errors.

I don't think this would be such a problem.

If I understand Till correctly he's proposing a service-loader approach,
which seems overkill to me. It also introduces some inconsistency in
what one needs to do before working on stuff which I'm apprehensive about.

On 15/04/2020 16:31, Till Rohrmann wrote:

> Hi everyone,
>
> thanks for starting this discussion Niels.
>
> I like the idea of getting rid of parsing a Properties instance. On the
> other hand, I also understand that people are concerned about disrupting
> the workflows of our devs.
>
> Maybe we can find a compromise between both approaches. For example, one
> could define an EnvironmentInformationProvider interface which is loaded at
> runtime. The implementation of this interface could be generated by `mvn
> generate-sources`. Moreover, if we cannot find it, then we fail with a
> descriptive exception saying to run `mvn generate-sources`. That way only
> devs who rely on the EnvironmentInformation have to run `mvn
> generate-sources` when testing/developing and we could still get rid of all
> the parsing and logic of a properties file. What do you think?
>
> Cheers,
> Till
>
> On Mon, Apr 13, 2020 at 10:53 AM Niels Basjes <[hidden email]> wrote:
>
>> Hi,
>>
>> On Mon, Apr 13, 2020 at 8:30 AM Yang Wang <[hidden email]> wrote:
>>
>>> Although the latter option is more stable,
>>> i think it is not acceptable for all the developers to execute `mvn
>>> generate-sources` first.
>> Otherwise, the Flink project is just broken and could not run tests, Flink
>>> jobs in IDE.
>>>
>>>
>> It is important to realize that this discussion is essentially around the
>> fact that systems like IntelliJ (what I use) execute various ways of
>> generating code that have been written in a maven pom.xml in different
>> ways. Simply put: They don't execute all instructions that have been
>> defined in the pom.xml automatically when needed.
>>
>>     - With the properties file variant (which uses resource filtering)
>>     IntelliJ does resource filtering automatically yet no maven plugins are
>>     run that affect the properties that are replaced.
>>     So the git variables are NOT populated when running in IntelliJ and I
>>     have seen several values in the propersites file are nonsense.
>>     As a consequence the Java code reading those needs to catch things like
>>     "missing properties file" and various kinds of "nonsense values" and
>>     replace them with a workable default.
>>     So when running this code you are actually running something different
>>     from what will be run when doing a `mvn clean install`, yet the
>> developer
>>     is under the illusion it all works because of the code that catches all
>> of
>>     those problems.
>>     Depending on the way these variables are used this may lead to "It fails
>>     in production but it works fine in IntelliJ" situations.
>>     In my mind the code that catches all of those exceptional situations in
>>     poorly generated properties only exists so that developers can run the
>>     (otherwise perfectly fine) software in a "broken" build/development
>>     environment.
>>
>>
>>     - With the way I propose to generate the code the effect is indeed
>>     slightly harder: If you do not run the pom.xml based code generation it
>>     will not work.
>>     I understand that this requires the developers to think a bit more about
>>     the code they are working on.
>>     The upside is that the code either works perfectly or does not compile.
>>     There is no "it compiles but is really nonsense".
>>     I prefer this.
>>     Also at this point in time this is already true for Flink: There are
>>     quite a few modules where the developer MUST run mvn generate-sources
>> for
>>     all tests to run successfully.
>>     Like I indicated there is a javacc SQL parser and there are a lot of
>>     tests that rely on generating Avro code before they are able to run.
>>     I have several projects where systems like Avro and Antlr force me in
>>     this direction, there is simply no other way to do this.
>>
>> So i think the version properties file is enough for now. +1 for the first
>>> option.
>>>
>> Like I said. I'm fine with either choice by the committers.
>> I would choose the `mvn generate-sources` code variant as it is much
>> simpler and much more reliable.
>>
>> Niels
>>
>> Niels Basjes <[hidden email]> 于2020年4月9日周四 下午4:47写道:
>>>> Hi,
>>>>
>>>> I'm working on https://issues.apache.org/jira/browse/FLINK-16871 to
>> make
>>>> more build time variables (like the scala version) into the code
>>> available
>>>> at runtime.
>>>>
>>>> During the review process there was discussion around a basic question:
>>> *Is
>>>> generating java code during the build ok?*
>>>> See
>>>>
>>>>     - https://github.com/apache/flink/pull/11245#discussion_r400035133
>>>>     - https://github.com/apache/flink/pull/11592
>>>>     - https://github.com/apache/flink/pull/11592#issuecomment-610282450
>>>>
>>>> As suggested by Chesnay Schepler I'm putting this question to the
>> mailing
>>>> list.
>>>>    https://github.com/apache/flink/pull/11592#issuecomment-610963947
>>>>
>>>> The main discussion was around the ease of use when running in an IDE
>>> like
>>>> IntelliJ.
>>>>
>>>> So essentially we have two solution directions available:
>>>>
>>>>     1. *Generate a properties file and then use the classloader to load
>>> this
>>>>     file as a resource and then parse it as a property file.*
>>>>     This is the currently used solution direction for this part of the
>>> code.
>>>>     A rough version of this (to be improved) :
>>>>
>>>>
>> https://github.com/apache/flink/commit/47099f663b7644056e9d87b262cd4dba034f513e
>>>>     This method has several effects:
>>>>        1. The developer can run the project immediately from within the
>>> IDE
>>>>        as fallback values are provided if the 'actual' values are
>> missing.
>>>>        2. This property file (with stuff that should never be
>> overwritten)
>>>>        can be modified by placing a different one in the classpath. In
>>>> fact it IS
>>>>        modified in the flink-dist as it generates a new file with the
>> same
>>>> name
>>>>        into the binary distribution (I consider this to be bad).
>>>>        3. Loading resources means loading, parsing and a lot of error
>>>>        handling. Lots of things "can be null" or  be a default value. So
>>> the
>>>>        values are unreliable and lots of code needs to handle this. In
>>> fact
>>>> when
>>>>        running from IntelliJ the properties file is generated poorly
>> most
>>>> of the
>>>>        time, only during a normal maven build will it work correctly.
>>>>     2. *Generate a Java source file and then simply compile this and
>> make
>>> it
>>>>     part of the project.*
>>>>     Essentially the same model as you would have when using Apache Avro,
>>>>     Protobuf, Antlr 4 and javacc (several of those are used in Flink!).
>>>>     A rough version of this (to be improved) :
>>>>
>>>>
>> https://github.com/apache/flink/commit/d215e4df60dc9d647dcee1aa9a2114cbf49d0566
>>>>     This method has several effects:
>>>>     1. The developer MUST run 'mvn generate-sources' before the actual
>> the
>>>>        project immediately from within the IDE as fallback values are
>>>> provided if
>>>>        the 'actual' values are missing.
>>>>        2. The code/test will not run until this step is done.
>>>>        3. Because the file is generated by a plugin it is always
>> correct.
>>> As
>>>>        a consequence all variables are always available and the
>> downstream
>>>> users
>>>>        no longer have to handle the "can be null" or "default value"
>>>> situations.
>>>>
>>>> So is generating code similar to what I created a desired change?
>>>> My opinion is that it is the better solution, the data available is
>> more
>>>> reliable and as a consequence the rest of the code is simpler.
>>>> It would probably mean that during development of flink you should be
>>> aware
>>>> of this and do an 'extra step' to get it running.
>>>>
>>>> What do you guys think?
>>>>
>>>> --
>>>> Best regards / Met vriendelijke groeten,
>>>>
>>>> Niels Basjes
>>>>
>>
>> --
>> Best regards / Met vriendelijke groeten,
>>
>> Niels Basjes
>>

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Generating java code?

Till Rohrmann
I'm not advocating for a specific approach. The point I wanted to make is
that there are solutions which allow us to get rid of the problematic
parsing and not disrupting the workflow. If the Jackson JSON file approach
works for Niels, then I'm fine with that as well.

Cheers,
Till

On Wed, Apr 15, 2020 at 5:21 PM Chesnay Schepler <[hidden email]> wrote:

> It doesn't have to be a properties file, nor do we necessarily have to
> do any manual parsing.
> It could just be a JSON file that we point Jackson at. Naturally we
> could also generate it with Jackson.
> You'd have a POJO for all the fields with sane defaults (an analogue to
> the proposed generated POJO) and 5-6 lines at most for loading the file
> and handling errors.
>
> I don't think this would be such a problem.
>
> If I understand Till correctly he's proposing a service-loader approach,
> which seems overkill to me. It also introduces some inconsistency in
> what one needs to do before working on stuff which I'm apprehensive about.
>
> On 15/04/2020 16:31, Till Rohrmann wrote:
> > Hi everyone,
> >
> > thanks for starting this discussion Niels.
> >
> > I like the idea of getting rid of parsing a Properties instance. On the
> > other hand, I also understand that people are concerned about disrupting
> > the workflows of our devs.
> >
> > Maybe we can find a compromise between both approaches. For example, one
> > could define an EnvironmentInformationProvider interface which is loaded
> at
> > runtime. The implementation of this interface could be generated by `mvn
> > generate-sources`. Moreover, if we cannot find it, then we fail with a
> > descriptive exception saying to run `mvn generate-sources`. That way only
> > devs who rely on the EnvironmentInformation have to run `mvn
> > generate-sources` when testing/developing and we could still get rid of
> all
> > the parsing and logic of a properties file. What do you think?
> >
> > Cheers,
> > Till
> >
> > On Mon, Apr 13, 2020 at 10:53 AM Niels Basjes <[hidden email]> wrote:
> >
> >> Hi,
> >>
> >> On Mon, Apr 13, 2020 at 8:30 AM Yang Wang <[hidden email]>
> wrote:
> >>
> >>> Although the latter option is more stable,
> >>> i think it is not acceptable for all the developers to execute `mvn
> >>> generate-sources` first.
> >> Otherwise, the Flink project is just broken and could not run tests,
> Flink
> >>> jobs in IDE.
> >>>
> >>>
> >> It is important to realize that this discussion is essentially around
> the
> >> fact that systems like IntelliJ (what I use) execute various ways of
> >> generating code that have been written in a maven pom.xml in different
> >> ways. Simply put: They don't execute all instructions that have been
> >> defined in the pom.xml automatically when needed.
> >>
> >>     - With the properties file variant (which uses resource filtering)
> >>     IntelliJ does resource filtering automatically yet no maven plugins
> are
> >>     run that affect the properties that are replaced.
> >>     So the git variables are NOT populated when running in IntelliJ and
> I
> >>     have seen several values in the propersites file are nonsense.
> >>     As a consequence the Java code reading those needs to catch things
> like
> >>     "missing properties file" and various kinds of "nonsense values" and
> >>     replace them with a workable default.
> >>     So when running this code you are actually running something
> different
> >>     from what will be run when doing a `mvn clean install`, yet the
> >> developer
> >>     is under the illusion it all works because of the code that catches
> all
> >> of
> >>     those problems.
> >>     Depending on the way these variables are used this may lead to "It
> fails
> >>     in production but it works fine in IntelliJ" situations.
> >>     In my mind the code that catches all of those exceptional
> situations in
> >>     poorly generated properties only exists so that developers can run
> the
> >>     (otherwise perfectly fine) software in a "broken" build/development
> >>     environment.
> >>
> >>
> >>     - With the way I propose to generate the code the effect is indeed
> >>     slightly harder: If you do not run the pom.xml based code
> generation it
> >>     will not work.
> >>     I understand that this requires the developers to think a bit more
> about
> >>     the code they are working on.
> >>     The upside is that the code either works perfectly or does not
> compile.
> >>     There is no "it compiles but is really nonsense".
> >>     I prefer this.
> >>     Also at this point in time this is already true for Flink: There are
> >>     quite a few modules where the developer MUST run mvn
> generate-sources
> >> for
> >>     all tests to run successfully.
> >>     Like I indicated there is a javacc SQL parser and there are a lot of
> >>     tests that rely on generating Avro code before they are able to run.
> >>     I have several projects where systems like Avro and Antlr force me
> in
> >>     this direction, there is simply no other way to do this.
> >>
> >> So i think the version properties file is enough for now. +1 for the
> first
> >>> option.
> >>>
> >> Like I said. I'm fine with either choice by the committers.
> >> I would choose the `mvn generate-sources` code variant as it is much
> >> simpler and much more reliable.
> >>
> >> Niels
> >>
> >> Niels Basjes <[hidden email]> 于2020年4月9日周四 下午4:47写道:
> >>>> Hi,
> >>>>
> >>>> I'm working on https://issues.apache.org/jira/browse/FLINK-16871 to
> >> make
> >>>> more build time variables (like the scala version) into the code
> >>> available
> >>>> at runtime.
> >>>>
> >>>> During the review process there was discussion around a basic
> question:
> >>> *Is
> >>>> generating java code during the build ok?*
> >>>> See
> >>>>
> >>>>     -
> https://github.com/apache/flink/pull/11245#discussion_r400035133
> >>>>     - https://github.com/apache/flink/pull/11592
> >>>>     -
> https://github.com/apache/flink/pull/11592#issuecomment-610282450
> >>>>
> >>>> As suggested by Chesnay Schepler I'm putting this question to the
> >> mailing
> >>>> list.
> >>>>    https://github.com/apache/flink/pull/11592#issuecomment-610963947
> >>>>
> >>>> The main discussion was around the ease of use when running in an IDE
> >>> like
> >>>> IntelliJ.
> >>>>
> >>>> So essentially we have two solution directions available:
> >>>>
> >>>>     1. *Generate a properties file and then use the classloader to
> load
> >>> this
> >>>>     file as a resource and then parse it as a property file.*
> >>>>     This is the currently used solution direction for this part of the
> >>> code.
> >>>>     A rough version of this (to be improved) :
> >>>>
> >>>>
> >>
> https://github.com/apache/flink/commit/47099f663b7644056e9d87b262cd4dba034f513e
> >>>>     This method has several effects:
> >>>>        1. The developer can run the project immediately from within
> the
> >>> IDE
> >>>>        as fallback values are provided if the 'actual' values are
> >> missing.
> >>>>        2. This property file (with stuff that should never be
> >> overwritten)
> >>>>        can be modified by placing a different one in the classpath. In
> >>>> fact it IS
> >>>>        modified in the flink-dist as it generates a new file with the
> >> same
> >>>> name
> >>>>        into the binary distribution (I consider this to be bad).
> >>>>        3. Loading resources means loading, parsing and a lot of error
> >>>>        handling. Lots of things "can be null" or  be a default value.
> So
> >>> the
> >>>>        values are unreliable and lots of code needs to handle this. In
> >>> fact
> >>>> when
> >>>>        running from IntelliJ the properties file is generated poorly
> >> most
> >>>> of the
> >>>>        time, only during a normal maven build will it work correctly.
> >>>>     2. *Generate a Java source file and then simply compile this and
> >> make
> >>> it
> >>>>     part of the project.*
> >>>>     Essentially the same model as you would have when using Apache
> Avro,
> >>>>     Protobuf, Antlr 4 and javacc (several of those are used in
> Flink!).
> >>>>     A rough version of this (to be improved) :
> >>>>
> >>>>
> >>
> https://github.com/apache/flink/commit/d215e4df60dc9d647dcee1aa9a2114cbf49d0566
> >>>>     This method has several effects:
> >>>>     1. The developer MUST run 'mvn generate-sources' before the actual
> >> the
> >>>>        project immediately from within the IDE as fallback values are
> >>>> provided if
> >>>>        the 'actual' values are missing.
> >>>>        2. The code/test will not run until this step is done.
> >>>>        3. Because the file is generated by a plugin it is always
> >> correct.
> >>> As
> >>>>        a consequence all variables are always available and the
> >> downstream
> >>>> users
> >>>>        no longer have to handle the "can be null" or "default value"
> >>>> situations.
> >>>>
> >>>> So is generating code similar to what I created a desired change?
> >>>> My opinion is that it is the better solution, the data available is
> >> more
> >>>> reliable and as a consequence the rest of the code is simpler.
> >>>> It would probably mean that during development of flink you should be
> >>> aware
> >>>> of this and do an 'extra step' to get it running.
> >>>>
> >>>> What do you guys think?
> >>>>
> >>>> --
> >>>> Best regards / Met vriendelijke groeten,
> >>>>
> >>>> Niels Basjes
> >>>>
> >>
> >> --
> >> Best regards / Met vriendelijke groeten,
> >>
> >> Niels Basjes
> >>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Generating java code?

Niels Basjes
Hi,

Apparently the IDEs (like IntelliJ) are imperfect at supporting the DEVs in
these kinds of use cases.
The solutions you see are like you have in IntelliJ: a "generate-sources"
button.
It is a way of working I see in almost all projects I'm involved with:
Almost all either use a parser generator (like javacc or Antlr) or generate
code from a specification (like Avro or Protobuf).
In all of these cases there is no "default" value you can use so this
discussion never arose in those projects.

All other options (other than 'hard' generating a file/class without having
defaults) involve having some kind of "default" (regardless if it is a
property file, a Json file or a Java interface).
This will lead to the illusion of "it works on my machine" or "why doesn't
it work on my machine".
What I also see is that these options which include "defaults" also have
quite a bit of extra code to handle all the situations related to loading
the settings (either through resource loading or classloading).

The goal I was working on is that the docker image that is loaded is tagged
with a flink+scala version.
If the DEV does not generate the right values or has values from a
different branch then a different docker image may be used then what is
expected leading to "why doesn't it work on my machine" .

So yes it is an additional learning step for the DEVs and yes it takes a
bit of getting used to.
Yet I truly believe it is the best direction.

Niels


On Wed, Apr 15, 2020 at 5:26 PM Till Rohrmann <[hidden email]> wrote:

> I'm not advocating for a specific approach. The point I wanted to make is
> that there are solutions which allow us to get rid of the problematic
> parsing and not disrupting the workflow. If the Jackson JSON file approach
> works for Niels, then I'm fine with that as well.
>
> Cheers,
> Till
>
> On Wed, Apr 15, 2020 at 5:21 PM Chesnay Schepler <[hidden email]>
> wrote:
>
> > It doesn't have to be a properties file, nor do we necessarily have to
> > do any manual parsing.
> > It could just be a JSON file that we point Jackson at. Naturally we
> > could also generate it with Jackson.
> > You'd have a POJO for all the fields with sane defaults (an analogue to
> > the proposed generated POJO) and 5-6 lines at most for loading the file
> > and handling errors.
> >
> > I don't think this would be such a problem.
> >
> > If I understand Till correctly he's proposing a service-loader approach,
> > which seems overkill to me. It also introduces some inconsistency in
> > what one needs to do before working on stuff which I'm apprehensive
> about.
> >
> > On 15/04/2020 16:31, Till Rohrmann wrote:
> > > Hi everyone,
> > >
> > > thanks for starting this discussion Niels.
> > >
> > > I like the idea of getting rid of parsing a Properties instance. On the
> > > other hand, I also understand that people are concerned about
> disrupting
> > > the workflows of our devs.
> > >
> > > Maybe we can find a compromise between both approaches. For example,
> one
> > > could define an EnvironmentInformationProvider interface which is
> loaded
> > at
> > > runtime. The implementation of this interface could be generated by
> `mvn
> > > generate-sources`. Moreover, if we cannot find it, then we fail with a
> > > descriptive exception saying to run `mvn generate-sources`. That way
> only
> > > devs who rely on the EnvironmentInformation have to run `mvn
> > > generate-sources` when testing/developing and we could still get rid of
> > all
> > > the parsing and logic of a properties file. What do you think?
> > >
> > > Cheers,
> > > Till
> > >
> > > On Mon, Apr 13, 2020 at 10:53 AM Niels Basjes <[hidden email]> wrote:
> > >
> > >> Hi,
> > >>
> > >> On Mon, Apr 13, 2020 at 8:30 AM Yang Wang <[hidden email]>
> > wrote:
> > >>
> > >>> Although the latter option is more stable,
> > >>> i think it is not acceptable for all the developers to execute `mvn
> > >>> generate-sources` first.
> > >> Otherwise, the Flink project is just broken and could not run tests,
> > Flink
> > >>> jobs in IDE.
> > >>>
> > >>>
> > >> It is important to realize that this discussion is essentially around
> > the
> > >> fact that systems like IntelliJ (what I use) execute various ways of
> > >> generating code that have been written in a maven pom.xml in different
> > >> ways. Simply put: They don't execute all instructions that have been
> > >> defined in the pom.xml automatically when needed.
> > >>
> > >>     - With the properties file variant (which uses resource filtering)
> > >>     IntelliJ does resource filtering automatically yet no maven
> plugins
> > are
> > >>     run that affect the properties that are replaced.
> > >>     So the git variables are NOT populated when running in IntelliJ
> and
> > I
> > >>     have seen several values in the propersites file are nonsense.
> > >>     As a consequence the Java code reading those needs to catch things
> > like
> > >>     "missing properties file" and various kinds of "nonsense values"
> and
> > >>     replace them with a workable default.
> > >>     So when running this code you are actually running something
> > different
> > >>     from what will be run when doing a `mvn clean install`, yet the
> > >> developer
> > >>     is under the illusion it all works because of the code that
> catches
> > all
> > >> of
> > >>     those problems.
> > >>     Depending on the way these variables are used this may lead to "It
> > fails
> > >>     in production but it works fine in IntelliJ" situations.
> > >>     In my mind the code that catches all of those exceptional
> > situations in
> > >>     poorly generated properties only exists so that developers can run
> > the
> > >>     (otherwise perfectly fine) software in a "broken"
> build/development
> > >>     environment.
> > >>
> > >>
> > >>     - With the way I propose to generate the code the effect is indeed
> > >>     slightly harder: If you do not run the pom.xml based code
> > generation it
> > >>     will not work.
> > >>     I understand that this requires the developers to think a bit more
> > about
> > >>     the code they are working on.
> > >>     The upside is that the code either works perfectly or does not
> > compile.
> > >>     There is no "it compiles but is really nonsense".
> > >>     I prefer this.
> > >>     Also at this point in time this is already true for Flink: There
> are
> > >>     quite a few modules where the developer MUST run mvn
> > generate-sources
> > >> for
> > >>     all tests to run successfully.
> > >>     Like I indicated there is a javacc SQL parser and there are a lot
> of
> > >>     tests that rely on generating Avro code before they are able to
> run.
> > >>     I have several projects where systems like Avro and Antlr force me
> > in
> > >>     this direction, there is simply no other way to do this.
> > >>
> > >> So i think the version properties file is enough for now. +1 for the
> > first
> > >>> option.
> > >>>
> > >> Like I said. I'm fine with either choice by the committers.
> > >> I would choose the `mvn generate-sources` code variant as it is much
> > >> simpler and much more reliable.
> > >>
> > >> Niels
> > >>
> > >> Niels Basjes <[hidden email]> 于2020年4月9日周四 下午4:47写道:
> > >>>> Hi,
> > >>>>
> > >>>> I'm working on https://issues.apache.org/jira/browse/FLINK-16871 to
> > >> make
> > >>>> more build time variables (like the scala version) into the code
> > >>> available
> > >>>> at runtime.
> > >>>>
> > >>>> During the review process there was discussion around a basic
> > question:
> > >>> *Is
> > >>>> generating java code during the build ok?*
> > >>>> See
> > >>>>
> > >>>>     -
> > https://github.com/apache/flink/pull/11245#discussion_r400035133
> > >>>>     - https://github.com/apache/flink/pull/11592
> > >>>>     -
> > https://github.com/apache/flink/pull/11592#issuecomment-610282450
> > >>>>
> > >>>> As suggested by Chesnay Schepler I'm putting this question to the
> > >> mailing
> > >>>> list.
> > >>>>
> https://github.com/apache/flink/pull/11592#issuecomment-610963947
> > >>>>
> > >>>> The main discussion was around the ease of use when running in an
> IDE
> > >>> like
> > >>>> IntelliJ.
> > >>>>
> > >>>> So essentially we have two solution directions available:
> > >>>>
> > >>>>     1. *Generate a properties file and then use the classloader to
> > load
> > >>> this
> > >>>>     file as a resource and then parse it as a property file.*
> > >>>>     This is the currently used solution direction for this part of
> the
> > >>> code.
> > >>>>     A rough version of this (to be improved) :
> > >>>>
> > >>>>
> > >>
> >
> https://github.com/apache/flink/commit/47099f663b7644056e9d87b262cd4dba034f513e
> > >>>>     This method has several effects:
> > >>>>        1. The developer can run the project immediately from within
> > the
> > >>> IDE
> > >>>>        as fallback values are provided if the 'actual' values are
> > >> missing.
> > >>>>        2. This property file (with stuff that should never be
> > >> overwritten)
> > >>>>        can be modified by placing a different one in the classpath.
> In
> > >>>> fact it IS
> > >>>>        modified in the flink-dist as it generates a new file with
> the
> > >> same
> > >>>> name
> > >>>>        into the binary distribution (I consider this to be bad).
> > >>>>        3. Loading resources means loading, parsing and a lot of
> error
> > >>>>        handling. Lots of things "can be null" or  be a default
> value.
> > So
> > >>> the
> > >>>>        values are unreliable and lots of code needs to handle this.
> In
> > >>> fact
> > >>>> when
> > >>>>        running from IntelliJ the properties file is generated poorly
> > >> most
> > >>>> of the
> > >>>>        time, only during a normal maven build will it work
> correctly.
> > >>>>     2. *Generate a Java source file and then simply compile this and
> > >> make
> > >>> it
> > >>>>     part of the project.*
> > >>>>     Essentially the same model as you would have when using Apache
> > Avro,
> > >>>>     Protobuf, Antlr 4 and javacc (several of those are used in
> > Flink!).
> > >>>>     A rough version of this (to be improved) :
> > >>>>
> > >>>>
> > >>
> >
> https://github.com/apache/flink/commit/d215e4df60dc9d647dcee1aa9a2114cbf49d0566
> > >>>>     This method has several effects:
> > >>>>     1. The developer MUST run 'mvn generate-sources' before the
> actual
> > >> the
> > >>>>        project immediately from within the IDE as fallback values
> are
> > >>>> provided if
> > >>>>        the 'actual' values are missing.
> > >>>>        2. The code/test will not run until this step is done.
> > >>>>        3. Because the file is generated by a plugin it is always
> > >> correct.
> > >>> As
> > >>>>        a consequence all variables are always available and the
> > >> downstream
> > >>>> users
> > >>>>        no longer have to handle the "can be null" or "default value"
> > >>>> situations.
> > >>>>
> > >>>> So is generating code similar to what I created a desired change?
> > >>>> My opinion is that it is the better solution, the data available is
> > >> more
> > >>>> reliable and as a consequence the rest of the code is simpler.
> > >>>> It would probably mean that during development of flink you should
> be
> > >>> aware
> > >>>> of this and do an 'extra step' to get it running.
> > >>>>
> > >>>> What do you guys think?
> > >>>>
> > >>>> --
> > >>>> Best regards / Met vriendelijke groeten,
> > >>>>
> > >>>> Niels Basjes
> > >>>>
> > >>
> > >> --
> > >> Best regards / Met vriendelijke groeten,
> > >>
> > >> Niels Basjes
> > >>
> >
> >
>


--
Best regards / Met vriendelijke groeten,

Niels Basjes
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Generating java code?

Chesnay Schepler-3
I've had an offline discussion with Till and we came to the following
conclusion:

All out-lined approaches work perfectly when Flink is built in it's
entirety with maven.
None of the out-lined approach work perfectly when Flink is _not_ built
in it's entirety.

The source-file generation approach has similar problems as the
properties-file one; if I build flink-runtime, switch to another branch,
build flink-dist, then the information that will be displayed in the UI
when I start the cluster may be incorrect.
Put another way, you can still easily run into the issue of "doesn't
work on /my/ machine", the solution to which always is to rebuilt Flink
completely.
Frustratingly enough, with both approaches you can even run into the
issue where the information _seems_ correct, when it actually isn't; but
this isn't something that we can really solve.

As such, this is no longer really about correctness, but one about
convenience and maintenance overhead.

As for maintenance, I think a good argument can be made that the
generation approach would be easier to maintain, in parts because it is
the more standard approach for this kind of thing.

The properties-file approach however clearly wins in convenience as it
neither requires an additional command to be run and doesn't cause
compile errors. The compile error is quite a problem indeed since it
provides no information as to what the actual problem is; or rather how
to solve it.

Overall we concluded that we would like to stick to the existing
approach of having a properties file (or JSON or whatever).

I saw that you updated the PR, and will take a look shortly.

On a personal note, I'm sympathetic to your idea, but people are quite
sensitive to things interrupting their work-flow (which I ran into a few
times already :/ ), so we put a lot of emphasis on that aspect.

On 16/04/2020 16:05, Niels Basjes wrote:

> Hi,
>
> Apparently the IDEs (like IntelliJ) are imperfect at supporting the DEVs in
> these kinds of use cases.
> The solutions you see are like you have in IntelliJ: a "generate-sources"
> button.
> It is a way of working I see in almost all projects I'm involved with:
> Almost all either use a parser generator (like javacc or Antlr) or generate
> code from a specification (like Avro or Protobuf).
> In all of these cases there is no "default" value you can use so this
> discussion never arose in those projects.
>
> All other options (other than 'hard' generating a file/class without having
> defaults) involve having some kind of "default" (regardless if it is a
> property file, a Json file or a Java interface).
> This will lead to the illusion of "it works on my machine" or "why doesn't
> it work on my machine".
> What I also see is that these options which include "defaults" also have
> quite a bit of extra code to handle all the situations related to loading
> the settings (either through resource loading or classloading).
>
> The goal I was working on is that the docker image that is loaded is tagged
> with a flink+scala version.
> If the DEV does not generate the right values or has values from a
> different branch then a different docker image may be used then what is
> expected leading to "why doesn't it work on my machine" .
>
> So yes it is an additional learning step for the DEVs and yes it takes a
> bit of getting used to.
> Yet I truly believe it is the best direction.
>
> Niels
>
>
> On Wed, Apr 15, 2020 at 5:26 PM Till Rohrmann <[hidden email]> wrote:
>
>> I'm not advocating for a specific approach. The point I wanted to make is
>> that there are solutions which allow us to get rid of the problematic
>> parsing and not disrupting the workflow. If the Jackson JSON file approach
>> works for Niels, then I'm fine with that as well.
>>
>> Cheers,
>> Till
>>
>> On Wed, Apr 15, 2020 at 5:21 PM Chesnay Schepler <[hidden email]>
>> wrote:
>>
>>> It doesn't have to be a properties file, nor do we necessarily have to
>>> do any manual parsing.
>>> It could just be a JSON file that we point Jackson at. Naturally we
>>> could also generate it with Jackson.
>>> You'd have a POJO for all the fields with sane defaults (an analogue to
>>> the proposed generated POJO) and 5-6 lines at most for loading the file
>>> and handling errors.
>>>
>>> I don't think this would be such a problem.
>>>
>>> If I understand Till correctly he's proposing a service-loader approach,
>>> which seems overkill to me. It also introduces some inconsistency in
>>> what one needs to do before working on stuff which I'm apprehensive
>> about.
>>> On 15/04/2020 16:31, Till Rohrmann wrote:
>>>> Hi everyone,
>>>>
>>>> thanks for starting this discussion Niels.
>>>>
>>>> I like the idea of getting rid of parsing a Properties instance. On the
>>>> other hand, I also understand that people are concerned about
>> disrupting
>>>> the workflows of our devs.
>>>>
>>>> Maybe we can find a compromise between both approaches. For example,
>> one
>>>> could define an EnvironmentInformationProvider interface which is
>> loaded
>>> at
>>>> runtime. The implementation of this interface could be generated by
>> `mvn
>>>> generate-sources`. Moreover, if we cannot find it, then we fail with a
>>>> descriptive exception saying to run `mvn generate-sources`. That way
>> only
>>>> devs who rely on the EnvironmentInformation have to run `mvn
>>>> generate-sources` when testing/developing and we could still get rid of
>>> all
>>>> the parsing and logic of a properties file. What do you think?
>>>>
>>>> Cheers,
>>>> Till
>>>>
>>>> On Mon, Apr 13, 2020 at 10:53 AM Niels Basjes <[hidden email]> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> On Mon, Apr 13, 2020 at 8:30 AM Yang Wang <[hidden email]>
>>> wrote:
>>>>>> Although the latter option is more stable,
>>>>>> i think it is not acceptable for all the developers to execute `mvn
>>>>>> generate-sources` first.
>>>>> Otherwise, the Flink project is just broken and could not run tests,
>>> Flink
>>>>>> jobs in IDE.
>>>>>>
>>>>>>
>>>>> It is important to realize that this discussion is essentially around
>>> the
>>>>> fact that systems like IntelliJ (what I use) execute various ways of
>>>>> generating code that have been written in a maven pom.xml in different
>>>>> ways. Simply put: They don't execute all instructions that have been
>>>>> defined in the pom.xml automatically when needed.
>>>>>
>>>>>      - With the properties file variant (which uses resource filtering)
>>>>>      IntelliJ does resource filtering automatically yet no maven
>> plugins
>>> are
>>>>>      run that affect the properties that are replaced.
>>>>>      So the git variables are NOT populated when running in IntelliJ
>> and
>>> I
>>>>>      have seen several values in the propersites file are nonsense.
>>>>>      As a consequence the Java code reading those needs to catch things
>>> like
>>>>>      "missing properties file" and various kinds of "nonsense values"
>> and
>>>>>      replace them with a workable default.
>>>>>      So when running this code you are actually running something
>>> different
>>>>>      from what will be run when doing a `mvn clean install`, yet the
>>>>> developer
>>>>>      is under the illusion it all works because of the code that
>> catches
>>> all
>>>>> of
>>>>>      those problems.
>>>>>      Depending on the way these variables are used this may lead to "It
>>> fails
>>>>>      in production but it works fine in IntelliJ" situations.
>>>>>      In my mind the code that catches all of those exceptional
>>> situations in
>>>>>      poorly generated properties only exists so that developers can run
>>> the
>>>>>      (otherwise perfectly fine) software in a "broken"
>> build/development
>>>>>      environment.
>>>>>
>>>>>
>>>>>      - With the way I propose to generate the code the effect is indeed
>>>>>      slightly harder: If you do not run the pom.xml based code
>>> generation it
>>>>>      will not work.
>>>>>      I understand that this requires the developers to think a bit more
>>> about
>>>>>      the code they are working on.
>>>>>      The upside is that the code either works perfectly or does not
>>> compile.
>>>>>      There is no "it compiles but is really nonsense".
>>>>>      I prefer this.
>>>>>      Also at this point in time this is already true for Flink: There
>> are
>>>>>      quite a few modules where the developer MUST run mvn
>>> generate-sources
>>>>> for
>>>>>      all tests to run successfully.
>>>>>      Like I indicated there is a javacc SQL parser and there are a lot
>> of
>>>>>      tests that rely on generating Avro code before they are able to
>> run.
>>>>>      I have several projects where systems like Avro and Antlr force me
>>> in
>>>>>      this direction, there is simply no other way to do this.
>>>>>
>>>>> So i think the version properties file is enough for now. +1 for the
>>> first
>>>>>> option.
>>>>>>
>>>>> Like I said. I'm fine with either choice by the committers.
>>>>> I would choose the `mvn generate-sources` code variant as it is much
>>>>> simpler and much more reliable.
>>>>>
>>>>> Niels
>>>>>
>>>>> Niels Basjes <[hidden email]> 于2020年4月9日周四 下午4:47写道:
>>>>>>> Hi,
>>>>>>>
>>>>>>> I'm working on https://issues.apache.org/jira/browse/FLINK-16871 to
>>>>> make
>>>>>>> more build time variables (like the scala version) into the code
>>>>>> available
>>>>>>> at runtime.
>>>>>>>
>>>>>>> During the review process there was discussion around a basic
>>> question:
>>>>>> *Is
>>>>>>> generating java code during the build ok?*
>>>>>>> See
>>>>>>>
>>>>>>>      -
>>> https://github.com/apache/flink/pull/11245#discussion_r400035133
>>>>>>>      - https://github.com/apache/flink/pull/11592
>>>>>>>      -
>>> https://github.com/apache/flink/pull/11592#issuecomment-610282450
>>>>>>> As suggested by Chesnay Schepler I'm putting this question to the
>>>>> mailing
>>>>>>> list.
>>>>>>>
>> https://github.com/apache/flink/pull/11592#issuecomment-610963947
>>>>>>> The main discussion was around the ease of use when running in an
>> IDE
>>>>>> like
>>>>>>> IntelliJ.
>>>>>>>
>>>>>>> So essentially we have two solution directions available:
>>>>>>>
>>>>>>>      1. *Generate a properties file and then use the classloader to
>>> load
>>>>>> this
>>>>>>>      file as a resource and then parse it as a property file.*
>>>>>>>      This is the currently used solution direction for this part of
>> the
>>>>>> code.
>>>>>>>      A rough version of this (to be improved) :
>>>>>>>
>>>>>>>
>> https://github.com/apache/flink/commit/47099f663b7644056e9d87b262cd4dba034f513e
>>>>>>>      This method has several effects:
>>>>>>>         1. The developer can run the project immediately from within
>>> the
>>>>>> IDE
>>>>>>>         as fallback values are provided if the 'actual' values are
>>>>> missing.
>>>>>>>         2. This property file (with stuff that should never be
>>>>> overwritten)
>>>>>>>         can be modified by placing a different one in the classpath.
>> In
>>>>>>> fact it IS
>>>>>>>         modified in the flink-dist as it generates a new file with
>> the
>>>>> same
>>>>>>> name
>>>>>>>         into the binary distribution (I consider this to be bad).
>>>>>>>         3. Loading resources means loading, parsing and a lot of
>> error
>>>>>>>         handling. Lots of things "can be null" or  be a default
>> value.
>>> So
>>>>>> the
>>>>>>>         values are unreliable and lots of code needs to handle this.
>> In
>>>>>> fact
>>>>>>> when
>>>>>>>         running from IntelliJ the properties file is generated poorly
>>>>> most
>>>>>>> of the
>>>>>>>         time, only during a normal maven build will it work
>> correctly.
>>>>>>>      2. *Generate a Java source file and then simply compile this and
>>>>> make
>>>>>> it
>>>>>>>      part of the project.*
>>>>>>>      Essentially the same model as you would have when using Apache
>>> Avro,
>>>>>>>      Protobuf, Antlr 4 and javacc (several of those are used in
>>> Flink!).
>>>>>>>      A rough version of this (to be improved) :
>>>>>>>
>>>>>>>
>> https://github.com/apache/flink/commit/d215e4df60dc9d647dcee1aa9a2114cbf49d0566
>>>>>>>      This method has several effects:
>>>>>>>      1. The developer MUST run 'mvn generate-sources' before the
>> actual
>>>>> the
>>>>>>>         project immediately from within the IDE as fallback values
>> are
>>>>>>> provided if
>>>>>>>         the 'actual' values are missing.
>>>>>>>         2. The code/test will not run until this step is done.
>>>>>>>         3. Because the file is generated by a plugin it is always
>>>>> correct.
>>>>>> As
>>>>>>>         a consequence all variables are always available and the
>>>>> downstream
>>>>>>> users
>>>>>>>         no longer have to handle the "can be null" or "default value"
>>>>>>> situations.
>>>>>>>
>>>>>>> So is generating code similar to what I created a desired change?
>>>>>>> My opinion is that it is the better solution, the data available is
>>>>> more
>>>>>>> reliable and as a consequence the rest of the code is simpler.
>>>>>>> It would probably mean that during development of flink you should
>> be
>>>>>> aware
>>>>>>> of this and do an 'extra step' to get it running.
>>>>>>>
>>>>>>> What do you guys think?
>>>>>>>
>>>>>>> --
>>>>>>> Best regards / Met vriendelijke groeten,
>>>>>>>
>>>>>>> Niels Basjes
>>>>>>>
>>>>> --
>>>>> Best regards / Met vriendelijke groeten,
>>>>>
>>>>> Niels Basjes
>>>>>
>>>
>

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Generating java code?

Niels Basjes
Hi,

Thank you for clarifying the tradeoffs and choices.
I've updated my pull request for your review, as far as I can tell it meets
the choices.

Now there are 3 scenarios:
1) There is no properties file --> everything returns a "default" value.
These are the defaults I have chosen:

Version                : <unknown>
ScalaVersion           : <unknown>
BuildTime              : 1970-01-01T00:00:00Z
BuildTimeString        : 1970-01-01T00:00:00+0000
GitCommitId            : DecafC0ffeeD0d0F00d
GitCommitIdAbbrev      : DeadD0d0
GitCommitTime          : 1970-01-01T00:00:00Z
GitCommitTimeString    : 1970-01-01T00:00:00+0000


2) A poorly generated properties file exists --> Only the bad values return
a default.
This happens because the maven git plugin is not run by the IDE.
Example:

project.version=1.11-SNAPSHOT
scala.binary.version=2.11

git.commit.id=${git.commit.id}
git.commit.id.abbrev=${git.commit.id.abbrev}
git.commit.time=${git.commit.time}
git.build.time=${git.build.time}


3) A full generated properties file exists --> Everything returns the right
value.
Example:

project.version=1.11-SNAPSHOT
scala.binary.version=2.11

git.commit.id=8e2642ec0046bb02fe9c1648b589051e66c2f956
git.commit.id.abbrev=8e2642e
git.commit.time=2020-04-28T09:22:20+0200
git.build.time=2020-04-28T10:06:42+0200


Niels

On Tue, Apr 21, 2020 at 2:00 PM Chesnay Schepler <[hidden email]> wrote:

> I've had an offline discussion with Till and we came to the following
> conclusion:
>
> All out-lined approaches work perfectly when Flink is built in it's
> entirety with maven.
> None of the out-lined approach work perfectly when Flink is _not_ built in
> it's entirety.
>
> The source-file generation approach has similar problems as the
> properties-file one; if I build flink-runtime, switch to another branch,
> build flink-dist, then the information that will be displayed in the UI
> when I start the cluster may be incorrect.
> Put another way, you can still easily run into the issue of "doesn't work
> on *my* machine", the solution to which always is to rebuilt Flink
> completely.
> Frustratingly enough, with both approaches you can even run into the issue
> where the information _seems_ correct, when it actually isn't; but this
> isn't something that we can really solve.
>
> As such, this is no longer really about correctness, but one about
> convenience and maintenance overhead.
>
> As for maintenance, I think a good argument can be made that the
> generation approach would be easier to maintain, in parts because it is the
> more standard approach for this kind of thing.
>
> The properties-file approach however clearly wins in convenience as it
> neither requires an additional command to be run and doesn't cause compile
> errors. The compile error is quite a problem indeed since it provides no
> information as to what the actual problem is; or rather how to solve it.
>
> Overall we concluded that we would like to stick to the existing approach
> of having a properties file (or JSON or whatever).
>
> I saw that you updated the PR, and will take a look shortly.
>
> On a personal note, I'm sympathetic to your idea, but people are quite
> sensitive to things interrupting their work-flow (which I ran into a few
> times already :/ ), so we put a lot of emphasis on that aspect.
>
> On 16/04/2020 16:05, Niels Basjes wrote:
>
> Hi,
>
> Apparently the IDEs (like IntelliJ) are imperfect at supporting the DEVs in
> these kinds of use cases.
> The solutions you see are like you have in IntelliJ: a "generate-sources"
> button.
> It is a way of working I see in almost all projects I'm involved with:
> Almost all either use a parser generator (like javacc or Antlr) or generate
> code from a specification (like Avro or Protobuf).
> In all of these cases there is no "default" value you can use so this
> discussion never arose in those projects.
>
> All other options (other than 'hard' generating a file/class without having
> defaults) involve having some kind of "default" (regardless if it is a
> property file, a Json file or a Java interface).
> This will lead to the illusion of "it works on my machine" or "why doesn't
> it work on my machine".
> What I also see is that these options which include "defaults" also have
> quite a bit of extra code to handle all the situations related to loading
> the settings (either through resource loading or classloading).
>
> The goal I was working on is that the docker image that is loaded is tagged
> with a flink+scala version.
> If the DEV does not generate the right values or has values from a
> different branch then a different docker image may be used then what is
> expected leading to "why doesn't it work on my machine" .
>
> So yes it is an additional learning step for the DEVs and yes it takes a
> bit of getting used to.
> Yet I truly believe it is the best direction.
>
> Niels
>
>
> On Wed, Apr 15, 2020 at 5:26 PM Till Rohrmann <[hidden email]> <[hidden email]> wrote:
>
>
> I'm not advocating for a specific approach. The point I wanted to make is
> that there are solutions which allow us to get rid of the problematic
> parsing and not disrupting the workflow. If the Jackson JSON file approach
> works for Niels, then I'm fine with that as well.
>
> Cheers,
> Till
>
> On Wed, Apr 15, 2020 at 5:21 PM Chesnay Schepler <[hidden email]> <[hidden email]>
> wrote:
>
>
> It doesn't have to be a properties file, nor do we necessarily have to
> do any manual parsing.
> It could just be a JSON file that we point Jackson at. Naturally we
> could also generate it with Jackson.
> You'd have a POJO for all the fields with sane defaults (an analogue to
> the proposed generated POJO) and 5-6 lines at most for loading the file
> and handling errors.
>
> I don't think this would be such a problem.
>
> If I understand Till correctly he's proposing a service-loader approach,
> which seems overkill to me. It also introduces some inconsistency in
> what one needs to do before working on stuff which I'm apprehensive
>
> about.
>
> On 15/04/2020 16:31, Till Rohrmann wrote:
>
> Hi everyone,
>
> thanks for starting this discussion Niels.
>
> I like the idea of getting rid of parsing a Properties instance. On the
> other hand, I also understand that people are concerned about
>
> disrupting
>
> the workflows of our devs.
>
> Maybe we can find a compromise between both approaches. For example,
>
> one
>
> could define an EnvironmentInformationProvider interface which is
>
> loaded
>
> at
>
> runtime. The implementation of this interface could be generated by
>
> `mvn
>
> generate-sources`. Moreover, if we cannot find it, then we fail with a
> descriptive exception saying to run `mvn generate-sources`. That way
>
> only
>
> devs who rely on the EnvironmentInformation have to run `mvn
> generate-sources` when testing/developing and we could still get rid of
>
> all
>
> the parsing and logic of a properties file. What do you think?
>
> Cheers,
> Till
>
> On Mon, Apr 13, 2020 at 10:53 AM Niels Basjes <[hidden email]> <[hidden email]> wrote:
>
>
> Hi,
>
> On Mon, Apr 13, 2020 at 8:30 AM Yang Wang <[hidden email]> <[hidden email]>
>
> wrote:
>
> Although the latter option is more stable,
> i think it is not acceptable for all the developers to execute `mvn
> generate-sources` first.
>
> Otherwise, the Flink project is just broken and could not run tests,
>
> Flink
>
> jobs in IDE.
>
>
>
> It is important to realize that this discussion is essentially around
>
> the
>
> fact that systems like IntelliJ (what I use) execute various ways of
> generating code that have been written in a maven pom.xml in different
> ways. Simply put: They don't execute all instructions that have been
> defined in the pom.xml automatically when needed.
>
>     - With the properties file variant (which uses resource filtering)
>     IntelliJ does resource filtering automatically yet no maven
>
> plugins
>
> are
>
>     run that affect the properties that are replaced.
>     So the git variables are NOT populated when running in IntelliJ
>
> and
>
> I
>
>     have seen several values in the propersites file are nonsense.
>     As a consequence the Java code reading those needs to catch things
>
> like
>
>     "missing properties file" and various kinds of "nonsense values"
>
> and
>
>     replace them with a workable default.
>     So when running this code you are actually running something
>
> different
>
>     from what will be run when doing a `mvn clean install`, yet the
> developer
>     is under the illusion it all works because of the code that
>
> catches
>
> all
>
> of
>     those problems.
>     Depending on the way these variables are used this may lead to "It
>
> fails
>
>     in production but it works fine in IntelliJ" situations.
>     In my mind the code that catches all of those exceptional
>
> situations in
>
>     poorly generated properties only exists so that developers can run
>
> the
>
>     (otherwise perfectly fine) software in a "broken"
>
> build/development
>
>     environment.
>
>
>     - With the way I propose to generate the code the effect is indeed
>     slightly harder: If you do not run the pom.xml based code
>
> generation it
>
>     will not work.
>     I understand that this requires the developers to think a bit more
>
> about
>
>     the code they are working on.
>     The upside is that the code either works perfectly or does not
>
> compile.
>
>     There is no "it compiles but is really nonsense".
>     I prefer this.
>     Also at this point in time this is already true for Flink: There
>
> are
>
>     quite a few modules where the developer MUST run mvn
>
> generate-sources
>
> for
>     all tests to run successfully.
>     Like I indicated there is a javacc SQL parser and there are a lot
>
> of
>
>     tests that rely on generating Avro code before they are able to
>
> run.
>
>     I have several projects where systems like Avro and Antlr force me
>
> in
>
>     this direction, there is simply no other way to do this.
>
> So i think the version properties file is enough for now. +1 for the
>
> first
>
> option.
>
>
> Like I said. I'm fine with either choice by the committers.
> I would choose the `mvn generate-sources` code variant as it is much
> simpler and much more reliable.
>
> Niels
>
> Niels Basjes <[hidden email]> <[hidden email]> 于2020年4月9日周四 下午4:47写道:
>
> Hi,
>
> I'm working on https://issues.apache.org/jira/browse/FLINK-16871 to
>
> make
>
> more build time variables (like the scala version) into the code
>
> available
>
> at runtime.
>
> During the review process there was discussion around a basic
>
> question:
>
> *Is
>
> generating java code during the build ok?*
> See
>
>     -
>
> https://github.com/apache/flink/pull/11245#discussion_r400035133
>
>     - https://github.com/apache/flink/pull/11592
>     -
>
> https://github.com/apache/flink/pull/11592#issuecomment-610282450
>
> As suggested by Chesnay Schepler I'm putting this question to the
>
> mailing
>
> list.
>
>
> https://github.com/apache/flink/pull/11592#issuecomment-610963947
>
> The main discussion was around the ease of use when running in an
>
> IDE
>
> like
>
> IntelliJ.
>
> So essentially we have two solution directions available:
>
>     1. *Generate a properties file and then use the classloader to
>
> load
>
> this
>
>     file as a resource and then parse it as a property file.*
>     This is the currently used solution direction for this part of
>
> the
>
> code.
>
>     A rough version of this (to be improved) :
>
>
>
> https://github.com/apache/flink/commit/47099f663b7644056e9d87b262cd4dba034f513e
>
>     This method has several effects:
>        1. The developer can run the project immediately from within
>
> the
>
> IDE
>
>        as fallback values are provided if the 'actual' values are
>
> missing.
>
>        2. This property file (with stuff that should never be
>
> overwritten)
>
>        can be modified by placing a different one in the classpath.
>
> In
>
> fact it IS
>        modified in the flink-dist as it generates a new file with
>
> the
>
> same
>
> name
>        into the binary distribution (I consider this to be bad).
>        3. Loading resources means loading, parsing and a lot of
>
> error
>
>        handling. Lots of things "can be null" or  be a default
>
> value.
>
> So
>
> the
>
>        values are unreliable and lots of code needs to handle this.
>
> In
>
> fact
>
> when
>        running from IntelliJ the properties file is generated poorly
>
> most
>
> of the
>        time, only during a normal maven build will it work
>
> correctly.
>
>     2. *Generate a Java source file and then simply compile this and
>
> make
>
> it
>
>     part of the project.*
>     Essentially the same model as you would have when using Apache
>
> Avro,
>
>     Protobuf, Antlr 4 and javacc (several of those are used in
>
> Flink!).
>
>     A rough version of this (to be improved) :
>
>
>
> https://github.com/apache/flink/commit/d215e4df60dc9d647dcee1aa9a2114cbf49d0566
>
>     This method has several effects:
>     1. The developer MUST run 'mvn generate-sources' before the
>
> actual
>
> the
>
>        project immediately from within the IDE as fallback values
>
> are
>
> provided if
>        the 'actual' values are missing.
>        2. The code/test will not run until this step is done.
>        3. Because the file is generated by a plugin it is always
>
> correct.
>
> As
>
>        a consequence all variables are always available and the
>
> downstream
>
> users
>        no longer have to handle the "can be null" or "default value"
> situations.
>
> So is generating code similar to what I created a desired change?
> My opinion is that it is the better solution, the data available is
>
> more
>
> reliable and as a consequence the rest of the code is simpler.
> It would probably mean that during development of flink you should
>
> be
>
> aware
>
> of this and do an 'extra step' to get it running.
>
> What do you guys think?
>
> --
> Best regards / Met vriendelijke groeten,
>
> Niels Basjes
>
>
> --
> Best regards / Met vriendelijke groeten,
>
> Niels Basjes
>
>
>
>

--
Best regards / Met vriendelijke groeten,

Niels Basjes