Hi all,
With the effort of FLIP-38 [1], the Python Table API(without UDF support for now) will be supported in the coming release-1.9. As described in "Build PyFlink"[2], if users want to use the Python Table API, they can manually install it using the command: "cd flink-python && python3 setup.py sdist && pip install dist/*.tar.gz". This is non-trivial for users and it will be better if we can follow the Python way to publish PyFlink to PyPI which is a repository of software for the Python programming language. Then users can use the standard Python package manager "pip" to install PyFlink: "pip install pyflink". So, there are some topic need to be discussed as follows: 1. How to publish PyFlink to PyPI 1.1 Project Name We need to decide the project name of PyPI to use, for example, apache-flink, pyflink, etc. Regarding to the name "pyflink", it has already been registered by @ueqt and there is already a package '1.0' released under this project which is based on flink-libraries/flink-python. @ueqt has kindly agreed to give this project back to the community. And he has requested that the released package '1.0' should not be removed as it has already been used in their company. So we need to decide whether to use the name 'pyflink'? If yes, we need to figure out how to tackle with the package '1.0' under this project. From the points of my view, the "pyflink" is better for our project name and we can keep the release of 1.0, maybe more people want to use. 1.2 PyPI account for release We need also decide on which account to use to publish packages to PyPI. There are two permissions in PyPI: owner and maintainer: 1) The owner can upload releases, delete files, releases or the entire project. 2) The maintainer can also upload releases. However, they cannot delete files, releases, or the project. So there are two options in my mind: 1) Create an account such as 'pyflink' as the owner share it with all the release managers and then release managers can publish the package to PyPI using this account. 2) Create an account such as 'pyflink' as owner(only PMC can manage it) and adds the release manager's account as maintainers of the project. Release managers publish the package to PyPI using their own account. As I know, PySpark takes Option 1) and Apache Beam takes Option 2). From the points of my view, I prefer option 2) as it's pretty safer as it eliminate the risk of deleting old releases occasionally and at the same time keeps the trace of who is operating. 2. How to handle Scala_2.11 and Scala_2.12 The PyFlink package bundles the jars in the package. As we know, there are two versions of jars for each module: one for Scala 2.11 and the other for Scala 2.12. So there will be two PyFlink packages theoretically. We need to decide which one to publish to PyPI or both. If both packages will be published to PyPI, we may need two projects, such as pyflink_211 and pyflink_212 separately. Maybe more in the future such as pyflink_213. (BTW, I think we should bring up a discussion for dorp Scala_2.11 in Flink 1.10 release due to 2.13 is available in early June.) From the points of my view, for now, we can only release the scala_2.11 version, due to scala_2.11 is our default version in Flink. 3. Legal problems of publishing to PyPI As @Chesnay Schepler <[hidden email]> pointed out in FLINK-13011[3], publishing PyFlink to PyPI means that we will publish binaries to a distribution channel not owned by Apache. We need to figure out if there are legal problems. From my point of view, there are no problems as a few Apache projects such as Spark, Beam, etc have already done it. Frankly speaking, I am not familiar with this problem, welcome any feedback on this if somebody is more family with this. Great thanks to @ueqt for willing to dedicate PyPI's project name `pyflink` to the Apache Flink community!!! Great thanks to @Dian for the offline effort!!! Best, Jincheng [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-38%3A+Python+Table+API [2] https://ci.apache.org/projects/flink/flink-docs-master/flinkDev/building.html#build-pyflink [3] https://issues.apache.org/jira/browse/FLINK-13011 |
The existing artifact in the pyflink project was neither released by the
Flink project / anyone affiliated with it nor approved by the Flink PMC. As such, if we were to use this account I believe we should delete it to not mislead users that this is in any way an apache-provided distribution. Since this goes against the users wishes, I would be in favor of creating a separate account, and giving back control over the pyflink account. My take on the raised points: 1.1) "apache-flink" 1.2) option 2 2) Given that we only distribute python code there should be no reason to differentiate between scala versions. We should not be distributing any java/scala code and/or modules to PyPi. Currently, I'm a bit confused about this question and wonder what exactly we are trying to publish here. 3) The should be treated as any other source release; i.e., it needs a LICENSE and NOTICE file, signatures and a PMC vote. My suggestion would be to make this part of our normal release process. There will be _one_ source release on dist.apache.org encompassing everything, and a separate python of focused source release that we push to PyPi. The LICENSE and NOTICE contained in the python source release must also be present in the source release of Flink; so basically the python source release is just the contents of flink-python module the maven pom.xml, with no other special sauce added during the release process. On 02/07/2019 05:42, jincheng sun wrote: > Hi all, > > With the effort of FLIP-38 [1], the Python Table API(without UDF support > for now) will be supported in the coming release-1.9. > As described in "Build PyFlink"[2], if users want to use the Python Table > API, they can manually install it using the command: > "cd flink-python && python3 setup.py sdist && pip install dist/*.tar.gz". > > This is non-trivial for users and it will be better if we can follow the > Python way to publish PyFlink to PyPI > which is a repository of software for the Python programming language. Then > users can use the standard Python package > manager "pip" to install PyFlink: "pip install pyflink". So, there are some > topic need to be discussed as follows: > > 1. How to publish PyFlink to PyPI > > 1.1 Project Name > We need to decide the project name of PyPI to use, for example, > apache-flink, pyflink, etc. > > Regarding to the name "pyflink", it has already been registered by > @ueqt and there is already a package '1.0' released under this project > which is based on flink-libraries/flink-python. > > @ueqt has kindly agreed to give this project back to the community. And > he has requested that the released package '1.0' should not be removed as > it has already been used in their company. > > So we need to decide whether to use the name 'pyflink'? If yes, we > need to figure out how to tackle with the package '1.0' under this project. > > From the points of my view, the "pyflink" is better for our project > name and we can keep the release of 1.0, maybe more people want to use. > > 1.2 PyPI account for release > We need also decide on which account to use to publish packages to PyPI. > > There are two permissions in PyPI: owner and maintainer: > > 1) The owner can upload releases, delete files, releases or the entire > project. > 2) The maintainer can also upload releases. However, they cannot delete > files, releases, or the project. > > So there are two options in my mind: > > 1) Create an account such as 'pyflink' as the owner share it with all > the release managers and then release managers can publish the package to > PyPI using this account. > 2) Create an account such as 'pyflink' as owner(only PMC can manage it) > and adds the release manager's account as maintainers of the project. > Release managers publish the package to PyPI using their own account. > > As I know, PySpark takes Option 1) and Apache Beam takes Option 2). > > From the points of my view, I prefer option 2) as it's pretty safer as > it eliminate the risk of deleting old releases occasionally and at the same > time keeps the trace of who is operating. > > 2. How to handle Scala_2.11 and Scala_2.12 > > The PyFlink package bundles the jars in the package. As we know, there are > two versions of jars for each module: one for Scala 2.11 and the other for > Scala 2.12. So there will be two PyFlink packages theoretically. We need to > decide which one to publish to PyPI or both. If both packages will be > published to PyPI, we may need two projects, such as pyflink_211 and > pyflink_212 separately. Maybe more in the future such as pyflink_213. > > (BTW, I think we should bring up a discussion for dorp Scala_2.11 in > Flink 1.10 release due to 2.13 is available in early June.) > > From the points of my view, for now, we can only release the scala_2.11 > version, due to scala_2.11 is our default version in Flink. > > 3. Legal problems of publishing to PyPI > > As @Chesnay Schepler <[hidden email]> pointed out in FLINK-13011[3], > publishing PyFlink to PyPI means that we will publish binaries to a > distribution channel not owned by Apache. We need to figure out if there > are legal problems. From my point of view, there are no problems as a few > Apache projects such as Spark, Beam, etc have already done it. Frankly > speaking, I am not familiar with this problem, welcome any feedback on this > if somebody is more family with this. > > Great thanks to @ueqt for willing to dedicate PyPI's project name `pyflink` > to the Apache Flink community!!! > Great thanks to @Dian for the offline effort!!! > > Best, > Jincheng > > [1] > https://cwiki.apache.org/confluence/display/FLINK/FLIP-38%3A+Python+Table+API > [2] > https://ci.apache.org/projects/flink/flink-docs-master/flinkDev/building.html#build-pyflink > [3] https://issues.apache.org/jira/browse/FLINK-13011 > |
Hi Chesnay,
Thanks a lot for the suggestions. Regarding “distributing java/scala code to PyPI”: The Python Table API is just a wrapper of the Java Table API and without the java/scala code, two steps will be needed to set up an environment to execute a Python Table API program: 1) Install pyflink using "pip install apache-flink" 2) Download the flink distribution and set the FLINK_HOME to it. Besides, users have to make sure that the manually installed Flink is compatible with the pip installed pyflink. Bundle the java/scala code inside the Python package will eliminate step 2) and makes it more simple for users to install pyflink. There was a short discussion <https://issues.apache.org/jira/browse/SPARK-1267> on this in Spark community and they finally decide to package the java/scala code in the python package. (BTW, PySpark only bundle the jars of scala 2.11). Regards, Dian > 在 2019年7月3日,下午7:13,Chesnay Schepler <[hidden email]> 写道: > > The existing artifact in the pyflink project was neither released by the Flink project / anyone affiliated with it nor approved by the Flink PMC. > > As such, if we were to use this account I believe we should delete it to not mislead users that this is in any way an apache-provided distribution. Since this goes against the users wishes, I would be in favor of creating a separate account, and giving back control over the pyflink account. > > My take on the raised points: > 1.1) "apache-flink" > 1.2) option 2 > 2) Given that we only distribute python code there should be no reason to differentiate between scala versions. We should not be distributing any java/scala code and/or modules to PyPi. Currently, I'm a bit confused about this question and wonder what exactly we are trying to publish here. > 3) The should be treated as any other source release; i.e., it needs a LICENSE and NOTICE file, signatures and a PMC vote. My suggestion would be to make this part of our normal release process. There will be _one_ source release on dist.apache.org encompassing everything, and a separate python of focused source release that we push to PyPi. The LICENSE and NOTICE contained in the python source release must also be present in the source release of Flink; so basically the python source release is just the contents of flink-python module the maven pom.xml, with no other special sauce added during the release process. > > On 02/07/2019 05:42, jincheng sun wrote: >> Hi all, >> >> With the effort of FLIP-38 [1], the Python Table API(without UDF support >> for now) will be supported in the coming release-1.9. >> As described in "Build PyFlink"[2], if users want to use the Python Table >> API, they can manually install it using the command: >> "cd flink-python && python3 setup.py sdist && pip install dist/*.tar.gz". >> >> This is non-trivial for users and it will be better if we can follow the >> Python way to publish PyFlink to PyPI >> which is a repository of software for the Python programming language. Then >> users can use the standard Python package >> manager "pip" to install PyFlink: "pip install pyflink". So, there are some >> topic need to be discussed as follows: >> >> 1. How to publish PyFlink to PyPI >> >> 1.1 Project Name >> We need to decide the project name of PyPI to use, for example, >> apache-flink, pyflink, etc. >> >> Regarding to the name "pyflink", it has already been registered by >> @ueqt and there is already a package '1.0' released under this project >> which is based on flink-libraries/flink-python. >> >> @ueqt has kindly agreed to give this project back to the community. And >> he has requested that the released package '1.0' should not be removed as >> it has already been used in their company. >> >> So we need to decide whether to use the name 'pyflink'? If yes, we >> need to figure out how to tackle with the package '1.0' under this project. >> >> From the points of my view, the "pyflink" is better for our project >> name and we can keep the release of 1.0, maybe more people want to use. >> >> 1.2 PyPI account for release >> We need also decide on which account to use to publish packages to PyPI. >> >> There are two permissions in PyPI: owner and maintainer: >> >> 1) The owner can upload releases, delete files, releases or the entire >> project. >> 2) The maintainer can also upload releases. However, they cannot delete >> files, releases, or the project. >> >> So there are two options in my mind: >> >> 1) Create an account such as 'pyflink' as the owner share it with all >> the release managers and then release managers can publish the package to >> PyPI using this account. >> 2) Create an account such as 'pyflink' as owner(only PMC can manage it) >> and adds the release manager's account as maintainers of the project. >> Release managers publish the package to PyPI using their own account. >> >> As I know, PySpark takes Option 1) and Apache Beam takes Option 2). >> >> From the points of my view, I prefer option 2) as it's pretty safer as >> it eliminate the risk of deleting old releases occasionally and at the same >> time keeps the trace of who is operating. >> >> 2. How to handle Scala_2.11 and Scala_2.12 >> >> The PyFlink package bundles the jars in the package. As we know, there are >> two versions of jars for each module: one for Scala 2.11 and the other for >> Scala 2.12. So there will be two PyFlink packages theoretically. We need to >> decide which one to publish to PyPI or both. If both packages will be >> published to PyPI, we may need two projects, such as pyflink_211 and >> pyflink_212 separately. Maybe more in the future such as pyflink_213. >> >> (BTW, I think we should bring up a discussion for dorp Scala_2.11 in >> Flink 1.10 release due to 2.13 is available in early June.) >> >> From the points of my view, for now, we can only release the scala_2.11 >> version, due to scala_2.11 is our default version in Flink. >> >> 3. Legal problems of publishing to PyPI >> >> As @Chesnay Schepler <[hidden email]> pointed out in FLINK-13011[3], >> publishing PyFlink to PyPI means that we will publish binaries to a >> distribution channel not owned by Apache. We need to figure out if there >> are legal problems. From my point of view, there are no problems as a few >> Apache projects such as Spark, Beam, etc have already done it. Frankly >> speaking, I am not familiar with this problem, welcome any feedback on this >> if somebody is more family with this. >> >> Great thanks to @ueqt for willing to dedicate PyPI's project name `pyflink` >> to the Apache Flink community!!! >> Great thanks to @Dian for the offline effort!!! >> >> Best, >> Jincheng >> >> [1] >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-38%3A+Python+Table+API >> [2] >> https://ci.apache.org/projects/flink/flink-docs-master/flinkDev/building.html#build-pyflink >> [3] https://issues.apache.org/jira/browse/FLINK-13011 >> > |
So this would not be a source release then, but a full-blown binary release.
Maybe it is just me, but I find it a bit suspect to ship an entire java application via PyPI, just because there's a Python API for it. We definitely need input from more people here. On 03/07/2019 14:09, Dian Fu wrote: > Hi Chesnay, > > Thanks a lot for the suggestions. > > Regarding “distributing java/scala code to PyPI”: > The Python Table API is just a wrapper of the Java Table API and without the java/scala code, two steps will be needed to set up an environment to execute a Python Table API program: > 1) Install pyflink using "pip install apache-flink" > 2) Download the flink distribution and set the FLINK_HOME to it. > Besides, users have to make sure that the manually installed Flink is compatible with the pip installed pyflink. > > Bundle the java/scala code inside the Python package will eliminate step 2) and makes it more simple for users to install pyflink. There was a short discussion <https://issues.apache.org/jira/browse/SPARK-1267> on this in Spark community and they finally decide to package the java/scala code in the python package. (BTW, PySpark only bundle the jars of scala 2.11). > > Regards, > Dian > >> 在 2019年7月3日,下午7:13,Chesnay Schepler <[hidden email]> 写道: >> >> The existing artifact in the pyflink project was neither released by the Flink project / anyone affiliated with it nor approved by the Flink PMC. >> >> As such, if we were to use this account I believe we should delete it to not mislead users that this is in any way an apache-provided distribution. Since this goes against the users wishes, I would be in favor of creating a separate account, and giving back control over the pyflink account. >> >> My take on the raised points: >> 1.1) "apache-flink" >> 1.2) option 2 >> 2) Given that we only distribute python code there should be no reason to differentiate between scala versions. We should not be distributing any java/scala code and/or modules to PyPi. Currently, I'm a bit confused about this question and wonder what exactly we are trying to publish here. >> 3) The should be treated as any other source release; i.e., it needs a LICENSE and NOTICE file, signatures and a PMC vote. My suggestion would be to make this part of our normal release process. There will be _one_ source release on dist.apache.org encompassing everything, and a separate python of focused source release that we push to PyPi. The LICENSE and NOTICE contained in the python source release must also be present in the source release of Flink; so basically the python source release is just the contents of flink-python module the maven pom.xml, with no other special sauce added during the release process. >> >> On 02/07/2019 05:42, jincheng sun wrote: >>> Hi all, >>> >>> With the effort of FLIP-38 [1], the Python Table API(without UDF support >>> for now) will be supported in the coming release-1.9. >>> As described in "Build PyFlink"[2], if users want to use the Python Table >>> API, they can manually install it using the command: >>> "cd flink-python && python3 setup.py sdist && pip install dist/*.tar.gz". >>> >>> This is non-trivial for users and it will be better if we can follow the >>> Python way to publish PyFlink to PyPI >>> which is a repository of software for the Python programming language. Then >>> users can use the standard Python package >>> manager "pip" to install PyFlink: "pip install pyflink". So, there are some >>> topic need to be discussed as follows: >>> >>> 1. How to publish PyFlink to PyPI >>> >>> 1.1 Project Name >>> We need to decide the project name of PyPI to use, for example, >>> apache-flink, pyflink, etc. >>> >>> Regarding to the name "pyflink", it has already been registered by >>> @ueqt and there is already a package '1.0' released under this project >>> which is based on flink-libraries/flink-python. >>> >>> @ueqt has kindly agreed to give this project back to the community. And >>> he has requested that the released package '1.0' should not be removed as >>> it has already been used in their company. >>> >>> So we need to decide whether to use the name 'pyflink'? If yes, we >>> need to figure out how to tackle with the package '1.0' under this project. >>> >>> From the points of my view, the "pyflink" is better for our project >>> name and we can keep the release of 1.0, maybe more people want to use. >>> >>> 1.2 PyPI account for release >>> We need also decide on which account to use to publish packages to PyPI. >>> >>> There are two permissions in PyPI: owner and maintainer: >>> >>> 1) The owner can upload releases, delete files, releases or the entire >>> project. >>> 2) The maintainer can also upload releases. However, they cannot delete >>> files, releases, or the project. >>> >>> So there are two options in my mind: >>> >>> 1) Create an account such as 'pyflink' as the owner share it with all >>> the release managers and then release managers can publish the package to >>> PyPI using this account. >>> 2) Create an account such as 'pyflink' as owner(only PMC can manage it) >>> and adds the release manager's account as maintainers of the project. >>> Release managers publish the package to PyPI using their own account. >>> >>> As I know, PySpark takes Option 1) and Apache Beam takes Option 2). >>> >>> From the points of my view, I prefer option 2) as it's pretty safer as >>> it eliminate the risk of deleting old releases occasionally and at the same >>> time keeps the trace of who is operating. >>> >>> 2. How to handle Scala_2.11 and Scala_2.12 >>> >>> The PyFlink package bundles the jars in the package. As we know, there are >>> two versions of jars for each module: one for Scala 2.11 and the other for >>> Scala 2.12. So there will be two PyFlink packages theoretically. We need to >>> decide which one to publish to PyPI or both. If both packages will be >>> published to PyPI, we may need two projects, such as pyflink_211 and >>> pyflink_212 separately. Maybe more in the future such as pyflink_213. >>> >>> (BTW, I think we should bring up a discussion for dorp Scala_2.11 in >>> Flink 1.10 release due to 2.13 is available in early June.) >>> >>> From the points of my view, for now, we can only release the scala_2.11 >>> version, due to scala_2.11 is our default version in Flink. >>> >>> 3. Legal problems of publishing to PyPI >>> >>> As @Chesnay Schepler <[hidden email]> pointed out in FLINK-13011[3], >>> publishing PyFlink to PyPI means that we will publish binaries to a >>> distribution channel not owned by Apache. We need to figure out if there >>> are legal problems. From my point of view, there are no problems as a few >>> Apache projects such as Spark, Beam, etc have already done it. Frankly >>> speaking, I am not familiar with this problem, welcome any feedback on this >>> if somebody is more family with this. >>> >>> Great thanks to @ueqt for willing to dedicate PyPI's project name `pyflink` >>> to the Apache Flink community!!! >>> Great thanks to @Dian for the offline effort!!! >>> >>> Best, >>> Jincheng >>> >>> [1] >>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-38%3A+Python+Table+API >>> [2] >>> https://ci.apache.org/projects/flink/flink-docs-master/flinkDev/building.html#build-pyflink >>> [3] https://issues.apache.org/jira/browse/FLINK-13011 >>> > |
Hi All,
Thanks for the feedback @Chesnay Schepler <[hidden email]> @Dian! I think using `apache-flink` for the project name also makes sense to me. due to we should always keep in mind that Flink is owned by Apache. (And beam also using this pattern `apache-beam` for Python API) Regarding the Python API release with the JAVA JARs, I think the principle of consideration is the convenience of the user. So, Thanks for the explanation @Dian! And your right @Chesnay Schepler <[hidden email]> we can't make a hasty decision and we need more people's opinions! So, I appreciate it if anyone can give us feedback and suggestions! Best, Jincheng Chesnay Schepler <[hidden email]> 于2019年7月3日周三 下午8:46写道: > So this would not be a source release then, but a full-blown binary > release. > > Maybe it is just me, but I find it a bit suspect to ship an entire java > application via PyPI, just because there's a Python API for it. > > We definitely need input from more people here. > > On 03/07/2019 14:09, Dian Fu wrote: > > Hi Chesnay, > > > > Thanks a lot for the suggestions. > > > > Regarding “distributing java/scala code to PyPI”: > > The Python Table API is just a wrapper of the Java Table API and without > the java/scala code, two steps will be needed to set up an environment to > execute a Python Table API program: > > 1) Install pyflink using "pip install apache-flink" > > 2) Download the flink distribution and set the FLINK_HOME to it. > > Besides, users have to make sure that the manually installed Flink is > compatible with the pip installed pyflink. > > > > Bundle the java/scala code inside the Python package will eliminate step > 2) and makes it more simple for users to install pyflink. There was a short > discussion <https://issues.apache.org/jira/browse/SPARK-1267> on this in > Spark community and they finally decide to package the java/scala code in > the python package. (BTW, PySpark only bundle the jars of scala 2.11). > > > > Regards, > > Dian > > > >> 在 2019年7月3日,下午7:13,Chesnay Schepler <[hidden email]> 写道: > >> > >> The existing artifact in the pyflink project was neither released by > the Flink project / anyone affiliated with it nor approved by the Flink PMC. > >> > >> As such, if we were to use this account I believe we should delete it > to not mislead users that this is in any way an apache-provided > distribution. Since this goes against the users wishes, I would be in favor > of creating a separate account, and giving back control over the pyflink > account. > >> > >> My take on the raised points: > >> 1.1) "apache-flink" > >> 1.2) option 2 > >> 2) Given that we only distribute python code there should be no reason > to differentiate between scala versions. We should not be distributing any > java/scala code and/or modules to PyPi. Currently, I'm a bit confused about > this question and wonder what exactly we are trying to publish here. > >> 3) The should be treated as any other source release; i.e., it needs a > LICENSE and NOTICE file, signatures and a PMC vote. My suggestion would be > to make this part of our normal release process. There will be _one_ source > release on dist.apache.org encompassing everything, and a separate python > of focused source release that we push to PyPi. The LICENSE and NOTICE > contained in the python source release must also be present in the source > release of Flink; so basically the python source release is just the > contents of flink-python module the maven pom.xml, with no other special > sauce added during the release process. > >> > >> On 02/07/2019 05:42, jincheng sun wrote: > >>> Hi all, > >>> > >>> With the effort of FLIP-38 [1], the Python Table API(without UDF > support > >>> for now) will be supported in the coming release-1.9. > >>> As described in "Build PyFlink"[2], if users want to use the Python > Table > >>> API, they can manually install it using the command: > >>> "cd flink-python && python3 setup.py sdist && pip install > dist/*.tar.gz". > >>> > >>> This is non-trivial for users and it will be better if we can follow > the > >>> Python way to publish PyFlink to PyPI > >>> which is a repository of software for the Python programming language. > Then > >>> users can use the standard Python package > >>> manager "pip" to install PyFlink: "pip install pyflink". So, there are > some > >>> topic need to be discussed as follows: > >>> > >>> 1. How to publish PyFlink to PyPI > >>> > >>> 1.1 Project Name > >>> We need to decide the project name of PyPI to use, for example, > >>> apache-flink, pyflink, etc. > >>> > >>> Regarding to the name "pyflink", it has already been registered by > >>> @ueqt and there is already a package '1.0' released under this project > >>> which is based on flink-libraries/flink-python. > >>> > >>> @ueqt has kindly agreed to give this project back to the > community. And > >>> he has requested that the released package '1.0' should not be removed > as > >>> it has already been used in their company. > >>> > >>> So we need to decide whether to use the name 'pyflink'? If yes, > we > >>> need to figure out how to tackle with the package '1.0' under this > project. > >>> > >>> From the points of my view, the "pyflink" is better for our > project > >>> name and we can keep the release of 1.0, maybe more people want to use. > >>> > >>> 1.2 PyPI account for release > >>> We need also decide on which account to use to publish packages > to PyPI. > >>> > >>> There are two permissions in PyPI: owner and maintainer: > >>> > >>> 1) The owner can upload releases, delete files, releases or the > entire > >>> project. > >>> 2) The maintainer can also upload releases. However, they cannot > delete > >>> files, releases, or the project. > >>> > >>> So there are two options in my mind: > >>> > >>> 1) Create an account such as 'pyflink' as the owner share it with > all > >>> the release managers and then release managers can publish the package > to > >>> PyPI using this account. > >>> 2) Create an account such as 'pyflink' as owner(only PMC can > manage it) > >>> and adds the release manager's account as maintainers of the project. > >>> Release managers publish the package to PyPI using their own account. > >>> > >>> As I know, PySpark takes Option 1) and Apache Beam takes Option > 2). > >>> > >>> From the points of my view, I prefer option 2) as it's pretty > safer as > >>> it eliminate the risk of deleting old releases occasionally and at the > same > >>> time keeps the trace of who is operating. > >>> > >>> 2. How to handle Scala_2.11 and Scala_2.12 > >>> > >>> The PyFlink package bundles the jars in the package. As we know, there > are > >>> two versions of jars for each module: one for Scala 2.11 and the other > for > >>> Scala 2.12. So there will be two PyFlink packages theoretically. We > need to > >>> decide which one to publish to PyPI or both. If both packages will be > >>> published to PyPI, we may need two projects, such as pyflink_211 and > >>> pyflink_212 separately. Maybe more in the future such as pyflink_213. > >>> > >>> (BTW, I think we should bring up a discussion for dorp Scala_2.11 > in > >>> Flink 1.10 release due to 2.13 is available in early June.) > >>> > >>> From the points of my view, for now, we can only release the > scala_2.11 > >>> version, due to scala_2.11 is our default version in Flink. > >>> > >>> 3. Legal problems of publishing to PyPI > >>> > >>> As @Chesnay Schepler <[hidden email]> pointed out in > FLINK-13011[3], > >>> publishing PyFlink to PyPI means that we will publish binaries to a > >>> distribution channel not owned by Apache. We need to figure out if > there > >>> are legal problems. From my point of view, there are no problems as a > few > >>> Apache projects such as Spark, Beam, etc have already done it. Frankly > >>> speaking, I am not familiar with this problem, welcome any feedback on > this > >>> if somebody is more family with this. > >>> > >>> Great thanks to @ueqt for willing to dedicate PyPI's project name > `pyflink` > >>> to the Apache Flink community!!! > >>> Great thanks to @Dian for the offline effort!!! > >>> > >>> Best, > >>> Jincheng > >>> > >>> [1] > >>> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-38%3A+Python+Table+API > >>> [2] > >>> > https://ci.apache.org/projects/flink/flink-docs-master/flinkDev/building.html#build-pyflink > >>> [3] https://issues.apache.org/jira/browse/FLINK-13011 > >>> > > > > |
Hi!
Sorry for the late involvement. Here are some thoughts from my side: Definitely +1 to publishing to PyPy, even if it is a binary release. Community growth into other communities is great, and if this is the natural way to reach developers in the Python community, let's do it. This is not about our convenience, but reaching users. I think the way to look at this is that this is a convenience distribution channel, courtesy of the Flink community. It is not an Apache release, we make this clear in the Readme. Of course, this doesn't mean we don't try to uphold similar standards as for our official release (like proper license information). Concerning credentials sharing, I would be fine with whatever option. The PMC doesn't own it (it is an initiative by some community members), but the PMC needs to ensure trademark compliance, so slight preference for option #1 (PMC would have means to correct problems). I believe there is no need to differentiate between Scala versions, because this is merely a convenience thing for pure Python users. Users that mix python and scala (and thus depend on specific scala versions) can still download from Apache or build themselves. Best, Stephan On Thu, Jul 4, 2019 at 9:51 AM jincheng sun <[hidden email]> wrote: > Hi All, > > Thanks for the feedback @Chesnay Schepler <[hidden email]> @Dian! > > I think using `apache-flink` for the project name also makes sense to me. > due to we should always keep in mind that Flink is owned by Apache. (And > beam also using this pattern `apache-beam` for Python API) > > Regarding the Python API release with the JAVA JARs, I think the principle > of consideration is the convenience of the user. So, Thanks for the > explanation @Dian! > > And your right @Chesnay Schepler <[hidden email]> we can't make a > hasty decision and we need more people's opinions! > > So, I appreciate it if anyone can give us feedback and suggestions! > > Best, > Jincheng > > > > > Chesnay Schepler <[hidden email]> 于2019年7月3日周三 下午8:46写道: > > > So this would not be a source release then, but a full-blown binary > > release. > > > > Maybe it is just me, but I find it a bit suspect to ship an entire java > > application via PyPI, just because there's a Python API for it. > > > > We definitely need input from more people here. > > > > On 03/07/2019 14:09, Dian Fu wrote: > > > Hi Chesnay, > > > > > > Thanks a lot for the suggestions. > > > > > > Regarding “distributing java/scala code to PyPI”: > > > The Python Table API is just a wrapper of the Java Table API and > without > > the java/scala code, two steps will be needed to set up an environment to > > execute a Python Table API program: > > > 1) Install pyflink using "pip install apache-flink" > > > 2) Download the flink distribution and set the FLINK_HOME to it. > > > Besides, users have to make sure that the manually installed Flink is > > compatible with the pip installed pyflink. > > > > > > Bundle the java/scala code inside the Python package will eliminate > step > > 2) and makes it more simple for users to install pyflink. There was a > short > > discussion <https://issues.apache.org/jira/browse/SPARK-1267> on this in > > Spark community and they finally decide to package the java/scala code in > > the python package. (BTW, PySpark only bundle the jars of scala 2.11). > > > > > > Regards, > > > Dian > > > > > >> 在 2019年7月3日,下午7:13,Chesnay Schepler <[hidden email]> 写道: > > >> > > >> The existing artifact in the pyflink project was neither released by > > the Flink project / anyone affiliated with it nor approved by the Flink > PMC. > > >> > > >> As such, if we were to use this account I believe we should delete it > > to not mislead users that this is in any way an apache-provided > > distribution. Since this goes against the users wishes, I would be in > favor > > of creating a separate account, and giving back control over the pyflink > > account. > > >> > > >> My take on the raised points: > > >> 1.1) "apache-flink" > > >> 1.2) option 2 > > >> 2) Given that we only distribute python code there should be no reason > > to differentiate between scala versions. We should not be distributing > any > > java/scala code and/or modules to PyPi. Currently, I'm a bit confused > about > > this question and wonder what exactly we are trying to publish here. > > >> 3) The should be treated as any other source release; i.e., it needs a > > LICENSE and NOTICE file, signatures and a PMC vote. My suggestion would > be > > to make this part of our normal release process. There will be _one_ > source > > release on dist.apache.org encompassing everything, and a separate > python > > of focused source release that we push to PyPi. The LICENSE and NOTICE > > contained in the python source release must also be present in the source > > release of Flink; so basically the python source release is just the > > contents of flink-python module the maven pom.xml, with no other special > > sauce added during the release process. > > >> > > >> On 02/07/2019 05:42, jincheng sun wrote: > > >>> Hi all, > > >>> > > >>> With the effort of FLIP-38 [1], the Python Table API(without UDF > > support > > >>> for now) will be supported in the coming release-1.9. > > >>> As described in "Build PyFlink"[2], if users want to use the Python > > Table > > >>> API, they can manually install it using the command: > > >>> "cd flink-python && python3 setup.py sdist && pip install > > dist/*.tar.gz". > > >>> > > >>> This is non-trivial for users and it will be better if we can follow > > the > > >>> Python way to publish PyFlink to PyPI > > >>> which is a repository of software for the Python programming > language. > > Then > > >>> users can use the standard Python package > > >>> manager "pip" to install PyFlink: "pip install pyflink". So, there > are > > some > > >>> topic need to be discussed as follows: > > >>> > > >>> 1. How to publish PyFlink to PyPI > > >>> > > >>> 1.1 Project Name > > >>> We need to decide the project name of PyPI to use, for example, > > >>> apache-flink, pyflink, etc. > > >>> > > >>> Regarding to the name "pyflink", it has already been registered > by > > >>> @ueqt and there is already a package '1.0' released under this > project > > >>> which is based on flink-libraries/flink-python. > > >>> > > >>> @ueqt has kindly agreed to give this project back to the > > community. And > > >>> he has requested that the released package '1.0' should not be > removed > > as > > >>> it has already been used in their company. > > >>> > > >>> So we need to decide whether to use the name 'pyflink'? If yes, > > we > > >>> need to figure out how to tackle with the package '1.0' under this > > project. > > >>> > > >>> From the points of my view, the "pyflink" is better for our > > project > > >>> name and we can keep the release of 1.0, maybe more people want to > use. > > >>> > > >>> 1.2 PyPI account for release > > >>> We need also decide on which account to use to publish packages > > to PyPI. > > >>> > > >>> There are two permissions in PyPI: owner and maintainer: > > >>> > > >>> 1) The owner can upload releases, delete files, releases or the > > entire > > >>> project. > > >>> 2) The maintainer can also upload releases. However, they cannot > > delete > > >>> files, releases, or the project. > > >>> > > >>> So there are two options in my mind: > > >>> > > >>> 1) Create an account such as 'pyflink' as the owner share it > with > > all > > >>> the release managers and then release managers can publish the > package > > to > > >>> PyPI using this account. > > >>> 2) Create an account such as 'pyflink' as owner(only PMC can > > manage it) > > >>> and adds the release manager's account as maintainers of the project. > > >>> Release managers publish the package to PyPI using their own account. > > >>> > > >>> As I know, PySpark takes Option 1) and Apache Beam takes Option > > 2). > > >>> > > >>> From the points of my view, I prefer option 2) as it's pretty > > safer as > > >>> it eliminate the risk of deleting old releases occasionally and at > the > > same > > >>> time keeps the trace of who is operating. > > >>> > > >>> 2. How to handle Scala_2.11 and Scala_2.12 > > >>> > > >>> The PyFlink package bundles the jars in the package. As we know, > there > > are > > >>> two versions of jars for each module: one for Scala 2.11 and the > other > > for > > >>> Scala 2.12. So there will be two PyFlink packages theoretically. We > > need to > > >>> decide which one to publish to PyPI or both. If both packages will be > > >>> published to PyPI, we may need two projects, such as pyflink_211 and > > >>> pyflink_212 separately. Maybe more in the future such as pyflink_213. > > >>> > > >>> (BTW, I think we should bring up a discussion for dorp > Scala_2.11 > > in > > >>> Flink 1.10 release due to 2.13 is available in early June.) > > >>> > > >>> From the points of my view, for now, we can only release the > > scala_2.11 > > >>> version, due to scala_2.11 is our default version in Flink. > > >>> > > >>> 3. Legal problems of publishing to PyPI > > >>> > > >>> As @Chesnay Schepler <[hidden email]> pointed out in > > FLINK-13011[3], > > >>> publishing PyFlink to PyPI means that we will publish binaries to a > > >>> distribution channel not owned by Apache. We need to figure out if > > there > > >>> are legal problems. From my point of view, there are no problems as a > > few > > >>> Apache projects such as Spark, Beam, etc have already done it. > Frankly > > >>> speaking, I am not familiar with this problem, welcome any feedback > on > > this > > >>> if somebody is more family with this. > > >>> > > >>> Great thanks to @ueqt for willing to dedicate PyPI's project name > > `pyflink` > > >>> to the Apache Flink community!!! > > >>> Great thanks to @Dian for the offline effort!!! > > >>> > > >>> Best, > > >>> Jincheng > > >>> > > >>> [1] > > >>> > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-38%3A+Python+Table+API > > >>> [2] > > >>> > > > https://ci.apache.org/projects/flink/flink-docs-master/flinkDev/building.html#build-pyflink > > >>> [3] https://issues.apache.org/jira/browse/FLINK-13011 > > >>> > > > > > > > > |
+1 for publishing pyflink to pypi.
Regarding including jar, I just want to make sure which flink binary distribution we would ship with pyflink since we have multiple flink binary distributions (w/o hadoop). Personally, I prefer to use the hadoop-included binary distribution. And I just want to confirm whether it is possible for users to use a different flink binary distribution as long as he set env FLINK_HOME. Besides that, I hope that there will be bi-direction link reference between flink doc and pypi doc. Stephan Ewen <[hidden email]> 于2019年7月24日周三 上午12:07写道: > Hi! > > Sorry for the late involvement. Here are some thoughts from my side: > > Definitely +1 to publishing to PyPy, even if it is a binary release. > Community growth into other communities is great, and if this is the > natural way to reach developers in the Python community, let's do it. This > is not about our convenience, but reaching users. > > I think the way to look at this is that this is a convenience distribution > channel, courtesy of the Flink community. It is not an Apache release, we > make this clear in the Readme. > Of course, this doesn't mean we don't try to uphold similar standards as > for our official release (like proper license information). > > Concerning credentials sharing, I would be fine with whatever option. The > PMC doesn't own it (it is an initiative by some community members), but the > PMC needs to ensure trademark compliance, so slight preference for option > #1 (PMC would have means to correct problems). > > I believe there is no need to differentiate between Scala versions, because > this is merely a convenience thing for pure Python users. Users that mix > python and scala (and thus depend on specific scala versions) can still > download from Apache or build themselves. > > Best, > Stephan > > > > On Thu, Jul 4, 2019 at 9:51 AM jincheng sun <[hidden email]> > wrote: > > > Hi All, > > > > Thanks for the feedback @Chesnay Schepler <[hidden email]> @Dian! > > > > I think using `apache-flink` for the project name also makes sense to me. > > due to we should always keep in mind that Flink is owned by Apache. (And > > beam also using this pattern `apache-beam` for Python API) > > > > Regarding the Python API release with the JAVA JARs, I think the > principle > > of consideration is the convenience of the user. So, Thanks for the > > explanation @Dian! > > > > And your right @Chesnay Schepler <[hidden email]> we can't make a > > hasty decision and we need more people's opinions! > > > > So, I appreciate it if anyone can give us feedback and suggestions! > > > > Best, > > Jincheng > > > > > > > > > > Chesnay Schepler <[hidden email]> 于2019年7月3日周三 下午8:46写道: > > > > > So this would not be a source release then, but a full-blown binary > > > release. > > > > > > Maybe it is just me, but I find it a bit suspect to ship an entire java > > > application via PyPI, just because there's a Python API for it. > > > > > > We definitely need input from more people here. > > > > > > On 03/07/2019 14:09, Dian Fu wrote: > > > > Hi Chesnay, > > > > > > > > Thanks a lot for the suggestions. > > > > > > > > Regarding “distributing java/scala code to PyPI”: > > > > The Python Table API is just a wrapper of the Java Table API and > > without > > > the java/scala code, two steps will be needed to set up an environment > to > > > execute a Python Table API program: > > > > 1) Install pyflink using "pip install apache-flink" > > > > 2) Download the flink distribution and set the FLINK_HOME to it. > > > > Besides, users have to make sure that the manually installed Flink is > > > compatible with the pip installed pyflink. > > > > > > > > Bundle the java/scala code inside the Python package will eliminate > > step > > > 2) and makes it more simple for users to install pyflink. There was a > > short > > > discussion <https://issues.apache.org/jira/browse/SPARK-1267> on this > in > > > Spark community and they finally decide to package the java/scala code > in > > > the python package. (BTW, PySpark only bundle the jars of scala 2.11). > > > > > > > > Regards, > > > > Dian > > > > > > > >> 在 2019年7月3日,下午7:13,Chesnay Schepler <[hidden email]> 写道: > > > >> > > > >> The existing artifact in the pyflink project was neither released by > > > the Flink project / anyone affiliated with it nor approved by the Flink > > PMC. > > > >> > > > >> As such, if we were to use this account I believe we should delete > it > > > to not mislead users that this is in any way an apache-provided > > > distribution. Since this goes against the users wishes, I would be in > > favor > > > of creating a separate account, and giving back control over the > pyflink > > > account. > > > >> > > > >> My take on the raised points: > > > >> 1.1) "apache-flink" > > > >> 1.2) option 2 > > > >> 2) Given that we only distribute python code there should be no > reason > > > to differentiate between scala versions. We should not be distributing > > any > > > java/scala code and/or modules to PyPi. Currently, I'm a bit confused > > about > > > this question and wonder what exactly we are trying to publish here. > > > >> 3) The should be treated as any other source release; i.e., it > needs a > > > LICENSE and NOTICE file, signatures and a PMC vote. My suggestion would > > be > > > to make this part of our normal release process. There will be _one_ > > source > > > release on dist.apache.org encompassing everything, and a separate > > python > > > of focused source release that we push to PyPi. The LICENSE and NOTICE > > > contained in the python source release must also be present in the > source > > > release of Flink; so basically the python source release is just the > > > contents of flink-python module the maven pom.xml, with no other > special > > > sauce added during the release process. > > > >> > > > >> On 02/07/2019 05:42, jincheng sun wrote: > > > >>> Hi all, > > > >>> > > > >>> With the effort of FLIP-38 [1], the Python Table API(without UDF > > > support > > > >>> for now) will be supported in the coming release-1.9. > > > >>> As described in "Build PyFlink"[2], if users want to use the Python > > > Table > > > >>> API, they can manually install it using the command: > > > >>> "cd flink-python && python3 setup.py sdist && pip install > > > dist/*.tar.gz". > > > >>> > > > >>> This is non-trivial for users and it will be better if we can > follow > > > the > > > >>> Python way to publish PyFlink to PyPI > > > >>> which is a repository of software for the Python programming > > language. > > > Then > > > >>> users can use the standard Python package > > > >>> manager "pip" to install PyFlink: "pip install pyflink". So, there > > are > > > some > > > >>> topic need to be discussed as follows: > > > >>> > > > >>> 1. How to publish PyFlink to PyPI > > > >>> > > > >>> 1.1 Project Name > > > >>> We need to decide the project name of PyPI to use, for > example, > > > >>> apache-flink, pyflink, etc. > > > >>> > > > >>> Regarding to the name "pyflink", it has already been > registered > > by > > > >>> @ueqt and there is already a package '1.0' released under this > > project > > > >>> which is based on flink-libraries/flink-python. > > > >>> > > > >>> @ueqt has kindly agreed to give this project back to the > > > community. And > > > >>> he has requested that the released package '1.0' should not be > > removed > > > as > > > >>> it has already been used in their company. > > > >>> > > > >>> So we need to decide whether to use the name 'pyflink'? If > yes, > > > we > > > >>> need to figure out how to tackle with the package '1.0' under this > > > project. > > > >>> > > > >>> From the points of my view, the "pyflink" is better for our > > > project > > > >>> name and we can keep the release of 1.0, maybe more people want to > > use. > > > >>> > > > >>> 1.2 PyPI account for release > > > >>> We need also decide on which account to use to publish > packages > > > to PyPI. > > > >>> > > > >>> There are two permissions in PyPI: owner and maintainer: > > > >>> > > > >>> 1) The owner can upload releases, delete files, releases or > the > > > entire > > > >>> project. > > > >>> 2) The maintainer can also upload releases. However, they > cannot > > > delete > > > >>> files, releases, or the project. > > > >>> > > > >>> So there are two options in my mind: > > > >>> > > > >>> 1) Create an account such as 'pyflink' as the owner share it > > with > > > all > > > >>> the release managers and then release managers can publish the > > package > > > to > > > >>> PyPI using this account. > > > >>> 2) Create an account such as 'pyflink' as owner(only PMC can > > > manage it) > > > >>> and adds the release manager's account as maintainers of the > project. > > > >>> Release managers publish the package to PyPI using their own > account. > > > >>> > > > >>> As I know, PySpark takes Option 1) and Apache Beam takes > Option > > > 2). > > > >>> > > > >>> From the points of my view, I prefer option 2) as it's pretty > > > safer as > > > >>> it eliminate the risk of deleting old releases occasionally and at > > the > > > same > > > >>> time keeps the trace of who is operating. > > > >>> > > > >>> 2. How to handle Scala_2.11 and Scala_2.12 > > > >>> > > > >>> The PyFlink package bundles the jars in the package. As we know, > > there > > > are > > > >>> two versions of jars for each module: one for Scala 2.11 and the > > other > > > for > > > >>> Scala 2.12. So there will be two PyFlink packages theoretically. We > > > need to > > > >>> decide which one to publish to PyPI or both. If both packages will > be > > > >>> published to PyPI, we may need two projects, such as pyflink_211 > and > > > >>> pyflink_212 separately. Maybe more in the future such as > pyflink_213. > > > >>> > > > >>> (BTW, I think we should bring up a discussion for dorp > > Scala_2.11 > > > in > > > >>> Flink 1.10 release due to 2.13 is available in early June.) > > > >>> > > > >>> From the points of my view, for now, we can only release the > > > scala_2.11 > > > >>> version, due to scala_2.11 is our default version in Flink. > > > >>> > > > >>> 3. Legal problems of publishing to PyPI > > > >>> > > > >>> As @Chesnay Schepler <[hidden email]> pointed out in > > > FLINK-13011[3], > > > >>> publishing PyFlink to PyPI means that we will publish binaries to a > > > >>> distribution channel not owned by Apache. We need to figure out if > > > there > > > >>> are legal problems. From my point of view, there are no problems > as a > > > few > > > >>> Apache projects such as Spark, Beam, etc have already done it. > > Frankly > > > >>> speaking, I am not familiar with this problem, welcome any feedback > > on > > > this > > > >>> if somebody is more family with this. > > > >>> > > > >>> Great thanks to @ueqt for willing to dedicate PyPI's project name > > > `pyflink` > > > >>> to the Apache Flink community!!! > > > >>> Great thanks to @Dian for the offline effort!!! > > > >>> > > > >>> Best, > > > >>> Jincheng > > > >>> > > > >>> [1] > > > >>> > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-38%3A+Python+Table+API > > > >>> [2] > > > >>> > > > > > > https://ci.apache.org/projects/flink/flink-docs-master/flinkDev/building.html#build-pyflink > > > >>> [3] https://issues.apache.org/jira/browse/FLINK-13011 > > > >>> > > > > > > > > > > > > > -- Best Regards Jeff Zhang |
Hi Stephan & Jeff,
Thanks a lot for sharing your thoughts! Regarding the bundled jars, currently only the jars in the flink binary distribution is packaged in the pyflink package. That maybe a good idea to also bundle the other jars such as flink-hadoop-compatibility. We may need also consider whether to bundle the format jars such as flink-avro, flink-json, flink-csv and the connector jars such as flink-connector-kafka, etc. If FLINK_HOME is set, the binary distribution specified by FLINK_HOME will be used instead. Regards, Dian > 在 2019年7月24日,上午9:47,Jeff Zhang <[hidden email]> 写道: > > +1 for publishing pyflink to pypi. > > Regarding including jar, I just want to make sure which flink binary > distribution we would ship with pyflink since we have multiple flink binary > distributions (w/o hadoop). > Personally, I prefer to use the hadoop-included binary distribution. > > And I just want to confirm whether it is possible for users to use a > different flink binary distribution as long as he set env FLINK_HOME. > > Besides that, I hope that there will be bi-direction link reference between > flink doc and pypi doc. > > > > Stephan Ewen <[hidden email]> 于2019年7月24日周三 上午12:07写道: > >> Hi! >> >> Sorry for the late involvement. Here are some thoughts from my side: >> >> Definitely +1 to publishing to PyPy, even if it is a binary release. >> Community growth into other communities is great, and if this is the >> natural way to reach developers in the Python community, let's do it. This >> is not about our convenience, but reaching users. >> >> I think the way to look at this is that this is a convenience distribution >> channel, courtesy of the Flink community. It is not an Apache release, we >> make this clear in the Readme. >> Of course, this doesn't mean we don't try to uphold similar standards as >> for our official release (like proper license information). >> >> Concerning credentials sharing, I would be fine with whatever option. The >> PMC doesn't own it (it is an initiative by some community members), but the >> PMC needs to ensure trademark compliance, so slight preference for option >> #1 (PMC would have means to correct problems). >> >> I believe there is no need to differentiate between Scala versions, because >> this is merely a convenience thing for pure Python users. Users that mix >> python and scala (and thus depend on specific scala versions) can still >> download from Apache or build themselves. >> >> Best, >> Stephan >> >> >> >> On Thu, Jul 4, 2019 at 9:51 AM jincheng sun <[hidden email]> >> wrote: >> >>> Hi All, >>> >>> Thanks for the feedback @Chesnay Schepler <[hidden email]> @Dian! >>> >>> I think using `apache-flink` for the project name also makes sense to me. >>> due to we should always keep in mind that Flink is owned by Apache. (And >>> beam also using this pattern `apache-beam` for Python API) >>> >>> Regarding the Python API release with the JAVA JARs, I think the >> principle >>> of consideration is the convenience of the user. So, Thanks for the >>> explanation @Dian! >>> >>> And your right @Chesnay Schepler <[hidden email]> we can't make a >>> hasty decision and we need more people's opinions! >>> >>> So, I appreciate it if anyone can give us feedback and suggestions! >>> >>> Best, >>> Jincheng >>> >>> >>> >>> >>> Chesnay Schepler <[hidden email]> 于2019年7月3日周三 下午8:46写道: >>> >>>> So this would not be a source release then, but a full-blown binary >>>> release. >>>> >>>> Maybe it is just me, but I find it a bit suspect to ship an entire java >>>> application via PyPI, just because there's a Python API for it. >>>> >>>> We definitely need input from more people here. >>>> >>>> On 03/07/2019 14:09, Dian Fu wrote: >>>>> Hi Chesnay, >>>>> >>>>> Thanks a lot for the suggestions. >>>>> >>>>> Regarding “distributing java/scala code to PyPI”: >>>>> The Python Table API is just a wrapper of the Java Table API and >>> without >>>> the java/scala code, two steps will be needed to set up an environment >> to >>>> execute a Python Table API program: >>>>> 1) Install pyflink using "pip install apache-flink" >>>>> 2) Download the flink distribution and set the FLINK_HOME to it. >>>>> Besides, users have to make sure that the manually installed Flink is >>>> compatible with the pip installed pyflink. >>>>> >>>>> Bundle the java/scala code inside the Python package will eliminate >>> step >>>> 2) and makes it more simple for users to install pyflink. There was a >>> short >>>> discussion <https://issues.apache.org/jira/browse/SPARK-1267> on this >> in >>>> Spark community and they finally decide to package the java/scala code >> in >>>> the python package. (BTW, PySpark only bundle the jars of scala 2.11). >>>>> >>>>> Regards, >>>>> Dian >>>>> >>>>>> 在 2019年7月3日,下午7:13,Chesnay Schepler <[hidden email]> 写道: >>>>>> >>>>>> The existing artifact in the pyflink project was neither released by >>>> the Flink project / anyone affiliated with it nor approved by the Flink >>> PMC. >>>>>> >>>>>> As such, if we were to use this account I believe we should delete >> it >>>> to not mislead users that this is in any way an apache-provided >>>> distribution. Since this goes against the users wishes, I would be in >>> favor >>>> of creating a separate account, and giving back control over the >> pyflink >>>> account. >>>>>> >>>>>> My take on the raised points: >>>>>> 1.1) "apache-flink" >>>>>> 1.2) option 2 >>>>>> 2) Given that we only distribute python code there should be no >> reason >>>> to differentiate between scala versions. We should not be distributing >>> any >>>> java/scala code and/or modules to PyPi. Currently, I'm a bit confused >>> about >>>> this question and wonder what exactly we are trying to publish here. >>>>>> 3) The should be treated as any other source release; i.e., it >> needs a >>>> LICENSE and NOTICE file, signatures and a PMC vote. My suggestion would >>> be >>>> to make this part of our normal release process. There will be _one_ >>> source >>>> release on dist.apache.org encompassing everything, and a separate >>> python >>>> of focused source release that we push to PyPi. The LICENSE and NOTICE >>>> contained in the python source release must also be present in the >> source >>>> release of Flink; so basically the python source release is just the >>>> contents of flink-python module the maven pom.xml, with no other >> special >>>> sauce added during the release process. >>>>>> >>>>>> On 02/07/2019 05:42, jincheng sun wrote: >>>>>>> Hi all, >>>>>>> >>>>>>> With the effort of FLIP-38 [1], the Python Table API(without UDF >>>> support >>>>>>> for now) will be supported in the coming release-1.9. >>>>>>> As described in "Build PyFlink"[2], if users want to use the Python >>>> Table >>>>>>> API, they can manually install it using the command: >>>>>>> "cd flink-python && python3 setup.py sdist && pip install >>>> dist/*.tar.gz". >>>>>>> >>>>>>> This is non-trivial for users and it will be better if we can >> follow >>>> the >>>>>>> Python way to publish PyFlink to PyPI >>>>>>> which is a repository of software for the Python programming >>> language. >>>> Then >>>>>>> users can use the standard Python package >>>>>>> manager "pip" to install PyFlink: "pip install pyflink". So, there >>> are >>>> some >>>>>>> topic need to be discussed as follows: >>>>>>> >>>>>>> 1. How to publish PyFlink to PyPI >>>>>>> >>>>>>> 1.1 Project Name >>>>>>> We need to decide the project name of PyPI to use, for >> example, >>>>>>> apache-flink, pyflink, etc. >>>>>>> >>>>>>> Regarding to the name "pyflink", it has already been >> registered >>> by >>>>>>> @ueqt and there is already a package '1.0' released under this >>> project >>>>>>> which is based on flink-libraries/flink-python. >>>>>>> >>>>>>> @ueqt has kindly agreed to give this project back to the >>>> community. And >>>>>>> he has requested that the released package '1.0' should not be >>> removed >>>> as >>>>>>> it has already been used in their company. >>>>>>> >>>>>>> So we need to decide whether to use the name 'pyflink'? If >> yes, >>>> we >>>>>>> need to figure out how to tackle with the package '1.0' under this >>>> project. >>>>>>> >>>>>>> From the points of my view, the "pyflink" is better for our >>>> project >>>>>>> name and we can keep the release of 1.0, maybe more people want to >>> use. >>>>>>> >>>>>>> 1.2 PyPI account for release >>>>>>> We need also decide on which account to use to publish >> packages >>>> to PyPI. >>>>>>> >>>>>>> There are two permissions in PyPI: owner and maintainer: >>>>>>> >>>>>>> 1) The owner can upload releases, delete files, releases or >> the >>>> entire >>>>>>> project. >>>>>>> 2) The maintainer can also upload releases. However, they >> cannot >>>> delete >>>>>>> files, releases, or the project. >>>>>>> >>>>>>> So there are two options in my mind: >>>>>>> >>>>>>> 1) Create an account such as 'pyflink' as the owner share it >>> with >>>> all >>>>>>> the release managers and then release managers can publish the >>> package >>>> to >>>>>>> PyPI using this account. >>>>>>> 2) Create an account such as 'pyflink' as owner(only PMC can >>>> manage it) >>>>>>> and adds the release manager's account as maintainers of the >> project. >>>>>>> Release managers publish the package to PyPI using their own >> account. >>>>>>> >>>>>>> As I know, PySpark takes Option 1) and Apache Beam takes >> Option >>>> 2). >>>>>>> >>>>>>> From the points of my view, I prefer option 2) as it's pretty >>>> safer as >>>>>>> it eliminate the risk of deleting old releases occasionally and at >>> the >>>> same >>>>>>> time keeps the trace of who is operating. >>>>>>> >>>>>>> 2. How to handle Scala_2.11 and Scala_2.12 >>>>>>> >>>>>>> The PyFlink package bundles the jars in the package. As we know, >>> there >>>> are >>>>>>> two versions of jars for each module: one for Scala 2.11 and the >>> other >>>> for >>>>>>> Scala 2.12. So there will be two PyFlink packages theoretically. We >>>> need to >>>>>>> decide which one to publish to PyPI or both. If both packages will >> be >>>>>>> published to PyPI, we may need two projects, such as pyflink_211 >> and >>>>>>> pyflink_212 separately. Maybe more in the future such as >> pyflink_213. >>>>>>> >>>>>>> (BTW, I think we should bring up a discussion for dorp >>> Scala_2.11 >>>> in >>>>>>> Flink 1.10 release due to 2.13 is available in early June.) >>>>>>> >>>>>>> From the points of my view, for now, we can only release the >>>> scala_2.11 >>>>>>> version, due to scala_2.11 is our default version in Flink. >>>>>>> >>>>>>> 3. Legal problems of publishing to PyPI >>>>>>> >>>>>>> As @Chesnay Schepler <[hidden email]> pointed out in >>>> FLINK-13011[3], >>>>>>> publishing PyFlink to PyPI means that we will publish binaries to a >>>>>>> distribution channel not owned by Apache. We need to figure out if >>>> there >>>>>>> are legal problems. From my point of view, there are no problems >> as a >>>> few >>>>>>> Apache projects such as Spark, Beam, etc have already done it. >>> Frankly >>>>>>> speaking, I am not familiar with this problem, welcome any feedback >>> on >>>> this >>>>>>> if somebody is more family with this. >>>>>>> >>>>>>> Great thanks to @ueqt for willing to dedicate PyPI's project name >>>> `pyflink` >>>>>>> to the Apache Flink community!!! >>>>>>> Great thanks to @Dian for the offline effort!!! >>>>>>> >>>>>>> Best, >>>>>>> Jincheng >>>>>>> >>>>>>> [1] >>>>>>> >>>> >>> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-38%3A+Python+Table+API >>>>>>> [2] >>>>>>> >>>> >>> >> https://ci.apache.org/projects/flink/flink-docs-master/flinkDev/building.html#build-pyflink >>>>>>> [3] https://issues.apache.org/jira/browse/FLINK-13011 >>>>>>> >>>>> >>>> >>>> >>> >> > > > -- > Best Regards > > Jeff Zhang |
if we ship a binary, we should ship the binary we usually ship, not some
highly customized version. On 24/07/2019 05:19, Dian Fu wrote: > Hi Stephan & Jeff, > > Thanks a lot for sharing your thoughts! > > Regarding the bundled jars, currently only the jars in the flink binary distribution is packaged in the pyflink package. That maybe a good idea to also bundle the other jars such as flink-hadoop-compatibility. We may need also consider whether to bundle the format jars such as flink-avro, flink-json, flink-csv and the connector jars such as flink-connector-kafka, etc. > > If FLINK_HOME is set, the binary distribution specified by FLINK_HOME will be used instead. > > Regards, > Dian > >> 在 2019年7月24日,上午9:47,Jeff Zhang <[hidden email]> 写道: >> >> +1 for publishing pyflink to pypi. >> >> Regarding including jar, I just want to make sure which flink binary >> distribution we would ship with pyflink since we have multiple flink binary >> distributions (w/o hadoop). >> Personally, I prefer to use the hadoop-included binary distribution. >> >> And I just want to confirm whether it is possible for users to use a >> different flink binary distribution as long as he set env FLINK_HOME. >> >> Besides that, I hope that there will be bi-direction link reference between >> flink doc and pypi doc. >> >> >> >> Stephan Ewen <[hidden email]> 于2019年7月24日周三 上午12:07写道: >> >>> Hi! >>> >>> Sorry for the late involvement. Here are some thoughts from my side: >>> >>> Definitely +1 to publishing to PyPy, even if it is a binary release. >>> Community growth into other communities is great, and if this is the >>> natural way to reach developers in the Python community, let's do it. This >>> is not about our convenience, but reaching users. >>> >>> I think the way to look at this is that this is a convenience distribution >>> channel, courtesy of the Flink community. It is not an Apache release, we >>> make this clear in the Readme. >>> Of course, this doesn't mean we don't try to uphold similar standards as >>> for our official release (like proper license information). >>> >>> Concerning credentials sharing, I would be fine with whatever option. The >>> PMC doesn't own it (it is an initiative by some community members), but the >>> PMC needs to ensure trademark compliance, so slight preference for option >>> #1 (PMC would have means to correct problems). >>> >>> I believe there is no need to differentiate between Scala versions, because >>> this is merely a convenience thing for pure Python users. Users that mix >>> python and scala (and thus depend on specific scala versions) can still >>> download from Apache or build themselves. >>> >>> Best, >>> Stephan >>> >>> >>> >>> On Thu, Jul 4, 2019 at 9:51 AM jincheng sun <[hidden email]> >>> wrote: >>> >>>> Hi All, >>>> >>>> Thanks for the feedback @Chesnay Schepler <[hidden email]> @Dian! >>>> >>>> I think using `apache-flink` for the project name also makes sense to me. >>>> due to we should always keep in mind that Flink is owned by Apache. (And >>>> beam also using this pattern `apache-beam` for Python API) >>>> >>>> Regarding the Python API release with the JAVA JARs, I think the >>> principle >>>> of consideration is the convenience of the user. So, Thanks for the >>>> explanation @Dian! >>>> >>>> And your right @Chesnay Schepler <[hidden email]> we can't make a >>>> hasty decision and we need more people's opinions! >>>> >>>> So, I appreciate it if anyone can give us feedback and suggestions! >>>> >>>> Best, >>>> Jincheng >>>> >>>> >>>> >>>> >>>> Chesnay Schepler <[hidden email]> 于2019年7月3日周三 下午8:46写道: >>>> >>>>> So this would not be a source release then, but a full-blown binary >>>>> release. >>>>> >>>>> Maybe it is just me, but I find it a bit suspect to ship an entire java >>>>> application via PyPI, just because there's a Python API for it. >>>>> >>>>> We definitely need input from more people here. >>>>> >>>>> On 03/07/2019 14:09, Dian Fu wrote: >>>>>> Hi Chesnay, >>>>>> >>>>>> Thanks a lot for the suggestions. >>>>>> >>>>>> Regarding “distributing java/scala code to PyPI”: >>>>>> The Python Table API is just a wrapper of the Java Table API and >>>> without >>>>> the java/scala code, two steps will be needed to set up an environment >>> to >>>>> execute a Python Table API program: >>>>>> 1) Install pyflink using "pip install apache-flink" >>>>>> 2) Download the flink distribution and set the FLINK_HOME to it. >>>>>> Besides, users have to make sure that the manually installed Flink is >>>>> compatible with the pip installed pyflink. >>>>>> Bundle the java/scala code inside the Python package will eliminate >>>> step >>>>> 2) and makes it more simple for users to install pyflink. There was a >>>> short >>>>> discussion <https://issues.apache.org/jira/browse/SPARK-1267> on this >>> in >>>>> Spark community and they finally decide to package the java/scala code >>> in >>>>> the python package. (BTW, PySpark only bundle the jars of scala 2.11). >>>>>> Regards, >>>>>> Dian >>>>>> >>>>>>> 在 2019年7月3日,下午7:13,Chesnay Schepler <[hidden email]> 写道: >>>>>>> >>>>>>> The existing artifact in the pyflink project was neither released by >>>>> the Flink project / anyone affiliated with it nor approved by the Flink >>>> PMC. >>>>>>> As such, if we were to use this account I believe we should delete >>> it >>>>> to not mislead users that this is in any way an apache-provided >>>>> distribution. Since this goes against the users wishes, I would be in >>>> favor >>>>> of creating a separate account, and giving back control over the >>> pyflink >>>>> account. >>>>>>> My take on the raised points: >>>>>>> 1.1) "apache-flink" >>>>>>> 1.2) option 2 >>>>>>> 2) Given that we only distribute python code there should be no >>> reason >>>>> to differentiate between scala versions. We should not be distributing >>>> any >>>>> java/scala code and/or modules to PyPi. Currently, I'm a bit confused >>>> about >>>>> this question and wonder what exactly we are trying to publish here. >>>>>>> 3) The should be treated as any other source release; i.e., it >>> needs a >>>>> LICENSE and NOTICE file, signatures and a PMC vote. My suggestion would >>>> be >>>>> to make this part of our normal release process. There will be _one_ >>>> source >>>>> release on dist.apache.org encompassing everything, and a separate >>>> python >>>>> of focused source release that we push to PyPi. The LICENSE and NOTICE >>>>> contained in the python source release must also be present in the >>> source >>>>> release of Flink; so basically the python source release is just the >>>>> contents of flink-python module the maven pom.xml, with no other >>> special >>>>> sauce added during the release process. >>>>>>> On 02/07/2019 05:42, jincheng sun wrote: >>>>>>>> Hi all, >>>>>>>> >>>>>>>> With the effort of FLIP-38 [1], the Python Table API(without UDF >>>>> support >>>>>>>> for now) will be supported in the coming release-1.9. >>>>>>>> As described in "Build PyFlink"[2], if users want to use the Python >>>>> Table >>>>>>>> API, they can manually install it using the command: >>>>>>>> "cd flink-python && python3 setup.py sdist && pip install >>>>> dist/*.tar.gz". >>>>>>>> This is non-trivial for users and it will be better if we can >>> follow >>>>> the >>>>>>>> Python way to publish PyFlink to PyPI >>>>>>>> which is a repository of software for the Python programming >>>> language. >>>>> Then >>>>>>>> users can use the standard Python package >>>>>>>> manager "pip" to install PyFlink: "pip install pyflink". So, there >>>> are >>>>> some >>>>>>>> topic need to be discussed as follows: >>>>>>>> >>>>>>>> 1. How to publish PyFlink to PyPI >>>>>>>> >>>>>>>> 1.1 Project Name >>>>>>>> We need to decide the project name of PyPI to use, for >>> example, >>>>>>>> apache-flink, pyflink, etc. >>>>>>>> >>>>>>>> Regarding to the name "pyflink", it has already been >>> registered >>>> by >>>>>>>> @ueqt and there is already a package '1.0' released under this >>>> project >>>>>>>> which is based on flink-libraries/flink-python. >>>>>>>> >>>>>>>> @ueqt has kindly agreed to give this project back to the >>>>> community. And >>>>>>>> he has requested that the released package '1.0' should not be >>>> removed >>>>> as >>>>>>>> it has already been used in their company. >>>>>>>> >>>>>>>> So we need to decide whether to use the name 'pyflink'? If >>> yes, >>>>> we >>>>>>>> need to figure out how to tackle with the package '1.0' under this >>>>> project. >>>>>>>> From the points of my view, the "pyflink" is better for our >>>>> project >>>>>>>> name and we can keep the release of 1.0, maybe more people want to >>>> use. >>>>>>>> 1.2 PyPI account for release >>>>>>>> We need also decide on which account to use to publish >>> packages >>>>> to PyPI. >>>>>>>> There are two permissions in PyPI: owner and maintainer: >>>>>>>> >>>>>>>> 1) The owner can upload releases, delete files, releases or >>> the >>>>> entire >>>>>>>> project. >>>>>>>> 2) The maintainer can also upload releases. However, they >>> cannot >>>>> delete >>>>>>>> files, releases, or the project. >>>>>>>> >>>>>>>> So there are two options in my mind: >>>>>>>> >>>>>>>> 1) Create an account such as 'pyflink' as the owner share it >>>> with >>>>> all >>>>>>>> the release managers and then release managers can publish the >>>> package >>>>> to >>>>>>>> PyPI using this account. >>>>>>>> 2) Create an account such as 'pyflink' as owner(only PMC can >>>>> manage it) >>>>>>>> and adds the release manager's account as maintainers of the >>> project. >>>>>>>> Release managers publish the package to PyPI using their own >>> account. >>>>>>>> As I know, PySpark takes Option 1) and Apache Beam takes >>> Option >>>>> 2). >>>>>>>> From the points of my view, I prefer option 2) as it's pretty >>>>> safer as >>>>>>>> it eliminate the risk of deleting old releases occasionally and at >>>> the >>>>> same >>>>>>>> time keeps the trace of who is operating. >>>>>>>> >>>>>>>> 2. How to handle Scala_2.11 and Scala_2.12 >>>>>>>> >>>>>>>> The PyFlink package bundles the jars in the package. As we know, >>>> there >>>>> are >>>>>>>> two versions of jars for each module: one for Scala 2.11 and the >>>> other >>>>> for >>>>>>>> Scala 2.12. So there will be two PyFlink packages theoretically. We >>>>> need to >>>>>>>> decide which one to publish to PyPI or both. If both packages will >>> be >>>>>>>> published to PyPI, we may need two projects, such as pyflink_211 >>> and >>>>>>>> pyflink_212 separately. Maybe more in the future such as >>> pyflink_213. >>>>>>>> (BTW, I think we should bring up a discussion for dorp >>>> Scala_2.11 >>>>> in >>>>>>>> Flink 1.10 release due to 2.13 is available in early June.) >>>>>>>> >>>>>>>> From the points of my view, for now, we can only release the >>>>> scala_2.11 >>>>>>>> version, due to scala_2.11 is our default version in Flink. >>>>>>>> >>>>>>>> 3. Legal problems of publishing to PyPI >>>>>>>> >>>>>>>> As @Chesnay Schepler <[hidden email]> pointed out in >>>>> FLINK-13011[3], >>>>>>>> publishing PyFlink to PyPI means that we will publish binaries to a >>>>>>>> distribution channel not owned by Apache. We need to figure out if >>>>> there >>>>>>>> are legal problems. From my point of view, there are no problems >>> as a >>>>> few >>>>>>>> Apache projects such as Spark, Beam, etc have already done it. >>>> Frankly >>>>>>>> speaking, I am not familiar with this problem, welcome any feedback >>>> on >>>>> this >>>>>>>> if somebody is more family with this. >>>>>>>> >>>>>>>> Great thanks to @ueqt for willing to dedicate PyPI's project name >>>>> `pyflink` >>>>>>>> to the Apache Flink community!!! >>>>>>>> Great thanks to @Dian for the offline effort!!! >>>>>>>> >>>>>>>> Best, >>>>>>>> Jincheng >>>>>>>> >>>>>>>> [1] >>>>>>>> >>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-38%3A+Python+Table+API >>>>>>>> [2] >>>>>>>> >>> https://ci.apache.org/projects/flink/flink-docs-master/flinkDev/building.html#build-pyflink >>>>>>>> [3] https://issues.apache.org/jira/browse/FLINK-13011 >>>>>>>> >>>>> >> >> -- >> Best Regards >> >> Jeff Zhang > |
Hi all,
Thanks for all of your reply! Hi Stephan, thanks for the reply and prove the details we need to pay attention to. such as: Readme and Trademark compliance. Regarding the PyPI account for release, #1 may have some risk that our release package can be deleted by anyone who know the password of the account. And in this case PMC would not have means to correct problems. So, I think the #2 is pretty safe for flink community. Hi Jeff&Dian, thanks for share your thoughts. Python API just a language entry point. I think which binary should be contained in the release we should make consistency with Java release policy. So, currently we do not add the Hadoop, connectors JARs into the release package. Hi Chesnay, agree that we should ship the very common binary in feature if Java side already make the decision. So, our current consensus is: 1. Should we re publish the PyFlink into PyPI --> YES 2. PyPI Project Name ---> apache-flink 3. How to handle Scala_2.11 and Scala_2.12 ---> We only release one binary with the default Scala version same with flink default config. We still need discuss how to manage PyPI account for release: -------- > 1) Create an account such as 'pyflink' as the owner share it with all the release managers and then release managers can publish the package to PyPI using this account. 2) Create an account such as 'pyflink' as owner(only PMC can manage it) and adds the release manager's account as maintainers of the project. Release managers publish the package to PyPI using their own account. -------- Stephan like the #1 but want PMC can correct the problems. (sounds like #2) can you conform that ? @Stephan Chesnay and I prefer to #2 Best, Jincheng Chesnay Schepler <[hidden email]> 于2019年7月24日周三 下午3:57写道: > if we ship a binary, we should ship the binary we usually ship, not some > highly customized version. > > On 24/07/2019 05:19, Dian Fu wrote: > > Hi Stephan & Jeff, > > > > Thanks a lot for sharing your thoughts! > > > > Regarding the bundled jars, currently only the jars in the flink binary > distribution is packaged in the pyflink package. That maybe a good idea to > also bundle the other jars such as flink-hadoop-compatibility. We may need > also consider whether to bundle the format jars such as flink-avro, > flink-json, flink-csv and the connector jars such as flink-connector-kafka, > etc. > > > > If FLINK_HOME is set, the binary distribution specified by FLINK_HOME > will be used instead. > > > > Regards, > > Dian > > > >> 在 2019年7月24日,上午9:47,Jeff Zhang <[hidden email]> 写道: > >> > >> +1 for publishing pyflink to pypi. > >> > >> Regarding including jar, I just want to make sure which flink binary > >> distribution we would ship with pyflink since we have multiple flink > binary > >> distributions (w/o hadoop). > >> Personally, I prefer to use the hadoop-included binary distribution. > >> > >> And I just want to confirm whether it is possible for users to use a > >> different flink binary distribution as long as he set env FLINK_HOME. > >> > >> Besides that, I hope that there will be bi-direction link reference > between > >> flink doc and pypi doc. > >> > >> > >> > >> Stephan Ewen <[hidden email]> 于2019年7月24日周三 上午12:07写道: > >> > >>> Hi! > >>> > >>> Sorry for the late involvement. Here are some thoughts from my side: > >>> > >>> Definitely +1 to publishing to PyPy, even if it is a binary release. > >>> Community growth into other communities is great, and if this is the > >>> natural way to reach developers in the Python community, let's do it. > This > >>> is not about our convenience, but reaching users. > >>> > >>> I think the way to look at this is that this is a convenience > distribution > >>> channel, courtesy of the Flink community. It is not an Apache release, > we > >>> make this clear in the Readme. > >>> Of course, this doesn't mean we don't try to uphold similar standards > as > >>> for our official release (like proper license information). > >>> > >>> Concerning credentials sharing, I would be fine with whatever option. > The > >>> PMC doesn't own it (it is an initiative by some community members), > but the > >>> PMC needs to ensure trademark compliance, so slight preference for > option > >>> #1 (PMC would have means to correct problems). > >>> > >>> I believe there is no need to differentiate between Scala versions, > because > >>> this is merely a convenience thing for pure Python users. Users that > mix > >>> python and scala (and thus depend on specific scala versions) can still > >>> download from Apache or build themselves. > >>> > >>> Best, > >>> Stephan > >>> > >>> > >>> > >>> On Thu, Jul 4, 2019 at 9:51 AM jincheng sun <[hidden email]> > >>> wrote: > >>> > >>>> Hi All, > >>>> > >>>> Thanks for the feedback @Chesnay Schepler <[hidden email]> @Dian! > >>>> > >>>> I think using `apache-flink` for the project name also makes sense to > me. > >>>> due to we should always keep in mind that Flink is owned by Apache. > (And > >>>> beam also using this pattern `apache-beam` for Python API) > >>>> > >>>> Regarding the Python API release with the JAVA JARs, I think the > >>> principle > >>>> of consideration is the convenience of the user. So, Thanks for the > >>>> explanation @Dian! > >>>> > >>>> And your right @Chesnay Schepler <[hidden email]> we can't make > a > >>>> hasty decision and we need more people's opinions! > >>>> > >>>> So, I appreciate it if anyone can give us feedback and suggestions! > >>>> > >>>> Best, > >>>> Jincheng > >>>> > >>>> > >>>> > >>>> > >>>> Chesnay Schepler <[hidden email]> 于2019年7月3日周三 下午8:46写道: > >>>> > >>>>> So this would not be a source release then, but a full-blown binary > >>>>> release. > >>>>> > >>>>> Maybe it is just me, but I find it a bit suspect to ship an entire > java > >>>>> application via PyPI, just because there's a Python API for it. > >>>>> > >>>>> We definitely need input from more people here. > >>>>> > >>>>> On 03/07/2019 14:09, Dian Fu wrote: > >>>>>> Hi Chesnay, > >>>>>> > >>>>>> Thanks a lot for the suggestions. > >>>>>> > >>>>>> Regarding “distributing java/scala code to PyPI”: > >>>>>> The Python Table API is just a wrapper of the Java Table API and > >>>> without > >>>>> the java/scala code, two steps will be needed to set up an > environment > >>> to > >>>>> execute a Python Table API program: > >>>>>> 1) Install pyflink using "pip install apache-flink" > >>>>>> 2) Download the flink distribution and set the FLINK_HOME to it. > >>>>>> Besides, users have to make sure that the manually installed Flink > is > >>>>> compatible with the pip installed pyflink. > >>>>>> Bundle the java/scala code inside the Python package will eliminate > >>>> step > >>>>> 2) and makes it more simple for users to install pyflink. There was a > >>>> short > >>>>> discussion <https://issues.apache.org/jira/browse/SPARK-1267> on > this > >>> in > >>>>> Spark community and they finally decide to package the java/scala > code > >>> in > >>>>> the python package. (BTW, PySpark only bundle the jars of scala > 2.11). > >>>>>> Regards, > >>>>>> Dian > >>>>>> > >>>>>>> 在 2019年7月3日,下午7:13,Chesnay Schepler <[hidden email]> 写道: > >>>>>>> > >>>>>>> The existing artifact in the pyflink project was neither released > by > >>>>> the Flink project / anyone affiliated with it nor approved by the > Flink > >>>> PMC. > >>>>>>> As such, if we were to use this account I believe we should delete > >>> it > >>>>> to not mislead users that this is in any way an apache-provided > >>>>> distribution. Since this goes against the users wishes, I would be in > >>>> favor > >>>>> of creating a separate account, and giving back control over the > >>> pyflink > >>>>> account. > >>>>>>> My take on the raised points: > >>>>>>> 1.1) "apache-flink" > >>>>>>> 1.2) option 2 > >>>>>>> 2) Given that we only distribute python code there should be no > >>> reason > >>>>> to differentiate between scala versions. We should not be > distributing > >>>> any > >>>>> java/scala code and/or modules to PyPi. Currently, I'm a bit confused > >>>> about > >>>>> this question and wonder what exactly we are trying to publish here. > >>>>>>> 3) The should be treated as any other source release; i.e., it > >>> needs a > >>>>> LICENSE and NOTICE file, signatures and a PMC vote. My suggestion > would > >>>> be > >>>>> to make this part of our normal release process. There will be _one_ > >>>> source > >>>>> release on dist.apache.org encompassing everything, and a separate > >>>> python > >>>>> of focused source release that we push to PyPi. The LICENSE and > NOTICE > >>>>> contained in the python source release must also be present in the > >>> source > >>>>> release of Flink; so basically the python source release is just the > >>>>> contents of flink-python module the maven pom.xml, with no other > >>> special > >>>>> sauce added during the release process. > >>>>>>> On 02/07/2019 05:42, jincheng sun wrote: > >>>>>>>> Hi all, > >>>>>>>> > >>>>>>>> With the effort of FLIP-38 [1], the Python Table API(without UDF > >>>>> support > >>>>>>>> for now) will be supported in the coming release-1.9. > >>>>>>>> As described in "Build PyFlink"[2], if users want to use the > Python > >>>>> Table > >>>>>>>> API, they can manually install it using the command: > >>>>>>>> "cd flink-python && python3 setup.py sdist && pip install > >>>>> dist/*.tar.gz". > >>>>>>>> This is non-trivial for users and it will be better if we can > >>> follow > >>>>> the > >>>>>>>> Python way to publish PyFlink to PyPI > >>>>>>>> which is a repository of software for the Python programming > >>>> language. > >>>>> Then > >>>>>>>> users can use the standard Python package > >>>>>>>> manager "pip" to install PyFlink: "pip install pyflink". So, there > >>>> are > >>>>> some > >>>>>>>> topic need to be discussed as follows: > >>>>>>>> > >>>>>>>> 1. How to publish PyFlink to PyPI > >>>>>>>> > >>>>>>>> 1.1 Project Name > >>>>>>>> We need to decide the project name of PyPI to use, for > >>> example, > >>>>>>>> apache-flink, pyflink, etc. > >>>>>>>> > >>>>>>>> Regarding to the name "pyflink", it has already been > >>> registered > >>>> by > >>>>>>>> @ueqt and there is already a package '1.0' released under this > >>>> project > >>>>>>>> which is based on flink-libraries/flink-python. > >>>>>>>> > >>>>>>>> @ueqt has kindly agreed to give this project back to the > >>>>> community. And > >>>>>>>> he has requested that the released package '1.0' should not be > >>>> removed > >>>>> as > >>>>>>>> it has already been used in their company. > >>>>>>>> > >>>>>>>> So we need to decide whether to use the name 'pyflink'? If > >>> yes, > >>>>> we > >>>>>>>> need to figure out how to tackle with the package '1.0' under this > >>>>> project. > >>>>>>>> From the points of my view, the "pyflink" is better for our > >>>>> project > >>>>>>>> name and we can keep the release of 1.0, maybe more people want to > >>>> use. > >>>>>>>> 1.2 PyPI account for release > >>>>>>>> We need also decide on which account to use to publish > >>> packages > >>>>> to PyPI. > >>>>>>>> There are two permissions in PyPI: owner and maintainer: > >>>>>>>> > >>>>>>>> 1) The owner can upload releases, delete files, releases or > >>> the > >>>>> entire > >>>>>>>> project. > >>>>>>>> 2) The maintainer can also upload releases. However, they > >>> cannot > >>>>> delete > >>>>>>>> files, releases, or the project. > >>>>>>>> > >>>>>>>> So there are two options in my mind: > >>>>>>>> > >>>>>>>> 1) Create an account such as 'pyflink' as the owner share it > >>>> with > >>>>> all > >>>>>>>> the release managers and then release managers can publish the > >>>> package > >>>>> to > >>>>>>>> PyPI using this account. > >>>>>>>> 2) Create an account such as 'pyflink' as owner(only PMC can > >>>>> manage it) > >>>>>>>> and adds the release manager's account as maintainers of the > >>> project. > >>>>>>>> Release managers publish the package to PyPI using their own > >>> account. > >>>>>>>> As I know, PySpark takes Option 1) and Apache Beam takes > >>> Option > >>>>> 2). > >>>>>>>> From the points of my view, I prefer option 2) as it's pretty > >>>>> safer as > >>>>>>>> it eliminate the risk of deleting old releases occasionally and at > >>>> the > >>>>> same > >>>>>>>> time keeps the trace of who is operating. > >>>>>>>> > >>>>>>>> 2. How to handle Scala_2.11 and Scala_2.12 > >>>>>>>> > >>>>>>>> The PyFlink package bundles the jars in the package. As we know, > >>>> there > >>>>> are > >>>>>>>> two versions of jars for each module: one for Scala 2.11 and the > >>>> other > >>>>> for > >>>>>>>> Scala 2.12. So there will be two PyFlink packages theoretically. > We > >>>>> need to > >>>>>>>> decide which one to publish to PyPI or both. If both packages will > >>> be > >>>>>>>> published to PyPI, we may need two projects, such as pyflink_211 > >>> and > >>>>>>>> pyflink_212 separately. Maybe more in the future such as > >>> pyflink_213. > >>>>>>>> (BTW, I think we should bring up a discussion for dorp > >>>> Scala_2.11 > >>>>> in > >>>>>>>> Flink 1.10 release due to 2.13 is available in early June.) > >>>>>>>> > >>>>>>>> From the points of my view, for now, we can only release the > >>>>> scala_2.11 > >>>>>>>> version, due to scala_2.11 is our default version in Flink. > >>>>>>>> > >>>>>>>> 3. Legal problems of publishing to PyPI > >>>>>>>> > >>>>>>>> As @Chesnay Schepler <[hidden email]> pointed out in > >>>>> FLINK-13011[3], > >>>>>>>> publishing PyFlink to PyPI means that we will publish binaries to > a > >>>>>>>> distribution channel not owned by Apache. We need to figure out if > >>>>> there > >>>>>>>> are legal problems. From my point of view, there are no problems > >>> as a > >>>>> few > >>>>>>>> Apache projects such as Spark, Beam, etc have already done it. > >>>> Frankly > >>>>>>>> speaking, I am not familiar with this problem, welcome any > feedback > >>>> on > >>>>> this > >>>>>>>> if somebody is more family with this. > >>>>>>>> > >>>>>>>> Great thanks to @ueqt for willing to dedicate PyPI's project name > >>>>> `pyflink` > >>>>>>>> to the Apache Flink community!!! > >>>>>>>> Great thanks to @Dian for the offline effort!!! > >>>>>>>> > >>>>>>>> Best, > >>>>>>>> Jincheng > >>>>>>>> > >>>>>>>> [1] > >>>>>>>> > >>> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-38%3A+Python+Table+API > >>>>>>>> [2] > >>>>>>>> > >>> > https://ci.apache.org/projects/flink/flink-docs-master/flinkDev/building.html#build-pyflink > >>>>>>>> [3] https://issues.apache.org/jira/browse/FLINK-13011 > >>>>>>>> > >>>>> > >> > >> -- > >> Best Regards > >> > >> Jeff Zhang > > > > |
Sorry for chiming in so late. I would be in favor of option #2.
I guess that the PMC would need to give the credentials to the release manager for option #1. Hence, the PMC could also add the release manager as a maintainer which makes sure that only the PMC can delete artifacts. Cheers, Till On Wed, Jul 24, 2019 at 12:33 PM jincheng sun <[hidden email]> wrote: > Hi all, > > Thanks for all of your reply! > > Hi Stephan, thanks for the reply and prove the details we need to pay > attention to. such as: Readme and Trademark compliance. Regarding the PyPI > account for release, #1 may have some risk that our release package can be > deleted by anyone who know the password of the account. And in this case > PMC would not have means to correct problems. So, I think the #2 is pretty > safe for flink community. > > Hi Jeff&Dian, thanks for share your thoughts. Python API just a language > entry point. I think which binary should be contained in the release we > should make consistency with Java release policy. So, currently we do not > add the Hadoop, connectors JARs into the release package. > > Hi Chesnay, agree that we should ship the very common binary in feature if > Java side already make the decision. > > So, our current consensus is: > 1. Should we re publish the PyFlink into PyPI --> YES > 2. PyPI Project Name ---> apache-flink > 3. How to handle Scala_2.11 and Scala_2.12 ---> We only release one binary > with the default Scala version same with flink default config. > > We still need discuss how to manage PyPI account for release: > -------- > > 1) Create an account such as 'pyflink' as the owner share it with all the > release managers and then release managers can publish the package to PyPI > using this account. > 2) Create an account such as 'pyflink' as owner(only PMC can manage it) > and adds the release manager's account as maintainers of the project. > Release managers publish the package to PyPI using their own account. > -------- > Stephan like the #1 but want PMC can correct the problems. (sounds like #2) > can you conform that ? @Stephan > Chesnay and I prefer to #2 > > Best, Jincheng > > Chesnay Schepler <[hidden email]> 于2019年7月24日周三 下午3:57写道: > > > if we ship a binary, we should ship the binary we usually ship, not some > > highly customized version. > > > > On 24/07/2019 05:19, Dian Fu wrote: > > > Hi Stephan & Jeff, > > > > > > Thanks a lot for sharing your thoughts! > > > > > > Regarding the bundled jars, currently only the jars in the flink binary > > distribution is packaged in the pyflink package. That maybe a good idea > to > > also bundle the other jars such as flink-hadoop-compatibility. We may > need > > also consider whether to bundle the format jars such as flink-avro, > > flink-json, flink-csv and the connector jars such as > flink-connector-kafka, > > etc. > > > > > > If FLINK_HOME is set, the binary distribution specified by FLINK_HOME > > will be used instead. > > > > > > Regards, > > > Dian > > > > > >> 在 2019年7月24日,上午9:47,Jeff Zhang <[hidden email]> 写道: > > >> > > >> +1 for publishing pyflink to pypi. > > >> > > >> Regarding including jar, I just want to make sure which flink binary > > >> distribution we would ship with pyflink since we have multiple flink > > binary > > >> distributions (w/o hadoop). > > >> Personally, I prefer to use the hadoop-included binary distribution. > > >> > > >> And I just want to confirm whether it is possible for users to use a > > >> different flink binary distribution as long as he set env FLINK_HOME. > > >> > > >> Besides that, I hope that there will be bi-direction link reference > > between > > >> flink doc and pypi doc. > > >> > > >> > > >> > > >> Stephan Ewen <[hidden email]> 于2019年7月24日周三 上午12:07写道: > > >> > > >>> Hi! > > >>> > > >>> Sorry for the late involvement. Here are some thoughts from my side: > > >>> > > >>> Definitely +1 to publishing to PyPy, even if it is a binary release. > > >>> Community growth into other communities is great, and if this is the > > >>> natural way to reach developers in the Python community, let's do it. > > This > > >>> is not about our convenience, but reaching users. > > >>> > > >>> I think the way to look at this is that this is a convenience > > distribution > > >>> channel, courtesy of the Flink community. It is not an Apache > release, > > we > > >>> make this clear in the Readme. > > >>> Of course, this doesn't mean we don't try to uphold similar standards > > as > > >>> for our official release (like proper license information). > > >>> > > >>> Concerning credentials sharing, I would be fine with whatever option. > > The > > >>> PMC doesn't own it (it is an initiative by some community members), > > but the > > >>> PMC needs to ensure trademark compliance, so slight preference for > > option > > >>> #1 (PMC would have means to correct problems). > > >>> > > >>> I believe there is no need to differentiate between Scala versions, > > because > > >>> this is merely a convenience thing for pure Python users. Users that > > mix > > >>> python and scala (and thus depend on specific scala versions) can > still > > >>> download from Apache or build themselves. > > >>> > > >>> Best, > > >>> Stephan > > >>> > > >>> > > >>> > > >>> On Thu, Jul 4, 2019 at 9:51 AM jincheng sun < > [hidden email]> > > >>> wrote: > > >>> > > >>>> Hi All, > > >>>> > > >>>> Thanks for the feedback @Chesnay Schepler <[hidden email]> > @Dian! > > >>>> > > >>>> I think using `apache-flink` for the project name also makes sense > to > > me. > > >>>> due to we should always keep in mind that Flink is owned by Apache. > > (And > > >>>> beam also using this pattern `apache-beam` for Python API) > > >>>> > > >>>> Regarding the Python API release with the JAVA JARs, I think the > > >>> principle > > >>>> of consideration is the convenience of the user. So, Thanks for the > > >>>> explanation @Dian! > > >>>> > > >>>> And your right @Chesnay Schepler <[hidden email]> we can't > make > > a > > >>>> hasty decision and we need more people's opinions! > > >>>> > > >>>> So, I appreciate it if anyone can give us feedback and suggestions! > > >>>> > > >>>> Best, > > >>>> Jincheng > > >>>> > > >>>> > > >>>> > > >>>> > > >>>> Chesnay Schepler <[hidden email]> 于2019年7月3日周三 下午8:46写道: > > >>>> > > >>>>> So this would not be a source release then, but a full-blown binary > > >>>>> release. > > >>>>> > > >>>>> Maybe it is just me, but I find it a bit suspect to ship an entire > > java > > >>>>> application via PyPI, just because there's a Python API for it. > > >>>>> > > >>>>> We definitely need input from more people here. > > >>>>> > > >>>>> On 03/07/2019 14:09, Dian Fu wrote: > > >>>>>> Hi Chesnay, > > >>>>>> > > >>>>>> Thanks a lot for the suggestions. > > >>>>>> > > >>>>>> Regarding “distributing java/scala code to PyPI”: > > >>>>>> The Python Table API is just a wrapper of the Java Table API and > > >>>> without > > >>>>> the java/scala code, two steps will be needed to set up an > > environment > > >>> to > > >>>>> execute a Python Table API program: > > >>>>>> 1) Install pyflink using "pip install apache-flink" > > >>>>>> 2) Download the flink distribution and set the FLINK_HOME to it. > > >>>>>> Besides, users have to make sure that the manually installed Flink > > is > > >>>>> compatible with the pip installed pyflink. > > >>>>>> Bundle the java/scala code inside the Python package will > eliminate > > >>>> step > > >>>>> 2) and makes it more simple for users to install pyflink. There > was a > > >>>> short > > >>>>> discussion <https://issues.apache.org/jira/browse/SPARK-1267> on > > this > > >>> in > > >>>>> Spark community and they finally decide to package the java/scala > > code > > >>> in > > >>>>> the python package. (BTW, PySpark only bundle the jars of scala > > 2.11). > > >>>>>> Regards, > > >>>>>> Dian > > >>>>>> > > >>>>>>> 在 2019年7月3日,下午7:13,Chesnay Schepler <[hidden email]> 写道: > > >>>>>>> > > >>>>>>> The existing artifact in the pyflink project was neither released > > by > > >>>>> the Flink project / anyone affiliated with it nor approved by the > > Flink > > >>>> PMC. > > >>>>>>> As such, if we were to use this account I believe we should > delete > > >>> it > > >>>>> to not mislead users that this is in any way an apache-provided > > >>>>> distribution. Since this goes against the users wishes, I would be > in > > >>>> favor > > >>>>> of creating a separate account, and giving back control over the > > >>> pyflink > > >>>>> account. > > >>>>>>> My take on the raised points: > > >>>>>>> 1.1) "apache-flink" > > >>>>>>> 1.2) option 2 > > >>>>>>> 2) Given that we only distribute python code there should be no > > >>> reason > > >>>>> to differentiate between scala versions. We should not be > > distributing > > >>>> any > > >>>>> java/scala code and/or modules to PyPi. Currently, I'm a bit > confused > > >>>> about > > >>>>> this question and wonder what exactly we are trying to publish > here. > > >>>>>>> 3) The should be treated as any other source release; i.e., it > > >>> needs a > > >>>>> LICENSE and NOTICE file, signatures and a PMC vote. My suggestion > > would > > >>>> be > > >>>>> to make this part of our normal release process. There will be > _one_ > > >>>> source > > >>>>> release on dist.apache.org encompassing everything, and a separate > > >>>> python > > >>>>> of focused source release that we push to PyPi. The LICENSE and > > NOTICE > > >>>>> contained in the python source release must also be present in the > > >>> source > > >>>>> release of Flink; so basically the python source release is just > the > > >>>>> contents of flink-python module the maven pom.xml, with no other > > >>> special > > >>>>> sauce added during the release process. > > >>>>>>> On 02/07/2019 05:42, jincheng sun wrote: > > >>>>>>>> Hi all, > > >>>>>>>> > > >>>>>>>> With the effort of FLIP-38 [1], the Python Table API(without UDF > > >>>>> support > > >>>>>>>> for now) will be supported in the coming release-1.9. > > >>>>>>>> As described in "Build PyFlink"[2], if users want to use the > > Python > > >>>>> Table > > >>>>>>>> API, they can manually install it using the command: > > >>>>>>>> "cd flink-python && python3 setup.py sdist && pip install > > >>>>> dist/*.tar.gz". > > >>>>>>>> This is non-trivial for users and it will be better if we can > > >>> follow > > >>>>> the > > >>>>>>>> Python way to publish PyFlink to PyPI > > >>>>>>>> which is a repository of software for the Python programming > > >>>> language. > > >>>>> Then > > >>>>>>>> users can use the standard Python package > > >>>>>>>> manager "pip" to install PyFlink: "pip install pyflink". So, > there > > >>>> are > > >>>>> some > > >>>>>>>> topic need to be discussed as follows: > > >>>>>>>> > > >>>>>>>> 1. How to publish PyFlink to PyPI > > >>>>>>>> > > >>>>>>>> 1.1 Project Name > > >>>>>>>> We need to decide the project name of PyPI to use, for > > >>> example, > > >>>>>>>> apache-flink, pyflink, etc. > > >>>>>>>> > > >>>>>>>> Regarding to the name "pyflink", it has already been > > >>> registered > > >>>> by > > >>>>>>>> @ueqt and there is already a package '1.0' released under this > > >>>> project > > >>>>>>>> which is based on flink-libraries/flink-python. > > >>>>>>>> > > >>>>>>>> @ueqt has kindly agreed to give this project back to the > > >>>>> community. And > > >>>>>>>> he has requested that the released package '1.0' should not be > > >>>> removed > > >>>>> as > > >>>>>>>> it has already been used in their company. > > >>>>>>>> > > >>>>>>>> So we need to decide whether to use the name 'pyflink'? If > > >>> yes, > > >>>>> we > > >>>>>>>> need to figure out how to tackle with the package '1.0' under > this > > >>>>> project. > > >>>>>>>> From the points of my view, the "pyflink" is better for our > > >>>>> project > > >>>>>>>> name and we can keep the release of 1.0, maybe more people want > to > > >>>> use. > > >>>>>>>> 1.2 PyPI account for release > > >>>>>>>> We need also decide on which account to use to publish > > >>> packages > > >>>>> to PyPI. > > >>>>>>>> There are two permissions in PyPI: owner and maintainer: > > >>>>>>>> > > >>>>>>>> 1) The owner can upload releases, delete files, releases or > > >>> the > > >>>>> entire > > >>>>>>>> project. > > >>>>>>>> 2) The maintainer can also upload releases. However, they > > >>> cannot > > >>>>> delete > > >>>>>>>> files, releases, or the project. > > >>>>>>>> > > >>>>>>>> So there are two options in my mind: > > >>>>>>>> > > >>>>>>>> 1) Create an account such as 'pyflink' as the owner share > it > > >>>> with > > >>>>> all > > >>>>>>>> the release managers and then release managers can publish the > > >>>> package > > >>>>> to > > >>>>>>>> PyPI using this account. > > >>>>>>>> 2) Create an account such as 'pyflink' as owner(only PMC > can > > >>>>> manage it) > > >>>>>>>> and adds the release manager's account as maintainers of the > > >>> project. > > >>>>>>>> Release managers publish the package to PyPI using their own > > >>> account. > > >>>>>>>> As I know, PySpark takes Option 1) and Apache Beam takes > > >>> Option > > >>>>> 2). > > >>>>>>>> From the points of my view, I prefer option 2) as it's > pretty > > >>>>> safer as > > >>>>>>>> it eliminate the risk of deleting old releases occasionally and > at > > >>>> the > > >>>>> same > > >>>>>>>> time keeps the trace of who is operating. > > >>>>>>>> > > >>>>>>>> 2. How to handle Scala_2.11 and Scala_2.12 > > >>>>>>>> > > >>>>>>>> The PyFlink package bundles the jars in the package. As we know, > > >>>> there > > >>>>> are > > >>>>>>>> two versions of jars for each module: one for Scala 2.11 and the > > >>>> other > > >>>>> for > > >>>>>>>> Scala 2.12. So there will be two PyFlink packages theoretically. > > We > > >>>>> need to > > >>>>>>>> decide which one to publish to PyPI or both. If both packages > will > > >>> be > > >>>>>>>> published to PyPI, we may need two projects, such as pyflink_211 > > >>> and > > >>>>>>>> pyflink_212 separately. Maybe more in the future such as > > >>> pyflink_213. > > >>>>>>>> (BTW, I think we should bring up a discussion for dorp > > >>>> Scala_2.11 > > >>>>> in > > >>>>>>>> Flink 1.10 release due to 2.13 is available in early June.) > > >>>>>>>> > > >>>>>>>> From the points of my view, for now, we can only release > the > > >>>>> scala_2.11 > > >>>>>>>> version, due to scala_2.11 is our default version in Flink. > > >>>>>>>> > > >>>>>>>> 3. Legal problems of publishing to PyPI > > >>>>>>>> > > >>>>>>>> As @Chesnay Schepler <[hidden email]> pointed out in > > >>>>> FLINK-13011[3], > > >>>>>>>> publishing PyFlink to PyPI means that we will publish binaries > to > > a > > >>>>>>>> distribution channel not owned by Apache. We need to figure out > if > > >>>>> there > > >>>>>>>> are legal problems. From my point of view, there are no problems > > >>> as a > > >>>>> few > > >>>>>>>> Apache projects such as Spark, Beam, etc have already done it. > > >>>> Frankly > > >>>>>>>> speaking, I am not familiar with this problem, welcome any > > feedback > > >>>> on > > >>>>> this > > >>>>>>>> if somebody is more family with this. > > >>>>>>>> > > >>>>>>>> Great thanks to @ueqt for willing to dedicate PyPI's project > name > > >>>>> `pyflink` > > >>>>>>>> to the Apache Flink community!!! > > >>>>>>>> Great thanks to @Dian for the offline effort!!! > > >>>>>>>> > > >>>>>>>> Best, > > >>>>>>>> Jincheng > > >>>>>>>> > > >>>>>>>> [1] > > >>>>>>>> > > >>> > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-38%3A+Python+Table+API > > >>>>>>>> [2] > > >>>>>>>> > > >>> > > > https://ci.apache.org/projects/flink/flink-docs-master/flinkDev/building.html#build-pyflink > > >>>>>>>> [3] https://issues.apache.org/jira/browse/FLINK-13011 > > >>>>>>>> > > >>>>> > > >> > > >> -- > > >> Best Regards > > >> > > >> Jeff Zhang > > > > > > > > |
Yes Till, I think you are correct that we should make sure that the
published Flink Python API cannot be arbitrarily deleted. So, It seems that our current consensus is: 1. Should we re publish the PyFlink into PyPI --> YES 2. PyPI Project Name ---> apache-flink 3. How to handle Scala_2.11 and Scala_2.12 ---> We only release one binary with the default Scala version same with flink default config. 4. PyPI account for release --> Create an account such as 'pyflink' as owner(only PMC can manage it) and adds the release manager's account as maintainers of the project. Release managers publish the package to PyPI using their own account but can not delete the release. So, If there no other comments, I think we should initiate a voting thread. What do you think? Best, Jincheng Till Rohrmann <[hidden email]> 于2019年7月24日周三 下午1:17写道: > Sorry for chiming in so late. I would be in favor of option #2. > > I guess that the PMC would need to give the credentials to the release > manager for option #1. Hence, the PMC could also add the release manager as > a maintainer which makes sure that only the PMC can delete artifacts. > > Cheers, > Till > > On Wed, Jul 24, 2019 at 12:33 PM jincheng sun <[hidden email]> > wrote: > > > Hi all, > > > > Thanks for all of your reply! > > > > Hi Stephan, thanks for the reply and prove the details we need to pay > > attention to. such as: Readme and Trademark compliance. Regarding the > PyPI > > account for release, #1 may have some risk that our release package can > be > > deleted by anyone who know the password of the account. And in this case > > PMC would not have means to correct problems. So, I think the #2 is > pretty > > safe for flink community. > > > > Hi Jeff&Dian, thanks for share your thoughts. Python API just a language > > entry point. I think which binary should be contained in the release we > > should make consistency with Java release policy. So, currently we do > not > > add the Hadoop, connectors JARs into the release package. > > > > Hi Chesnay, agree that we should ship the very common binary in feature > if > > Java side already make the decision. > > > > So, our current consensus is: > > 1. Should we re publish the PyFlink into PyPI --> YES > > 2. PyPI Project Name ---> apache-flink > > 3. How to handle Scala_2.11 and Scala_2.12 ---> We only release one > binary > > with the default Scala version same with flink default config. > > > > We still need discuss how to manage PyPI account for release: > > -------- > > > 1) Create an account such as 'pyflink' as the owner share it with all > the > > release managers and then release managers can publish the package to > PyPI > > using this account. > > 2) Create an account such as 'pyflink' as owner(only PMC can manage > it) > > and adds the release manager's account as maintainers of the project. > > Release managers publish the package to PyPI using their own account. > > -------- > > Stephan like the #1 but want PMC can correct the problems. (sounds like > #2) > > can you conform that ? @Stephan > > Chesnay and I prefer to #2 > > > > Best, Jincheng > > > > Chesnay Schepler <[hidden email]> 于2019年7月24日周三 下午3:57写道: > > > > > if we ship a binary, we should ship the binary we usually ship, not > some > > > highly customized version. > > > > > > On 24/07/2019 05:19, Dian Fu wrote: > > > > Hi Stephan & Jeff, > > > > > > > > Thanks a lot for sharing your thoughts! > > > > > > > > Regarding the bundled jars, currently only the jars in the flink > binary > > > distribution is packaged in the pyflink package. That maybe a good idea > > to > > > also bundle the other jars such as flink-hadoop-compatibility. We may > > need > > > also consider whether to bundle the format jars such as flink-avro, > > > flink-json, flink-csv and the connector jars such as > > flink-connector-kafka, > > > etc. > > > > > > > > If FLINK_HOME is set, the binary distribution specified by FLINK_HOME > > > will be used instead. > > > > > > > > Regards, > > > > Dian > > > > > > > >> 在 2019年7月24日,上午9:47,Jeff Zhang <[hidden email]> 写道: > > > >> > > > >> +1 for publishing pyflink to pypi. > > > >> > > > >> Regarding including jar, I just want to make sure which flink binary > > > >> distribution we would ship with pyflink since we have multiple flink > > > binary > > > >> distributions (w/o hadoop). > > > >> Personally, I prefer to use the hadoop-included binary distribution. > > > >> > > > >> And I just want to confirm whether it is possible for users to use a > > > >> different flink binary distribution as long as he set env > FLINK_HOME. > > > >> > > > >> Besides that, I hope that there will be bi-direction link reference > > > between > > > >> flink doc and pypi doc. > > > >> > > > >> > > > >> > > > >> Stephan Ewen <[hidden email]> 于2019年7月24日周三 上午12:07写道: > > > >> > > > >>> Hi! > > > >>> > > > >>> Sorry for the late involvement. Here are some thoughts from my > side: > > > >>> > > > >>> Definitely +1 to publishing to PyPy, even if it is a binary > release. > > > >>> Community growth into other communities is great, and if this is > the > > > >>> natural way to reach developers in the Python community, let's do > it. > > > This > > > >>> is not about our convenience, but reaching users. > > > >>> > > > >>> I think the way to look at this is that this is a convenience > > > distribution > > > >>> channel, courtesy of the Flink community. It is not an Apache > > release, > > > we > > > >>> make this clear in the Readme. > > > >>> Of course, this doesn't mean we don't try to uphold similar > standards > > > as > > > >>> for our official release (like proper license information). > > > >>> > > > >>> Concerning credentials sharing, I would be fine with whatever > option. > > > The > > > >>> PMC doesn't own it (it is an initiative by some community members), > > > but the > > > >>> PMC needs to ensure trademark compliance, so slight preference for > > > option > > > >>> #1 (PMC would have means to correct problems). > > > >>> > > > >>> I believe there is no need to differentiate between Scala versions, > > > because > > > >>> this is merely a convenience thing for pure Python users. Users > that > > > mix > > > >>> python and scala (and thus depend on specific scala versions) can > > still > > > >>> download from Apache or build themselves. > > > >>> > > > >>> Best, > > > >>> Stephan > > > >>> > > > >>> > > > >>> > > > >>> On Thu, Jul 4, 2019 at 9:51 AM jincheng sun < > > [hidden email]> > > > >>> wrote: > > > >>> > > > >>>> Hi All, > > > >>>> > > > >>>> Thanks for the feedback @Chesnay Schepler <[hidden email]> > > @Dian! > > > >>>> > > > >>>> I think using `apache-flink` for the project name also makes sense > > to > > > me. > > > >>>> due to we should always keep in mind that Flink is owned by > Apache. > > > (And > > > >>>> beam also using this pattern `apache-beam` for Python API) > > > >>>> > > > >>>> Regarding the Python API release with the JAVA JARs, I think the > > > >>> principle > > > >>>> of consideration is the convenience of the user. So, Thanks for > the > > > >>>> explanation @Dian! > > > >>>> > > > >>>> And your right @Chesnay Schepler <[hidden email]> we can't > > make > > > a > > > >>>> hasty decision and we need more people's opinions! > > > >>>> > > > >>>> So, I appreciate it if anyone can give us feedback and > suggestions! > > > >>>> > > > >>>> Best, > > > >>>> Jincheng > > > >>>> > > > >>>> > > > >>>> > > > >>>> > > > >>>> Chesnay Schepler <[hidden email]> 于2019年7月3日周三 下午8:46写道: > > > >>>> > > > >>>>> So this would not be a source release then, but a full-blown > binary > > > >>>>> release. > > > >>>>> > > > >>>>> Maybe it is just me, but I find it a bit suspect to ship an > entire > > > java > > > >>>>> application via PyPI, just because there's a Python API for it. > > > >>>>> > > > >>>>> We definitely need input from more people here. > > > >>>>> > > > >>>>> On 03/07/2019 14:09, Dian Fu wrote: > > > >>>>>> Hi Chesnay, > > > >>>>>> > > > >>>>>> Thanks a lot for the suggestions. > > > >>>>>> > > > >>>>>> Regarding “distributing java/scala code to PyPI”: > > > >>>>>> The Python Table API is just a wrapper of the Java Table API and > > > >>>> without > > > >>>>> the java/scala code, two steps will be needed to set up an > > > environment > > > >>> to > > > >>>>> execute a Python Table API program: > > > >>>>>> 1) Install pyflink using "pip install apache-flink" > > > >>>>>> 2) Download the flink distribution and set the FLINK_HOME to it. > > > >>>>>> Besides, users have to make sure that the manually installed > Flink > > > is > > > >>>>> compatible with the pip installed pyflink. > > > >>>>>> Bundle the java/scala code inside the Python package will > > eliminate > > > >>>> step > > > >>>>> 2) and makes it more simple for users to install pyflink. There > > was a > > > >>>> short > > > >>>>> discussion <https://issues.apache.org/jira/browse/SPARK-1267> on > > > this > > > >>> in > > > >>>>> Spark community and they finally decide to package the java/scala > > > code > > > >>> in > > > >>>>> the python package. (BTW, PySpark only bundle the jars of scala > > > 2.11). > > > >>>>>> Regards, > > > >>>>>> Dian > > > >>>>>> > > > >>>>>>> 在 2019年7月3日,下午7:13,Chesnay Schepler <[hidden email]> 写道: > > > >>>>>>> > > > >>>>>>> The existing artifact in the pyflink project was neither > released > > > by > > > >>>>> the Flink project / anyone affiliated with it nor approved by the > > > Flink > > > >>>> PMC. > > > >>>>>>> As such, if we were to use this account I believe we should > > delete > > > >>> it > > > >>>>> to not mislead users that this is in any way an apache-provided > > > >>>>> distribution. Since this goes against the users wishes, I would > be > > in > > > >>>> favor > > > >>>>> of creating a separate account, and giving back control over the > > > >>> pyflink > > > >>>>> account. > > > >>>>>>> My take on the raised points: > > > >>>>>>> 1.1) "apache-flink" > > > >>>>>>> 1.2) option 2 > > > >>>>>>> 2) Given that we only distribute python code there should be no > > > >>> reason > > > >>>>> to differentiate between scala versions. We should not be > > > distributing > > > >>>> any > > > >>>>> java/scala code and/or modules to PyPi. Currently, I'm a bit > > confused > > > >>>> about > > > >>>>> this question and wonder what exactly we are trying to publish > > here. > > > >>>>>>> 3) The should be treated as any other source release; i.e., it > > > >>> needs a > > > >>>>> LICENSE and NOTICE file, signatures and a PMC vote. My suggestion > > > would > > > >>>> be > > > >>>>> to make this part of our normal release process. There will be > > _one_ > > > >>>> source > > > >>>>> release on dist.apache.org encompassing everything, and a > separate > > > >>>> python > > > >>>>> of focused source release that we push to PyPi. The LICENSE and > > > NOTICE > > > >>>>> contained in the python source release must also be present in > the > > > >>> source > > > >>>>> release of Flink; so basically the python source release is just > > the > > > >>>>> contents of flink-python module the maven pom.xml, with no other > > > >>> special > > > >>>>> sauce added during the release process. > > > >>>>>>> On 02/07/2019 05:42, jincheng sun wrote: > > > >>>>>>>> Hi all, > > > >>>>>>>> > > > >>>>>>>> With the effort of FLIP-38 [1], the Python Table API(without > UDF > > > >>>>> support > > > >>>>>>>> for now) will be supported in the coming release-1.9. > > > >>>>>>>> As described in "Build PyFlink"[2], if users want to use the > > > Python > > > >>>>> Table > > > >>>>>>>> API, they can manually install it using the command: > > > >>>>>>>> "cd flink-python && python3 setup.py sdist && pip install > > > >>>>> dist/*.tar.gz". > > > >>>>>>>> This is non-trivial for users and it will be better if we can > > > >>> follow > > > >>>>> the > > > >>>>>>>> Python way to publish PyFlink to PyPI > > > >>>>>>>> which is a repository of software for the Python programming > > > >>>> language. > > > >>>>> Then > > > >>>>>>>> users can use the standard Python package > > > >>>>>>>> manager "pip" to install PyFlink: "pip install pyflink". So, > > there > > > >>>> are > > > >>>>> some > > > >>>>>>>> topic need to be discussed as follows: > > > >>>>>>>> > > > >>>>>>>> 1. How to publish PyFlink to PyPI > > > >>>>>>>> > > > >>>>>>>> 1.1 Project Name > > > >>>>>>>> We need to decide the project name of PyPI to use, for > > > >>> example, > > > >>>>>>>> apache-flink, pyflink, etc. > > > >>>>>>>> > > > >>>>>>>> Regarding to the name "pyflink", it has already been > > > >>> registered > > > >>>> by > > > >>>>>>>> @ueqt and there is already a package '1.0' released under this > > > >>>> project > > > >>>>>>>> which is based on flink-libraries/flink-python. > > > >>>>>>>> > > > >>>>>>>> @ueqt has kindly agreed to give this project back to the > > > >>>>> community. And > > > >>>>>>>> he has requested that the released package '1.0' should not be > > > >>>> removed > > > >>>>> as > > > >>>>>>>> it has already been used in their company. > > > >>>>>>>> > > > >>>>>>>> So we need to decide whether to use the name 'pyflink'? > If > > > >>> yes, > > > >>>>> we > > > >>>>>>>> need to figure out how to tackle with the package '1.0' under > > this > > > >>>>> project. > > > >>>>>>>> From the points of my view, the "pyflink" is better for > our > > > >>>>> project > > > >>>>>>>> name and we can keep the release of 1.0, maybe more people > want > > to > > > >>>> use. > > > >>>>>>>> 1.2 PyPI account for release > > > >>>>>>>> We need also decide on which account to use to publish > > > >>> packages > > > >>>>> to PyPI. > > > >>>>>>>> There are two permissions in PyPI: owner and maintainer: > > > >>>>>>>> > > > >>>>>>>> 1) The owner can upload releases, delete files, releases > or > > > >>> the > > > >>>>> entire > > > >>>>>>>> project. > > > >>>>>>>> 2) The maintainer can also upload releases. However, they > > > >>> cannot > > > >>>>> delete > > > >>>>>>>> files, releases, or the project. > > > >>>>>>>> > > > >>>>>>>> So there are two options in my mind: > > > >>>>>>>> > > > >>>>>>>> 1) Create an account such as 'pyflink' as the owner share > > it > > > >>>> with > > > >>>>> all > > > >>>>>>>> the release managers and then release managers can publish the > > > >>>> package > > > >>>>> to > > > >>>>>>>> PyPI using this account. > > > >>>>>>>> 2) Create an account such as 'pyflink' as owner(only PMC > > can > > > >>>>> manage it) > > > >>>>>>>> and adds the release manager's account as maintainers of the > > > >>> project. > > > >>>>>>>> Release managers publish the package to PyPI using their own > > > >>> account. > > > >>>>>>>> As I know, PySpark takes Option 1) and Apache Beam takes > > > >>> Option > > > >>>>> 2). > > > >>>>>>>> From the points of my view, I prefer option 2) as it's > > pretty > > > >>>>> safer as > > > >>>>>>>> it eliminate the risk of deleting old releases occasionally > and > > at > > > >>>> the > > > >>>>> same > > > >>>>>>>> time keeps the trace of who is operating. > > > >>>>>>>> > > > >>>>>>>> 2. How to handle Scala_2.11 and Scala_2.12 > > > >>>>>>>> > > > >>>>>>>> The PyFlink package bundles the jars in the package. As we > know, > > > >>>> there > > > >>>>> are > > > >>>>>>>> two versions of jars for each module: one for Scala 2.11 and > the > > > >>>> other > > > >>>>> for > > > >>>>>>>> Scala 2.12. So there will be two PyFlink packages > theoretically. > > > We > > > >>>>> need to > > > >>>>>>>> decide which one to publish to PyPI or both. If both packages > > will > > > >>> be > > > >>>>>>>> published to PyPI, we may need two projects, such as > pyflink_211 > > > >>> and > > > >>>>>>>> pyflink_212 separately. Maybe more in the future such as > > > >>> pyflink_213. > > > >>>>>>>> (BTW, I think we should bring up a discussion for dorp > > > >>>> Scala_2.11 > > > >>>>> in > > > >>>>>>>> Flink 1.10 release due to 2.13 is available in early June.) > > > >>>>>>>> > > > >>>>>>>> From the points of my view, for now, we can only release > > the > > > >>>>> scala_2.11 > > > >>>>>>>> version, due to scala_2.11 is our default version in Flink. > > > >>>>>>>> > > > >>>>>>>> 3. Legal problems of publishing to PyPI > > > >>>>>>>> > > > >>>>>>>> As @Chesnay Schepler <[hidden email]> pointed out in > > > >>>>> FLINK-13011[3], > > > >>>>>>>> publishing PyFlink to PyPI means that we will publish binaries > > to > > > a > > > >>>>>>>> distribution channel not owned by Apache. We need to figure > out > > if > > > >>>>> there > > > >>>>>>>> are legal problems. From my point of view, there are no > problems > > > >>> as a > > > >>>>> few > > > >>>>>>>> Apache projects such as Spark, Beam, etc have already done it. > > > >>>> Frankly > > > >>>>>>>> speaking, I am not familiar with this problem, welcome any > > > feedback > > > >>>> on > > > >>>>> this > > > >>>>>>>> if somebody is more family with this. > > > >>>>>>>> > > > >>>>>>>> Great thanks to @ueqt for willing to dedicate PyPI's project > > name > > > >>>>> `pyflink` > > > >>>>>>>> to the Apache Flink community!!! > > > >>>>>>>> Great thanks to @Dian for the offline effort!!! > > > >>>>>>>> > > > >>>>>>>> Best, > > > >>>>>>>> Jincheng > > > >>>>>>>> > > > >>>>>>>> [1] > > > >>>>>>>> > > > >>> > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-38%3A+Python+Table+API > > > >>>>>>>> [2] > > > >>>>>>>> > > > >>> > > > > > > https://ci.apache.org/projects/flink/flink-docs-master/flinkDev/building.html#build-pyflink > > > >>>>>>>> [3] https://issues.apache.org/jira/browse/FLINK-13011 > > > >>>>>>>> > > > >>>>> > > > >> > > > >> -- > > > >> Best Regards > > > >> > > > >> Jeff Zhang > > > > > > > > > > > > > |
Sounds good to me. Thanks for driving this discussion.
Cheers, Till On Mon, Jul 29, 2019 at 9:24 AM jincheng sun <[hidden email]> wrote: > Yes Till, I think you are correct that we should make sure that the > published Flink Python API cannot be arbitrarily deleted. > > So, It seems that our current consensus is: > > 1. Should we re publish the PyFlink into PyPI --> YES > 2. PyPI Project Name ---> apache-flink > 3. How to handle Scala_2.11 and Scala_2.12 ---> We only release one binary > with the default Scala version same with flink default config. > 4. PyPI account for release --> Create an account such as 'pyflink' as > owner(only PMC can manage it) and adds the release manager's account as > maintainers of the project. Release managers publish the package to PyPI > using their own account but can not delete the release. > > So, If there no other comments, I think we should initiate a voting thread. > > What do you think? > > Best, Jincheng > > > Till Rohrmann <[hidden email]> 于2019年7月24日周三 下午1:17写道: > > > Sorry for chiming in so late. I would be in favor of option #2. > > > > I guess that the PMC would need to give the credentials to the release > > manager for option #1. Hence, the PMC could also add the release manager > as > > a maintainer which makes sure that only the PMC can delete artifacts. > > > > Cheers, > > Till > > > > On Wed, Jul 24, 2019 at 12:33 PM jincheng sun <[hidden email]> > > wrote: > > > > > Hi all, > > > > > > Thanks for all of your reply! > > > > > > Hi Stephan, thanks for the reply and prove the details we need to pay > > > attention to. such as: Readme and Trademark compliance. Regarding the > > PyPI > > > account for release, #1 may have some risk that our release package > can > > be > > > deleted by anyone who know the password of the account. And in this > case > > > PMC would not have means to correct problems. So, I think the #2 is > > pretty > > > safe for flink community. > > > > > > Hi Jeff&Dian, thanks for share your thoughts. Python API just a > language > > > entry point. I think which binary should be contained in the release we > > > should make consistency with Java release policy. So, currently we do > > not > > > add the Hadoop, connectors JARs into the release package. > > > > > > Hi Chesnay, agree that we should ship the very common binary in feature > > if > > > Java side already make the decision. > > > > > > So, our current consensus is: > > > 1. Should we re publish the PyFlink into PyPI --> YES > > > 2. PyPI Project Name ---> apache-flink > > > 3. How to handle Scala_2.11 and Scala_2.12 ---> We only release one > > binary > > > with the default Scala version same with flink default config. > > > > > > We still need discuss how to manage PyPI account for release: > > > -------- > > > > 1) Create an account such as 'pyflink' as the owner share it with all > > the > > > release managers and then release managers can publish the package to > > PyPI > > > using this account. > > > 2) Create an account such as 'pyflink' as owner(only PMC can manage > > it) > > > and adds the release manager's account as maintainers of the project. > > > Release managers publish the package to PyPI using their own account. > > > -------- > > > Stephan like the #1 but want PMC can correct the problems. (sounds like > > #2) > > > can you conform that ? @Stephan > > > Chesnay and I prefer to #2 > > > > > > Best, Jincheng > > > > > > Chesnay Schepler <[hidden email]> 于2019年7月24日周三 下午3:57写道: > > > > > > > if we ship a binary, we should ship the binary we usually ship, not > > some > > > > highly customized version. > > > > > > > > On 24/07/2019 05:19, Dian Fu wrote: > > > > > Hi Stephan & Jeff, > > > > > > > > > > Thanks a lot for sharing your thoughts! > > > > > > > > > > Regarding the bundled jars, currently only the jars in the flink > > binary > > > > distribution is packaged in the pyflink package. That maybe a good > idea > > > to > > > > also bundle the other jars such as flink-hadoop-compatibility. We may > > > need > > > > also consider whether to bundle the format jars such as flink-avro, > > > > flink-json, flink-csv and the connector jars such as > > > flink-connector-kafka, > > > > etc. > > > > > > > > > > If FLINK_HOME is set, the binary distribution specified by > FLINK_HOME > > > > will be used instead. > > > > > > > > > > Regards, > > > > > Dian > > > > > > > > > >> 在 2019年7月24日,上午9:47,Jeff Zhang <[hidden email]> 写道: > > > > >> > > > > >> +1 for publishing pyflink to pypi. > > > > >> > > > > >> Regarding including jar, I just want to make sure which flink > binary > > > > >> distribution we would ship with pyflink since we have multiple > flink > > > > binary > > > > >> distributions (w/o hadoop). > > > > >> Personally, I prefer to use the hadoop-included binary > distribution. > > > > >> > > > > >> And I just want to confirm whether it is possible for users to > use a > > > > >> different flink binary distribution as long as he set env > > FLINK_HOME. > > > > >> > > > > >> Besides that, I hope that there will be bi-direction link > reference > > > > between > > > > >> flink doc and pypi doc. > > > > >> > > > > >> > > > > >> > > > > >> Stephan Ewen <[hidden email]> 于2019年7月24日周三 上午12:07写道: > > > > >> > > > > >>> Hi! > > > > >>> > > > > >>> Sorry for the late involvement. Here are some thoughts from my > > side: > > > > >>> > > > > >>> Definitely +1 to publishing to PyPy, even if it is a binary > > release. > > > > >>> Community growth into other communities is great, and if this is > > the > > > > >>> natural way to reach developers in the Python community, let's do > > it. > > > > This > > > > >>> is not about our convenience, but reaching users. > > > > >>> > > > > >>> I think the way to look at this is that this is a convenience > > > > distribution > > > > >>> channel, courtesy of the Flink community. It is not an Apache > > > release, > > > > we > > > > >>> make this clear in the Readme. > > > > >>> Of course, this doesn't mean we don't try to uphold similar > > standards > > > > as > > > > >>> for our official release (like proper license information). > > > > >>> > > > > >>> Concerning credentials sharing, I would be fine with whatever > > option. > > > > The > > > > >>> PMC doesn't own it (it is an initiative by some community > members), > > > > but the > > > > >>> PMC needs to ensure trademark compliance, so slight preference > for > > > > option > > > > >>> #1 (PMC would have means to correct problems). > > > > >>> > > > > >>> I believe there is no need to differentiate between Scala > versions, > > > > because > > > > >>> this is merely a convenience thing for pure Python users. Users > > that > > > > mix > > > > >>> python and scala (and thus depend on specific scala versions) can > > > still > > > > >>> download from Apache or build themselves. > > > > >>> > > > > >>> Best, > > > > >>> Stephan > > > > >>> > > > > >>> > > > > >>> > > > > >>> On Thu, Jul 4, 2019 at 9:51 AM jincheng sun < > > > [hidden email]> > > > > >>> wrote: > > > > >>> > > > > >>>> Hi All, > > > > >>>> > > > > >>>> Thanks for the feedback @Chesnay Schepler <[hidden email]> > > > @Dian! > > > > >>>> > > > > >>>> I think using `apache-flink` for the project name also makes > sense > > > to > > > > me. > > > > >>>> due to we should always keep in mind that Flink is owned by > > Apache. > > > > (And > > > > >>>> beam also using this pattern `apache-beam` for Python API) > > > > >>>> > > > > >>>> Regarding the Python API release with the JAVA JARs, I think the > > > > >>> principle > > > > >>>> of consideration is the convenience of the user. So, Thanks for > > the > > > > >>>> explanation @Dian! > > > > >>>> > > > > >>>> And your right @Chesnay Schepler <[hidden email]> we can't > > > make > > > > a > > > > >>>> hasty decision and we need more people's opinions! > > > > >>>> > > > > >>>> So, I appreciate it if anyone can give us feedback and > > suggestions! > > > > >>>> > > > > >>>> Best, > > > > >>>> Jincheng > > > > >>>> > > > > >>>> > > > > >>>> > > > > >>>> > > > > >>>> Chesnay Schepler <[hidden email]> 于2019年7月3日周三 下午8:46写道: > > > > >>>> > > > > >>>>> So this would not be a source release then, but a full-blown > > binary > > > > >>>>> release. > > > > >>>>> > > > > >>>>> Maybe it is just me, but I find it a bit suspect to ship an > > entire > > > > java > > > > >>>>> application via PyPI, just because there's a Python API for it. > > > > >>>>> > > > > >>>>> We definitely need input from more people here. > > > > >>>>> > > > > >>>>> On 03/07/2019 14:09, Dian Fu wrote: > > > > >>>>>> Hi Chesnay, > > > > >>>>>> > > > > >>>>>> Thanks a lot for the suggestions. > > > > >>>>>> > > > > >>>>>> Regarding “distributing java/scala code to PyPI”: > > > > >>>>>> The Python Table API is just a wrapper of the Java Table API > and > > > > >>>> without > > > > >>>>> the java/scala code, two steps will be needed to set up an > > > > environment > > > > >>> to > > > > >>>>> execute a Python Table API program: > > > > >>>>>> 1) Install pyflink using "pip install apache-flink" > > > > >>>>>> 2) Download the flink distribution and set the FLINK_HOME to > it. > > > > >>>>>> Besides, users have to make sure that the manually installed > > Flink > > > > is > > > > >>>>> compatible with the pip installed pyflink. > > > > >>>>>> Bundle the java/scala code inside the Python package will > > > eliminate > > > > >>>> step > > > > >>>>> 2) and makes it more simple for users to install pyflink. There > > > was a > > > > >>>> short > > > > >>>>> discussion <https://issues.apache.org/jira/browse/SPARK-1267> > on > > > > this > > > > >>> in > > > > >>>>> Spark community and they finally decide to package the > java/scala > > > > code > > > > >>> in > > > > >>>>> the python package. (BTW, PySpark only bundle the jars of scala > > > > 2.11). > > > > >>>>>> Regards, > > > > >>>>>> Dian > > > > >>>>>> > > > > >>>>>>> 在 2019年7月3日,下午7:13,Chesnay Schepler <[hidden email]> 写道: > > > > >>>>>>> > > > > >>>>>>> The existing artifact in the pyflink project was neither > > released > > > > by > > > > >>>>> the Flink project / anyone affiliated with it nor approved by > the > > > > Flink > > > > >>>> PMC. > > > > >>>>>>> As such, if we were to use this account I believe we should > > > delete > > > > >>> it > > > > >>>>> to not mislead users that this is in any way an apache-provided > > > > >>>>> distribution. Since this goes against the users wishes, I would > > be > > > in > > > > >>>> favor > > > > >>>>> of creating a separate account, and giving back control over > the > > > > >>> pyflink > > > > >>>>> account. > > > > >>>>>>> My take on the raised points: > > > > >>>>>>> 1.1) "apache-flink" > > > > >>>>>>> 1.2) option 2 > > > > >>>>>>> 2) Given that we only distribute python code there should be > no > > > > >>> reason > > > > >>>>> to differentiate between scala versions. We should not be > > > > distributing > > > > >>>> any > > > > >>>>> java/scala code and/or modules to PyPi. Currently, I'm a bit > > > confused > > > > >>>> about > > > > >>>>> this question and wonder what exactly we are trying to publish > > > here. > > > > >>>>>>> 3) The should be treated as any other source release; i.e., > it > > > > >>> needs a > > > > >>>>> LICENSE and NOTICE file, signatures and a PMC vote. My > suggestion > > > > would > > > > >>>> be > > > > >>>>> to make this part of our normal release process. There will be > > > _one_ > > > > >>>> source > > > > >>>>> release on dist.apache.org encompassing everything, and a > > separate > > > > >>>> python > > > > >>>>> of focused source release that we push to PyPi. The LICENSE and > > > > NOTICE > > > > >>>>> contained in the python source release must also be present in > > the > > > > >>> source > > > > >>>>> release of Flink; so basically the python source release is > just > > > the > > > > >>>>> contents of flink-python module the maven pom.xml, with no > other > > > > >>> special > > > > >>>>> sauce added during the release process. > > > > >>>>>>> On 02/07/2019 05:42, jincheng sun wrote: > > > > >>>>>>>> Hi all, > > > > >>>>>>>> > > > > >>>>>>>> With the effort of FLIP-38 [1], the Python Table API(without > > UDF > > > > >>>>> support > > > > >>>>>>>> for now) will be supported in the coming release-1.9. > > > > >>>>>>>> As described in "Build PyFlink"[2], if users want to use the > > > > Python > > > > >>>>> Table > > > > >>>>>>>> API, they can manually install it using the command: > > > > >>>>>>>> "cd flink-python && python3 setup.py sdist && pip install > > > > >>>>> dist/*.tar.gz". > > > > >>>>>>>> This is non-trivial for users and it will be better if we > can > > > > >>> follow > > > > >>>>> the > > > > >>>>>>>> Python way to publish PyFlink to PyPI > > > > >>>>>>>> which is a repository of software for the Python programming > > > > >>>> language. > > > > >>>>> Then > > > > >>>>>>>> users can use the standard Python package > > > > >>>>>>>> manager "pip" to install PyFlink: "pip install pyflink". So, > > > there > > > > >>>> are > > > > >>>>> some > > > > >>>>>>>> topic need to be discussed as follows: > > > > >>>>>>>> > > > > >>>>>>>> 1. How to publish PyFlink to PyPI > > > > >>>>>>>> > > > > >>>>>>>> 1.1 Project Name > > > > >>>>>>>> We need to decide the project name of PyPI to use, for > > > > >>> example, > > > > >>>>>>>> apache-flink, pyflink, etc. > > > > >>>>>>>> > > > > >>>>>>>> Regarding to the name "pyflink", it has already been > > > > >>> registered > > > > >>>> by > > > > >>>>>>>> @ueqt and there is already a package '1.0' released under > this > > > > >>>> project > > > > >>>>>>>> which is based on flink-libraries/flink-python. > > > > >>>>>>>> > > > > >>>>>>>> @ueqt has kindly agreed to give this project back to the > > > > >>>>> community. And > > > > >>>>>>>> he has requested that the released package '1.0' should not > be > > > > >>>> removed > > > > >>>>> as > > > > >>>>>>>> it has already been used in their company. > > > > >>>>>>>> > > > > >>>>>>>> So we need to decide whether to use the name 'pyflink'? > > If > > > > >>> yes, > > > > >>>>> we > > > > >>>>>>>> need to figure out how to tackle with the package '1.0' > under > > > this > > > > >>>>> project. > > > > >>>>>>>> From the points of my view, the "pyflink" is better for > > our > > > > >>>>> project > > > > >>>>>>>> name and we can keep the release of 1.0, maybe more people > > want > > > to > > > > >>>> use. > > > > >>>>>>>> 1.2 PyPI account for release > > > > >>>>>>>> We need also decide on which account to use to publish > > > > >>> packages > > > > >>>>> to PyPI. > > > > >>>>>>>> There are two permissions in PyPI: owner and > maintainer: > > > > >>>>>>>> > > > > >>>>>>>> 1) The owner can upload releases, delete files, > releases > > or > > > > >>> the > > > > >>>>> entire > > > > >>>>>>>> project. > > > > >>>>>>>> 2) The maintainer can also upload releases. However, > they > > > > >>> cannot > > > > >>>>> delete > > > > >>>>>>>> files, releases, or the project. > > > > >>>>>>>> > > > > >>>>>>>> So there are two options in my mind: > > > > >>>>>>>> > > > > >>>>>>>> 1) Create an account such as 'pyflink' as the owner > share > > > it > > > > >>>> with > > > > >>>>> all > > > > >>>>>>>> the release managers and then release managers can publish > the > > > > >>>> package > > > > >>>>> to > > > > >>>>>>>> PyPI using this account. > > > > >>>>>>>> 2) Create an account such as 'pyflink' as owner(only > PMC > > > can > > > > >>>>> manage it) > > > > >>>>>>>> and adds the release manager's account as maintainers of the > > > > >>> project. > > > > >>>>>>>> Release managers publish the package to PyPI using their own > > > > >>> account. > > > > >>>>>>>> As I know, PySpark takes Option 1) and Apache Beam > takes > > > > >>> Option > > > > >>>>> 2). > > > > >>>>>>>> From the points of my view, I prefer option 2) as it's > > > pretty > > > > >>>>> safer as > > > > >>>>>>>> it eliminate the risk of deleting old releases occasionally > > and > > > at > > > > >>>> the > > > > >>>>> same > > > > >>>>>>>> time keeps the trace of who is operating. > > > > >>>>>>>> > > > > >>>>>>>> 2. How to handle Scala_2.11 and Scala_2.12 > > > > >>>>>>>> > > > > >>>>>>>> The PyFlink package bundles the jars in the package. As we > > know, > > > > >>>> there > > > > >>>>> are > > > > >>>>>>>> two versions of jars for each module: one for Scala 2.11 and > > the > > > > >>>> other > > > > >>>>> for > > > > >>>>>>>> Scala 2.12. So there will be two PyFlink packages > > theoretically. > > > > We > > > > >>>>> need to > > > > >>>>>>>> decide which one to publish to PyPI or both. If both > packages > > > will > > > > >>> be > > > > >>>>>>>> published to PyPI, we may need two projects, such as > > pyflink_211 > > > > >>> and > > > > >>>>>>>> pyflink_212 separately. Maybe more in the future such as > > > > >>> pyflink_213. > > > > >>>>>>>> (BTW, I think we should bring up a discussion for dorp > > > > >>>> Scala_2.11 > > > > >>>>> in > > > > >>>>>>>> Flink 1.10 release due to 2.13 is available in early June.) > > > > >>>>>>>> > > > > >>>>>>>> From the points of my view, for now, we can only > release > > > the > > > > >>>>> scala_2.11 > > > > >>>>>>>> version, due to scala_2.11 is our default version in Flink. > > > > >>>>>>>> > > > > >>>>>>>> 3. Legal problems of publishing to PyPI > > > > >>>>>>>> > > > > >>>>>>>> As @Chesnay Schepler <[hidden email]> pointed out in > > > > >>>>> FLINK-13011[3], > > > > >>>>>>>> publishing PyFlink to PyPI means that we will publish > binaries > > > to > > > > a > > > > >>>>>>>> distribution channel not owned by Apache. We need to figure > > out > > > if > > > > >>>>> there > > > > >>>>>>>> are legal problems. From my point of view, there are no > > problems > > > > >>> as a > > > > >>>>> few > > > > >>>>>>>> Apache projects such as Spark, Beam, etc have already done > it. > > > > >>>> Frankly > > > > >>>>>>>> speaking, I am not familiar with this problem, welcome any > > > > feedback > > > > >>>> on > > > > >>>>> this > > > > >>>>>>>> if somebody is more family with this. > > > > >>>>>>>> > > > > >>>>>>>> Great thanks to @ueqt for willing to dedicate PyPI's project > > > name > > > > >>>>> `pyflink` > > > > >>>>>>>> to the Apache Flink community!!! > > > > >>>>>>>> Great thanks to @Dian for the offline effort!!! > > > > >>>>>>>> > > > > >>>>>>>> Best, > > > > >>>>>>>> Jincheng > > > > >>>>>>>> > > > > >>>>>>>> [1] > > > > >>>>>>>> > > > > >>> > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-38%3A+Python+Table+API > > > > >>>>>>>> [2] > > > > >>>>>>>> > > > > >>> > > > > > > > > > > https://ci.apache.org/projects/flink/flink-docs-master/flinkDev/building.html#build-pyflink > > > > >>>>>>>> [3] https://issues.apache.org/jira/browse/FLINK-13011 > > > > >>>>>>>> > > > > >>>>> > > > > >> > > > > >> -- > > > > >> Best Regards > > > > >> > > > > >> Jeff Zhang > > > > > > > > > > > > > > > > > > > |
Thanks for your confirm Till !
Publish the PyFlink into PyPI is very important for our user, I have initiated a voting thread. Best, Jincheng Till Rohrmann <[hidden email]> 于2019年7月29日周一 下午3:01写道: > Sounds good to me. Thanks for driving this discussion. > > Cheers, > Till > > On Mon, Jul 29, 2019 at 9:24 AM jincheng sun <[hidden email]> > wrote: > > > Yes Till, I think you are correct that we should make sure that the > > published Flink Python API cannot be arbitrarily deleted. > > > > So, It seems that our current consensus is: > > > > 1. Should we re publish the PyFlink into PyPI --> YES > > 2. PyPI Project Name ---> apache-flink > > 3. How to handle Scala_2.11 and Scala_2.12 ---> We only release one > binary > > with the default Scala version same with flink default config. > > 4. PyPI account for release --> Create an account such as 'pyflink' as > > owner(only PMC can manage it) and adds the release manager's account as > > maintainers of the project. Release managers publish the package to PyPI > > using their own account but can not delete the release. > > > > So, If there no other comments, I think we should initiate a voting > thread. > > > > What do you think? > > > > Best, Jincheng > > > > > > Till Rohrmann <[hidden email]> 于2019年7月24日周三 下午1:17写道: > > > > > Sorry for chiming in so late. I would be in favor of option #2. > > > > > > I guess that the PMC would need to give the credentials to the release > > > manager for option #1. Hence, the PMC could also add the release > manager > > as > > > a maintainer which makes sure that only the PMC can delete artifacts. > > > > > > Cheers, > > > Till > > > > > > On Wed, Jul 24, 2019 at 12:33 PM jincheng sun < > [hidden email]> > > > wrote: > > > > > > > Hi all, > > > > > > > > Thanks for all of your reply! > > > > > > > > Hi Stephan, thanks for the reply and prove the details we need to pay > > > > attention to. such as: Readme and Trademark compliance. Regarding the > > > PyPI > > > > account for release, #1 may have some risk that our release package > > can > > > be > > > > deleted by anyone who know the password of the account. And in this > > case > > > > PMC would not have means to correct problems. So, I think the #2 is > > > pretty > > > > safe for flink community. > > > > > > > > Hi Jeff&Dian, thanks for share your thoughts. Python API just a > > language > > > > entry point. I think which binary should be contained in the release > we > > > > should make consistency with Java release policy. So, currently we > do > > > not > > > > add the Hadoop, connectors JARs into the release package. > > > > > > > > Hi Chesnay, agree that we should ship the very common binary in > feature > > > if > > > > Java side already make the decision. > > > > > > > > So, our current consensus is: > > > > 1. Should we re publish the PyFlink into PyPI --> YES > > > > 2. PyPI Project Name ---> apache-flink > > > > 3. How to handle Scala_2.11 and Scala_2.12 ---> We only release one > > > binary > > > > with the default Scala version same with flink default config. > > > > > > > > We still need discuss how to manage PyPI account for release: > > > > -------- > > > > > 1) Create an account such as 'pyflink' as the owner share it with > all > > > the > > > > release managers and then release managers can publish the package to > > > PyPI > > > > using this account. > > > > 2) Create an account such as 'pyflink' as owner(only PMC can > manage > > > it) > > > > and adds the release manager's account as maintainers of the project. > > > > Release managers publish the package to PyPI using their own account. > > > > -------- > > > > Stephan like the #1 but want PMC can correct the problems. (sounds > like > > > #2) > > > > can you conform that ? @Stephan > > > > Chesnay and I prefer to #2 > > > > > > > > Best, Jincheng > > > > > > > > Chesnay Schepler <[hidden email]> 于2019年7月24日周三 下午3:57写道: > > > > > > > > > if we ship a binary, we should ship the binary we usually ship, not > > > some > > > > > highly customized version. > > > > > > > > > > On 24/07/2019 05:19, Dian Fu wrote: > > > > > > Hi Stephan & Jeff, > > > > > > > > > > > > Thanks a lot for sharing your thoughts! > > > > > > > > > > > > Regarding the bundled jars, currently only the jars in the flink > > > binary > > > > > distribution is packaged in the pyflink package. That maybe a good > > idea > > > > to > > > > > also bundle the other jars such as flink-hadoop-compatibility. We > may > > > > need > > > > > also consider whether to bundle the format jars such as flink-avro, > > > > > flink-json, flink-csv and the connector jars such as > > > > flink-connector-kafka, > > > > > etc. > > > > > > > > > > > > If FLINK_HOME is set, the binary distribution specified by > > FLINK_HOME > > > > > will be used instead. > > > > > > > > > > > > Regards, > > > > > > Dian > > > > > > > > > > > >> 在 2019年7月24日,上午9:47,Jeff Zhang <[hidden email]> 写道: > > > > > >> > > > > > >> +1 for publishing pyflink to pypi. > > > > > >> > > > > > >> Regarding including jar, I just want to make sure which flink > > binary > > > > > >> distribution we would ship with pyflink since we have multiple > > flink > > > > > binary > > > > > >> distributions (w/o hadoop). > > > > > >> Personally, I prefer to use the hadoop-included binary > > distribution. > > > > > >> > > > > > >> And I just want to confirm whether it is possible for users to > > use a > > > > > >> different flink binary distribution as long as he set env > > > FLINK_HOME. > > > > > >> > > > > > >> Besides that, I hope that there will be bi-direction link > > reference > > > > > between > > > > > >> flink doc and pypi doc. > > > > > >> > > > > > >> > > > > > >> > > > > > >> Stephan Ewen <[hidden email]> 于2019年7月24日周三 上午12:07写道: > > > > > >> > > > > > >>> Hi! > > > > > >>> > > > > > >>> Sorry for the late involvement. Here are some thoughts from my > > > side: > > > > > >>> > > > > > >>> Definitely +1 to publishing to PyPy, even if it is a binary > > > release. > > > > > >>> Community growth into other communities is great, and if this > is > > > the > > > > > >>> natural way to reach developers in the Python community, let's > do > > > it. > > > > > This > > > > > >>> is not about our convenience, but reaching users. > > > > > >>> > > > > > >>> I think the way to look at this is that this is a convenience > > > > > distribution > > > > > >>> channel, courtesy of the Flink community. It is not an Apache > > > > release, > > > > > we > > > > > >>> make this clear in the Readme. > > > > > >>> Of course, this doesn't mean we don't try to uphold similar > > > standards > > > > > as > > > > > >>> for our official release (like proper license information). > > > > > >>> > > > > > >>> Concerning credentials sharing, I would be fine with whatever > > > option. > > > > > The > > > > > >>> PMC doesn't own it (it is an initiative by some community > > members), > > > > > but the > > > > > >>> PMC needs to ensure trademark compliance, so slight preference > > for > > > > > option > > > > > >>> #1 (PMC would have means to correct problems). > > > > > >>> > > > > > >>> I believe there is no need to differentiate between Scala > > versions, > > > > > because > > > > > >>> this is merely a convenience thing for pure Python users. Users > > > that > > > > > mix > > > > > >>> python and scala (and thus depend on specific scala versions) > can > > > > still > > > > > >>> download from Apache or build themselves. > > > > > >>> > > > > > >>> Best, > > > > > >>> Stephan > > > > > >>> > > > > > >>> > > > > > >>> > > > > > >>> On Thu, Jul 4, 2019 at 9:51 AM jincheng sun < > > > > [hidden email]> > > > > > >>> wrote: > > > > > >>> > > > > > >>>> Hi All, > > > > > >>>> > > > > > >>>> Thanks for the feedback @Chesnay Schepler <[hidden email] > > > > > > @Dian! > > > > > >>>> > > > > > >>>> I think using `apache-flink` for the project name also makes > > sense > > > > to > > > > > me. > > > > > >>>> due to we should always keep in mind that Flink is owned by > > > Apache. > > > > > (And > > > > > >>>> beam also using this pattern `apache-beam` for Python API) > > > > > >>>> > > > > > >>>> Regarding the Python API release with the JAVA JARs, I think > the > > > > > >>> principle > > > > > >>>> of consideration is the convenience of the user. So, Thanks > for > > > the > > > > > >>>> explanation @Dian! > > > > > >>>> > > > > > >>>> And your right @Chesnay Schepler <[hidden email]> we > can't > > > > make > > > > > a > > > > > >>>> hasty decision and we need more people's opinions! > > > > > >>>> > > > > > >>>> So, I appreciate it if anyone can give us feedback and > > > suggestions! > > > > > >>>> > > > > > >>>> Best, > > > > > >>>> Jincheng > > > > > >>>> > > > > > >>>> > > > > > >>>> > > > > > >>>> > > > > > >>>> Chesnay Schepler <[hidden email]> 于2019年7月3日周三 下午8:46写道: > > > > > >>>> > > > > > >>>>> So this would not be a source release then, but a full-blown > > > binary > > > > > >>>>> release. > > > > > >>>>> > > > > > >>>>> Maybe it is just me, but I find it a bit suspect to ship an > > > entire > > > > > java > > > > > >>>>> application via PyPI, just because there's a Python API for > it. > > > > > >>>>> > > > > > >>>>> We definitely need input from more people here. > > > > > >>>>> > > > > > >>>>> On 03/07/2019 14:09, Dian Fu wrote: > > > > > >>>>>> Hi Chesnay, > > > > > >>>>>> > > > > > >>>>>> Thanks a lot for the suggestions. > > > > > >>>>>> > > > > > >>>>>> Regarding “distributing java/scala code to PyPI”: > > > > > >>>>>> The Python Table API is just a wrapper of the Java Table API > > and > > > > > >>>> without > > > > > >>>>> the java/scala code, two steps will be needed to set up an > > > > > environment > > > > > >>> to > > > > > >>>>> execute a Python Table API program: > > > > > >>>>>> 1) Install pyflink using "pip install apache-flink" > > > > > >>>>>> 2) Download the flink distribution and set the FLINK_HOME to > > it. > > > > > >>>>>> Besides, users have to make sure that the manually installed > > > Flink > > > > > is > > > > > >>>>> compatible with the pip installed pyflink. > > > > > >>>>>> Bundle the java/scala code inside the Python package will > > > > eliminate > > > > > >>>> step > > > > > >>>>> 2) and makes it more simple for users to install pyflink. > There > > > > was a > > > > > >>>> short > > > > > >>>>> discussion <https://issues.apache.org/jira/browse/SPARK-1267 > > > > on > > > > > this > > > > > >>> in > > > > > >>>>> Spark community and they finally decide to package the > > java/scala > > > > > code > > > > > >>> in > > > > > >>>>> the python package. (BTW, PySpark only bundle the jars of > scala > > > > > 2.11). > > > > > >>>>>> Regards, > > > > > >>>>>> Dian > > > > > >>>>>> > > > > > >>>>>>> 在 2019年7月3日,下午7:13,Chesnay Schepler <[hidden email]> > 写道: > > > > > >>>>>>> > > > > > >>>>>>> The existing artifact in the pyflink project was neither > > > released > > > > > by > > > > > >>>>> the Flink project / anyone affiliated with it nor approved by > > the > > > > > Flink > > > > > >>>> PMC. > > > > > >>>>>>> As such, if we were to use this account I believe we should > > > > delete > > > > > >>> it > > > > > >>>>> to not mislead users that this is in any way an > apache-provided > > > > > >>>>> distribution. Since this goes against the users wishes, I > would > > > be > > > > in > > > > > >>>> favor > > > > > >>>>> of creating a separate account, and giving back control over > > the > > > > > >>> pyflink > > > > > >>>>> account. > > > > > >>>>>>> My take on the raised points: > > > > > >>>>>>> 1.1) "apache-flink" > > > > > >>>>>>> 1.2) option 2 > > > > > >>>>>>> 2) Given that we only distribute python code there should > be > > no > > > > > >>> reason > > > > > >>>>> to differentiate between scala versions. We should not be > > > > > distributing > > > > > >>>> any > > > > > >>>>> java/scala code and/or modules to PyPi. Currently, I'm a bit > > > > confused > > > > > >>>> about > > > > > >>>>> this question and wonder what exactly we are trying to > publish > > > > here. > > > > > >>>>>>> 3) The should be treated as any other source release; i.e., > > it > > > > > >>> needs a > > > > > >>>>> LICENSE and NOTICE file, signatures and a PMC vote. My > > suggestion > > > > > would > > > > > >>>> be > > > > > >>>>> to make this part of our normal release process. There will > be > > > > _one_ > > > > > >>>> source > > > > > >>>>> release on dist.apache.org encompassing everything, and a > > > separate > > > > > >>>> python > > > > > >>>>> of focused source release that we push to PyPi. The LICENSE > and > > > > > NOTICE > > > > > >>>>> contained in the python source release must also be present > in > > > the > > > > > >>> source > > > > > >>>>> release of Flink; so basically the python source release is > > just > > > > the > > > > > >>>>> contents of flink-python module the maven pom.xml, with no > > other > > > > > >>> special > > > > > >>>>> sauce added during the release process. > > > > > >>>>>>> On 02/07/2019 05:42, jincheng sun wrote: > > > > > >>>>>>>> Hi all, > > > > > >>>>>>>> > > > > > >>>>>>>> With the effort of FLIP-38 [1], the Python Table > API(without > > > UDF > > > > > >>>>> support > > > > > >>>>>>>> for now) will be supported in the coming release-1.9. > > > > > >>>>>>>> As described in "Build PyFlink"[2], if users want to use > the > > > > > Python > > > > > >>>>> Table > > > > > >>>>>>>> API, they can manually install it using the command: > > > > > >>>>>>>> "cd flink-python && python3 setup.py sdist && pip install > > > > > >>>>> dist/*.tar.gz". > > > > > >>>>>>>> This is non-trivial for users and it will be better if we > > can > > > > > >>> follow > > > > > >>>>> the > > > > > >>>>>>>> Python way to publish PyFlink to PyPI > > > > > >>>>>>>> which is a repository of software for the Python > programming > > > > > >>>> language. > > > > > >>>>> Then > > > > > >>>>>>>> users can use the standard Python package > > > > > >>>>>>>> manager "pip" to install PyFlink: "pip install pyflink". > So, > > > > there > > > > > >>>> are > > > > > >>>>> some > > > > > >>>>>>>> topic need to be discussed as follows: > > > > > >>>>>>>> > > > > > >>>>>>>> 1. How to publish PyFlink to PyPI > > > > > >>>>>>>> > > > > > >>>>>>>> 1.1 Project Name > > > > > >>>>>>>> We need to decide the project name of PyPI to use, > for > > > > > >>> example, > > > > > >>>>>>>> apache-flink, pyflink, etc. > > > > > >>>>>>>> > > > > > >>>>>>>> Regarding to the name "pyflink", it has already been > > > > > >>> registered > > > > > >>>> by > > > > > >>>>>>>> @ueqt and there is already a package '1.0' released under > > this > > > > > >>>> project > > > > > >>>>>>>> which is based on flink-libraries/flink-python. > > > > > >>>>>>>> > > > > > >>>>>>>> @ueqt has kindly agreed to give this project back to > the > > > > > >>>>> community. And > > > > > >>>>>>>> he has requested that the released package '1.0' should > not > > be > > > > > >>>> removed > > > > > >>>>> as > > > > > >>>>>>>> it has already been used in their company. > > > > > >>>>>>>> > > > > > >>>>>>>> So we need to decide whether to use the name > 'pyflink'? > > > If > > > > > >>> yes, > > > > > >>>>> we > > > > > >>>>>>>> need to figure out how to tackle with the package '1.0' > > under > > > > this > > > > > >>>>> project. > > > > > >>>>>>>> From the points of my view, the "pyflink" is better > for > > > our > > > > > >>>>> project > > > > > >>>>>>>> name and we can keep the release of 1.0, maybe more people > > > want > > > > to > > > > > >>>> use. > > > > > >>>>>>>> 1.2 PyPI account for release > > > > > >>>>>>>> We need also decide on which account to use to > publish > > > > > >>> packages > > > > > >>>>> to PyPI. > > > > > >>>>>>>> There are two permissions in PyPI: owner and > > maintainer: > > > > > >>>>>>>> > > > > > >>>>>>>> 1) The owner can upload releases, delete files, > > releases > > > or > > > > > >>> the > > > > > >>>>> entire > > > > > >>>>>>>> project. > > > > > >>>>>>>> 2) The maintainer can also upload releases. However, > > they > > > > > >>> cannot > > > > > >>>>> delete > > > > > >>>>>>>> files, releases, or the project. > > > > > >>>>>>>> > > > > > >>>>>>>> So there are two options in my mind: > > > > > >>>>>>>> > > > > > >>>>>>>> 1) Create an account such as 'pyflink' as the owner > > share > > > > it > > > > > >>>> with > > > > > >>>>> all > > > > > >>>>>>>> the release managers and then release managers can publish > > the > > > > > >>>> package > > > > > >>>>> to > > > > > >>>>>>>> PyPI using this account. > > > > > >>>>>>>> 2) Create an account such as 'pyflink' as owner(only > > PMC > > > > can > > > > > >>>>> manage it) > > > > > >>>>>>>> and adds the release manager's account as maintainers of > the > > > > > >>> project. > > > > > >>>>>>>> Release managers publish the package to PyPI using their > own > > > > > >>> account. > > > > > >>>>>>>> As I know, PySpark takes Option 1) and Apache Beam > > takes > > > > > >>> Option > > > > > >>>>> 2). > > > > > >>>>>>>> From the points of my view, I prefer option 2) as > it's > > > > pretty > > > > > >>>>> safer as > > > > > >>>>>>>> it eliminate the risk of deleting old releases > occasionally > > > and > > > > at > > > > > >>>> the > > > > > >>>>> same > > > > > >>>>>>>> time keeps the trace of who is operating. > > > > > >>>>>>>> > > > > > >>>>>>>> 2. How to handle Scala_2.11 and Scala_2.12 > > > > > >>>>>>>> > > > > > >>>>>>>> The PyFlink package bundles the jars in the package. As we > > > know, > > > > > >>>> there > > > > > >>>>> are > > > > > >>>>>>>> two versions of jars for each module: one for Scala 2.11 > and > > > the > > > > > >>>> other > > > > > >>>>> for > > > > > >>>>>>>> Scala 2.12. So there will be two PyFlink packages > > > theoretically. > > > > > We > > > > > >>>>> need to > > > > > >>>>>>>> decide which one to publish to PyPI or both. If both > > packages > > > > will > > > > > >>> be > > > > > >>>>>>>> published to PyPI, we may need two projects, such as > > > pyflink_211 > > > > > >>> and > > > > > >>>>>>>> pyflink_212 separately. Maybe more in the future such as > > > > > >>> pyflink_213. > > > > > >>>>>>>> (BTW, I think we should bring up a discussion for > dorp > > > > > >>>> Scala_2.11 > > > > > >>>>> in > > > > > >>>>>>>> Flink 1.10 release due to 2.13 is available in early > June.) > > > > > >>>>>>>> > > > > > >>>>>>>> From the points of my view, for now, we can only > > release > > > > the > > > > > >>>>> scala_2.11 > > > > > >>>>>>>> version, due to scala_2.11 is our default version in > Flink. > > > > > >>>>>>>> > > > > > >>>>>>>> 3. Legal problems of publishing to PyPI > > > > > >>>>>>>> > > > > > >>>>>>>> As @Chesnay Schepler <[hidden email]> pointed out in > > > > > >>>>> FLINK-13011[3], > > > > > >>>>>>>> publishing PyFlink to PyPI means that we will publish > > binaries > > > > to > > > > > a > > > > > >>>>>>>> distribution channel not owned by Apache. We need to > figure > > > out > > > > if > > > > > >>>>> there > > > > > >>>>>>>> are legal problems. From my point of view, there are no > > > problems > > > > > >>> as a > > > > > >>>>> few > > > > > >>>>>>>> Apache projects such as Spark, Beam, etc have already done > > it. > > > > > >>>> Frankly > > > > > >>>>>>>> speaking, I am not familiar with this problem, welcome any > > > > > feedback > > > > > >>>> on > > > > > >>>>> this > > > > > >>>>>>>> if somebody is more family with this. > > > > > >>>>>>>> > > > > > >>>>>>>> Great thanks to @ueqt for willing to dedicate PyPI's > project > > > > name > > > > > >>>>> `pyflink` > > > > > >>>>>>>> to the Apache Flink community!!! > > > > > >>>>>>>> Great thanks to @Dian for the offline effort!!! > > > > > >>>>>>>> > > > > > >>>>>>>> Best, > > > > > >>>>>>>> Jincheng > > > > > >>>>>>>> > > > > > >>>>>>>> [1] > > > > > >>>>>>>> > > > > > >>> > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-38%3A+Python+Table+API > > > > > >>>>>>>> [2] > > > > > >>>>>>>> > > > > > >>> > > > > > > > > > > > > > > > https://ci.apache.org/projects/flink/flink-docs-master/flinkDev/building.html#build-pyflink > > > > > >>>>>>>> [3] https://issues.apache.org/jira/browse/FLINK-13011 > > > > > >>>>>>>> > > > > > >>>>> > > > > > >> > > > > > >> -- > > > > > >> Best Regards > > > > > >> > > > > > >> Jeff Zhang > > > > > > > > > > > > > > > > > > > > > > > > > > |
Free forum by Nabble | Edit this page |