Let me begin by noting that I obviously have a conflict of interest
since my company is a direct competitor to Cloudera. But as a mentor and Apache member I believe I need to bring this up. What is the Apache policy towards having a vendor specific package on a download site? It is strange to me to come to Flink's website and see packages for Flink with CDH (or HDP or MapR or whatever). We should avoid providing vendor specific packages. It gives the appearance of preferring one vendor over another, which Apache does not want to do. I have no problem at all with Cloudera hosting a CDH specific package of Flink, nor with Flink project members working with Cloudera to create such a package. But I do not think they should be hosted at Apache. Alan. -- Sent with Postbox <http://www.getpostbox.com> -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. |
I hope not surprisingly, I agree. (Backstory: I am at Cloudera.) I
have for example lobbied Spark to remove CDH-specific releases and build profiles. Not just for this reason, but because it is often unnecessary to have vendor-specific builds, and also just increases maintenance overhead for the project. Matei et al say they want to make it as easy as possible to consume Spark, and so provide vendor-build-specific artifacts and such here and there. To be fair, Spark tries to support a large range of Hadoop and YARN versions, and getting the right combination of profiles and versions right to recreate a vendor release was kind of hard until about Hadoop 2.2 (stable YARN really). I haven't heard of any formal policy. I would ask whether there are similar reasons to produce pre-packaged releases like so? On Fri, Aug 15, 2014 at 6:24 PM, Alan Gates <[hidden email]> wrote: > Let me begin by noting that I obviously have a conflict of interest since my > company is a direct competitor to Cloudera. But as a mentor and Apache > member I believe I need to bring this up. > > What is the Apache policy towards having a vendor specific package on a > download site? It is strange to me to come to Flink's website and see > packages for Flink with CDH (or HDP or MapR or whatever). We should avoid > providing vendor specific packages. It gives the appearance of preferring > one vendor over another, which Apache does not want to do. > > I have no problem at all with Cloudera hosting a CDH specific package of > Flink, nor with Flink project members working with Cloudera to create such a > package. But I do not think they should be hosted at Apache. > > Alan. > -- > Sent with Postbox <http://www.getpostbox.com> > > -- > CONFIDENTIALITY NOTICE > NOTICE: This message is intended for the use of the individual or entity to > which it is addressed and may contain information that is confidential, > privileged and exempt from disclosure under applicable law. If the reader of > this message is not the intended recipient, you are hereby notified that any > printing, copying, dissemination, distribution, disclosure or forwarding of > this communication is strictly prohibited. If you have received this > communication in error, please contact the sender immediately and delete it > from your system. Thank You. |
As a mentor, I agree that vendor specific packages aren't appropriate for
the Apache site. (Disclosure: I work at Hortonworks.) Working with the vendors to make packages available is great, but they shouldn't be hosted at Apache. .. Owen On Fri, Aug 15, 2014 at 10:32 AM, Sean Owen <[hidden email]> wrote: > I hope not surprisingly, I agree. (Backstory: I am at Cloudera.) I > have for example lobbied Spark to remove CDH-specific releases and > build profiles. Not just for this reason, but because it is often > unnecessary to have vendor-specific builds, and also just increases > maintenance overhead for the project. > > Matei et al say they want to make it as easy as possible to consume > Spark, and so provide vendor-build-specific artifacts and such here > and there. To be fair, Spark tries to support a large range of Hadoop > and YARN versions, and getting the right combination of profiles and > versions right to recreate a vendor release was kind of hard until > about Hadoop 2.2 (stable YARN really). > > I haven't heard of any formal policy. I would ask whether there are > similar reasons to produce pre-packaged releases like so? > > > On Fri, Aug 15, 2014 at 6:24 PM, Alan Gates <[hidden email]> wrote: > > Let me begin by noting that I obviously have a conflict of interest > since my > > company is a direct competitor to Cloudera. But as a mentor and Apache > > member I believe I need to bring this up. > > > > What is the Apache policy towards having a vendor specific package on a > > download site? It is strange to me to come to Flink's website and see > > packages for Flink with CDH (or HDP or MapR or whatever). We should > avoid > > providing vendor specific packages. It gives the appearance of > preferring > > one vendor over another, which Apache does not want to do. > > > > I have no problem at all with Cloudera hosting a CDH specific package of > > Flink, nor with Flink project members working with Cloudera to create > such a > > package. But I do not think they should be hosted at Apache. > > > > Alan. > > -- > > Sent with Postbox <http://www.getpostbox.com> > > > > -- > > CONFIDENTIALITY NOTICE > > NOTICE: This message is intended for the use of the individual or entity > to > > which it is addressed and may contain information that is confidential, > > privileged and exempt from disclosure under applicable law. If the > reader of > > this message is not the intended recipient, you are hereby notified that > any > > printing, copying, dissemination, distribution, disclosure or forwarding > of > > this communication is strictly prohibited. If you have received this > > communication in error, please contact the sender immediately and delete > it > > from your system. Thank You. > |
PS, sorry for being dense, but I don't see vendor packages at
http://flink.incubator.apache.org/downloads.html ? Is it this page? http://flink.incubator.apache.org/docs/0.6-SNAPSHOT/building.html That's more benign, just helping people rebuild for certain distros if desired. Can the example be generified to refer to a fictional "ACME Distribution"? But a note here and there about gotchas building for certain versions and combos seems reasonable. I also find this bit in the build script, although vendor-specific, is a small nice convenience for users: https://github.com/apache/incubator-flink/blob/master/pom.xml#L195 On Fri, Aug 15, 2014 at 7:01 PM, Owen O'Malley <[hidden email]> wrote: > As a mentor, I agree that vendor specific packages aren't appropriate for > the Apache site. (Disclosure: I work at Hortonworks.) Working with the > vendors to make packages available is great, but they shouldn't be hosted > at Apache. > > .. Owen > > > On Fri, Aug 15, 2014 at 10:32 AM, Sean Owen <[hidden email]> wrote: > >> I hope not surprisingly, I agree. (Backstory: I am at Cloudera.) I >> have for example lobbied Spark to remove CDH-specific releases and >> build profiles. Not just for this reason, but because it is often >> unnecessary to have vendor-specific builds, and also just increases >> maintenance overhead for the project. >> >> Matei et al say they want to make it as easy as possible to consume >> Spark, and so provide vendor-build-specific artifacts and such here >> and there. To be fair, Spark tries to support a large range of Hadoop >> and YARN versions, and getting the right combination of profiles and >> versions right to recreate a vendor release was kind of hard until >> about Hadoop 2.2 (stable YARN really). >> >> I haven't heard of any formal policy. I would ask whether there are >> similar reasons to produce pre-packaged releases like so? >> >> >> On Fri, Aug 15, 2014 at 6:24 PM, Alan Gates <[hidden email]> wrote: >> > Let me begin by noting that I obviously have a conflict of interest >> since my >> > company is a direct competitor to Cloudera. But as a mentor and Apache >> > member I believe I need to bring this up. >> > >> > What is the Apache policy towards having a vendor specific package on a >> > download site? It is strange to me to come to Flink's website and see >> > packages for Flink with CDH (or HDP or MapR or whatever). We should >> avoid >> > providing vendor specific packages. It gives the appearance of >> preferring >> > one vendor over another, which Apache does not want to do. >> > >> > I have no problem at all with Cloudera hosting a CDH specific package of >> > Flink, nor with Flink project members working with Cloudera to create >> such a >> > package. But I do not think they should be hosted at Apache. >> > >> > Alan. >> > -- >> > Sent with Postbox <http://www.getpostbox.com> >> > >> > -- >> > CONFIDENTIALITY NOTICE >> > NOTICE: This message is intended for the use of the individual or entity >> to >> > which it is addressed and may contain information that is confidential, >> > privileged and exempt from disclosure under applicable law. If the >> reader of >> > this message is not the intended recipient, you are hereby notified that >> any >> > printing, copying, dissemination, distribution, disclosure or forwarding >> of >> > this communication is strictly prohibited. If you have received this >> > communication in error, please contact the sender immediately and delete >> it >> > from your system. Thank You. >> |
In reply to this post by Owen O'Malley
Hi,
I'm glad you've brought this topic up. (Thank you also for checking the release!). I've used Spark's release script as a reference for creating ours (why reinventing the wheel, they have excellent infrastructure), and they had a CDH4 profile, so I thought its okay for Apache projects to have these special builds. Let me explain the technical background for this: (I hope all information here is correct, correct me if I'm wrong) There are two components inside Flink that have dependencies to Hadoop a) HDFS and b) YARN. Usually, users who have a Hadoop versions like 0.2x or 1.x can use our "hadoop1" builds. They contain the hadoop1 HDFS client and no YARN support. Users with old CDH versions (I guess pre 4), Hortonworks or MapR can also use these builds. For users that have newer vendor distributions (HDP2, CDH5, ...) can use our "hadoop2" build. It contains the newer HDFS client (protobuf-based RPC) and have support for the new YARN API (2.2.0 onwards). So the "hadoop1" and "hadoop2" builds cover probably most of the cases users have. Then, there is CDH4, which contains a "unreleased" Hadoop 2.0.0 version. It has the new HDFS client (protobuf), but the old YARN API (2.1.0-beta or so), which we don't support. Therefore, users can not use the "hadoop1" build (wrong HDFS client) and the "hadoop2" build is not compatible with YARN. If you have a look at the Spark downloads page, you'll find the following (apache-hosted?) binary builds: - For Hadoop 1 (HDP1, CDH3): find an Apache mirror <http://www.apache.org/dyn/closer.cgi/spark/spark-1.0.2/spark-1.0.2-bin-hadoop1.tgz> or direct file download <http://d3kbcqa49mib13.cloudfront.net/spark-1.0.2-bin-hadoop1.tgz> - For CDH4: find an Apache mirror <http://www.apache.org/dyn/closer.cgi/spark/spark-1.0.2/spark-1.0.2-bin-cdh4.tgz> or direct file download <http://d3kbcqa49mib13.cloudfront.net/spark-1.0.2-bin-cdh4.tgz> - For Hadoop 2 (HDP2, CDH5): find an Apache mirror <http://www.apache.org/dyn/closer.cgi/spark/spark-1.0.2/spark-1.0.2-bin-hadoop2.tgz> or direct file download <http://d3kbcqa49mib13.cloudfront.net/spark-1.0.2-bin-hadoop2.tgz> I think this choice of binaries reflects what I've explained above. I'm happy (if the others agree) to remove the cdh4 binary from the release and delay the discussion after the release. Best, Robert On Fri, Aug 15, 2014 at 8:01 PM, Owen O'Malley <[hidden email]> wrote: > As a mentor, I agree that vendor specific packages aren't appropriate for > the Apache site. (Disclosure: I work at Hortonworks.) Working with the > vendors to make packages available is great, but they shouldn't be hosted > at Apache. > > .. Owen > > > On Fri, Aug 15, 2014 at 10:32 AM, Sean Owen <[hidden email]> wrote: > > > I hope not surprisingly, I agree. (Backstory: I am at Cloudera.) I > > have for example lobbied Spark to remove CDH-specific releases and > > build profiles. Not just for this reason, but because it is often > > unnecessary to have vendor-specific builds, and also just increases > > maintenance overhead for the project. > > > > Matei et al say they want to make it as easy as possible to consume > > Spark, and so provide vendor-build-specific artifacts and such here > > and there. To be fair, Spark tries to support a large range of Hadoop > > and YARN versions, and getting the right combination of profiles and > > versions right to recreate a vendor release was kind of hard until > > about Hadoop 2.2 (stable YARN really). > > > > I haven't heard of any formal policy. I would ask whether there are > > similar reasons to produce pre-packaged releases like so? > > > > > > On Fri, Aug 15, 2014 at 6:24 PM, Alan Gates <[hidden email]> > wrote: > > > Let me begin by noting that I obviously have a conflict of interest > > since my > > > company is a direct competitor to Cloudera. But as a mentor and Apache > > > member I believe I need to bring this up. > > > > > > What is the Apache policy towards having a vendor specific package on a > > > download site? It is strange to me to come to Flink's website and see > > > packages for Flink with CDH (or HDP or MapR or whatever). We should > > avoid > > > providing vendor specific packages. It gives the appearance of > > preferring > > > one vendor over another, which Apache does not want to do. > > > > > > I have no problem at all with Cloudera hosting a CDH specific package > of > > > Flink, nor with Flink project members working with Cloudera to create > > such a > > > package. But I do not think they should be hosted at Apache. > > > > > > Alan. > > > -- > > > Sent with Postbox <http://www.getpostbox.com> > > > > > > -- > > > CONFIDENTIALITY NOTICE > > > NOTICE: This message is intended for the use of the individual or > entity > > to > > > which it is addressed and may contain information that is confidential, > > > privileged and exempt from disclosure under applicable law. If the > > reader of > > > this message is not the intended recipient, you are hereby notified > that > > any > > > printing, copying, dissemination, distribution, disclosure or > forwarding > > of > > > this communication is strictly prohibited. If you have received this > > > communication in error, please contact the sender immediately and > delete > > it > > > from your system. Thank You. > > > |
+1 to not holding the release on this. Since the release is only the
source*, if we later decide that CDH specific packages are ok we can add them in with no extra votes, etc. Alan. *Apache releases only source code. This is so that users, distributers, etc. can verify the integrity etc. of the code. Binary packages are a convenience only for users who are willing to trust us without seeing the internals of the code themselves. Alan. > I'm happy (if the others agree) to remove the cdh4 binary from the release > and delay the discussion after the release. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. |
In reply to this post by Sean Owen
Sorry, apparently this was
unclear, as others asked the same question. Flink hasn't had any
Apache releases yet. I was referring to the proposed release that
Robert sent out,
http://people.apache.org/~rmetzger/flink-0.6-incubating-rc7/
Alan.
--
Sent with Postbox CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. |
In reply to this post by Robert Metzger
Agree with Robert,
ASF only releases source code. So the binary packages is just convenience from Flink that targeted specific Hadoop vendors. If you look at Apache Spark download page [1], they also do the same thing by providing distro specific binaries. AFAIK this should NOT be a problem and especially should not block the release. Thanks, Henry [1] http://spark.apache.org/downloads.html On Fri, Aug 15, 2014 at 11:28 AM, Robert Metzger <[hidden email]> wrote: > Hi, > > I'm glad you've brought this topic up. (Thank you also for checking the > release!). > I've used Spark's release script as a reference for creating ours (why > reinventing the wheel, they have excellent infrastructure), and they had a > CDH4 profile, so I thought its okay for Apache projects to have these > special builds. > > Let me explain the technical background for this: (I hope all information > here is correct, correct me if I'm wrong) > There are two components inside Flink that have dependencies to Hadoop a) > HDFS and b) YARN. > > Usually, users who have a Hadoop versions like 0.2x or 1.x can use our > "hadoop1" builds. They contain the hadoop1 HDFS client and no YARN support. > Users with old CDH versions (I guess pre 4), Hortonworks or MapR can also > use these builds. > For users that have newer vendor distributions (HDP2, CDH5, ...) can use > our "hadoop2" build. It contains the newer HDFS client (protobuf-based RPC) > and have support for the new YARN API (2.2.0 onwards). > So the "hadoop1" and "hadoop2" builds cover probably most of the cases > users have. > Then, there is CDH4, which contains a "unreleased" Hadoop 2.0.0 version. It > has the new HDFS client (protobuf), but the old YARN API (2.1.0-beta or > so), which we don't support. Therefore, users can not use the "hadoop1" > build (wrong HDFS client) and the "hadoop2" build is not compatible with > YARN. > > If you have a look at the Spark downloads page, you'll find the following > (apache-hosted?) binary builds: > > > > - For Hadoop 1 (HDP1, CDH3): find an Apache mirror > <http://www.apache.org/dyn/closer.cgi/spark/spark-1.0.2/spark-1.0.2-bin-hadoop1.tgz> > or direct file download > <http://d3kbcqa49mib13.cloudfront.net/spark-1.0.2-bin-hadoop1.tgz> > - For CDH4: find an Apache mirror > <http://www.apache.org/dyn/closer.cgi/spark/spark-1.0.2/spark-1.0.2-bin-cdh4.tgz> > or direct file download > <http://d3kbcqa49mib13.cloudfront.net/spark-1.0.2-bin-cdh4.tgz> > - For Hadoop 2 (HDP2, CDH5): find an Apache mirror > <http://www.apache.org/dyn/closer.cgi/spark/spark-1.0.2/spark-1.0.2-bin-hadoop2.tgz> > or direct file download > <http://d3kbcqa49mib13.cloudfront.net/spark-1.0.2-bin-hadoop2.tgz> > > > I think this choice of binaries reflects what I've explained above. > > I'm happy (if the others agree) to remove the cdh4 binary from the release > and delay the discussion after the release. > > Best, > Robert > > > > > On Fri, Aug 15, 2014 at 8:01 PM, Owen O'Malley <[hidden email]> wrote: > >> As a mentor, I agree that vendor specific packages aren't appropriate for >> the Apache site. (Disclosure: I work at Hortonworks.) Working with the >> vendors to make packages available is great, but they shouldn't be hosted >> at Apache. >> >> .. Owen >> >> >> On Fri, Aug 15, 2014 at 10:32 AM, Sean Owen <[hidden email]> wrote: >> >> > I hope not surprisingly, I agree. (Backstory: I am at Cloudera.) I >> > have for example lobbied Spark to remove CDH-specific releases and >> > build profiles. Not just for this reason, but because it is often >> > unnecessary to have vendor-specific builds, and also just increases >> > maintenance overhead for the project. >> > >> > Matei et al say they want to make it as easy as possible to consume >> > Spark, and so provide vendor-build-specific artifacts and such here >> > and there. To be fair, Spark tries to support a large range of Hadoop >> > and YARN versions, and getting the right combination of profiles and >> > versions right to recreate a vendor release was kind of hard until >> > about Hadoop 2.2 (stable YARN really). >> > >> > I haven't heard of any formal policy. I would ask whether there are >> > similar reasons to produce pre-packaged releases like so? >> > >> > >> > On Fri, Aug 15, 2014 at 6:24 PM, Alan Gates <[hidden email]> >> wrote: >> > > Let me begin by noting that I obviously have a conflict of interest >> > since my >> > > company is a direct competitor to Cloudera. But as a mentor and Apache >> > > member I believe I need to bring this up. >> > > >> > > What is the Apache policy towards having a vendor specific package on a >> > > download site? It is strange to me to come to Flink's website and see >> > > packages for Flink with CDH (or HDP or MapR or whatever). We should >> > avoid >> > > providing vendor specific packages. It gives the appearance of >> > preferring >> > > one vendor over another, which Apache does not want to do. >> > > >> > > I have no problem at all with Cloudera hosting a CDH specific package >> of >> > > Flink, nor with Flink project members working with Cloudera to create >> > such a >> > > package. But I do not think they should be hosted at Apache. >> > > >> > > Alan. >> > > -- >> > > Sent with Postbox <http://www.getpostbox.com> >> > > >> > > -- >> > > CONFIDENTIALITY NOTICE >> > > NOTICE: This message is intended for the use of the individual or >> entity >> > to >> > > which it is addressed and may contain information that is confidential, >> > > privileged and exempt from disclosure under applicable law. If the >> > reader of >> > > this message is not the intended recipient, you are hereby notified >> that >> > any >> > > printing, copying, dissemination, distribution, disclosure or >> forwarding >> > of >> > > this communication is strictly prohibited. If you have received this >> > > communication in error, please contact the sender immediately and >> delete >> > it >> > > from your system. Thank You. >> > >> |
In reply to this post by Sean Owen
Hi Sean, I don't think Flink has done with a release yet.
We are trying to do several RCs to get the one that good enough to ve voted on. - Henry On Fri, Aug 15, 2014 at 11:26 AM, Sean Owen <[hidden email]> wrote: > PS, sorry for being dense, but I don't see vendor packages at > http://flink.incubator.apache.org/downloads.html ? > > Is it this page? > http://flink.incubator.apache.org/docs/0.6-SNAPSHOT/building.html > > That's more benign, just helping people rebuild for certain distros if > desired. Can the example be generified to refer to a fictional "ACME > Distribution"? But a note here and there about gotchas building for > certain versions and combos seems reasonable. > > I also find this bit in the build script, although vendor-specific, is > a small nice convenience for users: > https://github.com/apache/incubator-flink/blob/master/pom.xml#L195 > > On Fri, Aug 15, 2014 at 7:01 PM, Owen O'Malley <[hidden email]> wrote: >> As a mentor, I agree that vendor specific packages aren't appropriate for >> the Apache site. (Disclosure: I work at Hortonworks.) Working with the >> vendors to make packages available is great, but they shouldn't be hosted >> at Apache. >> >> .. Owen >> >> >> On Fri, Aug 15, 2014 at 10:32 AM, Sean Owen <[hidden email]> wrote: >> >>> I hope not surprisingly, I agree. (Backstory: I am at Cloudera.) I >>> have for example lobbied Spark to remove CDH-specific releases and >>> build profiles. Not just for this reason, but because it is often >>> unnecessary to have vendor-specific builds, and also just increases >>> maintenance overhead for the project. >>> >>> Matei et al say they want to make it as easy as possible to consume >>> Spark, and so provide vendor-build-specific artifacts and such here >>> and there. To be fair, Spark tries to support a large range of Hadoop >>> and YARN versions, and getting the right combination of profiles and >>> versions right to recreate a vendor release was kind of hard until >>> about Hadoop 2.2 (stable YARN really). >>> >>> I haven't heard of any formal policy. I would ask whether there are >>> similar reasons to produce pre-packaged releases like so? >>> >>> >>> On Fri, Aug 15, 2014 at 6:24 PM, Alan Gates <[hidden email]> wrote: >>> > Let me begin by noting that I obviously have a conflict of interest >>> since my >>> > company is a direct competitor to Cloudera. But as a mentor and Apache >>> > member I believe I need to bring this up. >>> > >>> > What is the Apache policy towards having a vendor specific package on a >>> > download site? It is strange to me to come to Flink's website and see >>> > packages for Flink with CDH (or HDP or MapR or whatever). We should >>> avoid >>> > providing vendor specific packages. It gives the appearance of >>> preferring >>> > one vendor over another, which Apache does not want to do. >>> > >>> > I have no problem at all with Cloudera hosting a CDH specific package of >>> > Flink, nor with Flink project members working with Cloudera to create >>> such a >>> > package. But I do not think they should be hosted at Apache. >>> > >>> > Alan. >>> > -- >>> > Sent with Postbox <http://www.getpostbox.com> >>> > >>> > -- >>> > CONFIDENTIALITY NOTICE >>> > NOTICE: This message is intended for the use of the individual or entity >>> to >>> > which it is addressed and may contain information that is confidential, >>> > privileged and exempt from disclosure under applicable law. If the >>> reader of >>> > this message is not the intended recipient, you are hereby notified that >>> any >>> > printing, copying, dissemination, distribution, disclosure or forwarding >>> of >>> > this communication is strictly prohibited. If you have received this >>> > communication in error, please contact the sender immediately and delete >>> it >>> > from your system. Thank You. >>> |
In reply to this post by Alan Gates
Ah sorry Alan, did not see your reply to Owen.
Mea culpa from me. - Henry On Fri, Aug 15, 2014 at 2:15 PM, Alan Gates <[hidden email]> wrote: > Sorry, apparently this was unclear, as others asked the same question. > Flink hasn't had any Apache releases yet. I was referring to the proposed > release that Robert sent out, > http://people.apache.org/~rmetzger/flink-0.6-incubating-rc7/ > > Alan. > > Sean Owen <[hidden email]> > August 15, 2014 at 11:26 AM > PS, sorry for being dense, but I don't see vendor packages at > http://flink.incubator.apache.org/downloads.html ? > > Is it this page? > http://flink.incubator.apache.org/docs/0.6-SNAPSHOT/building.html > > That's more benign, just helping people rebuild for certain distros if > desired. Can the example be generified to refer to a fictional "ACME > Distribution"? But a note here and there about gotchas building for > certain versions and combos seems reasonable. > > I also find this bit in the build script, although vendor-specific, is > a small nice convenience for users: > https://github.com/apache/incubator-flink/blob/master/pom.xml#L195 > Owen O'Malley <[hidden email]> > August 15, 2014 at 11:01 AM > As a mentor, I agree that vendor specific packages aren't appropriate for > the Apache site. (Disclosure: I work at Hortonworks.) Working with the > vendors to make packages available is great, but they shouldn't be hosted > at Apache. > > .. Owen > > > > Sean Owen <[hidden email]> > August 15, 2014 at 10:32 AM > I hope not surprisingly, I agree. (Backstory: I am at Cloudera.) I > have for example lobbied Spark to remove CDH-specific releases and > build profiles. Not just for this reason, but because it is often > unnecessary to have vendor-specific builds, and also just increases > maintenance overhead for the project. > > Matei et al say they want to make it as easy as possible to consume > Spark, and so provide vendor-build-specific artifacts and such here > and there. To be fair, Spark tries to support a large range of Hadoop > and YARN versions, and getting the right combination of profiles and > versions right to recreate a vendor release was kind of hard until > about Hadoop 2.2 (stable YARN really). > > I haven't heard of any formal policy. I would ask whether there are > similar reasons to produce pre-packaged releases like so? > > Alan Gates <[hidden email]> > August 15, 2014 at 10:24 AM > Let me begin by noting that I obviously have a conflict of interest since > my company is a direct competitor to Cloudera. But as a mentor and Apache > member I believe I need to bring this up. > > What is the Apache policy towards having a vendor specific package on a > download site? It is strange to me to come to Flink's website and see > packages for Flink with CDH (or HDP or MapR or whatever). We should avoid > providing vendor specific packages. It gives the appearance of preferring > one vendor over another, which Apache does not want to do. > > I have no problem at all with Cloudera hosting a CDH specific package of > Flink, nor with Flink project members working with Cloudera to create such > a package. But I do not think they should be hosted at Apache. > > Alan. > > > -- > Sent with Postbox <http://www.getpostbox.com> > > CONFIDENTIALITY NOTICE > NOTICE: This message is intended for the use of the individual or entity > to which it is addressed and may contain information that is confidential, > privileged and exempt from disclosure under applicable law. If the reader > of this message is not the intended recipient, you are hereby notified that > any printing, copying, dissemination, distribution, disclosure or > forwarding of this communication is strictly prohibited. If you have > received this communication in error, please contact the sender immediately > and delete it from your system. Thank You. > |
Hi,
I think we all agree that our project benefits from providing pre-compiled binaries for different hadoop distributions. I've drafted an extension of the current download page, that I would suggest to use after the release: http://i.imgur.com/MucW2HD.png As you can see, users can directly pick the Flink version they want (its not going to show the CDH4 package there) or they can choose from the table with the most popular (in my opinion) vendor distributions. The different links still point to the "hadoop1", "hadoop2" binaries, but I don't think this is highlighting any hadoop vendors. What do you think? On Fri, Aug 15, 2014 at 11:45 PM, Henry Saputra <[hidden email]> wrote: > Ah sorry Alan, did not see your reply to Owen. > > Mea culpa from me. > > - Henry > > > On Fri, Aug 15, 2014 at 2:15 PM, Alan Gates <[hidden email]> wrote: > > > Sorry, apparently this was unclear, as others asked the same question. > > Flink hasn't had any Apache releases yet. I was referring to the > proposed > > release that Robert sent out, > > http://people.apache.org/~rmetzger/flink-0.6-incubating-rc7/ > > > > Alan. > > > > Sean Owen <[hidden email]> > > August 15, 2014 at 11:26 AM > > PS, sorry for being dense, but I don't see vendor packages at > > http://flink.incubator.apache.org/downloads.html ? > > > > Is it this page? > > http://flink.incubator.apache.org/docs/0.6-SNAPSHOT/building.html > > > > That's more benign, just helping people rebuild for certain distros if > > desired. Can the example be generified to refer to a fictional "ACME > > Distribution"? But a note here and there about gotchas building for > > certain versions and combos seems reasonable. > > > > I also find this bit in the build script, although vendor-specific, is > > a small nice convenience for users: > > https://github.com/apache/incubator-flink/blob/master/pom.xml#L195 > > Owen O'Malley <[hidden email]> > > August 15, 2014 at 11:01 AM > > As a mentor, I agree that vendor specific packages aren't appropriate for > > the Apache site. (Disclosure: I work at Hortonworks.) Working with the > > vendors to make packages available is great, but they shouldn't be hosted > > at Apache. > > > > .. Owen > > > > > > > > Sean Owen <[hidden email]> > > August 15, 2014 at 10:32 AM > > I hope not surprisingly, I agree. (Backstory: I am at Cloudera.) I > > have for example lobbied Spark to remove CDH-specific releases and > > build profiles. Not just for this reason, but because it is often > > unnecessary to have vendor-specific builds, and also just increases > > maintenance overhead for the project. > > > > Matei et al say they want to make it as easy as possible to consume > > Spark, and so provide vendor-build-specific artifacts and such here > > and there. To be fair, Spark tries to support a large range of Hadoop > > and YARN versions, and getting the right combination of profiles and > > versions right to recreate a vendor release was kind of hard until > > about Hadoop 2.2 (stable YARN really). > > > > I haven't heard of any formal policy. I would ask whether there are > > similar reasons to produce pre-packaged releases like so? > > > > Alan Gates <[hidden email]> > > August 15, 2014 at 10:24 AM > > Let me begin by noting that I obviously have a conflict of interest > since > > my company is a direct competitor to Cloudera. But as a mentor and > Apache > > member I believe I need to bring this up. > > > > What is the Apache policy towards having a vendor specific package on a > > download site? It is strange to me to come to Flink's website and see > > packages for Flink with CDH (or HDP or MapR or whatever). We should > avoid > > providing vendor specific packages. It gives the appearance of > preferring > > one vendor over another, which Apache does not want to do. > > > > I have no problem at all with Cloudera hosting a CDH specific package of > > Flink, nor with Flink project members working with Cloudera to create > such > > a package. But I do not think they should be hosted at Apache. > > > > Alan. > > > > > > -- > > Sent with Postbox <http://www.getpostbox.com> > > > > CONFIDENTIALITY NOTICE > > NOTICE: This message is intended for the use of the individual or entity > > to which it is addressed and may contain information that is > confidential, > > privileged and exempt from disclosure under applicable law. If the reader > > of this message is not the intended recipient, you are hereby notified > that > > any printing, copying, dissemination, distribution, disclosure or > > forwarding of this communication is strictly prohibited. If you have > > received this communication in error, please contact the sender > immediately > > and delete it from your system. Thank You. > > > |
The approach seems fair in the way it presents all vendors equally and
still offers user a convenient way to get started. I personally like it, but I cannot say in how far this is compliant with Apache policies. |
In reply to this post by Robert Metzger
My concern with this is it
appears to put Apache in the business of picking the right Hadoop
vendors. What about IBM, Pivotal, etc.? I get that the actual desire
here is to make things easy for users, and that the original three
packages offered (Hadoop1, CDH4, Hadoop2) will cover 95% of users. I
like that. I just don't know how to do this and avoid the appearance of
favoritism.
Perhaps the next best step is to ask on incubator-general and see if there is an Apache wide policy or if there needs to be one. Alan.
--
Sent with Postbox CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. |
Vendor X may be slightly against having two Flink-for-X distributions --
their own and another on a site/project they may not control. Are all these builds really needed? meaning, does a generic Hadoop 2.x build not work on some or most of these? I'd hope so. Might keep things simpler for everyone. For example, are the "CDH5" and "HDP2.1" builds not really just roughly "Hadoop 2.4" builds? If 2.4 needs its own profile so be it, but it need not be so specific to a flavor. How about some simple steps to at least de-emphasize vendor builds? like a separate page or pop-down panel? I can understand wanting to make it as simple as possible to access the right build straight away, since these distros don't have Flink yet of course. And hey, we make concessions in OSS to different versions of Java or Linux vs Windows all the time. The bright line isn't clear. Perhaps: take steps to treat this more as a special case, and produce these types of builds only where needed? where a non-trivial number of potential users will have trouble consuming the project without a tweak, create a special release on the side? On Mon, Aug 18, 2014 at 5:05 PM, Alan Gates <[hidden email]> wrote: > My concern with this is it appears to put Apache in the business of > picking the right Hadoop vendors. What about IBM, Pivotal, etc.? I get > that the actual desire here is to make things easy for users, and that the > original three packages offered (Hadoop1, CDH4, Hadoop2) will cover 95% of > users. I like that. I just don't know how to do this and avoid the > appearance of favoritism. > > Perhaps the next best step is to ask on incubator-general and see if there > is an Apache wide policy or if there needs to be one. > > Alan. > > Robert Metzger <[hidden email]> > August 18, 2014 at 6:54 > Hi, > > I think we all agree that our project benefits from providing pre-compiled > binaries for different hadoop distributions. > > I've drafted an extension of the current download page, that I would > suggest to use after the release: http://i.imgur.com/MucW2HD.png > As you can see, users can directly pick the Flink version they want (its > not going to show the CDH4 package there) or they can choose from the table > with the most popular (in my opinion) vendor distributions. > The different links still point to the "hadoop1", "hadoop2" binaries, but I > don't think this is highlighting any hadoop vendors. > > What do you think? > > > On Fri, Aug 15, 2014 at 11:45 PM, Henry Saputra <[hidden email]> > <[hidden email]> > > Henry Saputra <[hidden email]> > August 15, 2014 at 14:45 > Ah sorry Alan, did not see your reply to Owen. > > Mea culpa from me. > > - Henry > > > > Alan Gates <[hidden email]> > August 15, 2014 at 14:15 > Sorry, apparently this was unclear, as others asked the same question. > Flink hasn't had any Apache releases yet. I was referring to the proposed > release that Robert sent out, > http://people.apache.org/~rmetzger/flink-0.6-incubating-rc7/ > > Alan. > > > Sean Owen <[hidden email]> > August 15, 2014 at 11:26 > PS, sorry for being dense, but I don't see vendor packages at > http://flink.incubator.apache.org/downloads.html ? > > Is it this page? > http://flink.incubator.apache.org/docs/0.6-SNAPSHOT/building.html > > That's more benign, just helping people rebuild for certain distros if > desired. Can the example be generified to refer to a fictional "ACME > Distribution"? But a note here and there about gotchas building for > certain versions and combos seems reasonable. > > I also find this bit in the build script, although vendor-specific, is > a small nice convenience for users: > https://github.com/apache/incubator-flink/blob/master/pom.xml#L195 > Owen O'Malley <[hidden email]> > August 15, 2014 at 11:01 > As a mentor, I agree that vendor specific packages aren't appropriate for > the Apache site. (Disclosure: I work at Hortonworks.) Working with the > vendors to make packages available is great, but they shouldn't be hosted > at Apache. > > .. Owen > > > > > -- > Sent with Postbox <http://www.getpostbox.com> > > CONFIDENTIALITY NOTICE > NOTICE: This message is intended for the use of the individual or entity > to which it is addressed and may contain information that is confidential, > privileged and exempt from disclosure under applicable law. If the reader > of this message is not the intended recipient, you are hereby notified that > any printing, copying, dissemination, distribution, disclosure or > forwarding of this communication is strictly prohibited. If you have > received this communication in error, please contact the sender immediately > and delete it from your system. Thank You. > |
I think the main problem was that CDH4 is a non standard build. All others
we tried worked with hadoop-1.2 and 2.2/2.4 builds. But I understand your points. So, instead of creating those packages, we can make a guide "how to pick the right distribution", which points you to the hadoop-1.2 and 2.2/2.4 builds. For some cases, the guide will ask you to "compile-your-own". On Mon, Aug 18, 2014 at 6:30 PM, Sean Owen <[hidden email]> wrote: > Vendor X may be slightly against having two Flink-for-X distributions -- > their own and another on a site/project they may not control. > > Are all these builds really needed? meaning, does a generic Hadoop 2.x > build not work on some or most of these? I'd hope so. Might keep things > simpler for everyone. For example, are the "CDH5" and "HDP2.1" builds not > really just roughly "Hadoop 2.4" builds? If 2.4 needs its own profile so be > it, but it need not be so specific to a flavor. > > How about some simple steps to at least de-emphasize vendor builds? like a > separate page or pop-down panel? > > I can understand wanting to make it as simple as possible to access the > right build straight away, since these distros don't have Flink yet of > course. > > And hey, we make concessions in OSS to different versions of Java or Linux > vs Windows all the time. The bright line isn't clear. > > Perhaps: take steps to treat this more as a special case, and produce these > types of builds only where needed? where a non-trivial number of potential > users will have trouble consuming the project without a tweak, create a > special release on the side? > > > > > > > > On Mon, Aug 18, 2014 at 5:05 PM, Alan Gates <[hidden email]> wrote: > > > My concern with this is it appears to put Apache in the business of > > picking the right Hadoop vendors. What about IBM, Pivotal, etc.? I get > > that the actual desire here is to make things easy for users, and that > the > > original three packages offered (Hadoop1, CDH4, Hadoop2) will cover 95% > of > > users. I like that. I just don't know how to do this and avoid the > > appearance of favoritism. > > > > Perhaps the next best step is to ask on incubator-general and see if > there > > is an Apache wide policy or if there needs to be one. > > > > Alan. > > > > Robert Metzger <[hidden email]> > > August 18, 2014 at 6:54 > > Hi, > > > > I think we all agree that our project benefits from providing > pre-compiled > > binaries for different hadoop distributions. > > > > I've drafted an extension of the current download page, that I would > > suggest to use after the release: http://i.imgur.com/MucW2HD.png > > As you can see, users can directly pick the Flink version they want (its > > not going to show the CDH4 package there) or they can choose from the > table > > with the most popular (in my opinion) vendor distributions. > > The different links still point to the "hadoop1", "hadoop2" binaries, > but I > > don't think this is highlighting any hadoop vendors. > > > > What do you think? > > > > > > On Fri, Aug 15, 2014 at 11:45 PM, Henry Saputra <[hidden email] > > > > <[hidden email]> > > > > Henry Saputra <[hidden email]> > > August 15, 2014 at 14:45 > > Ah sorry Alan, did not see your reply to Owen. > > > > Mea culpa from me. > > > > - Henry > > > > > > > > Alan Gates <[hidden email]> > > August 15, 2014 at 14:15 > > Sorry, apparently this was unclear, as others asked the same question. > > Flink hasn't had any Apache releases yet. I was referring to the > proposed > > release that Robert sent out, > > http://people.apache.org/~rmetzger/flink-0.6-incubating-rc7/ > > > > Alan. > > > > > > Sean Owen <[hidden email]> > > August 15, 2014 at 11:26 > > PS, sorry for being dense, but I don't see vendor packages at > > http://flink.incubator.apache.org/downloads.html ? > > > > Is it this page? > > http://flink.incubator.apache.org/docs/0.6-SNAPSHOT/building.html > > > > That's more benign, just helping people rebuild for certain distros if > > desired. Can the example be generified to refer to a fictional "ACME > > Distribution"? But a note here and there about gotchas building for > > certain versions and combos seems reasonable. > > > > I also find this bit in the build script, although vendor-specific, is > > a small nice convenience for users: > > https://github.com/apache/incubator-flink/blob/master/pom.xml#L195 > > Owen O'Malley <[hidden email]> > > August 15, 2014 at 11:01 > > As a mentor, I agree that vendor specific packages aren't appropriate for > > the Apache site. (Disclosure: I work at Hortonworks.) Working with the > > vendors to make packages available is great, but they shouldn't be hosted > > at Apache. > > > > .. Owen > > > > > > > > > > -- > > Sent with Postbox <http://www.getpostbox.com> > > > > CONFIDENTIALITY NOTICE > > NOTICE: This message is intended for the use of the individual or entity > > to which it is addressed and may contain information that is > confidential, > > privileged and exempt from disclosure under applicable law. If the reader > > of this message is not the intended recipient, you are hereby notified > that > > any printing, copying, dissemination, distribution, disclosure or > > forwarding of this communication is strictly prohibited. If you have > > received this communication in error, please contact the sender > immediately > > and delete it from your system. Thank You. > > > |
It's probably the same thing as with Spark. Spark doesn't actually
work with YARN 'beta'-era releases, but works 'stable' and specially supports 'alpha'. CDH 4.{2-4} or so == YARN 'beta' (not non-standard, but, is probably the only distro of it you'll still run into in circulation). (And so it's kind of unhelpful that Spark has build instructions for CDH 4.2 + YARN.) Yeah, that's the thing you may handle as a corner case, or not handle and punt to the vendor. But even that -- if that's the same issue -- it's not a question of supporting a vendor but a Hadoop version combo. On Mon, Aug 18, 2014 at 5:43 PM, Stephan Ewen <[hidden email]> wrote: > I think the main problem was that CDH4 is a non standard build. All others > we tried worked with hadoop-1.2 and 2.2/2.4 builds. > > But I understand your points. > > So, instead of creating those packages, we can make a guide "how to pick > the right distribution", which points you to the hadoop-1.2 and 2.2/2.4 > builds. For some cases, the guide will ask you to "compile-your-own". > |
As for Flink, for now the additional CDH4 packaged binary is to
support "non-standard" Hadoop version that some customers may already have. Based on "not a question of supporting a vendor but a Hadoop version combo.", would the approach that Flink had done to help customers get go and running quickly seemed fair and good idea? There had been a lot of discussion about ASF release artifacts and the consistent answer is that ASF validate release of source code and not binaries. Release of binaries only used to help customers, which is the case that Flink is doing with different Hadoop versions. - Henry On Mon, Aug 18, 2014 at 9:51 AM, Sean Owen <[hidden email]> wrote: > It's probably the same thing as with Spark. Spark doesn't actually > work with YARN 'beta'-era releases, but works 'stable' and specially > supports 'alpha'. CDH 4.{2-4} or so == YARN 'beta' (not non-standard, > but, is probably the only distro of it you'll still run into in > circulation). (And so it's kind of unhelpful that Spark has build > instructions for CDH 4.2 + YARN.) Yeah, that's the thing you may > handle as a corner case, or not handle and punt to the vendor. But > even that -- if that's the same issue -- it's not a question of > supporting a vendor but a Hadoop version combo. > > On Mon, Aug 18, 2014 at 5:43 PM, Stephan Ewen <[hidden email]> wrote: >> I think the main problem was that CDH4 is a non standard build. All others >> we tried worked with hadoop-1.2 and 2.2/2.4 builds. >> >> But I understand your points. >> >> So, instead of creating those packages, we can make a guide "how to pick >> the right distribution", which points you to the hadoop-1.2 and 2.2/2.4 >> builds. For some cases, the guide will ask you to "compile-your-own". >> |
I like Sean's idea very much: Creating the three packages (Hadoop 1.x,
Hadoop 2.x, Hadoop 2.0 with Yarn beta). Any objections to creating a help site that says "For that vendor with this version pick the following binary release" ? Stephan > >> On Mon, Aug 18, 2014 at 5:58 PM, Henry Saputra <[hidden email]> > wrote: > >>> As for Flink, for now the additional CDH4 packaged binary is to > >>> support "non-standard" Hadoop version that some customers may already > >>> have. > >>> > >>> Based on "not a question of supporting a vendor but a Hadoop version > >>> combo.", would the approach that Flink had done to help customers get > >>> go and running quickly seemed fair and good idea? > >>> > >>> There had been a lot of discussion about ASF release artifacts and the > >>> consistent answer is that ASF validate release of source code and not > >>> binaries. > >>> Release of binaries only used to help customers, which is the case > >>> that Flink is doing with different Hadoop versions. > >>> > >>> - Henry > >>> > >>> On Mon, Aug 18, 2014 at 9:51 AM, Sean Owen <[hidden email]> wrote: > >>>> It's probably the same thing as with Spark. Spark doesn't actually > >>>> work with YARN 'beta'-era releases, but works 'stable' and specially > >>>> supports 'alpha'. CDH 4.{2-4} or so == YARN 'beta' (not non-standard, > >>>> but, is probably the only distro of it you'll still run into in > >>>> circulation). (And so it's kind of unhelpful that Spark has build > >>>> instructions for CDH 4.2 + YARN.) Yeah, that's the thing you may > >>>> handle as a corner case, or not handle and punt to the vendor. But > >>>> even that -- if that's the same issue -- it's not a question of > >>>> supporting a vendor but a Hadoop version combo. > >>>> > >>>> On Mon, Aug 18, 2014 at 5:43 PM, Stephan Ewen <[hidden email]> > wrote: > >>>>> I think the main problem was that CDH4 is a non standard build. All > others > >>>>> we tried worked with hadoop-1.2 and 2.2/2.4 builds. > >>>>> > >>>>> But I understand your points. > >>>>> > >>>>> So, instead of creating those packages, we can make a guide "how to > pick > >>>>> the right distribution", which points you to the hadoop-1.2 and > 2.2/2.4 > >>>>> builds. For some cases, the guide will ask you to "compile-your-own". > >>>>> > |
Supporting the Hadoop 2.0 (not 2.2) YARN API would be a lot of coding
effort. There was a huge API change between the two versions. Maybe we can find a technical solution to this political/legal problem: I'm going to build and try a Flink version against the "2.1.1-beta" (or similar) (official Apache Hadoop release) and see if that's working as well with CDH4. Then, we can provide a non-vendor specific binary that still solves the problem for our users. Our problem is not as severe as for Spark, since they have (in my understanding) support for both YARN APIs. So our issue with the CDH4 / Hadoop 2.1-beta is only related to the HDFS client, not the whole YARN API. On Mon, Aug 18, 2014 at 7:43 PM, Stephan Ewen <[hidden email]> wrote: > I like Sean's idea very much: Creating the three packages (Hadoop 1.x, > Hadoop 2.x, Hadoop 2.0 with Yarn beta). > > Any objections to creating a help site that says "For that vendor with this > version pick the following binary release" ? > > Stephan > > > > > >> On Mon, Aug 18, 2014 at 5:58 PM, Henry Saputra < > [hidden email]> > > wrote: > > >>> As for Flink, for now the additional CDH4 packaged binary is to > > >>> support "non-standard" Hadoop version that some customers may already > > >>> have. > > >>> > > >>> Based on "not a question of supporting a vendor but a Hadoop version > > >>> combo.", would the approach that Flink had done to help customers get > > >>> go and running quickly seemed fair and good idea? > > >>> > > >>> There had been a lot of discussion about ASF release artifacts and > the > > >>> consistent answer is that ASF validate release of source code and not > > >>> binaries. > > >>> Release of binaries only used to help customers, which is the case > > >>> that Flink is doing with different Hadoop versions. > > >>> > > >>> - Henry > > >>> > > >>> On Mon, Aug 18, 2014 at 9:51 AM, Sean Owen <[hidden email]> wrote: > > >>>> It's probably the same thing as with Spark. Spark doesn't actually > > >>>> work with YARN 'beta'-era releases, but works 'stable' and specially > > >>>> supports 'alpha'. CDH 4.{2-4} or so == YARN 'beta' (not > non-standard, > > >>>> but, is probably the only distro of it you'll still run into in > > >>>> circulation). (And so it's kind of unhelpful that Spark has build > > >>>> instructions for CDH 4.2 + YARN.) Yeah, that's the thing you may > > >>>> handle as a corner case, or not handle and punt to the vendor. But > > >>>> even that -- if that's the same issue -- it's not a question of > > >>>> supporting a vendor but a Hadoop version combo. > > >>>> > > >>>> On Mon, Aug 18, 2014 at 5:43 PM, Stephan Ewen <[hidden email]> > > wrote: > > >>>>> I think the main problem was that CDH4 is a non standard build. All > > others > > >>>>> we tried worked with hadoop-1.2 and 2.2/2.4 builds. > > >>>>> > > >>>>> But I understand your points. > > >>>>> > > >>>>> So, instead of creating those packages, we can make a guide "how to > > pick > > >>>>> the right distribution", which points you to the hadoop-1.2 and > > 2.2/2.4 > > >>>>> builds. For some cases, the guide will ask you to > "compile-your-own". > > >>>>> > > > |
In reply to this post by Stephan Ewen
No objections. That seems like a good way to help our users while avoiding
the appearance of favoring one vendor over another. Alan. On Mon, Aug 18, 2014 at 10:43 AM, Stephan Ewen <[hidden email]> wrote: > I like Sean's idea very much: Creating the three packages (Hadoop 1.x, > Hadoop 2.x, Hadoop 2.0 with Yarn beta). > > Any objections to creating a help site that says "For that vendor with this > version pick the following binary release" ? > > Stephan > > > > > >> On Mon, Aug 18, 2014 at 5:58 PM, Henry Saputra < > [hidden email]> > > wrote: > > >>> As for Flink, for now the additional CDH4 packaged binary is to > > >>> support "non-standard" Hadoop version that some customers may already > > >>> have. > > >>> > > >>> Based on "not a question of supporting a vendor but a Hadoop version > > >>> combo.", would the approach that Flink had done to help customers get > > >>> go and running quickly seemed fair and good idea? > > >>> > > >>> There had been a lot of discussion about ASF release artifacts and > the > > >>> consistent answer is that ASF validate release of source code and not > > >>> binaries. > > >>> Release of binaries only used to help customers, which is the case > > >>> that Flink is doing with different Hadoop versions. > > >>> > > >>> - Henry > > >>> > > >>> On Mon, Aug 18, 2014 at 9:51 AM, Sean Owen <[hidden email]> wrote: > > >>>> It's probably the same thing as with Spark. Spark doesn't actually > > >>>> work with YARN 'beta'-era releases, but works 'stable' and specially > > >>>> supports 'alpha'. CDH 4.{2-4} or so == YARN 'beta' (not > non-standard, > > >>>> but, is probably the only distro of it you'll still run into in > > >>>> circulation). (And so it's kind of unhelpful that Spark has build > > >>>> instructions for CDH 4.2 + YARN.) Yeah, that's the thing you may > > >>>> handle as a corner case, or not handle and punt to the vendor. But > > >>>> even that -- if that's the same issue -- it's not a question of > > >>>> supporting a vendor but a Hadoop version combo. > > >>>> > > >>>> On Mon, Aug 18, 2014 at 5:43 PM, Stephan Ewen <[hidden email]> > > wrote: > > >>>>> I think the main problem was that CDH4 is a non standard build. All > > others > > >>>>> we tried worked with hadoop-1.2 and 2.2/2.4 builds. > > >>>>> > > >>>>> But I understand your points. > > >>>>> > > >>>>> So, instead of creating those packages, we can make a guide "how to > > pick > > >>>>> the right distribution", which points you to the hadoop-1.2 and > > 2.2/2.4 > > >>>>> builds. For some cases, the guide will ask you to > "compile-your-own". > > >>>>> > > > -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. |
Free forum by Nabble | Edit this page |