[DISCUSS] Dashboard/HistoryServer authentication

classic Classic list List threaded Threaded
26 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Dashboard/HistoryServer authentication

Konstantin Knauf-4
Hi everyone,

sorry for joining late and thanks for the insightful discussion.

In general, I'd personally prefer not to increase the surface area of
Apache Flink unless there is a good reason. It seems we all agree that
authx is not part of the core value proposition of Apache Flink, so if we
can delegate this problem to a more specialized tool, I am in favor of
that. Apache Flink is already huge and a lot of work goes into maintenance,
so I personally have become more sensitive to this aspect over time.

If we add support for Basic Auth and Kerberos now, users will sooner or
later ask for OIDC, LDAP, SAML,... I acknowledge that Kerberos is widely
used in the corporate, on-premises context, but isn't the focus moving more
towards more web-friendly standards like OIDC/OAuth 2.0? If we only want to
support a single protocol, there is an argument to be made that it should
be OIDC and Dex [1,2] as a bridge to everything else. Have OIDC or OAuth2
been considered instead of Kerberos? How do you see the market moving? But
as I said before, in my opinion we can generate more value by investing
into other areas of Apache Flink.

Authorization also has the potential to become more fine-grained and
complex over time: you already mentioned restricting the actions that a
specific user can do in a cluster.

Cheers,

Konstantin

[1] https://github.com/dexidp/dex
[2] https://github.com/dexidp/dex/issues/1903


On Wed, Jun 16, 2021 at 11:44 AM Gabor Somogyi <[hidden email]>
wrote:

> Hi Till,
>
> Did you have the chance to take a look at the doc? Not yet seen any update.
>
> BR,
> G
>
>
> On Wed, Jun 9, 2021 at 1:43 PM Till Rohrmann <[hidden email]> wrote:
>
> > Thanks for the update Gabor. I'll take a look and respond in the
> document.
> >
> > Cheers,
> > Till
> >
> > On Wed, Jun 9, 2021 at 12:59 PM Gabor Somogyi <[hidden email]
> >
> > wrote:
> >
> >> Hi Till,
> >>
> >> Your proxy suggestion has been considered in-depth and updated the FLIP
> >> accordingly.
> >> We've considered 2 proxy implementation (Nginx and Squid) but according
> >> to our analysis and testing it's not suitable for the mentioned
> use-cases.
> >> Please take a look at the rejected alternatives for detailed
> explanation.
> >>
> >> Thanks for your time in advance!
> >>
> >> BR,
> >> G
> >>
> >>
> >> On Fri, Jun 4, 2021 at 3:31 PM Till Rohrmann <[hidden email]>
> >> wrote:
> >>
> >>> As I've said I am not a security expert and that's why I have to ask
> for
> >>> clarification, Gabor. You are saying that if we configure a truststore
> for
> >>> the REST endpoint with a single trusted certificate which has been
> >>> generated by the operator of the Flink cluster, then the attacker can
> >>> generate a new certificate, sign it and then talk to the Flink cluster
> if
> >>> he has access to the node on which the REST endpoint runs? My
> understanding
> >>> was that you need the corresponding private key which in my proposed
> setup
> >>> would be under the control of the operator as well (e.g. stored in a
> >>> keystore on the same machine but guarded by some secret). That way (if
> I am
> >>> not mistaken), only the entity which has access to the keystore is
> able to
> >>> talk to the Flink cluster.
> >>>
> >>> Maybe we are also getting our wires crossed here and are talking about
> >>> different things.
> >>>
> >>> Thanks for listing the pros and cons of Kerberos. Concerning what other
> >>> authentication mechanisms are used in the industry, I am not 100% sure.
> >>>
> >>> Cheers,
> >>> Till
> >>>
> >>> On Fri, Jun 4, 2021 at 11:09 AM Gabor Somogyi <
> [hidden email]>
> >>> wrote:
> >>>
> >>>> > I did not mean for the user to sign its own certificates but for the
> >>>> operator of the cluster. Once the user request hits the proxy, it
> should no
> >>>> longer be under his control. I think I do not fully understand yet
> why this
> >>>> would not work.
> >>>> I said it's not solving the authentication problem over any proxy.
> Even
> >>>> if the operator is signing the certificate one can have access to an
> >>>> internal node.
> >>>> Such case anybody can craft certificates which is accepted by the
> >>>> server. When it's accepted a bad guy can cancel jobs causing huge
> impacts.
> >>>>
> >>>> > Also, I am missing a bit the comparison of Kerberos to other
> >>>> authentication mechanisms and why they were rejected in favour of
> Kerberos.
> >>>> PROS:
> >>>> * Since it's not depending on cloud provider and/or k8s or bare-metal
> >>>> etc. deployment it's the biggest plus
> >>>> * Centralized with tools and no need to write tons of tools around
> >>>> * There are clients/tools on almost all OS-es and several languages
> >>>> * Super huge users are using it for years in production w/o huge
> issues
> >>>> * Provides cross-realm trust possibility amongst other features
> >>>> * Several open source components using it which could increase
> >>>> compatibility
> >>>>
> >>>> CONS:
> >>>> * Not everybody using kerberos
> >>>> * It would increase the code footprint but this is true for many
> >>>> features (as a side note I'm here to maintain it)
> >>>>
> >>>> Feel free to add your points because it only represents a single
> >>>> viewpoint.
> >>>> Also if you have any better option for strong authentication please
> >>>> share it and we can consider the pros/cons here.
> >>>>
> >>>> BR,
> >>>> G
> >>>>
> >>>>
> >>>> On Fri, Jun 4, 2021 at 10:32 AM Till Rohrmann <[hidden email]>
> >>>> wrote:
> >>>>
> >>>>> I did not mean for the user to sign its own certificates but for the
> >>>>> operator of the cluster. Once the user request hits the proxy, it
> should no
> >>>>> longer be under his control. I think I do not fully understand yet
> why this
> >>>>> would not work.
> >>>>>
> >>>>> What I would like to avoid is to add more complexity into Flink if
> >>>>> there is an easy solution which fulfills the requirements. That's
> why I
> >>>>> would like to exercise thoroughly through the different
> alternatives. Also,
> >>>>> I am missing a bit the comparison of Kerberos to other authentication
> >>>>> mechanisms and why they were rejected in favour of Kerberos.
> >>>>>
> >>>>> Cheers,
> >>>>> Till
> >>>>>
> >>>>> On Fri, Jun 4, 2021 at 10:26 AM Gyula Fóra <[hidden email]>
> wrote:
> >>>>>
> >>>>>> Hi!
> >>>>>>
> >>>>>> I think there might be possible alternatives but it seems Kerberos
> on
> >>>>>> the rest endpoint ticks all the right boxes and provides a super
> clean and
> >>>>>> simple solution for strong authentication.
> >>>>>>
> >>>>>> I wouldn’t even consider sidecar proxies etc if we can solve it in
> >>>>>> such a simple way as proposed by G.
> >>>>>>
> >>>>>> Cheers
> >>>>>> Gyula
> >>>>>>
> >>>>>> On Fri, 4 Jun 2021 at 10:03, Till Rohrmann <[hidden email]>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> I am not saying that we shouldn't add a strong authentication
> >>>>>>> mechanism if there are good reasons for it. I primarily would like
> to
> >>>>>>> understand the context a bit better in order to give qualified
> feedback and
> >>>>>>> come to a good decision. In order to do this, I have the feeling
> that we
> >>>>>>> haven't fully considered all available options which are on the
> table, tbh.
> >>>>>>>
> >>>>>>> Does the problem of certificate expiry also apply for self-signed
> >>>>>>> certificates? If yes, then this should then also be a problem for
> the
> >>>>>>> internal encryption of Flink's communication. If not, then one
> could use
> >>>>>>> self-signed certificates with a longer validity to solve the
> mentioned
> >>>>>>> issue.
> >>>>>>>
> >>>>>>> I think you can set up Flink in such a way that you don't have to
> >>>>>>> handle all the different certificates. For example, you could
> deploy Flink
> >>>>>>> with a "sidecar proxy" which is responsible for the authentication
> using an
> >>>>>>> arbitrary method (e.g. Kerberos) and then bind the REST endpoint
> to a local
> >>>>>>> network interface. That way, the REST endpoint would only be
> available
> >>>>>>> through the sidecar proxy. Additionally, one could enable SSL for
> this
> >>>>>>> communication. Would this be a solution for the problem?
> >>>>>>>
> >>>>>>> Cheers,
> >>>>>>> Till
> >>>>>>>
> >>>>>>> On Thu, Jun 3, 2021 at 10:46 PM Márton Balassi <
> >>>>>>> [hidden email]> wrote:
> >>>>>>>
> >>>>>>>> That is an interesting idea, Till.
> >>>>>>>>
> >>>>>>>> The main issue with it is that TLS certificates have an expiration
> >>>>>>>> time, usually they get approved for a couple years. Forcing our
> users to
> >>>>>>>> restart jobs to reprovision TLS certificates would be weird when
> we could
> >>>>>>>> just implement a single proper strong authentication mechanism
> instead in a
> >>>>>>>> couple hundred lines of code. :-)
> >>>>>>>>
> >>>>>>>> In many cases it is also impractical to go the TLS mutual route,
> >>>>>>>> because the Flink Dashboard can end up on any node in the
> k8s/Yarn cluster
> >>>>>>>> which means that we need a certificate per node (due to the
> mutual auth),
> >>>>>>>> but if we also want to protect the private key of these from users
> >>>>>>>> accidentally or intentionally leaking them then we need this per
> user. As
> >>>>>>>> in we end up managing user*machine number certificates and having
> to renew
> >>>>>>>> them periodically, which albeit automatable is unfortunately not
> yet
> >>>>>>>> automated in all large organizations.
> >>>>>>>>
> >>>>>>>> I fully agree that TLS certificate mutual authentication has its
> >>>>>>>> nice properties, especially at very large (multiple thousand
> node) clusters
> >>>>>>>> - but it has its own challenges too. Thanks for bringing it up.
> >>>>>>>>
> >>>>>>>> Happy to have this added to the rejected alternative list so that
> >>>>>>>> we have the full picture documented.
> >>>>>>>>
> >>>>>>>> On Thu, Jun 3, 2021 at 5:52 PM Till Rohrmann <
> [hidden email]>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> I guess the idea would then be to let the proxy do the
> >>>>>>>>> authentication job and only forward the request via an SSL
> mutually
> >>>>>>>>> encrypted connection to the Flink cluster. Would this be
> possible? The
> >>>>>>>>> beauty of this setup is in my opinion that this setup should
> work with all
> >>>>>>>>> kinds of authentication mechanisms.
> >>>>>>>>>
> >>>>>>>>> Cheers,
> >>>>>>>>> Till
> >>>>>>>>>
> >>>>>>>>> On Thu, Jun 3, 2021 at 3:12 PM Gabor Somogyi <
> >>>>>>>>> [hidden email]> wrote:
> >>>>>>>>>
> >>>>>>>>>> Thanks for giving options to fulfil the need.
> >>>>>>>>>>
> >>>>>>>>>> Users are looking for a solution where users can be identified
> on
> >>>>>>>>>> the whole cluster and restrict access to resources/actions.
> >>>>>>>>>> A good example for such an action is cancelling other users
> >>>>>>>>>> running jobs.
> >>>>>>>>>>
> >>>>>>>>>> * SSL does provide mutual authentication but when authentication
> >>>>>>>>>> passed there is no user based on restrictions can be made.
> >>>>>>>>>> * The less problematic part is that generating/maintaining short
> >>>>>>>>>> time valid certificates would be a hard (that's the reason KDC
> like servers
> >>>>>>>>>> exist).
> >>>>>>>>>> Having long time valid certificates would widen the attack
> >>>>>>>>>> surface but since the first concern is there this is just a
> cosmetic issue.
> >>>>>>>>>>
> >>>>>>>>>> All in all using TLS certificates is not sufficient in these
> >>>>>>>>>> environments unfortunately.
> >>>>>>>>>>
> >>>>>>>>>> BR,
> >>>>>>>>>> G
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On Thu, Jun 3, 2021 at 12:49 PM Till Rohrmann <
> >>>>>>>>>> [hidden email]> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> Thanks for the information Gabor. If it is about securing the
> >>>>>>>>>>> communication between the REST client and the REST server,
> then Flink
> >>>>>>>>>>> already supports enabling mutual SSL authentication [1]. Would
> this be
> >>>>>>>>>>> enough to secure the communication and to pass an audit?
> >>>>>>>>>>>
> >>>>>>>>>>> [1]
> >>>>>>>>>>>
> https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/security/security-ssl/#external--rest-connectivity
> >>>>>>>>>>>
> >>>>>>>>>>> Cheers,
> >>>>>>>>>>> Till
> >>>>>>>>>>>
> >>>>>>>>>>> On Thu, Jun 3, 2021 at 10:33 AM Gabor Somogyi <
> >>>>>>>>>>> [hidden email]> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> Hi Till,
> >>>>>>>>>>>>
> >>>>>>>>>>>> Since I'm working in security area 10+ years let me share my
> >>>>>>>>>>>> thought.
> >>>>>>>>>>>> I would like to emphasise there are experts better than me but
> >>>>>>>>>>>> I have some
> >>>>>>>>>>>> basics.
> >>>>>>>>>>>> The discussion is open and not trying to tell alone things...
> >>>>>>>>>>>>
> >>>>>>>>>>>> > I mean if an attacker can get access to one of the machines,
> >>>>>>>>>>>> then it
> >>>>>>>>>>>> should also be possible to obtain the right Kerberos token.
> >>>>>>>>>>>> Not necessarily. For example if one gets access to a specific
> >>>>>>>>>>>> user's
> >>>>>>>>>>>> credentials then it's not possible to compromise other user's
> >>>>>>>>>>>> jobs, data,
> >>>>>>>>>>>> etc...
> >>>>>>>>>>>> Security is like an onion, the more layers has been added the
> >>>>>>>>>>>> more time an
> >>>>>>>>>>>> attacker needs to proceed.
> >>>>>>>>>>>> At the end of the day if one is in, then most probably can
> find
> >>>>>>>>>>>> the way but
> >>>>>>>>>>>> this time is normally enough to sysadmins or security experts
> to
> >>>>>>>>>>>> close down the system and minimize the damage.
> >>>>>>>>>>>>
> >>>>>>>>>>>> The other thing is that all tokens has a timeout and if the
> >>>>>>>>>>>> token is
> >>>>>>>>>>>> invalid then the attacker can't proceed further.
> >>>>>>>>>>>>
> >>>>>>>>>>>> > Is Kerberos also the standard authentication protocol for
> >>>>>>>>>>>> Kubernetes
> >>>>>>>>>>>> deployments?
> >>>>>>>>>>>> Kerberos is an industry standard which is cloud/deployment
> >>>>>>>>>>>> agnostic and it
> >>>>>>>>>>>> can be used in any deployments including k8s.
> >>>>>>>>>>>> The main intention is to use kerberos in k8s deployments too
> >>>>>>>>>>>> since we're
> >>>>>>>>>>>> going this direction as well.
> >>>>>>>>>>>> Please see how Spark does this:
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> https://spark.apache.org/docs/latest/security.html#secure-interaction-with-kubernetes
> >>>>>>>>>>>>
> >>>>>>>>>>>> Last but not least the most important reason to add at least
> >>>>>>>>>>>> one strong
> >>>>>>>>>>>> authentication is that we have users who has
> >>>>>>>>>>>> hard requirements on this. They're doing security audits and
> if
> >>>>>>>>>>>> they fail
> >>>>>>>>>>>> then it's deal breaking.
> >>>>>>>>>>>> That is why we have added kerberos at the first place.
> >>>>>>>>>>>> Unfortunately we
> >>>>>>>>>>>> can't name them in this public list, however
> >>>>>>>>>>>> the customers who specifically asked for this were mainly in
> >>>>>>>>>>>> the banking
> >>>>>>>>>>>> and telco sector.
> >>>>>>>>>>>>
> >>>>>>>>>>>> BR,
> >>>>>>>>>>>> G
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Thu, Jun 3, 2021 at 9:20 AM Till Rohrmann <
> >>>>>>>>>>>> [hidden email]> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>> > Thanks for updating the document Márton. Why is it that
> banks
> >>>>>>>>>>>> will
> >>>>>>>>>>>> > consider it more secure if Flink comes with Kerberos
> >>>>>>>>>>>> authentication
> >>>>>>>>>>>> > (assuming a properly secured setup)? I mean if an attacker
> >>>>>>>>>>>> can get access
> >>>>>>>>>>>> > to one of the machines, then it should also be possible to
> >>>>>>>>>>>> obtain the right
> >>>>>>>>>>>> > Kerberos token.
> >>>>>>>>>>>> >
> >>>>>>>>>>>> > I am not an authentication expert and that's why I wanted to
> >>>>>>>>>>>> ask what are
> >>>>>>>>>>>> > other authentication protocols other than Kerberos? Why did
> >>>>>>>>>>>> we select
> >>>>>>>>>>>> > Kerberos and not any other authentication protocol? Maybe
> you
> >>>>>>>>>>>> can list the
> >>>>>>>>>>>> > pros and cons for the different protocols. Is Kerberos also
> >>>>>>>>>>>> the standard
> >>>>>>>>>>>> > authentication protocol for Kubernetes deployments? If not,
> >>>>>>>>>>>> what would be
> >>>>>>>>>>>> > the answer when deploying on K8s?
> >>>>>>>>>>>> >
> >>>>>>>>>>>> > Cheers,
> >>>>>>>>>>>> > Till
> >>>>>>>>>>>> >
> >>>>>>>>>>>> > On Wed, Jun 2, 2021 at 12:07 PM Gabor Somogyi <
> >>>>>>>>>>>> [hidden email]>
> >>>>>>>>>>>> > wrote:
> >>>>>>>>>>>> >
> >>>>>>>>>>>> >> Hi team,
> >>>>>>>>>>>> >>
> >>>>>>>>>>>> >> Happy to be here and hope I can provide quality additions
> in
> >>>>>>>>>>>> the future.
> >>>>>>>>>>>> >>
> >>>>>>>>>>>> >> Thank you all for helpful the suggestions!
> >>>>>>>>>>>> >> Considering them the FLIP has been modified and the work
> >>>>>>>>>>>> continues on the
> >>>>>>>>>>>> >> already existing Jira.
> >>>>>>>>>>>> >>
> >>>>>>>>>>>> >> BR,
> >>>>>>>>>>>> >> G
> >>>>>>>>>>>> >>
> >>>>>>>>>>>> >>
> >>>>>>>>>>>> >> On Wed, Jun 2, 2021 at 11:23 AM Márton Balassi <
> >>>>>>>>>>>> [hidden email]>
> >>>>>>>>>>>> >> wrote:
> >>>>>>>>>>>> >>
> >>>>>>>>>>>> >>> Thanks, Chesney - I totally missed that. Answered on the
> >>>>>>>>>>>> ticket too, let
> >>>>>>>>>>>> >>> us continue there then.
> >>>>>>>>>>>> >>>
> >>>>>>>>>>>> >>> Till, I agree that we should keep this codepath as slim as
> >>>>>>>>>>>> possible. It
> >>>>>>>>>>>> >>> is an important design decision that we aim to keep the
> >>>>>>>>>>>> list of
> >>>>>>>>>>>> >>> authentication protocols to a minimum. We believe that
> this
> >>>>>>>>>>>> should not be a
> >>>>>>>>>>>> >>> primary concern of Flink and a trusted proxy service (for
> >>>>>>>>>>>> example Apache
> >>>>>>>>>>>> >>> Knox) should be used to enable a multitude of enduser
> >>>>>>>>>>>> authentication
> >>>>>>>>>>>> >>> mechanisms. The bare minimum of authentication mechanisms
> >>>>>>>>>>>> to support
> >>>>>>>>>>>> >>> consequently consist of a single strong authentication
> >>>>>>>>>>>> protocol for which
> >>>>>>>>>>>> >>> Kerberos is the enterprise solution and HTTP Basic primary
> >>>>>>>>>>>> for development
> >>>>>>>>>>>> >>> and light-weight scenarios.
> >>>>>>>>>>>> >>>
> >>>>>>>>>>>> >>> Added the above wording to G's doc.
> >>>>>>>>>>>> >>>
> >>>>>>>>>>>> >>>
> >>>>>>>>>>>>
> https://docs.google.com/document/d/1NMPeJ9H0G49TGy3AzTVVJVKmYC0okwOtqLTSPnGqzHw/edit
> >>>>>>>>>>>> >>>
> >>>>>>>>>>>> >>>
> >>>>>>>>>>>> >>>
> >>>>>>>>>>>> >>> On Tue, Jun 1, 2021 at 11:47 AM Chesnay Schepler <
> >>>>>>>>>>>> [hidden email]>
> >>>>>>>>>>>> >>> wrote:
> >>>>>>>>>>>> >>>
> >>>>>>>>>>>> >>>> There's a related effort:
> >>>>>>>>>>>> >>>> https://issues.apache.org/jira/browse/FLINK-21108
> >>>>>>>>>>>> >>>>
> >>>>>>>>>>>> >>>> On 6/1/2021 10:14 AM, Till Rohrmann wrote:
> >>>>>>>>>>>> >>>> > Hi Gabor, welcome to the Flink community!
> >>>>>>>>>>>> >>>> >
> >>>>>>>>>>>> >>>> > Thanks for sharing this proposal with the community
> >>>>>>>>>>>> Márton. In
> >>>>>>>>>>>> >>>> general, I
> >>>>>>>>>>>> >>>> > agree that authentication is missing and that this is
> >>>>>>>>>>>> required for
> >>>>>>>>>>>> >>>> using
> >>>>>>>>>>>> >>>> > Flink within an enterprise. The thing I am wondering is
> >>>>>>>>>>>> whether this
> >>>>>>>>>>>> >>>> > feature strictly needs to be implemented inside of
> Flink
> >>>>>>>>>>>> or whether a
> >>>>>>>>>>>> >>>> proxy
> >>>>>>>>>>>> >>>> > setup could do the job? Have you considered this
> option?
> >>>>>>>>>>>> If yes, then
> >>>>>>>>>>>> >>>> it
> >>>>>>>>>>>> >>>> > would be good to list it under the point of rejected
> >>>>>>>>>>>> alternatives.
> >>>>>>>>>>>> >>>> >
> >>>>>>>>>>>> >>>> > I do see the benefit of implementing this feature
> inside
> >>>>>>>>>>>> of Flink if
> >>>>>>>>>>>> >>>> many
> >>>>>>>>>>>> >>>> > users need it. If not, then it might be easier for the
> >>>>>>>>>>>> project to not
> >>>>>>>>>>>> >>>> > increase the surface area since it makes the overall
> >>>>>>>>>>>> maintenance
> >>>>>>>>>>>> >>>> harder.
> >>>>>>>>>>>> >>>> >
> >>>>>>>>>>>> >>>> > Cheers,
> >>>>>>>>>>>> >>>> > Till
> >>>>>>>>>>>> >>>> >
> >>>>>>>>>>>> >>>> > On Mon, May 31, 2021 at 4:57 PM Márton Balassi <
> >>>>>>>>>>>> [hidden email]>
> >>>>>>>>>>>> >>>> wrote:
> >>>>>>>>>>>> >>>> >
> >>>>>>>>>>>> >>>> >> Hi team,
> >>>>>>>>>>>> >>>> >>
> >>>>>>>>>>>> >>>> >> Firstly I would like to introduce Gabor or G [1] for
> >>>>>>>>>>>> short to the
> >>>>>>>>>>>> >>>> >> community, he is a Spark committer who has recently
> >>>>>>>>>>>> transitioned to
> >>>>>>>>>>>> >>>> the
> >>>>>>>>>>>> >>>> >> Flink Engineering team at Cloudera and is looking
> >>>>>>>>>>>> forward to
> >>>>>>>>>>>> >>>> contributing
> >>>>>>>>>>>> >>>> >> to Apache Flink. Previously G primarily focused on
> >>>>>>>>>>>> Spark Streaming
> >>>>>>>>>>>> >>>> and
> >>>>>>>>>>>> >>>> >> security.
> >>>>>>>>>>>> >>>> >>
> >>>>>>>>>>>> >>>> >> Based on requests from our customers G has implemented
> >>>>>>>>>>>> Kerberos and
> >>>>>>>>>>>> >>>> HTTP
> >>>>>>>>>>>> >>>> >> Basic Authentication for the Flink Dashboard and
> >>>>>>>>>>>> HistoryServer.
> >>>>>>>>>>>> >>>> Previously
> >>>>>>>>>>>> >>>> >> lacked an authentication story.
> >>>>>>>>>>>> >>>> >>
> >>>>>>>>>>>> >>>> >> We are looking to contribute this functionality back
> to
> >>>>>>>>>>>> the
> >>>>>>>>>>>> >>>> community, we
> >>>>>>>>>>>> >>>> >> believe that given Flink's maturity there should be a
> >>>>>>>>>>>> common code
> >>>>>>>>>>>> >>>> solution
> >>>>>>>>>>>> >>>> >> for this general pattern.
> >>>>>>>>>>>> >>>> >>
> >>>>>>>>>>>> >>>> >> We are looking forward to your feedback on G's design.
> >>>>>>>>>>>> [2]
> >>>>>>>>>>>> >>>> >>
> >>>>>>>>>>>> >>>> >> [1] http://gaborsomogyi.com/
> >>>>>>>>>>>> >>>> >> [2]
> >>>>>>>>>>>> >>>> >>
> >>>>>>>>>>>> >>>> >>
> >>>>>>>>>>>> >>>>
> >>>>>>>>>>>>
> https://docs.google.com/document/d/1NMPeJ9H0G49TGy3AzTVVJVKmYC0okwOtqLTSPnGqzHw/edit
> >>>>>>>>>>>> >>>> >>
> >>>>>>>>>>>> >>>>
> >>>>>>>>>>>> >>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
>


--

Konstantin Knauf

https://twitter.com/snntrable

https://github.com/knaufk
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Dashboard/HistoryServer authentication

Gabor Somogyi
Hi Konstantin,

Thanks for the response. Related new feature introduction in case of Basic
auth I tend to agree, anything else can be chosen.

However representing Kerberos as completely new feature is not true because
it's already in since Flink makes authentication at least with HDFS and
Hbase through Kerberos.
The main problem with the actual Kerberos implementation is that it
contains several bugs and only partially implemented. Following your
suggestion can we agree that we
skip the Basic auth implementation and finish an already started Kerberos
story by adding History Server and Job Dashboard authentication?

Adding OIDC or OAuth2 has the exact same concerns what you've guys just
raised. Why exactly these? If you think this would be beneficial we can
discuss it in detail
but as a side story it would be good to finish a halfway done Kerberos
story.

Related authorization you've mentioned it can be complicated over time. Can
you show us an example? We've knowledge with couple of open source
components
but authorization was never a horror complex story. I personally have the
most experience with Spark which I think is quite simple and stable. Users
can be viewers/admins
and jobs started by others can't be modified. If you can share an example
over-complication we can discuss on facts.

Thank you in advance!

BR,
G


On Wed, Jun 16, 2021 at 5:42 PM Konstantin Knauf <[hidden email]> wrote:

> Hi everyone,
>
> sorry for joining late and thanks for the insightful discussion.
>
> In general, I'd personally prefer not to increase the surface area of
> Apache Flink unless there is a good reason. It seems we all agree that
> authx is not part of the core value proposition of Apache Flink, so if we
> can delegate this problem to a more specialized tool, I am in favor of
> that. Apache Flink is already huge and a lot of work goes into maintenance,
> so I personally have become more sensitive to this aspect over time.
>
> If we add support for Basic Auth and Kerberos now, users will sooner or
> later ask for OIDC, LDAP, SAML,... I acknowledge that Kerberos is widely
> used in the corporate, on-premises context, but isn't the focus moving more
> towards more web-friendly standards like OIDC/OAuth 2.0? If we only want to
> support a single protocol, there is an argument to be made that it should
> be OIDC and Dex [1,2] as a bridge to everything else. Have OIDC or OAuth2
> been considered instead of Kerberos? How do you see the market moving? But
> as I said before, in my opinion we can generate more value by investing
> into other areas of Apache Flink.
>
> Authorization also has the potential to become more fine-grained and
> complex over time: you already mentioned restricting the actions that a
> specific user can do in a cluster.
>
> Cheers,
>
> Konstantin
>
> [1] https://github.com/dexidp/dex
> [2] https://github.com/dexidp/dex/issues/1903
>
>
> On Wed, Jun 16, 2021 at 11:44 AM Gabor Somogyi <[hidden email]>
> wrote:
>
>> Hi Till,
>>
>> Did you have the chance to take a look at the doc? Not yet seen any
>> update.
>>
>> BR,
>> G
>>
>>
>> On Wed, Jun 9, 2021 at 1:43 PM Till Rohrmann <[hidden email]>
>> wrote:
>>
>> > Thanks for the update Gabor. I'll take a look and respond in the
>> document.
>> >
>> > Cheers,
>> > Till
>> >
>> > On Wed, Jun 9, 2021 at 12:59 PM Gabor Somogyi <
>> [hidden email]>
>> > wrote:
>> >
>> >> Hi Till,
>> >>
>> >> Your proxy suggestion has been considered in-depth and updated the FLIP
>> >> accordingly.
>> >> We've considered 2 proxy implementation (Nginx and Squid) but according
>> >> to our analysis and testing it's not suitable for the mentioned
>> use-cases.
>> >> Please take a look at the rejected alternatives for detailed
>> explanation.
>> >>
>> >> Thanks for your time in advance!
>> >>
>> >> BR,
>> >> G
>> >>
>> >>
>> >> On Fri, Jun 4, 2021 at 3:31 PM Till Rohrmann <[hidden email]>
>> >> wrote:
>> >>
>> >>> As I've said I am not a security expert and that's why I have to ask
>> for
>> >>> clarification, Gabor. You are saying that if we configure a
>> truststore for
>> >>> the REST endpoint with a single trusted certificate which has been
>> >>> generated by the operator of the Flink cluster, then the attacker can
>> >>> generate a new certificate, sign it and then talk to the Flink
>> cluster if
>> >>> he has access to the node on which the REST endpoint runs? My
>> understanding
>> >>> was that you need the corresponding private key which in my proposed
>> setup
>> >>> would be under the control of the operator as well (e.g. stored in a
>> >>> keystore on the same machine but guarded by some secret). That way
>> (if I am
>> >>> not mistaken), only the entity which has access to the keystore is
>> able to
>> >>> talk to the Flink cluster.
>> >>>
>> >>> Maybe we are also getting our wires crossed here and are talking about
>> >>> different things.
>> >>>
>> >>> Thanks for listing the pros and cons of Kerberos. Concerning what
>> other
>> >>> authentication mechanisms are used in the industry, I am not 100%
>> sure.
>> >>>
>> >>> Cheers,
>> >>> Till
>> >>>
>> >>> On Fri, Jun 4, 2021 at 11:09 AM Gabor Somogyi <
>> [hidden email]>
>> >>> wrote:
>> >>>
>> >>>> > I did not mean for the user to sign its own certificates but for
>> the
>> >>>> operator of the cluster. Once the user request hits the proxy, it
>> should no
>> >>>> longer be under his control. I think I do not fully understand yet
>> why this
>> >>>> would not work.
>> >>>> I said it's not solving the authentication problem over any proxy.
>> Even
>> >>>> if the operator is signing the certificate one can have access to an
>> >>>> internal node.
>> >>>> Such case anybody can craft certificates which is accepted by the
>> >>>> server. When it's accepted a bad guy can cancel jobs causing huge
>> impacts.
>> >>>>
>> >>>> > Also, I am missing a bit the comparison of Kerberos to other
>> >>>> authentication mechanisms and why they were rejected in favour of
>> Kerberos.
>> >>>> PROS:
>> >>>> * Since it's not depending on cloud provider and/or k8s or bare-metal
>> >>>> etc. deployment it's the biggest plus
>> >>>> * Centralized with tools and no need to write tons of tools around
>> >>>> * There are clients/tools on almost all OS-es and several languages
>> >>>> * Super huge users are using it for years in production w/o huge
>> issues
>> >>>> * Provides cross-realm trust possibility amongst other features
>> >>>> * Several open source components using it which could increase
>> >>>> compatibility
>> >>>>
>> >>>> CONS:
>> >>>> * Not everybody using kerberos
>> >>>> * It would increase the code footprint but this is true for many
>> >>>> features (as a side note I'm here to maintain it)
>> >>>>
>> >>>> Feel free to add your points because it only represents a single
>> >>>> viewpoint.
>> >>>> Also if you have any better option for strong authentication please
>> >>>> share it and we can consider the pros/cons here.
>> >>>>
>> >>>> BR,
>> >>>> G
>> >>>>
>> >>>>
>> >>>> On Fri, Jun 4, 2021 at 10:32 AM Till Rohrmann <[hidden email]>
>> >>>> wrote:
>> >>>>
>> >>>>> I did not mean for the user to sign its own certificates but for the
>> >>>>> operator of the cluster. Once the user request hits the proxy, it
>> should no
>> >>>>> longer be under his control. I think I do not fully understand yet
>> why this
>> >>>>> would not work.
>> >>>>>
>> >>>>> What I would like to avoid is to add more complexity into Flink if
>> >>>>> there is an easy solution which fulfills the requirements. That's
>> why I
>> >>>>> would like to exercise thoroughly through the different
>> alternatives. Also,
>> >>>>> I am missing a bit the comparison of Kerberos to other
>> authentication
>> >>>>> mechanisms and why they were rejected in favour of Kerberos.
>> >>>>>
>> >>>>> Cheers,
>> >>>>> Till
>> >>>>>
>> >>>>> On Fri, Jun 4, 2021 at 10:26 AM Gyula Fóra <[hidden email]>
>> wrote:
>> >>>>>
>> >>>>>> Hi!
>> >>>>>>
>> >>>>>> I think there might be possible alternatives but it seems Kerberos
>> on
>> >>>>>> the rest endpoint ticks all the right boxes and provides a super
>> clean and
>> >>>>>> simple solution for strong authentication.
>> >>>>>>
>> >>>>>> I wouldn’t even consider sidecar proxies etc if we can solve it in
>> >>>>>> such a simple way as proposed by G.
>> >>>>>>
>> >>>>>> Cheers
>> >>>>>> Gyula
>> >>>>>>
>> >>>>>> On Fri, 4 Jun 2021 at 10:03, Till Rohrmann <[hidden email]>
>> >>>>>> wrote:
>> >>>>>>
>> >>>>>>> I am not saying that we shouldn't add a strong authentication
>> >>>>>>> mechanism if there are good reasons for it. I primarily would
>> like to
>> >>>>>>> understand the context a bit better in order to give qualified
>> feedback and
>> >>>>>>> come to a good decision. In order to do this, I have the feeling
>> that we
>> >>>>>>> haven't fully considered all available options which are on the
>> table, tbh.
>> >>>>>>>
>> >>>>>>> Does the problem of certificate expiry also apply for self-signed
>> >>>>>>> certificates? If yes, then this should then also be a problem for
>> the
>> >>>>>>> internal encryption of Flink's communication. If not, then one
>> could use
>> >>>>>>> self-signed certificates with a longer validity to solve the
>> mentioned
>> >>>>>>> issue.
>> >>>>>>>
>> >>>>>>> I think you can set up Flink in such a way that you don't have to
>> >>>>>>> handle all the different certificates. For example, you could
>> deploy Flink
>> >>>>>>> with a "sidecar proxy" which is responsible for the
>> authentication using an
>> >>>>>>> arbitrary method (e.g. Kerberos) and then bind the REST endpoint
>> to a local
>> >>>>>>> network interface. That way, the REST endpoint would only be
>> available
>> >>>>>>> through the sidecar proxy. Additionally, one could enable SSL for
>> this
>> >>>>>>> communication. Would this be a solution for the problem?
>> >>>>>>>
>> >>>>>>> Cheers,
>> >>>>>>> Till
>> >>>>>>>
>> >>>>>>> On Thu, Jun 3, 2021 at 10:46 PM Márton Balassi <
>> >>>>>>> [hidden email]> wrote:
>> >>>>>>>
>> >>>>>>>> That is an interesting idea, Till.
>> >>>>>>>>
>> >>>>>>>> The main issue with it is that TLS certificates have an
>> expiration
>> >>>>>>>> time, usually they get approved for a couple years. Forcing our
>> users to
>> >>>>>>>> restart jobs to reprovision TLS certificates would be weird when
>> we could
>> >>>>>>>> just implement a single proper strong authentication mechanism
>> instead in a
>> >>>>>>>> couple hundred lines of code. :-)
>> >>>>>>>>
>> >>>>>>>> In many cases it is also impractical to go the TLS mutual route,
>> >>>>>>>> because the Flink Dashboard can end up on any node in the
>> k8s/Yarn cluster
>> >>>>>>>> which means that we need a certificate per node (due to the
>> mutual auth),
>> >>>>>>>> but if we also want to protect the private key of these from
>> users
>> >>>>>>>> accidentally or intentionally leaking them then we need this per
>> user. As
>> >>>>>>>> in we end up managing user*machine number certificates and
>> having to renew
>> >>>>>>>> them periodically, which albeit automatable is unfortunately not
>> yet
>> >>>>>>>> automated in all large organizations.
>> >>>>>>>>
>> >>>>>>>> I fully agree that TLS certificate mutual authentication has its
>> >>>>>>>> nice properties, especially at very large (multiple thousand
>> node) clusters
>> >>>>>>>> - but it has its own challenges too. Thanks for bringing it up.
>> >>>>>>>>
>> >>>>>>>> Happy to have this added to the rejected alternative list so that
>> >>>>>>>> we have the full picture documented.
>> >>>>>>>>
>> >>>>>>>> On Thu, Jun 3, 2021 at 5:52 PM Till Rohrmann <
>> [hidden email]>
>> >>>>>>>> wrote:
>> >>>>>>>>
>> >>>>>>>>> I guess the idea would then be to let the proxy do the
>> >>>>>>>>> authentication job and only forward the request via an SSL
>> mutually
>> >>>>>>>>> encrypted connection to the Flink cluster. Would this be
>> possible? The
>> >>>>>>>>> beauty of this setup is in my opinion that this setup should
>> work with all
>> >>>>>>>>> kinds of authentication mechanisms.
>> >>>>>>>>>
>> >>>>>>>>> Cheers,
>> >>>>>>>>> Till
>> >>>>>>>>>
>> >>>>>>>>> On Thu, Jun 3, 2021 at 3:12 PM Gabor Somogyi <
>> >>>>>>>>> [hidden email]> wrote:
>> >>>>>>>>>
>> >>>>>>>>>> Thanks for giving options to fulfil the need.
>> >>>>>>>>>>
>> >>>>>>>>>> Users are looking for a solution where users can be identified
>> on
>> >>>>>>>>>> the whole cluster and restrict access to resources/actions.
>> >>>>>>>>>> A good example for such an action is cancelling other users
>> >>>>>>>>>> running jobs.
>> >>>>>>>>>>
>> >>>>>>>>>> * SSL does provide mutual authentication but when
>> authentication
>> >>>>>>>>>> passed there is no user based on restrictions can be made.
>> >>>>>>>>>> * The less problematic part is that generating/maintaining
>> short
>> >>>>>>>>>> time valid certificates would be a hard (that's the reason KDC
>> like servers
>> >>>>>>>>>> exist).
>> >>>>>>>>>> Having long time valid certificates would widen the attack
>> >>>>>>>>>> surface but since the first concern is there this is just a
>> cosmetic issue.
>> >>>>>>>>>>
>> >>>>>>>>>> All in all using TLS certificates is not sufficient in these
>> >>>>>>>>>> environments unfortunately.
>> >>>>>>>>>>
>> >>>>>>>>>> BR,
>> >>>>>>>>>> G
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> On Thu, Jun 3, 2021 at 12:49 PM Till Rohrmann <
>> >>>>>>>>>> [hidden email]> wrote:
>> >>>>>>>>>>
>> >>>>>>>>>>> Thanks for the information Gabor. If it is about securing the
>> >>>>>>>>>>> communication between the REST client and the REST server,
>> then Flink
>> >>>>>>>>>>> already supports enabling mutual SSL authentication [1].
>> Would this be
>> >>>>>>>>>>> enough to secure the communication and to pass an audit?
>> >>>>>>>>>>>
>> >>>>>>>>>>> [1]
>> >>>>>>>>>>>
>> https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/security/security-ssl/#external--rest-connectivity
>> >>>>>>>>>>>
>> >>>>>>>>>>> Cheers,
>> >>>>>>>>>>> Till
>> >>>>>>>>>>>
>> >>>>>>>>>>> On Thu, Jun 3, 2021 at 10:33 AM Gabor Somogyi <
>> >>>>>>>>>>> [hidden email]> wrote:
>> >>>>>>>>>>>
>> >>>>>>>>>>>> Hi Till,
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Since I'm working in security area 10+ years let me share my
>> >>>>>>>>>>>> thought.
>> >>>>>>>>>>>> I would like to emphasise there are experts better than me
>> but
>> >>>>>>>>>>>> I have some
>> >>>>>>>>>>>> basics.
>> >>>>>>>>>>>> The discussion is open and not trying to tell alone things...
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> > I mean if an attacker can get access to one of the
>> machines,
>> >>>>>>>>>>>> then it
>> >>>>>>>>>>>> should also be possible to obtain the right Kerberos token.
>> >>>>>>>>>>>> Not necessarily. For example if one gets access to a specific
>> >>>>>>>>>>>> user's
>> >>>>>>>>>>>> credentials then it's not possible to compromise other user's
>> >>>>>>>>>>>> jobs, data,
>> >>>>>>>>>>>> etc...
>> >>>>>>>>>>>> Security is like an onion, the more layers has been added the
>> >>>>>>>>>>>> more time an
>> >>>>>>>>>>>> attacker needs to proceed.
>> >>>>>>>>>>>> At the end of the day if one is in, then most probably can
>> find
>> >>>>>>>>>>>> the way but
>> >>>>>>>>>>>> this time is normally enough to sysadmins or security
>> experts to
>> >>>>>>>>>>>> close down the system and minimize the damage.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> The other thing is that all tokens has a timeout and if the
>> >>>>>>>>>>>> token is
>> >>>>>>>>>>>> invalid then the attacker can't proceed further.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> > Is Kerberos also the standard authentication protocol for
>> >>>>>>>>>>>> Kubernetes
>> >>>>>>>>>>>> deployments?
>> >>>>>>>>>>>> Kerberos is an industry standard which is cloud/deployment
>> >>>>>>>>>>>> agnostic and it
>> >>>>>>>>>>>> can be used in any deployments including k8s.
>> >>>>>>>>>>>> The main intention is to use kerberos in k8s deployments too
>> >>>>>>>>>>>> since we're
>> >>>>>>>>>>>> going this direction as well.
>> >>>>>>>>>>>> Please see how Spark does this:
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> https://spark.apache.org/docs/latest/security.html#secure-interaction-with-kubernetes
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Last but not least the most important reason to add at least
>> >>>>>>>>>>>> one strong
>> >>>>>>>>>>>> authentication is that we have users who has
>> >>>>>>>>>>>> hard requirements on this. They're doing security audits and
>> if
>> >>>>>>>>>>>> they fail
>> >>>>>>>>>>>> then it's deal breaking.
>> >>>>>>>>>>>> That is why we have added kerberos at the first place.
>> >>>>>>>>>>>> Unfortunately we
>> >>>>>>>>>>>> can't name them in this public list, however
>> >>>>>>>>>>>> the customers who specifically asked for this were mainly in
>> >>>>>>>>>>>> the banking
>> >>>>>>>>>>>> and telco sector.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> BR,
>> >>>>>>>>>>>> G
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> On Thu, Jun 3, 2021 at 9:20 AM Till Rohrmann <
>> >>>>>>>>>>>> [hidden email]> wrote:
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> > Thanks for updating the document Márton. Why is it that
>> banks
>> >>>>>>>>>>>> will
>> >>>>>>>>>>>> > consider it more secure if Flink comes with Kerberos
>> >>>>>>>>>>>> authentication
>> >>>>>>>>>>>> > (assuming a properly secured setup)? I mean if an attacker
>> >>>>>>>>>>>> can get access
>> >>>>>>>>>>>> > to one of the machines, then it should also be possible to
>> >>>>>>>>>>>> obtain the right
>> >>>>>>>>>>>> > Kerberos token.
>> >>>>>>>>>>>> >
>> >>>>>>>>>>>> > I am not an authentication expert and that's why I wanted
>> to
>> >>>>>>>>>>>> ask what are
>> >>>>>>>>>>>> > other authentication protocols other than Kerberos? Why did
>> >>>>>>>>>>>> we select
>> >>>>>>>>>>>> > Kerberos and not any other authentication protocol? Maybe
>> you
>> >>>>>>>>>>>> can list the
>> >>>>>>>>>>>> > pros and cons for the different protocols. Is Kerberos also
>> >>>>>>>>>>>> the standard
>> >>>>>>>>>>>> > authentication protocol for Kubernetes deployments? If not,
>> >>>>>>>>>>>> what would be
>> >>>>>>>>>>>> > the answer when deploying on K8s?
>> >>>>>>>>>>>> >
>> >>>>>>>>>>>> > Cheers,
>> >>>>>>>>>>>> > Till
>> >>>>>>>>>>>> >
>> >>>>>>>>>>>> > On Wed, Jun 2, 2021 at 12:07 PM Gabor Somogyi <
>> >>>>>>>>>>>> [hidden email]>
>> >>>>>>>>>>>> > wrote:
>> >>>>>>>>>>>> >
>> >>>>>>>>>>>> >> Hi team,
>> >>>>>>>>>>>> >>
>> >>>>>>>>>>>> >> Happy to be here and hope I can provide quality additions
>> in
>> >>>>>>>>>>>> the future.
>> >>>>>>>>>>>> >>
>> >>>>>>>>>>>> >> Thank you all for helpful the suggestions!
>> >>>>>>>>>>>> >> Considering them the FLIP has been modified and the work
>> >>>>>>>>>>>> continues on the
>> >>>>>>>>>>>> >> already existing Jira.
>> >>>>>>>>>>>> >>
>> >>>>>>>>>>>> >> BR,
>> >>>>>>>>>>>> >> G
>> >>>>>>>>>>>> >>
>> >>>>>>>>>>>> >>
>> >>>>>>>>>>>> >> On Wed, Jun 2, 2021 at 11:23 AM Márton Balassi <
>> >>>>>>>>>>>> [hidden email]>
>> >>>>>>>>>>>> >> wrote:
>> >>>>>>>>>>>> >>
>> >>>>>>>>>>>> >>> Thanks, Chesney - I totally missed that. Answered on the
>> >>>>>>>>>>>> ticket too, let
>> >>>>>>>>>>>> >>> us continue there then.
>> >>>>>>>>>>>> >>>
>> >>>>>>>>>>>> >>> Till, I agree that we should keep this codepath as slim
>> as
>> >>>>>>>>>>>> possible. It
>> >>>>>>>>>>>> >>> is an important design decision that we aim to keep the
>> >>>>>>>>>>>> list of
>> >>>>>>>>>>>> >>> authentication protocols to a minimum. We believe that
>> this
>> >>>>>>>>>>>> should not be a
>> >>>>>>>>>>>> >>> primary concern of Flink and a trusted proxy service (for
>> >>>>>>>>>>>> example Apache
>> >>>>>>>>>>>> >>> Knox) should be used to enable a multitude of enduser
>> >>>>>>>>>>>> authentication
>> >>>>>>>>>>>> >>> mechanisms. The bare minimum of authentication mechanisms
>> >>>>>>>>>>>> to support
>> >>>>>>>>>>>> >>> consequently consist of a single strong authentication
>> >>>>>>>>>>>> protocol for which
>> >>>>>>>>>>>> >>> Kerberos is the enterprise solution and HTTP Basic
>> primary
>> >>>>>>>>>>>> for development
>> >>>>>>>>>>>> >>> and light-weight scenarios.
>> >>>>>>>>>>>> >>>
>> >>>>>>>>>>>> >>> Added the above wording to G's doc.
>> >>>>>>>>>>>> >>>
>> >>>>>>>>>>>> >>>
>> >>>>>>>>>>>>
>> https://docs.google.com/document/d/1NMPeJ9H0G49TGy3AzTVVJVKmYC0okwOtqLTSPnGqzHw/edit
>> >>>>>>>>>>>> >>>
>> >>>>>>>>>>>> >>>
>> >>>>>>>>>>>> >>>
>> >>>>>>>>>>>> >>> On Tue, Jun 1, 2021 at 11:47 AM Chesnay Schepler <
>> >>>>>>>>>>>> [hidden email]>
>> >>>>>>>>>>>> >>> wrote:
>> >>>>>>>>>>>> >>>
>> >>>>>>>>>>>> >>>> There's a related effort:
>> >>>>>>>>>>>> >>>> https://issues.apache.org/jira/browse/FLINK-21108
>> >>>>>>>>>>>> >>>>
>> >>>>>>>>>>>> >>>> On 6/1/2021 10:14 AM, Till Rohrmann wrote:
>> >>>>>>>>>>>> >>>> > Hi Gabor, welcome to the Flink community!
>> >>>>>>>>>>>> >>>> >
>> >>>>>>>>>>>> >>>> > Thanks for sharing this proposal with the community
>> >>>>>>>>>>>> Márton. In
>> >>>>>>>>>>>> >>>> general, I
>> >>>>>>>>>>>> >>>> > agree that authentication is missing and that this is
>> >>>>>>>>>>>> required for
>> >>>>>>>>>>>> >>>> using
>> >>>>>>>>>>>> >>>> > Flink within an enterprise. The thing I am wondering
>> is
>> >>>>>>>>>>>> whether this
>> >>>>>>>>>>>> >>>> > feature strictly needs to be implemented inside of
>> Flink
>> >>>>>>>>>>>> or whether a
>> >>>>>>>>>>>> >>>> proxy
>> >>>>>>>>>>>> >>>> > setup could do the job? Have you considered this
>> option?
>> >>>>>>>>>>>> If yes, then
>> >>>>>>>>>>>> >>>> it
>> >>>>>>>>>>>> >>>> > would be good to list it under the point of rejected
>> >>>>>>>>>>>> alternatives.
>> >>>>>>>>>>>> >>>> >
>> >>>>>>>>>>>> >>>> > I do see the benefit of implementing this feature
>> inside
>> >>>>>>>>>>>> of Flink if
>> >>>>>>>>>>>> >>>> many
>> >>>>>>>>>>>> >>>> > users need it. If not, then it might be easier for the
>> >>>>>>>>>>>> project to not
>> >>>>>>>>>>>> >>>> > increase the surface area since it makes the overall
>> >>>>>>>>>>>> maintenance
>> >>>>>>>>>>>> >>>> harder.
>> >>>>>>>>>>>> >>>> >
>> >>>>>>>>>>>> >>>> > Cheers,
>> >>>>>>>>>>>> >>>> > Till
>> >>>>>>>>>>>> >>>> >
>> >>>>>>>>>>>> >>>> > On Mon, May 31, 2021 at 4:57 PM Márton Balassi <
>> >>>>>>>>>>>> [hidden email]>
>> >>>>>>>>>>>> >>>> wrote:
>> >>>>>>>>>>>> >>>> >
>> >>>>>>>>>>>> >>>> >> Hi team,
>> >>>>>>>>>>>> >>>> >>
>> >>>>>>>>>>>> >>>> >> Firstly I would like to introduce Gabor or G [1] for
>> >>>>>>>>>>>> short to the
>> >>>>>>>>>>>> >>>> >> community, he is a Spark committer who has recently
>> >>>>>>>>>>>> transitioned to
>> >>>>>>>>>>>> >>>> the
>> >>>>>>>>>>>> >>>> >> Flink Engineering team at Cloudera and is looking
>> >>>>>>>>>>>> forward to
>> >>>>>>>>>>>> >>>> contributing
>> >>>>>>>>>>>> >>>> >> to Apache Flink. Previously G primarily focused on
>> >>>>>>>>>>>> Spark Streaming
>> >>>>>>>>>>>> >>>> and
>> >>>>>>>>>>>> >>>> >> security.
>> >>>>>>>>>>>> >>>> >>
>> >>>>>>>>>>>> >>>> >> Based on requests from our customers G has
>> implemented
>> >>>>>>>>>>>> Kerberos and
>> >>>>>>>>>>>> >>>> HTTP
>> >>>>>>>>>>>> >>>> >> Basic Authentication for the Flink Dashboard and
>> >>>>>>>>>>>> HistoryServer.
>> >>>>>>>>>>>> >>>> Previously
>> >>>>>>>>>>>> >>>> >> lacked an authentication story.
>> >>>>>>>>>>>> >>>> >>
>> >>>>>>>>>>>> >>>> >> We are looking to contribute this functionality back
>> to
>> >>>>>>>>>>>> the
>> >>>>>>>>>>>> >>>> community, we
>> >>>>>>>>>>>> >>>> >> believe that given Flink's maturity there should be a
>> >>>>>>>>>>>> common code
>> >>>>>>>>>>>> >>>> solution
>> >>>>>>>>>>>> >>>> >> for this general pattern.
>> >>>>>>>>>>>> >>>> >>
>> >>>>>>>>>>>> >>>> >> We are looking forward to your feedback on G's
>> design.
>> >>>>>>>>>>>> [2]
>> >>>>>>>>>>>> >>>> >>
>> >>>>>>>>>>>> >>>> >> [1] http://gaborsomogyi.com/
>> >>>>>>>>>>>> >>>> >> [2]
>> >>>>>>>>>>>> >>>> >>
>> >>>>>>>>>>>> >>>> >>
>> >>>>>>>>>>>> >>>>
>> >>>>>>>>>>>>
>> https://docs.google.com/document/d/1NMPeJ9H0G49TGy3AzTVVJVKmYC0okwOtqLTSPnGqzHw/edit
>> >>>>>>>>>>>> >>>> >>
>> >>>>>>>>>>>> >>>>
>> >>>>>>>>>>>> >>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>
>>
>
>
> --
>
> Konstantin Knauf
>
> https://twitter.com/snntrable
>
> https://github.com/knaufk
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Dashboard/HistoryServer authentication

Konstantin Knauf-4
Hi Gabor,

> However representing Kerberos as completely new feature is not true
because
it's already in since Flink makes authentication at least with HDFS and
Hbase through Kerberos.

True, that is one way to look at it, but there are differences, too:
Control Plane vs Data Plane, Core vs Connectors.

> Adding OIDC or OAuth2 has the exact same concerns what you've guys just
raised. Why exactly these? If you think this would be beneficial we can
discuss it in detail

That's exactly my point. Once we start adding authx support, we will sooner
or later discuss other options besides Kerberos, too. A user who would like
to use OAuth can not easily use Kerberos, right?
That is one of the reasons I am skeptical about adding initial authx
support.

> Related authorization you've mentioned it can be complicated over time.
Can
you show us an example? We've knowledge with couple of open source
components
but authorization was never a horror complex story. I personally have the
most experience with Spark which I think is quite simple and stable. Users
can be viewers/admins
and jobs started by others can't be modified. If you can share an example
over-complication we can discuss on facts.

Authorization is a new aspect that needs to be considered for every
addition to the REST API. In the future users might ask for additional
roles (e.g. an editor), user-defined roles and you've already mentioned
job-level permissions yourself. And keep in mind that there might also be
larger additions in the future like the flink-sql-gateway. Contributions
like this become more expensive the more aspects we need to consider.

In general, I believe, it is important that the community focuses its
efforts where we can generate the most value to the user and - personally -
I don't think there is much to gain by extending Flink's scope in that
direction. Of course, this is not black and white and there are other valid
opinions.

Thanks,

Konstantin

On Wed, Jun 16, 2021 at 7:38 PM Gabor Somogyi <[hidden email]>
wrote:

> Hi Konstantin,
>
> Thanks for the response. Related new feature introduction in case of Basic
> auth I tend to agree, anything else can be chosen.
>
> However representing Kerberos as completely new feature is not true because
> it's already in since Flink makes authentication at least with HDFS and
> Hbase through Kerberos.
> The main problem with the actual Kerberos implementation is that it
> contains several bugs and only partially implemented. Following your
> suggestion can we agree that we
> skip the Basic auth implementation and finish an already started Kerberos
> story by adding History Server and Job Dashboard authentication?
>
> Adding OIDC or OAuth2 has the exact same concerns what you've guys just
> raised. Why exactly these? If you think this would be beneficial we can
> discuss it in detail
> but as a side story it would be good to finish a halfway done Kerberos
> story.
>
> Related authorization you've mentioned it can be complicated over time. Can
> you show us an example? We've knowledge with couple of open source
> components
> but authorization was never a horror complex story. I personally have the
> most experience with Spark which I think is quite simple and stable. Users
> can be viewers/admins
> and jobs started by others can't be modified. If you can share an example
> over-complication we can discuss on facts.
>
> Thank you in advance!
>
> BR,
> G
>
>
> On Wed, Jun 16, 2021 at 5:42 PM Konstantin Knauf <[hidden email]>
> wrote:
>
> > Hi everyone,
> >
> > sorry for joining late and thanks for the insightful discussion.
> >
> > In general, I'd personally prefer not to increase the surface area of
> > Apache Flink unless there is a good reason. It seems we all agree that
> > authx is not part of the core value proposition of Apache Flink, so if we
> > can delegate this problem to a more specialized tool, I am in favor of
> > that. Apache Flink is already huge and a lot of work goes into
> maintenance,
> > so I personally have become more sensitive to this aspect over time.
> >
> > If we add support for Basic Auth and Kerberos now, users will sooner or
> > later ask for OIDC, LDAP, SAML,... I acknowledge that Kerberos is widely
> > used in the corporate, on-premises context, but isn't the focus moving
> more
> > towards more web-friendly standards like OIDC/OAuth 2.0? If we only want
> to
> > support a single protocol, there is an argument to be made that it should
> > be OIDC and Dex [1,2] as a bridge to everything else. Have OIDC or OAuth2
> > been considered instead of Kerberos? How do you see the market moving?
> But
> > as I said before, in my opinion we can generate more value by investing
> > into other areas of Apache Flink.
> >
> > Authorization also has the potential to become more fine-grained and
> > complex over time: you already mentioned restricting the actions that a
> > specific user can do in a cluster.
> >
> > Cheers,
> >
> > Konstantin
> >
> > [1] https://github.com/dexidp/dex
> > [2] https://github.com/dexidp/dex/issues/1903
> >
> >
> > On Wed, Jun 16, 2021 at 11:44 AM Gabor Somogyi <
> [hidden email]>
> > wrote:
> >
> >> Hi Till,
> >>
> >> Did you have the chance to take a look at the doc? Not yet seen any
> >> update.
> >>
> >> BR,
> >> G
> >>
> >>
> >> On Wed, Jun 9, 2021 at 1:43 PM Till Rohrmann <[hidden email]>
> >> wrote:
> >>
> >> > Thanks for the update Gabor. I'll take a look and respond in the
> >> document.
> >> >
> >> > Cheers,
> >> > Till
> >> >
> >> > On Wed, Jun 9, 2021 at 12:59 PM Gabor Somogyi <
> >> [hidden email]>
> >> > wrote:
> >> >
> >> >> Hi Till,
> >> >>
> >> >> Your proxy suggestion has been considered in-depth and updated the
> FLIP
> >> >> accordingly.
> >> >> We've considered 2 proxy implementation (Nginx and Squid) but
> according
> >> >> to our analysis and testing it's not suitable for the mentioned
> >> use-cases.
> >> >> Please take a look at the rejected alternatives for detailed
> >> explanation.
> >> >>
> >> >> Thanks for your time in advance!
> >> >>
> >> >> BR,
> >> >> G
> >> >>
> >> >>
> >> >> On Fri, Jun 4, 2021 at 3:31 PM Till Rohrmann <[hidden email]>
> >> >> wrote:
> >> >>
> >> >>> As I've said I am not a security expert and that's why I have to ask
> >> for
> >> >>> clarification, Gabor. You are saying that if we configure a
> >> truststore for
> >> >>> the REST endpoint with a single trusted certificate which has been
> >> >>> generated by the operator of the Flink cluster, then the attacker
> can
> >> >>> generate a new certificate, sign it and then talk to the Flink
> >> cluster if
> >> >>> he has access to the node on which the REST endpoint runs? My
> >> understanding
> >> >>> was that you need the corresponding private key which in my proposed
> >> setup
> >> >>> would be under the control of the operator as well (e.g. stored in a
> >> >>> keystore on the same machine but guarded by some secret). That way
> >> (if I am
> >> >>> not mistaken), only the entity which has access to the keystore is
> >> able to
> >> >>> talk to the Flink cluster.
> >> >>>
> >> >>> Maybe we are also getting our wires crossed here and are talking
> about
> >> >>> different things.
> >> >>>
> >> >>> Thanks for listing the pros and cons of Kerberos. Concerning what
> >> other
> >> >>> authentication mechanisms are used in the industry, I am not 100%
> >> sure.
> >> >>>
> >> >>> Cheers,
> >> >>> Till
> >> >>>
> >> >>> On Fri, Jun 4, 2021 at 11:09 AM Gabor Somogyi <
> >> [hidden email]>
> >> >>> wrote:
> >> >>>
> >> >>>> > I did not mean for the user to sign its own certificates but for
> >> the
> >> >>>> operator of the cluster. Once the user request hits the proxy, it
> >> should no
> >> >>>> longer be under his control. I think I do not fully understand yet
> >> why this
> >> >>>> would not work.
> >> >>>> I said it's not solving the authentication problem over any proxy.
> >> Even
> >> >>>> if the operator is signing the certificate one can have access to
> an
> >> >>>> internal node.
> >> >>>> Such case anybody can craft certificates which is accepted by the
> >> >>>> server. When it's accepted a bad guy can cancel jobs causing huge
> >> impacts.
> >> >>>>
> >> >>>> > Also, I am missing a bit the comparison of Kerberos to other
> >> >>>> authentication mechanisms and why they were rejected in favour of
> >> Kerberos.
> >> >>>> PROS:
> >> >>>> * Since it's not depending on cloud provider and/or k8s or
> bare-metal
> >> >>>> etc. deployment it's the biggest plus
> >> >>>> * Centralized with tools and no need to write tons of tools around
> >> >>>> * There are clients/tools on almost all OS-es and several languages
> >> >>>> * Super huge users are using it for years in production w/o huge
> >> issues
> >> >>>> * Provides cross-realm trust possibility amongst other features
> >> >>>> * Several open source components using it which could increase
> >> >>>> compatibility
> >> >>>>
> >> >>>> CONS:
> >> >>>> * Not everybody using kerberos
> >> >>>> * It would increase the code footprint but this is true for many
> >> >>>> features (as a side note I'm here to maintain it)
> >> >>>>
> >> >>>> Feel free to add your points because it only represents a single
> >> >>>> viewpoint.
> >> >>>> Also if you have any better option for strong authentication please
> >> >>>> share it and we can consider the pros/cons here.
> >> >>>>
> >> >>>> BR,
> >> >>>> G
> >> >>>>
> >> >>>>
> >> >>>> On Fri, Jun 4, 2021 at 10:32 AM Till Rohrmann <
> [hidden email]>
> >> >>>> wrote:
> >> >>>>
> >> >>>>> I did not mean for the user to sign its own certificates but for
> the
> >> >>>>> operator of the cluster. Once the user request hits the proxy, it
> >> should no
> >> >>>>> longer be under his control. I think I do not fully understand yet
> >> why this
> >> >>>>> would not work.
> >> >>>>>
> >> >>>>> What I would like to avoid is to add more complexity into Flink if
> >> >>>>> there is an easy solution which fulfills the requirements. That's
> >> why I
> >> >>>>> would like to exercise thoroughly through the different
> >> alternatives. Also,
> >> >>>>> I am missing a bit the comparison of Kerberos to other
> >> authentication
> >> >>>>> mechanisms and why they were rejected in favour of Kerberos.
> >> >>>>>
> >> >>>>> Cheers,
> >> >>>>> Till
> >> >>>>>
> >> >>>>> On Fri, Jun 4, 2021 at 10:26 AM Gyula Fóra <[hidden email]>
> >> wrote:
> >> >>>>>
> >> >>>>>> Hi!
> >> >>>>>>
> >> >>>>>> I think there might be possible alternatives but it seems
> Kerberos
> >> on
> >> >>>>>> the rest endpoint ticks all the right boxes and provides a super
> >> clean and
> >> >>>>>> simple solution for strong authentication.
> >> >>>>>>
> >> >>>>>> I wouldn’t even consider sidecar proxies etc if we can solve it
> in
> >> >>>>>> such a simple way as proposed by G.
> >> >>>>>>
> >> >>>>>> Cheers
> >> >>>>>> Gyula
> >> >>>>>>
> >> >>>>>> On Fri, 4 Jun 2021 at 10:03, Till Rohrmann <[hidden email]
> >
> >> >>>>>> wrote:
> >> >>>>>>
> >> >>>>>>> I am not saying that we shouldn't add a strong authentication
> >> >>>>>>> mechanism if there are good reasons for it. I primarily would
> >> like to
> >> >>>>>>> understand the context a bit better in order to give qualified
> >> feedback and
> >> >>>>>>> come to a good decision. In order to do this, I have the feeling
> >> that we
> >> >>>>>>> haven't fully considered all available options which are on the
> >> table, tbh.
> >> >>>>>>>
> >> >>>>>>> Does the problem of certificate expiry also apply for
> self-signed
> >> >>>>>>> certificates? If yes, then this should then also be a problem
> for
> >> the
> >> >>>>>>> internal encryption of Flink's communication. If not, then one
> >> could use
> >> >>>>>>> self-signed certificates with a longer validity to solve the
> >> mentioned
> >> >>>>>>> issue.
> >> >>>>>>>
> >> >>>>>>> I think you can set up Flink in such a way that you don't have
> to
> >> >>>>>>> handle all the different certificates. For example, you could
> >> deploy Flink
> >> >>>>>>> with a "sidecar proxy" which is responsible for the
> >> authentication using an
> >> >>>>>>> arbitrary method (e.g. Kerberos) and then bind the REST endpoint
> >> to a local
> >> >>>>>>> network interface. That way, the REST endpoint would only be
> >> available
> >> >>>>>>> through the sidecar proxy. Additionally, one could enable SSL
> for
> >> this
> >> >>>>>>> communication. Would this be a solution for the problem?
> >> >>>>>>>
> >> >>>>>>> Cheers,
> >> >>>>>>> Till
> >> >>>>>>>
> >> >>>>>>> On Thu, Jun 3, 2021 at 10:46 PM Márton Balassi <
> >> >>>>>>> [hidden email]> wrote:
> >> >>>>>>>
> >> >>>>>>>> That is an interesting idea, Till.
> >> >>>>>>>>
> >> >>>>>>>> The main issue with it is that TLS certificates have an
> >> expiration
> >> >>>>>>>> time, usually they get approved for a couple years. Forcing our
> >> users to
> >> >>>>>>>> restart jobs to reprovision TLS certificates would be weird
> when
> >> we could
> >> >>>>>>>> just implement a single proper strong authentication mechanism
> >> instead in a
> >> >>>>>>>> couple hundred lines of code. :-)
> >> >>>>>>>>
> >> >>>>>>>> In many cases it is also impractical to go the TLS mutual
> route,
> >> >>>>>>>> because the Flink Dashboard can end up on any node in the
> >> k8s/Yarn cluster
> >> >>>>>>>> which means that we need a certificate per node (due to the
> >> mutual auth),
> >> >>>>>>>> but if we also want to protect the private key of these from
> >> users
> >> >>>>>>>> accidentally or intentionally leaking them then we need this
> per
> >> user. As
> >> >>>>>>>> in we end up managing user*machine number certificates and
> >> having to renew
> >> >>>>>>>> them periodically, which albeit automatable is unfortunately
> not
> >> yet
> >> >>>>>>>> automated in all large organizations.
> >> >>>>>>>>
> >> >>>>>>>> I fully agree that TLS certificate mutual authentication has
> its
> >> >>>>>>>> nice properties, especially at very large (multiple thousand
> >> node) clusters
> >> >>>>>>>> - but it has its own challenges too. Thanks for bringing it up.
> >> >>>>>>>>
> >> >>>>>>>> Happy to have this added to the rejected alternative list so
> that
> >> >>>>>>>> we have the full picture documented.
> >> >>>>>>>>
> >> >>>>>>>> On Thu, Jun 3, 2021 at 5:52 PM Till Rohrmann <
> >> [hidden email]>
> >> >>>>>>>> wrote:
> >> >>>>>>>>
> >> >>>>>>>>> I guess the idea would then be to let the proxy do the
> >> >>>>>>>>> authentication job and only forward the request via an SSL
> >> mutually
> >> >>>>>>>>> encrypted connection to the Flink cluster. Would this be
> >> possible? The
> >> >>>>>>>>> beauty of this setup is in my opinion that this setup should
> >> work with all
> >> >>>>>>>>> kinds of authentication mechanisms.
> >> >>>>>>>>>
> >> >>>>>>>>> Cheers,
> >> >>>>>>>>> Till
> >> >>>>>>>>>
> >> >>>>>>>>> On Thu, Jun 3, 2021 at 3:12 PM Gabor Somogyi <
> >> >>>>>>>>> [hidden email]> wrote:
> >> >>>>>>>>>
> >> >>>>>>>>>> Thanks for giving options to fulfil the need.
> >> >>>>>>>>>>
> >> >>>>>>>>>> Users are looking for a solution where users can be
> identified
> >> on
> >> >>>>>>>>>> the whole cluster and restrict access to resources/actions.
> >> >>>>>>>>>> A good example for such an action is cancelling other users
> >> >>>>>>>>>> running jobs.
> >> >>>>>>>>>>
> >> >>>>>>>>>> * SSL does provide mutual authentication but when
> >> authentication
> >> >>>>>>>>>> passed there is no user based on restrictions can be made.
> >> >>>>>>>>>> * The less problematic part is that generating/maintaining
> >> short
> >> >>>>>>>>>> time valid certificates would be a hard (that's the reason
> KDC
> >> like servers
> >> >>>>>>>>>> exist).
> >> >>>>>>>>>> Having long time valid certificates would widen the attack
> >> >>>>>>>>>> surface but since the first concern is there this is just a
> >> cosmetic issue.
> >> >>>>>>>>>>
> >> >>>>>>>>>> All in all using TLS certificates is not sufficient in these
> >> >>>>>>>>>> environments unfortunately.
> >> >>>>>>>>>>
> >> >>>>>>>>>> BR,
> >> >>>>>>>>>> G
> >> >>>>>>>>>>
> >> >>>>>>>>>>
> >> >>>>>>>>>> On Thu, Jun 3, 2021 at 12:49 PM Till Rohrmann <
> >> >>>>>>>>>> [hidden email]> wrote:
> >> >>>>>>>>>>
> >> >>>>>>>>>>> Thanks for the information Gabor. If it is about securing
> the
> >> >>>>>>>>>>> communication between the REST client and the REST server,
> >> then Flink
> >> >>>>>>>>>>> already supports enabling mutual SSL authentication [1].
> >> Would this be
> >> >>>>>>>>>>> enough to secure the communication and to pass an audit?
> >> >>>>>>>>>>>
> >> >>>>>>>>>>> [1]
> >> >>>>>>>>>>>
> >>
> https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/security/security-ssl/#external--rest-connectivity
> >> >>>>>>>>>>>
> >> >>>>>>>>>>> Cheers,
> >> >>>>>>>>>>> Till
> >> >>>>>>>>>>>
> >> >>>>>>>>>>> On Thu, Jun 3, 2021 at 10:33 AM Gabor Somogyi <
> >> >>>>>>>>>>> [hidden email]> wrote:
> >> >>>>>>>>>>>
> >> >>>>>>>>>>>> Hi Till,
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>> Since I'm working in security area 10+ years let me share
> my
> >> >>>>>>>>>>>> thought.
> >> >>>>>>>>>>>> I would like to emphasise there are experts better than me
> >> but
> >> >>>>>>>>>>>> I have some
> >> >>>>>>>>>>>> basics.
> >> >>>>>>>>>>>> The discussion is open and not trying to tell alone
> things...
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>> > I mean if an attacker can get access to one of the
> >> machines,
> >> >>>>>>>>>>>> then it
> >> >>>>>>>>>>>> should also be possible to obtain the right Kerberos token.
> >> >>>>>>>>>>>> Not necessarily. For example if one gets access to a
> specific
> >> >>>>>>>>>>>> user's
> >> >>>>>>>>>>>> credentials then it's not possible to compromise other
> user's
> >> >>>>>>>>>>>> jobs, data,
> >> >>>>>>>>>>>> etc...
> >> >>>>>>>>>>>> Security is like an onion, the more layers has been added
> the
> >> >>>>>>>>>>>> more time an
> >> >>>>>>>>>>>> attacker needs to proceed.
> >> >>>>>>>>>>>> At the end of the day if one is in, then most probably can
> >> find
> >> >>>>>>>>>>>> the way but
> >> >>>>>>>>>>>> this time is normally enough to sysadmins or security
> >> experts to
> >> >>>>>>>>>>>> close down the system and minimize the damage.
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>> The other thing is that all tokens has a timeout and if the
> >> >>>>>>>>>>>> token is
> >> >>>>>>>>>>>> invalid then the attacker can't proceed further.
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>> > Is Kerberos also the standard authentication protocol for
> >> >>>>>>>>>>>> Kubernetes
> >> >>>>>>>>>>>> deployments?
> >> >>>>>>>>>>>> Kerberos is an industry standard which is cloud/deployment
> >> >>>>>>>>>>>> agnostic and it
> >> >>>>>>>>>>>> can be used in any deployments including k8s.
> >> >>>>>>>>>>>> The main intention is to use kerberos in k8s deployments
> too
> >> >>>>>>>>>>>> since we're
> >> >>>>>>>>>>>> going this direction as well.
> >> >>>>>>>>>>>> Please see how Spark does this:
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>>
> >>
> https://spark.apache.org/docs/latest/security.html#secure-interaction-with-kubernetes
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>> Last but not least the most important reason to add at
> least
> >> >>>>>>>>>>>> one strong
> >> >>>>>>>>>>>> authentication is that we have users who has
> >> >>>>>>>>>>>> hard requirements on this. They're doing security audits
> and
> >> if
> >> >>>>>>>>>>>> they fail
> >> >>>>>>>>>>>> then it's deal breaking.
> >> >>>>>>>>>>>> That is why we have added kerberos at the first place.
> >> >>>>>>>>>>>> Unfortunately we
> >> >>>>>>>>>>>> can't name them in this public list, however
> >> >>>>>>>>>>>> the customers who specifically asked for this were mainly
> in
> >> >>>>>>>>>>>> the banking
> >> >>>>>>>>>>>> and telco sector.
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>> BR,
> >> >>>>>>>>>>>> G
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>> On Thu, Jun 3, 2021 at 9:20 AM Till Rohrmann <
> >> >>>>>>>>>>>> [hidden email]> wrote:
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>> > Thanks for updating the document Márton. Why is it that
> >> banks
> >> >>>>>>>>>>>> will
> >> >>>>>>>>>>>> > consider it more secure if Flink comes with Kerberos
> >> >>>>>>>>>>>> authentication
> >> >>>>>>>>>>>> > (assuming a properly secured setup)? I mean if an
> attacker
> >> >>>>>>>>>>>> can get access
> >> >>>>>>>>>>>> > to one of the machines, then it should also be possible
> to
> >> >>>>>>>>>>>> obtain the right
> >> >>>>>>>>>>>> > Kerberos token.
> >> >>>>>>>>>>>> >
> >> >>>>>>>>>>>> > I am not an authentication expert and that's why I wanted
> >> to
> >> >>>>>>>>>>>> ask what are
> >> >>>>>>>>>>>> > other authentication protocols other than Kerberos? Why
> did
> >> >>>>>>>>>>>> we select
> >> >>>>>>>>>>>> > Kerberos and not any other authentication protocol? Maybe
> >> you
> >> >>>>>>>>>>>> can list the
> >> >>>>>>>>>>>> > pros and cons for the different protocols. Is Kerberos
> also
> >> >>>>>>>>>>>> the standard
> >> >>>>>>>>>>>> > authentication protocol for Kubernetes deployments? If
> not,
> >> >>>>>>>>>>>> what would be
> >> >>>>>>>>>>>> > the answer when deploying on K8s?
> >> >>>>>>>>>>>> >
> >> >>>>>>>>>>>> > Cheers,
> >> >>>>>>>>>>>> > Till
> >> >>>>>>>>>>>> >
> >> >>>>>>>>>>>> > On Wed, Jun 2, 2021 at 12:07 PM Gabor Somogyi <
> >> >>>>>>>>>>>> [hidden email]>
> >> >>>>>>>>>>>> > wrote:
> >> >>>>>>>>>>>> >
> >> >>>>>>>>>>>> >> Hi team,
> >> >>>>>>>>>>>> >>
> >> >>>>>>>>>>>> >> Happy to be here and hope I can provide quality
> additions
> >> in
> >> >>>>>>>>>>>> the future.
> >> >>>>>>>>>>>> >>
> >> >>>>>>>>>>>> >> Thank you all for helpful the suggestions!
> >> >>>>>>>>>>>> >> Considering them the FLIP has been modified and the work
> >> >>>>>>>>>>>> continues on the
> >> >>>>>>>>>>>> >> already existing Jira.
> >> >>>>>>>>>>>> >>
> >> >>>>>>>>>>>> >> BR,
> >> >>>>>>>>>>>> >> G
> >> >>>>>>>>>>>> >>
> >> >>>>>>>>>>>> >>
> >> >>>>>>>>>>>> >> On Wed, Jun 2, 2021 at 11:23 AM Márton Balassi <
> >> >>>>>>>>>>>> [hidden email]>
> >> >>>>>>>>>>>> >> wrote:
> >> >>>>>>>>>>>> >>
> >> >>>>>>>>>>>> >>> Thanks, Chesney - I totally missed that. Answered on
> the
> >> >>>>>>>>>>>> ticket too, let
> >> >>>>>>>>>>>> >>> us continue there then.
> >> >>>>>>>>>>>> >>>
> >> >>>>>>>>>>>> >>> Till, I agree that we should keep this codepath as slim
> >> as
> >> >>>>>>>>>>>> possible. It
> >> >>>>>>>>>>>> >>> is an important design decision that we aim to keep the
> >> >>>>>>>>>>>> list of
> >> >>>>>>>>>>>> >>> authentication protocols to a minimum. We believe that
> >> this
> >> >>>>>>>>>>>> should not be a
> >> >>>>>>>>>>>> >>> primary concern of Flink and a trusted proxy service
> (for
> >> >>>>>>>>>>>> example Apache
> >> >>>>>>>>>>>> >>> Knox) should be used to enable a multitude of enduser
> >> >>>>>>>>>>>> authentication
> >> >>>>>>>>>>>> >>> mechanisms. The bare minimum of authentication
> mechanisms
> >> >>>>>>>>>>>> to support
> >> >>>>>>>>>>>> >>> consequently consist of a single strong authentication
> >> >>>>>>>>>>>> protocol for which
> >> >>>>>>>>>>>> >>> Kerberos is the enterprise solution and HTTP Basic
> >> primary
> >> >>>>>>>>>>>> for development
> >> >>>>>>>>>>>> >>> and light-weight scenarios.
> >> >>>>>>>>>>>> >>>
> >> >>>>>>>>>>>> >>> Added the above wording to G's doc.
> >> >>>>>>>>>>>> >>>
> >> >>>>>>>>>>>> >>>
> >> >>>>>>>>>>>>
> >>
> https://docs.google.com/document/d/1NMPeJ9H0G49TGy3AzTVVJVKmYC0okwOtqLTSPnGqzHw/edit
> >> >>>>>>>>>>>> >>>
> >> >>>>>>>>>>>> >>>
> >> >>>>>>>>>>>> >>>
> >> >>>>>>>>>>>> >>> On Tue, Jun 1, 2021 at 11:47 AM Chesnay Schepler <
> >> >>>>>>>>>>>> [hidden email]>
> >> >>>>>>>>>>>> >>> wrote:
> >> >>>>>>>>>>>> >>>
> >> >>>>>>>>>>>> >>>> There's a related effort:
> >> >>>>>>>>>>>> >>>> https://issues.apache.org/jira/browse/FLINK-21108
> >> >>>>>>>>>>>> >>>>
> >> >>>>>>>>>>>> >>>> On 6/1/2021 10:14 AM, Till Rohrmann wrote:
> >> >>>>>>>>>>>> >>>> > Hi Gabor, welcome to the Flink community!
> >> >>>>>>>>>>>> >>>> >
> >> >>>>>>>>>>>> >>>> > Thanks for sharing this proposal with the community
> >> >>>>>>>>>>>> Márton. In
> >> >>>>>>>>>>>> >>>> general, I
> >> >>>>>>>>>>>> >>>> > agree that authentication is missing and that this
> is
> >> >>>>>>>>>>>> required for
> >> >>>>>>>>>>>> >>>> using
> >> >>>>>>>>>>>> >>>> > Flink within an enterprise. The thing I am wondering
> >> is
> >> >>>>>>>>>>>> whether this
> >> >>>>>>>>>>>> >>>> > feature strictly needs to be implemented inside of
> >> Flink
> >> >>>>>>>>>>>> or whether a
> >> >>>>>>>>>>>> >>>> proxy
> >> >>>>>>>>>>>> >>>> > setup could do the job? Have you considered this
> >> option?
> >> >>>>>>>>>>>> If yes, then
> >> >>>>>>>>>>>> >>>> it
> >> >>>>>>>>>>>> >>>> > would be good to list it under the point of rejected
> >> >>>>>>>>>>>> alternatives.
> >> >>>>>>>>>>>> >>>> >
> >> >>>>>>>>>>>> >>>> > I do see the benefit of implementing this feature
> >> inside
> >> >>>>>>>>>>>> of Flink if
> >> >>>>>>>>>>>> >>>> many
> >> >>>>>>>>>>>> >>>> > users need it. If not, then it might be easier for
> the
> >> >>>>>>>>>>>> project to not
> >> >>>>>>>>>>>> >>>> > increase the surface area since it makes the overall
> >> >>>>>>>>>>>> maintenance
> >> >>>>>>>>>>>> >>>> harder.
> >> >>>>>>>>>>>> >>>> >
> >> >>>>>>>>>>>> >>>> > Cheers,
> >> >>>>>>>>>>>> >>>> > Till
> >> >>>>>>>>>>>> >>>> >
> >> >>>>>>>>>>>> >>>> > On Mon, May 31, 2021 at 4:57 PM Márton Balassi <
> >> >>>>>>>>>>>> [hidden email]>
> >> >>>>>>>>>>>> >>>> wrote:
> >> >>>>>>>>>>>> >>>> >
> >> >>>>>>>>>>>> >>>> >> Hi team,
> >> >>>>>>>>>>>> >>>> >>
> >> >>>>>>>>>>>> >>>> >> Firstly I would like to introduce Gabor or G [1]
> for
> >> >>>>>>>>>>>> short to the
> >> >>>>>>>>>>>> >>>> >> community, he is a Spark committer who has recently
> >> >>>>>>>>>>>> transitioned to
> >> >>>>>>>>>>>> >>>> the
> >> >>>>>>>>>>>> >>>> >> Flink Engineering team at Cloudera and is looking
> >> >>>>>>>>>>>> forward to
> >> >>>>>>>>>>>> >>>> contributing
> >> >>>>>>>>>>>> >>>> >> to Apache Flink. Previously G primarily focused on
> >> >>>>>>>>>>>> Spark Streaming
> >> >>>>>>>>>>>> >>>> and
> >> >>>>>>>>>>>> >>>> >> security.
> >> >>>>>>>>>>>> >>>> >>
> >> >>>>>>>>>>>> >>>> >> Based on requests from our customers G has
> >> implemented
> >> >>>>>>>>>>>> Kerberos and
> >> >>>>>>>>>>>> >>>> HTTP
> >> >>>>>>>>>>>> >>>> >> Basic Authentication for the Flink Dashboard and
> >> >>>>>>>>>>>> HistoryServer.
> >> >>>>>>>>>>>> >>>> Previously
> >> >>>>>>>>>>>> >>>> >> lacked an authentication story.
> >> >>>>>>>>>>>> >>>> >>
> >> >>>>>>>>>>>> >>>> >> We are looking to contribute this functionality
> back
> >> to
> >> >>>>>>>>>>>> the
> >> >>>>>>>>>>>> >>>> community, we
> >> >>>>>>>>>>>> >>>> >> believe that given Flink's maturity there should
> be a
> >> >>>>>>>>>>>> common code
> >> >>>>>>>>>>>> >>>> solution
> >> >>>>>>>>>>>> >>>> >> for this general pattern.
> >> >>>>>>>>>>>> >>>> >>
> >> >>>>>>>>>>>> >>>> >> We are looking forward to your feedback on G's
> >> design.
> >> >>>>>>>>>>>> [2]
> >> >>>>>>>>>>>> >>>> >>
> >> >>>>>>>>>>>> >>>> >> [1] http://gaborsomogyi.com/
> >> >>>>>>>>>>>> >>>> >> [2]
> >> >>>>>>>>>>>> >>>> >>
> >> >>>>>>>>>>>> >>>> >>
> >> >>>>>>>>>>>> >>>>
> >> >>>>>>>>>>>>
> >>
> https://docs.google.com/document/d/1NMPeJ9H0G49TGy3AzTVVJVKmYC0okwOtqLTSPnGqzHw/edit
> >> >>>>>>>>>>>> >>>> >>
> >> >>>>>>>>>>>> >>>>
> >> >>>>>>>>>>>> >>>>
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>
> >>
> >
> >
> > --
> >
> > Konstantin Knauf
> >
> > https://twitter.com/snntrable
> >
> > https://github.com/knaufk
> >
>


--

Konstantin Knauf

https://twitter.com/snntrable

https://github.com/knaufk
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Dashboard/HistoryServer authentication

Till Rohrmann
Hi Gabor,

I haven't found time to look into the updated FLIP yet. I'll try to do it
asap.

Cheers,
Till

On Wed, Jun 16, 2021 at 9:35 PM Konstantin Knauf <[hidden email]> wrote:

> Hi Gabor,
>
> > However representing Kerberos as completely new feature is not true
> because
> it's already in since Flink makes authentication at least with HDFS and
> Hbase through Kerberos.
>
> True, that is one way to look at it, but there are differences, too:
> Control Plane vs Data Plane, Core vs Connectors.
>
> > Adding OIDC or OAuth2 has the exact same concerns what you've guys just
> raised. Why exactly these? If you think this would be beneficial we can
> discuss it in detail
>
> That's exactly my point. Once we start adding authx support, we will
> sooner or later discuss other options besides Kerberos, too. A user who
> would like to use OAuth can not easily use Kerberos, right?
> That is one of the reasons I am skeptical about adding initial authx
> support.
>
> > Related authorization you've mentioned it can be complicated over time.
> Can
> you show us an example? We've knowledge with couple of open source
> components
> but authorization was never a horror complex story. I personally have the
> most experience with Spark which I think is quite simple and stable. Users
> can be viewers/admins
> and jobs started by others can't be modified. If you can share an example
> over-complication we can discuss on facts.
>
> Authorization is a new aspect that needs to be considered for every
> addition to the REST API. In the future users might ask for additional
> roles (e.g. an editor), user-defined roles and you've already mentioned
> job-level permissions yourself. And keep in mind that there might also be
> larger additions in the future like the flink-sql-gateway. Contributions
> like this become more expensive the more aspects we need to consider.
>
> In general, I believe, it is important that the community focuses its
> efforts where we can generate the most value to the user and - personally -
> I don't think there is much to gain by extending Flink's scope in that
> direction. Of course, this is not black and white and there are other valid
> opinions.
>
> Thanks,
>
> Konstantin
>
> On Wed, Jun 16, 2021 at 7:38 PM Gabor Somogyi <[hidden email]>
> wrote:
>
>> Hi Konstantin,
>>
>> Thanks for the response. Related new feature introduction in case of Basic
>> auth I tend to agree, anything else can be chosen.
>>
>> However representing Kerberos as completely new feature is not true
>> because
>> it's already in since Flink makes authentication at least with HDFS and
>> Hbase through Kerberos.
>> The main problem with the actual Kerberos implementation is that it
>> contains several bugs and only partially implemented. Following your
>> suggestion can we agree that we
>> skip the Basic auth implementation and finish an already started Kerberos
>> story by adding History Server and Job Dashboard authentication?
>>
>> Adding OIDC or OAuth2 has the exact same concerns what you've guys just
>> raised. Why exactly these? If you think this would be beneficial we can
>> discuss it in detail
>> but as a side story it would be good to finish a halfway done Kerberos
>> story.
>>
>> Related authorization you've mentioned it can be complicated over time.
>> Can
>> you show us an example? We've knowledge with couple of open source
>> components
>> but authorization was never a horror complex story. I personally have the
>> most experience with Spark which I think is quite simple and stable. Users
>> can be viewers/admins
>> and jobs started by others can't be modified. If you can share an example
>> over-complication we can discuss on facts.
>>
>> Thank you in advance!
>>
>> BR,
>> G
>>
>>
>> On Wed, Jun 16, 2021 at 5:42 PM Konstantin Knauf <[hidden email]>
>> wrote:
>>
>> > Hi everyone,
>> >
>> > sorry for joining late and thanks for the insightful discussion.
>> >
>> > In general, I'd personally prefer not to increase the surface area of
>> > Apache Flink unless there is a good reason. It seems we all agree that
>> > authx is not part of the core value proposition of Apache Flink, so if
>> we
>> > can delegate this problem to a more specialized tool, I am in favor of
>> > that. Apache Flink is already huge and a lot of work goes into
>> maintenance,
>> > so I personally have become more sensitive to this aspect over time.
>> >
>> > If we add support for Basic Auth and Kerberos now, users will sooner or
>> > later ask for OIDC, LDAP, SAML,... I acknowledge that Kerberos is widely
>> > used in the corporate, on-premises context, but isn't the focus moving
>> more
>> > towards more web-friendly standards like OIDC/OAuth 2.0? If we only
>> want to
>> > support a single protocol, there is an argument to be made that it
>> should
>> > be OIDC and Dex [1,2] as a bridge to everything else. Have OIDC or
>> OAuth2
>> > been considered instead of Kerberos? How do you see the market moving?
>> But
>> > as I said before, in my opinion we can generate more value by investing
>> > into other areas of Apache Flink.
>> >
>> > Authorization also has the potential to become more fine-grained and
>> > complex over time: you already mentioned restricting the actions that a
>> > specific user can do in a cluster.
>> >
>> > Cheers,
>> >
>> > Konstantin
>> >
>> > [1] https://github.com/dexidp/dex
>> > [2] https://github.com/dexidp/dex/issues/1903
>> >
>> >
>> > On Wed, Jun 16, 2021 at 11:44 AM Gabor Somogyi <
>> [hidden email]>
>> > wrote:
>> >
>> >> Hi Till,
>> >>
>> >> Did you have the chance to take a look at the doc? Not yet seen any
>> >> update.
>> >>
>> >> BR,
>> >> G
>> >>
>> >>
>> >> On Wed, Jun 9, 2021 at 1:43 PM Till Rohrmann <[hidden email]>
>> >> wrote:
>> >>
>> >> > Thanks for the update Gabor. I'll take a look and respond in the
>> >> document.
>> >> >
>> >> > Cheers,
>> >> > Till
>> >> >
>> >> > On Wed, Jun 9, 2021 at 12:59 PM Gabor Somogyi <
>> >> [hidden email]>
>> >> > wrote:
>> >> >
>> >> >> Hi Till,
>> >> >>
>> >> >> Your proxy suggestion has been considered in-depth and updated the
>> FLIP
>> >> >> accordingly.
>> >> >> We've considered 2 proxy implementation (Nginx and Squid) but
>> according
>> >> >> to our analysis and testing it's not suitable for the mentioned
>> >> use-cases.
>> >> >> Please take a look at the rejected alternatives for detailed
>> >> explanation.
>> >> >>
>> >> >> Thanks for your time in advance!
>> >> >>
>> >> >> BR,
>> >> >> G
>> >> >>
>> >> >>
>> >> >> On Fri, Jun 4, 2021 at 3:31 PM Till Rohrmann <[hidden email]>
>> >> >> wrote:
>> >> >>
>> >> >>> As I've said I am not a security expert and that's why I have to
>> ask
>> >> for
>> >> >>> clarification, Gabor. You are saying that if we configure a
>> >> truststore for
>> >> >>> the REST endpoint with a single trusted certificate which has been
>> >> >>> generated by the operator of the Flink cluster, then the attacker
>> can
>> >> >>> generate a new certificate, sign it and then talk to the Flink
>> >> cluster if
>> >> >>> he has access to the node on which the REST endpoint runs? My
>> >> understanding
>> >> >>> was that you need the corresponding private key which in my
>> proposed
>> >> setup
>> >> >>> would be under the control of the operator as well (e.g. stored in
>> a
>> >> >>> keystore on the same machine but guarded by some secret). That way
>> >> (if I am
>> >> >>> not mistaken), only the entity which has access to the keystore is
>> >> able to
>> >> >>> talk to the Flink cluster.
>> >> >>>
>> >> >>> Maybe we are also getting our wires crossed here and are talking
>> about
>> >> >>> different things.
>> >> >>>
>> >> >>> Thanks for listing the pros and cons of Kerberos. Concerning what
>> >> other
>> >> >>> authentication mechanisms are used in the industry, I am not 100%
>> >> sure.
>> >> >>>
>> >> >>> Cheers,
>> >> >>> Till
>> >> >>>
>> >> >>> On Fri, Jun 4, 2021 at 11:09 AM Gabor Somogyi <
>> >> [hidden email]>
>> >> >>> wrote:
>> >> >>>
>> >> >>>> > I did not mean for the user to sign its own certificates but for
>> >> the
>> >> >>>> operator of the cluster. Once the user request hits the proxy, it
>> >> should no
>> >> >>>> longer be under his control. I think I do not fully understand yet
>> >> why this
>> >> >>>> would not work.
>> >> >>>> I said it's not solving the authentication problem over any proxy.
>> >> Even
>> >> >>>> if the operator is signing the certificate one can have access to
>> an
>> >> >>>> internal node.
>> >> >>>> Such case anybody can craft certificates which is accepted by the
>> >> >>>> server. When it's accepted a bad guy can cancel jobs causing huge
>> >> impacts.
>> >> >>>>
>> >> >>>> > Also, I am missing a bit the comparison of Kerberos to other
>> >> >>>> authentication mechanisms and why they were rejected in favour of
>> >> Kerberos.
>> >> >>>> PROS:
>> >> >>>> * Since it's not depending on cloud provider and/or k8s or
>> bare-metal
>> >> >>>> etc. deployment it's the biggest plus
>> >> >>>> * Centralized with tools and no need to write tons of tools around
>> >> >>>> * There are clients/tools on almost all OS-es and several
>> languages
>> >> >>>> * Super huge users are using it for years in production w/o huge
>> >> issues
>> >> >>>> * Provides cross-realm trust possibility amongst other features
>> >> >>>> * Several open source components using it which could increase
>> >> >>>> compatibility
>> >> >>>>
>> >> >>>> CONS:
>> >> >>>> * Not everybody using kerberos
>> >> >>>> * It would increase the code footprint but this is true for many
>> >> >>>> features (as a side note I'm here to maintain it)
>> >> >>>>
>> >> >>>> Feel free to add your points because it only represents a single
>> >> >>>> viewpoint.
>> >> >>>> Also if you have any better option for strong authentication
>> please
>> >> >>>> share it and we can consider the pros/cons here.
>> >> >>>>
>> >> >>>> BR,
>> >> >>>> G
>> >> >>>>
>> >> >>>>
>> >> >>>> On Fri, Jun 4, 2021 at 10:32 AM Till Rohrmann <
>> [hidden email]>
>> >> >>>> wrote:
>> >> >>>>
>> >> >>>>> I did not mean for the user to sign its own certificates but for
>> the
>> >> >>>>> operator of the cluster. Once the user request hits the proxy, it
>> >> should no
>> >> >>>>> longer be under his control. I think I do not fully understand
>> yet
>> >> why this
>> >> >>>>> would not work.
>> >> >>>>>
>> >> >>>>> What I would like to avoid is to add more complexity into Flink
>> if
>> >> >>>>> there is an easy solution which fulfills the requirements. That's
>> >> why I
>> >> >>>>> would like to exercise thoroughly through the different
>> >> alternatives. Also,
>> >> >>>>> I am missing a bit the comparison of Kerberos to other
>> >> authentication
>> >> >>>>> mechanisms and why they were rejected in favour of Kerberos.
>> >> >>>>>
>> >> >>>>> Cheers,
>> >> >>>>> Till
>> >> >>>>>
>> >> >>>>> On Fri, Jun 4, 2021 at 10:26 AM Gyula Fóra <[hidden email]>
>> >> wrote:
>> >> >>>>>
>> >> >>>>>> Hi!
>> >> >>>>>>
>> >> >>>>>> I think there might be possible alternatives but it seems
>> Kerberos
>> >> on
>> >> >>>>>> the rest endpoint ticks all the right boxes and provides a super
>> >> clean and
>> >> >>>>>> simple solution for strong authentication.
>> >> >>>>>>
>> >> >>>>>> I wouldn’t even consider sidecar proxies etc if we can solve it
>> in
>> >> >>>>>> such a simple way as proposed by G.
>> >> >>>>>>
>> >> >>>>>> Cheers
>> >> >>>>>> Gyula
>> >> >>>>>>
>> >> >>>>>> On Fri, 4 Jun 2021 at 10:03, Till Rohrmann <
>> [hidden email]>
>> >> >>>>>> wrote:
>> >> >>>>>>
>> >> >>>>>>> I am not saying that we shouldn't add a strong authentication
>> >> >>>>>>> mechanism if there are good reasons for it. I primarily would
>> >> like to
>> >> >>>>>>> understand the context a bit better in order to give qualified
>> >> feedback and
>> >> >>>>>>> come to a good decision. In order to do this, I have the
>> feeling
>> >> that we
>> >> >>>>>>> haven't fully considered all available options which are on the
>> >> table, tbh.
>> >> >>>>>>>
>> >> >>>>>>> Does the problem of certificate expiry also apply for
>> self-signed
>> >> >>>>>>> certificates? If yes, then this should then also be a problem
>> for
>> >> the
>> >> >>>>>>> internal encryption of Flink's communication. If not, then one
>> >> could use
>> >> >>>>>>> self-signed certificates with a longer validity to solve the
>> >> mentioned
>> >> >>>>>>> issue.
>> >> >>>>>>>
>> >> >>>>>>> I think you can set up Flink in such a way that you don't have
>> to
>> >> >>>>>>> handle all the different certificates. For example, you could
>> >> deploy Flink
>> >> >>>>>>> with a "sidecar proxy" which is responsible for the
>> >> authentication using an
>> >> >>>>>>> arbitrary method (e.g. Kerberos) and then bind the REST
>> endpoint
>> >> to a local
>> >> >>>>>>> network interface. That way, the REST endpoint would only be
>> >> available
>> >> >>>>>>> through the sidecar proxy. Additionally, one could enable SSL
>> for
>> >> this
>> >> >>>>>>> communication. Would this be a solution for the problem?
>> >> >>>>>>>
>> >> >>>>>>> Cheers,
>> >> >>>>>>> Till
>> >> >>>>>>>
>> >> >>>>>>> On Thu, Jun 3, 2021 at 10:46 PM Márton Balassi <
>> >> >>>>>>> [hidden email]> wrote:
>> >> >>>>>>>
>> >> >>>>>>>> That is an interesting idea, Till.
>> >> >>>>>>>>
>> >> >>>>>>>> The main issue with it is that TLS certificates have an
>> >> expiration
>> >> >>>>>>>> time, usually they get approved for a couple years. Forcing
>> our
>> >> users to
>> >> >>>>>>>> restart jobs to reprovision TLS certificates would be weird
>> when
>> >> we could
>> >> >>>>>>>> just implement a single proper strong authentication mechanism
>> >> instead in a
>> >> >>>>>>>> couple hundred lines of code. :-)
>> >> >>>>>>>>
>> >> >>>>>>>> In many cases it is also impractical to go the TLS mutual
>> route,
>> >> >>>>>>>> because the Flink Dashboard can end up on any node in the
>> >> k8s/Yarn cluster
>> >> >>>>>>>> which means that we need a certificate per node (due to the
>> >> mutual auth),
>> >> >>>>>>>> but if we also want to protect the private key of these from
>> >> users
>> >> >>>>>>>> accidentally or intentionally leaking them then we need this
>> per
>> >> user. As
>> >> >>>>>>>> in we end up managing user*machine number certificates and
>> >> having to renew
>> >> >>>>>>>> them periodically, which albeit automatable is unfortunately
>> not
>> >> yet
>> >> >>>>>>>> automated in all large organizations.
>> >> >>>>>>>>
>> >> >>>>>>>> I fully agree that TLS certificate mutual authentication has
>> its
>> >> >>>>>>>> nice properties, especially at very large (multiple thousand
>> >> node) clusters
>> >> >>>>>>>> - but it has its own challenges too. Thanks for bringing it
>> up.
>> >> >>>>>>>>
>> >> >>>>>>>> Happy to have this added to the rejected alternative list so
>> that
>> >> >>>>>>>> we have the full picture documented.
>> >> >>>>>>>>
>> >> >>>>>>>> On Thu, Jun 3, 2021 at 5:52 PM Till Rohrmann <
>> >> [hidden email]>
>> >> >>>>>>>> wrote:
>> >> >>>>>>>>
>> >> >>>>>>>>> I guess the idea would then be to let the proxy do the
>> >> >>>>>>>>> authentication job and only forward the request via an SSL
>> >> mutually
>> >> >>>>>>>>> encrypted connection to the Flink cluster. Would this be
>> >> possible? The
>> >> >>>>>>>>> beauty of this setup is in my opinion that this setup should
>> >> work with all
>> >> >>>>>>>>> kinds of authentication mechanisms.
>> >> >>>>>>>>>
>> >> >>>>>>>>> Cheers,
>> >> >>>>>>>>> Till
>> >> >>>>>>>>>
>> >> >>>>>>>>> On Thu, Jun 3, 2021 at 3:12 PM Gabor Somogyi <
>> >> >>>>>>>>> [hidden email]> wrote:
>> >> >>>>>>>>>
>> >> >>>>>>>>>> Thanks for giving options to fulfil the need.
>> >> >>>>>>>>>>
>> >> >>>>>>>>>> Users are looking for a solution where users can be
>> identified
>> >> on
>> >> >>>>>>>>>> the whole cluster and restrict access to resources/actions.
>> >> >>>>>>>>>> A good example for such an action is cancelling other users
>> >> >>>>>>>>>> running jobs.
>> >> >>>>>>>>>>
>> >> >>>>>>>>>> * SSL does provide mutual authentication but when
>> >> authentication
>> >> >>>>>>>>>> passed there is no user based on restrictions can be made.
>> >> >>>>>>>>>> * The less problematic part is that generating/maintaining
>> >> short
>> >> >>>>>>>>>> time valid certificates would be a hard (that's the reason
>> KDC
>> >> like servers
>> >> >>>>>>>>>> exist).
>> >> >>>>>>>>>> Having long time valid certificates would widen the attack
>> >> >>>>>>>>>> surface but since the first concern is there this is just a
>> >> cosmetic issue.
>> >> >>>>>>>>>>
>> >> >>>>>>>>>> All in all using TLS certificates is not sufficient in these
>> >> >>>>>>>>>> environments unfortunately.
>> >> >>>>>>>>>>
>> >> >>>>>>>>>> BR,
>> >> >>>>>>>>>> G
>> >> >>>>>>>>>>
>> >> >>>>>>>>>>
>> >> >>>>>>>>>> On Thu, Jun 3, 2021 at 12:49 PM Till Rohrmann <
>> >> >>>>>>>>>> [hidden email]> wrote:
>> >> >>>>>>>>>>
>> >> >>>>>>>>>>> Thanks for the information Gabor. If it is about securing
>> the
>> >> >>>>>>>>>>> communication between the REST client and the REST server,
>> >> then Flink
>> >> >>>>>>>>>>> already supports enabling mutual SSL authentication [1].
>> >> Would this be
>> >> >>>>>>>>>>> enough to secure the communication and to pass an audit?
>> >> >>>>>>>>>>>
>> >> >>>>>>>>>>> [1]
>> >> >>>>>>>>>>>
>> >>
>> https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/security/security-ssl/#external--rest-connectivity
>> >> >>>>>>>>>>>
>> >> >>>>>>>>>>> Cheers,
>> >> >>>>>>>>>>> Till
>> >> >>>>>>>>>>>
>> >> >>>>>>>>>>> On Thu, Jun 3, 2021 at 10:33 AM Gabor Somogyi <
>> >> >>>>>>>>>>> [hidden email]> wrote:
>> >> >>>>>>>>>>>
>> >> >>>>>>>>>>>> Hi Till,
>> >> >>>>>>>>>>>>
>> >> >>>>>>>>>>>> Since I'm working in security area 10+ years let me share
>> my
>> >> >>>>>>>>>>>> thought.
>> >> >>>>>>>>>>>> I would like to emphasise there are experts better than me
>> >> but
>> >> >>>>>>>>>>>> I have some
>> >> >>>>>>>>>>>> basics.
>> >> >>>>>>>>>>>> The discussion is open and not trying to tell alone
>> things...
>> >> >>>>>>>>>>>>
>> >> >>>>>>>>>>>> > I mean if an attacker can get access to one of the
>> >> machines,
>> >> >>>>>>>>>>>> then it
>> >> >>>>>>>>>>>> should also be possible to obtain the right Kerberos
>> token.
>> >> >>>>>>>>>>>> Not necessarily. For example if one gets access to a
>> specific
>> >> >>>>>>>>>>>> user's
>> >> >>>>>>>>>>>> credentials then it's not possible to compromise other
>> user's
>> >> >>>>>>>>>>>> jobs, data,
>> >> >>>>>>>>>>>> etc...
>> >> >>>>>>>>>>>> Security is like an onion, the more layers has been added
>> the
>> >> >>>>>>>>>>>> more time an
>> >> >>>>>>>>>>>> attacker needs to proceed.
>> >> >>>>>>>>>>>> At the end of the day if one is in, then most probably can
>> >> find
>> >> >>>>>>>>>>>> the way but
>> >> >>>>>>>>>>>> this time is normally enough to sysadmins or security
>> >> experts to
>> >> >>>>>>>>>>>> close down the system and minimize the damage.
>> >> >>>>>>>>>>>>
>> >> >>>>>>>>>>>> The other thing is that all tokens has a timeout and if
>> the
>> >> >>>>>>>>>>>> token is
>> >> >>>>>>>>>>>> invalid then the attacker can't proceed further.
>> >> >>>>>>>>>>>>
>> >> >>>>>>>>>>>> > Is Kerberos also the standard authentication protocol
>> for
>> >> >>>>>>>>>>>> Kubernetes
>> >> >>>>>>>>>>>> deployments?
>> >> >>>>>>>>>>>> Kerberos is an industry standard which is cloud/deployment
>> >> >>>>>>>>>>>> agnostic and it
>> >> >>>>>>>>>>>> can be used in any deployments including k8s.
>> >> >>>>>>>>>>>> The main intention is to use kerberos in k8s deployments
>> too
>> >> >>>>>>>>>>>> since we're
>> >> >>>>>>>>>>>> going this direction as well.
>> >> >>>>>>>>>>>> Please see how Spark does this:
>> >> >>>>>>>>>>>>
>> >> >>>>>>>>>>>>
>> >>
>> https://spark.apache.org/docs/latest/security.html#secure-interaction-with-kubernetes
>> >> >>>>>>>>>>>>
>> >> >>>>>>>>>>>> Last but not least the most important reason to add at
>> least
>> >> >>>>>>>>>>>> one strong
>> >> >>>>>>>>>>>> authentication is that we have users who has
>> >> >>>>>>>>>>>> hard requirements on this. They're doing security audits
>> and
>> >> if
>> >> >>>>>>>>>>>> they fail
>> >> >>>>>>>>>>>> then it's deal breaking.
>> >> >>>>>>>>>>>> That is why we have added kerberos at the first place.
>> >> >>>>>>>>>>>> Unfortunately we
>> >> >>>>>>>>>>>> can't name them in this public list, however
>> >> >>>>>>>>>>>> the customers who specifically asked for this were mainly
>> in
>> >> >>>>>>>>>>>> the banking
>> >> >>>>>>>>>>>> and telco sector.
>> >> >>>>>>>>>>>>
>> >> >>>>>>>>>>>> BR,
>> >> >>>>>>>>>>>> G
>> >> >>>>>>>>>>>>
>> >> >>>>>>>>>>>>
>> >> >>>>>>>>>>>> On Thu, Jun 3, 2021 at 9:20 AM Till Rohrmann <
>> >> >>>>>>>>>>>> [hidden email]> wrote:
>> >> >>>>>>>>>>>>
>> >> >>>>>>>>>>>> > Thanks for updating the document Márton. Why is it that
>> >> banks
>> >> >>>>>>>>>>>> will
>> >> >>>>>>>>>>>> > consider it more secure if Flink comes with Kerberos
>> >> >>>>>>>>>>>> authentication
>> >> >>>>>>>>>>>> > (assuming a properly secured setup)? I mean if an
>> attacker
>> >> >>>>>>>>>>>> can get access
>> >> >>>>>>>>>>>> > to one of the machines, then it should also be possible
>> to
>> >> >>>>>>>>>>>> obtain the right
>> >> >>>>>>>>>>>> > Kerberos token.
>> >> >>>>>>>>>>>> >
>> >> >>>>>>>>>>>> > I am not an authentication expert and that's why I
>> wanted
>> >> to
>> >> >>>>>>>>>>>> ask what are
>> >> >>>>>>>>>>>> > other authentication protocols other than Kerberos? Why
>> did
>> >> >>>>>>>>>>>> we select
>> >> >>>>>>>>>>>> > Kerberos and not any other authentication protocol?
>> Maybe
>> >> you
>> >> >>>>>>>>>>>> can list the
>> >> >>>>>>>>>>>> > pros and cons for the different protocols. Is Kerberos
>> also
>> >> >>>>>>>>>>>> the standard
>> >> >>>>>>>>>>>> > authentication protocol for Kubernetes deployments? If
>> not,
>> >> >>>>>>>>>>>> what would be
>> >> >>>>>>>>>>>> > the answer when deploying on K8s?
>> >> >>>>>>>>>>>> >
>> >> >>>>>>>>>>>> > Cheers,
>> >> >>>>>>>>>>>> > Till
>> >> >>>>>>>>>>>> >
>> >> >>>>>>>>>>>> > On Wed, Jun 2, 2021 at 12:07 PM Gabor Somogyi <
>> >> >>>>>>>>>>>> [hidden email]>
>> >> >>>>>>>>>>>> > wrote:
>> >> >>>>>>>>>>>> >
>> >> >>>>>>>>>>>> >> Hi team,
>> >> >>>>>>>>>>>> >>
>> >> >>>>>>>>>>>> >> Happy to be here and hope I can provide quality
>> additions
>> >> in
>> >> >>>>>>>>>>>> the future.
>> >> >>>>>>>>>>>> >>
>> >> >>>>>>>>>>>> >> Thank you all for helpful the suggestions!
>> >> >>>>>>>>>>>> >> Considering them the FLIP has been modified and the
>> work
>> >> >>>>>>>>>>>> continues on the
>> >> >>>>>>>>>>>> >> already existing Jira.
>> >> >>>>>>>>>>>> >>
>> >> >>>>>>>>>>>> >> BR,
>> >> >>>>>>>>>>>> >> G
>> >> >>>>>>>>>>>> >>
>> >> >>>>>>>>>>>> >>
>> >> >>>>>>>>>>>> >> On Wed, Jun 2, 2021 at 11:23 AM Márton Balassi <
>> >> >>>>>>>>>>>> [hidden email]>
>> >> >>>>>>>>>>>> >> wrote:
>> >> >>>>>>>>>>>> >>
>> >> >>>>>>>>>>>> >>> Thanks, Chesney - I totally missed that. Answered on
>> the
>> >> >>>>>>>>>>>> ticket too, let
>> >> >>>>>>>>>>>> >>> us continue there then.
>> >> >>>>>>>>>>>> >>>
>> >> >>>>>>>>>>>> >>> Till, I agree that we should keep this codepath as
>> slim
>> >> as
>> >> >>>>>>>>>>>> possible. It
>> >> >>>>>>>>>>>> >>> is an important design decision that we aim to keep
>> the
>> >> >>>>>>>>>>>> list of
>> >> >>>>>>>>>>>> >>> authentication protocols to a minimum. We believe that
>> >> this
>> >> >>>>>>>>>>>> should not be a
>> >> >>>>>>>>>>>> >>> primary concern of Flink and a trusted proxy service
>> (for
>> >> >>>>>>>>>>>> example Apache
>> >> >>>>>>>>>>>> >>> Knox) should be used to enable a multitude of enduser
>> >> >>>>>>>>>>>> authentication
>> >> >>>>>>>>>>>> >>> mechanisms. The bare minimum of authentication
>> mechanisms
>> >> >>>>>>>>>>>> to support
>> >> >>>>>>>>>>>> >>> consequently consist of a single strong authentication
>> >> >>>>>>>>>>>> protocol for which
>> >> >>>>>>>>>>>> >>> Kerberos is the enterprise solution and HTTP Basic
>> >> primary
>> >> >>>>>>>>>>>> for development
>> >> >>>>>>>>>>>> >>> and light-weight scenarios.
>> >> >>>>>>>>>>>> >>>
>> >> >>>>>>>>>>>> >>> Added the above wording to G's doc.
>> >> >>>>>>>>>>>> >>>
>> >> >>>>>>>>>>>> >>>
>> >> >>>>>>>>>>>>
>> >>
>> https://docs.google.com/document/d/1NMPeJ9H0G49TGy3AzTVVJVKmYC0okwOtqLTSPnGqzHw/edit
>> >> >>>>>>>>>>>> >>>
>> >> >>>>>>>>>>>> >>>
>> >> >>>>>>>>>>>> >>>
>> >> >>>>>>>>>>>> >>> On Tue, Jun 1, 2021 at 11:47 AM Chesnay Schepler <
>> >> >>>>>>>>>>>> [hidden email]>
>> >> >>>>>>>>>>>> >>> wrote:
>> >> >>>>>>>>>>>> >>>
>> >> >>>>>>>>>>>> >>>> There's a related effort:
>> >> >>>>>>>>>>>> >>>> https://issues.apache.org/jira/browse/FLINK-21108
>> >> >>>>>>>>>>>> >>>>
>> >> >>>>>>>>>>>> >>>> On 6/1/2021 10:14 AM, Till Rohrmann wrote:
>> >> >>>>>>>>>>>> >>>> > Hi Gabor, welcome to the Flink community!
>> >> >>>>>>>>>>>> >>>> >
>> >> >>>>>>>>>>>> >>>> > Thanks for sharing this proposal with the community
>> >> >>>>>>>>>>>> Márton. In
>> >> >>>>>>>>>>>> >>>> general, I
>> >> >>>>>>>>>>>> >>>> > agree that authentication is missing and that this
>> is
>> >> >>>>>>>>>>>> required for
>> >> >>>>>>>>>>>> >>>> using
>> >> >>>>>>>>>>>> >>>> > Flink within an enterprise. The thing I am
>> wondering
>> >> is
>> >> >>>>>>>>>>>> whether this
>> >> >>>>>>>>>>>> >>>> > feature strictly needs to be implemented inside of
>> >> Flink
>> >> >>>>>>>>>>>> or whether a
>> >> >>>>>>>>>>>> >>>> proxy
>> >> >>>>>>>>>>>> >>>> > setup could do the job? Have you considered this
>> >> option?
>> >> >>>>>>>>>>>> If yes, then
>> >> >>>>>>>>>>>> >>>> it
>> >> >>>>>>>>>>>> >>>> > would be good to list it under the point of
>> rejected
>> >> >>>>>>>>>>>> alternatives.
>> >> >>>>>>>>>>>> >>>> >
>> >> >>>>>>>>>>>> >>>> > I do see the benefit of implementing this feature
>> >> inside
>> >> >>>>>>>>>>>> of Flink if
>> >> >>>>>>>>>>>> >>>> many
>> >> >>>>>>>>>>>> >>>> > users need it. If not, then it might be easier for
>> the
>> >> >>>>>>>>>>>> project to not
>> >> >>>>>>>>>>>> >>>> > increase the surface area since it makes the
>> overall
>> >> >>>>>>>>>>>> maintenance
>> >> >>>>>>>>>>>> >>>> harder.
>> >> >>>>>>>>>>>> >>>> >
>> >> >>>>>>>>>>>> >>>> > Cheers,
>> >> >>>>>>>>>>>> >>>> > Till
>> >> >>>>>>>>>>>> >>>> >
>> >> >>>>>>>>>>>> >>>> > On Mon, May 31, 2021 at 4:57 PM Márton Balassi <
>> >> >>>>>>>>>>>> [hidden email]>
>> >> >>>>>>>>>>>> >>>> wrote:
>> >> >>>>>>>>>>>> >>>> >
>> >> >>>>>>>>>>>> >>>> >> Hi team,
>> >> >>>>>>>>>>>> >>>> >>
>> >> >>>>>>>>>>>> >>>> >> Firstly I would like to introduce Gabor or G [1]
>> for
>> >> >>>>>>>>>>>> short to the
>> >> >>>>>>>>>>>> >>>> >> community, he is a Spark committer who has
>> recently
>> >> >>>>>>>>>>>> transitioned to
>> >> >>>>>>>>>>>> >>>> the
>> >> >>>>>>>>>>>> >>>> >> Flink Engineering team at Cloudera and is looking
>> >> >>>>>>>>>>>> forward to
>> >> >>>>>>>>>>>> >>>> contributing
>> >> >>>>>>>>>>>> >>>> >> to Apache Flink. Previously G primarily focused on
>> >> >>>>>>>>>>>> Spark Streaming
>> >> >>>>>>>>>>>> >>>> and
>> >> >>>>>>>>>>>> >>>> >> security.
>> >> >>>>>>>>>>>> >>>> >>
>> >> >>>>>>>>>>>> >>>> >> Based on requests from our customers G has
>> >> implemented
>> >> >>>>>>>>>>>> Kerberos and
>> >> >>>>>>>>>>>> >>>> HTTP
>> >> >>>>>>>>>>>> >>>> >> Basic Authentication for the Flink Dashboard and
>> >> >>>>>>>>>>>> HistoryServer.
>> >> >>>>>>>>>>>> >>>> Previously
>> >> >>>>>>>>>>>> >>>> >> lacked an authentication story.
>> >> >>>>>>>>>>>> >>>> >>
>> >> >>>>>>>>>>>> >>>> >> We are looking to contribute this functionality
>> back
>> >> to
>> >> >>>>>>>>>>>> the
>> >> >>>>>>>>>>>> >>>> community, we
>> >> >>>>>>>>>>>> >>>> >> believe that given Flink's maturity there should
>> be a
>> >> >>>>>>>>>>>> common code
>> >> >>>>>>>>>>>> >>>> solution
>> >> >>>>>>>>>>>> >>>> >> for this general pattern.
>> >> >>>>>>>>>>>> >>>> >>
>> >> >>>>>>>>>>>> >>>> >> We are looking forward to your feedback on G's
>> >> design.
>> >> >>>>>>>>>>>> [2]
>> >> >>>>>>>>>>>> >>>> >>
>> >> >>>>>>>>>>>> >>>> >> [1] http://gaborsomogyi.com/
>> >> >>>>>>>>>>>> >>>> >> [2]
>> >> >>>>>>>>>>>> >>>> >>
>> >> >>>>>>>>>>>> >>>> >>
>> >> >>>>>>>>>>>> >>>>
>> >> >>>>>>>>>>>>
>> >>
>> https://docs.google.com/document/d/1NMPeJ9H0G49TGy3AzTVVJVKmYC0okwOtqLTSPnGqzHw/edit
>> >> >>>>>>>>>>>> >>>> >>
>> >> >>>>>>>>>>>> >>>>
>> >> >>>>>>>>>>>> >>>>
>> >> >>>>>>>>>>>>
>> >> >>>>>>>>>>>
>> >>
>> >
>> >
>> > --
>> >
>> > Konstantin Knauf
>> >
>> > https://twitter.com/snntrable
>> >
>> > https://github.com/knaufk
>> >
>>
>
>
> --
>
> Konstantin Knauf
>
> https://twitter.com/snntrable
>
> https://github.com/knaufk
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Dashboard/HistoryServer authentication

Till Rohrmann
I left some comments in the Google document. It would be great if
someone from the community with security experience could also take a look
at it. Maybe Eron you have an opinion on the topic.

Cheers,
Till

On Thu, Jun 17, 2021 at 6:57 PM Till Rohrmann <[hidden email]> wrote:

> Hi Gabor,
>
> I haven't found time to look into the updated FLIP yet. I'll try to do it
> asap.
>
> Cheers,
> Till
>
> On Wed, Jun 16, 2021 at 9:35 PM Konstantin Knauf <[hidden email]>
> wrote:
>
>> Hi Gabor,
>>
>> > However representing Kerberos as completely new feature is not true
>> because
>> it's already in since Flink makes authentication at least with HDFS and
>> Hbase through Kerberos.
>>
>> True, that is one way to look at it, but there are differences, too:
>> Control Plane vs Data Plane, Core vs Connectors.
>>
>> > Adding OIDC or OAuth2 has the exact same concerns what you've guys just
>> raised. Why exactly these? If you think this would be beneficial we can
>> discuss it in detail
>>
>> That's exactly my point. Once we start adding authx support, we will
>> sooner or later discuss other options besides Kerberos, too. A user who
>> would like to use OAuth can not easily use Kerberos, right?
>> That is one of the reasons I am skeptical about adding initial authx
>> support.
>>
>> > Related authorization you've mentioned it can be complicated over time.
>> Can
>> you show us an example? We've knowledge with couple of open source
>> components
>> but authorization was never a horror complex story. I personally have the
>> most experience with Spark which I think is quite simple and stable. Users
>> can be viewers/admins
>> and jobs started by others can't be modified. If you can share an example
>> over-complication we can discuss on facts.
>>
>> Authorization is a new aspect that needs to be considered for every
>> addition to the REST API. In the future users might ask for additional
>> roles (e.g. an editor), user-defined roles and you've already mentioned
>> job-level permissions yourself. And keep in mind that there might also be
>> larger additions in the future like the flink-sql-gateway. Contributions
>> like this become more expensive the more aspects we need to consider.
>>
>> In general, I believe, it is important that the community focuses its
>> efforts where we can generate the most value to the user and - personally -
>> I don't think there is much to gain by extending Flink's scope in that
>> direction. Of course, this is not black and white and there are other valid
>> opinions.
>>
>> Thanks,
>>
>> Konstantin
>>
>> On Wed, Jun 16, 2021 at 7:38 PM Gabor Somogyi <[hidden email]>
>> wrote:
>>
>>> Hi Konstantin,
>>>
>>> Thanks for the response. Related new feature introduction in case of
>>> Basic
>>> auth I tend to agree, anything else can be chosen.
>>>
>>> However representing Kerberos as completely new feature is not true
>>> because
>>> it's already in since Flink makes authentication at least with HDFS and
>>> Hbase through Kerberos.
>>> The main problem with the actual Kerberos implementation is that it
>>> contains several bugs and only partially implemented. Following your
>>> suggestion can we agree that we
>>> skip the Basic auth implementation and finish an already started Kerberos
>>> story by adding History Server and Job Dashboard authentication?
>>>
>>> Adding OIDC or OAuth2 has the exact same concerns what you've guys just
>>> raised. Why exactly these? If you think this would be beneficial we can
>>> discuss it in detail
>>> but as a side story it would be good to finish a halfway done Kerberos
>>> story.
>>>
>>> Related authorization you've mentioned it can be complicated over time.
>>> Can
>>> you show us an example? We've knowledge with couple of open source
>>> components
>>> but authorization was never a horror complex story. I personally have the
>>> most experience with Spark which I think is quite simple and stable.
>>> Users
>>> can be viewers/admins
>>> and jobs started by others can't be modified. If you can share an example
>>> over-complication we can discuss on facts.
>>>
>>> Thank you in advance!
>>>
>>> BR,
>>> G
>>>
>>>
>>> On Wed, Jun 16, 2021 at 5:42 PM Konstantin Knauf <[hidden email]>
>>> wrote:
>>>
>>> > Hi everyone,
>>> >
>>> > sorry for joining late and thanks for the insightful discussion.
>>> >
>>> > In general, I'd personally prefer not to increase the surface area of
>>> > Apache Flink unless there is a good reason. It seems we all agree that
>>> > authx is not part of the core value proposition of Apache Flink, so if
>>> we
>>> > can delegate this problem to a more specialized tool, I am in favor of
>>> > that. Apache Flink is already huge and a lot of work goes into
>>> maintenance,
>>> > so I personally have become more sensitive to this aspect over time.
>>> >
>>> > If we add support for Basic Auth and Kerberos now, users will sooner or
>>> > later ask for OIDC, LDAP, SAML,... I acknowledge that Kerberos is
>>> widely
>>> > used in the corporate, on-premises context, but isn't the focus moving
>>> more
>>> > towards more web-friendly standards like OIDC/OAuth 2.0? If we only
>>> want to
>>> > support a single protocol, there is an argument to be made that it
>>> should
>>> > be OIDC and Dex [1,2] as a bridge to everything else. Have OIDC or
>>> OAuth2
>>> > been considered instead of Kerberos? How do you see the market moving?
>>> But
>>> > as I said before, in my opinion we can generate more value by investing
>>> > into other areas of Apache Flink.
>>> >
>>> > Authorization also has the potential to become more fine-grained and
>>> > complex over time: you already mentioned restricting the actions that a
>>> > specific user can do in a cluster.
>>> >
>>> > Cheers,
>>> >
>>> > Konstantin
>>> >
>>> > [1] https://github.com/dexidp/dex
>>> > [2] https://github.com/dexidp/dex/issues/1903
>>> >
>>> >
>>> > On Wed, Jun 16, 2021 at 11:44 AM Gabor Somogyi <
>>> [hidden email]>
>>> > wrote:
>>> >
>>> >> Hi Till,
>>> >>
>>> >> Did you have the chance to take a look at the doc? Not yet seen any
>>> >> update.
>>> >>
>>> >> BR,
>>> >> G
>>> >>
>>> >>
>>> >> On Wed, Jun 9, 2021 at 1:43 PM Till Rohrmann <[hidden email]>
>>> >> wrote:
>>> >>
>>> >> > Thanks for the update Gabor. I'll take a look and respond in the
>>> >> document.
>>> >> >
>>> >> > Cheers,
>>> >> > Till
>>> >> >
>>> >> > On Wed, Jun 9, 2021 at 12:59 PM Gabor Somogyi <
>>> >> [hidden email]>
>>> >> > wrote:
>>> >> >
>>> >> >> Hi Till,
>>> >> >>
>>> >> >> Your proxy suggestion has been considered in-depth and updated the
>>> FLIP
>>> >> >> accordingly.
>>> >> >> We've considered 2 proxy implementation (Nginx and Squid) but
>>> according
>>> >> >> to our analysis and testing it's not suitable for the mentioned
>>> >> use-cases.
>>> >> >> Please take a look at the rejected alternatives for detailed
>>> >> explanation.
>>> >> >>
>>> >> >> Thanks for your time in advance!
>>> >> >>
>>> >> >> BR,
>>> >> >> G
>>> >> >>
>>> >> >>
>>> >> >> On Fri, Jun 4, 2021 at 3:31 PM Till Rohrmann <[hidden email]
>>> >
>>> >> >> wrote:
>>> >> >>
>>> >> >>> As I've said I am not a security expert and that's why I have to
>>> ask
>>> >> for
>>> >> >>> clarification, Gabor. You are saying that if we configure a
>>> >> truststore for
>>> >> >>> the REST endpoint with a single trusted certificate which has been
>>> >> >>> generated by the operator of the Flink cluster, then the attacker
>>> can
>>> >> >>> generate a new certificate, sign it and then talk to the Flink
>>> >> cluster if
>>> >> >>> he has access to the node on which the REST endpoint runs? My
>>> >> understanding
>>> >> >>> was that you need the corresponding private key which in my
>>> proposed
>>> >> setup
>>> >> >>> would be under the control of the operator as well (e.g. stored
>>> in a
>>> >> >>> keystore on the same machine but guarded by some secret). That way
>>> >> (if I am
>>> >> >>> not mistaken), only the entity which has access to the keystore is
>>> >> able to
>>> >> >>> talk to the Flink cluster.
>>> >> >>>
>>> >> >>> Maybe we are also getting our wires crossed here and are talking
>>> about
>>> >> >>> different things.
>>> >> >>>
>>> >> >>> Thanks for listing the pros and cons of Kerberos. Concerning what
>>> >> other
>>> >> >>> authentication mechanisms are used in the industry, I am not 100%
>>> >> sure.
>>> >> >>>
>>> >> >>> Cheers,
>>> >> >>> Till
>>> >> >>>
>>> >> >>> On Fri, Jun 4, 2021 at 11:09 AM Gabor Somogyi <
>>> >> [hidden email]>
>>> >> >>> wrote:
>>> >> >>>
>>> >> >>>> > I did not mean for the user to sign its own certificates but
>>> for
>>> >> the
>>> >> >>>> operator of the cluster. Once the user request hits the proxy, it
>>> >> should no
>>> >> >>>> longer be under his control. I think I do not fully understand
>>> yet
>>> >> why this
>>> >> >>>> would not work.
>>> >> >>>> I said it's not solving the authentication problem over any
>>> proxy.
>>> >> Even
>>> >> >>>> if the operator is signing the certificate one can have access
>>> to an
>>> >> >>>> internal node.
>>> >> >>>> Such case anybody can craft certificates which is accepted by the
>>> >> >>>> server. When it's accepted a bad guy can cancel jobs causing huge
>>> >> impacts.
>>> >> >>>>
>>> >> >>>> > Also, I am missing a bit the comparison of Kerberos to other
>>> >> >>>> authentication mechanisms and why they were rejected in favour of
>>> >> Kerberos.
>>> >> >>>> PROS:
>>> >> >>>> * Since it's not depending on cloud provider and/or k8s or
>>> bare-metal
>>> >> >>>> etc. deployment it's the biggest plus
>>> >> >>>> * Centralized with tools and no need to write tons of tools
>>> around
>>> >> >>>> * There are clients/tools on almost all OS-es and several
>>> languages
>>> >> >>>> * Super huge users are using it for years in production w/o huge
>>> >> issues
>>> >> >>>> * Provides cross-realm trust possibility amongst other features
>>> >> >>>> * Several open source components using it which could increase
>>> >> >>>> compatibility
>>> >> >>>>
>>> >> >>>> CONS:
>>> >> >>>> * Not everybody using kerberos
>>> >> >>>> * It would increase the code footprint but this is true for many
>>> >> >>>> features (as a side note I'm here to maintain it)
>>> >> >>>>
>>> >> >>>> Feel free to add your points because it only represents a single
>>> >> >>>> viewpoint.
>>> >> >>>> Also if you have any better option for strong authentication
>>> please
>>> >> >>>> share it and we can consider the pros/cons here.
>>> >> >>>>
>>> >> >>>> BR,
>>> >> >>>> G
>>> >> >>>>
>>> >> >>>>
>>> >> >>>> On Fri, Jun 4, 2021 at 10:32 AM Till Rohrmann <
>>> [hidden email]>
>>> >> >>>> wrote:
>>> >> >>>>
>>> >> >>>>> I did not mean for the user to sign its own certificates but
>>> for the
>>> >> >>>>> operator of the cluster. Once the user request hits the proxy,
>>> it
>>> >> should no
>>> >> >>>>> longer be under his control. I think I do not fully understand
>>> yet
>>> >> why this
>>> >> >>>>> would not work.
>>> >> >>>>>
>>> >> >>>>> What I would like to avoid is to add more complexity into Flink
>>> if
>>> >> >>>>> there is an easy solution which fulfills the requirements.
>>> That's
>>> >> why I
>>> >> >>>>> would like to exercise thoroughly through the different
>>> >> alternatives. Also,
>>> >> >>>>> I am missing a bit the comparison of Kerberos to other
>>> >> authentication
>>> >> >>>>> mechanisms and why they were rejected in favour of Kerberos.
>>> >> >>>>>
>>> >> >>>>> Cheers,
>>> >> >>>>> Till
>>> >> >>>>>
>>> >> >>>>> On Fri, Jun 4, 2021 at 10:26 AM Gyula Fóra <[hidden email]>
>>> >> wrote:
>>> >> >>>>>
>>> >> >>>>>> Hi!
>>> >> >>>>>>
>>> >> >>>>>> I think there might be possible alternatives but it seems
>>> Kerberos
>>> >> on
>>> >> >>>>>> the rest endpoint ticks all the right boxes and provides a
>>> super
>>> >> clean and
>>> >> >>>>>> simple solution for strong authentication.
>>> >> >>>>>>
>>> >> >>>>>> I wouldn’t even consider sidecar proxies etc if we can solve
>>> it in
>>> >> >>>>>> such a simple way as proposed by G.
>>> >> >>>>>>
>>> >> >>>>>> Cheers
>>> >> >>>>>> Gyula
>>> >> >>>>>>
>>> >> >>>>>> On Fri, 4 Jun 2021 at 10:03, Till Rohrmann <
>>> [hidden email]>
>>> >> >>>>>> wrote:
>>> >> >>>>>>
>>> >> >>>>>>> I am not saying that we shouldn't add a strong authentication
>>> >> >>>>>>> mechanism if there are good reasons for it. I primarily would
>>> >> like to
>>> >> >>>>>>> understand the context a bit better in order to give qualified
>>> >> feedback and
>>> >> >>>>>>> come to a good decision. In order to do this, I have the
>>> feeling
>>> >> that we
>>> >> >>>>>>> haven't fully considered all available options which are on
>>> the
>>> >> table, tbh.
>>> >> >>>>>>>
>>> >> >>>>>>> Does the problem of certificate expiry also apply for
>>> self-signed
>>> >> >>>>>>> certificates? If yes, then this should then also be a problem
>>> for
>>> >> the
>>> >> >>>>>>> internal encryption of Flink's communication. If not, then one
>>> >> could use
>>> >> >>>>>>> self-signed certificates with a longer validity to solve the
>>> >> mentioned
>>> >> >>>>>>> issue.
>>> >> >>>>>>>
>>> >> >>>>>>> I think you can set up Flink in such a way that you don't
>>> have to
>>> >> >>>>>>> handle all the different certificates. For example, you could
>>> >> deploy Flink
>>> >> >>>>>>> with a "sidecar proxy" which is responsible for the
>>> >> authentication using an
>>> >> >>>>>>> arbitrary method (e.g. Kerberos) and then bind the REST
>>> endpoint
>>> >> to a local
>>> >> >>>>>>> network interface. That way, the REST endpoint would only be
>>> >> available
>>> >> >>>>>>> through the sidecar proxy. Additionally, one could enable SSL
>>> for
>>> >> this
>>> >> >>>>>>> communication. Would this be a solution for the problem?
>>> >> >>>>>>>
>>> >> >>>>>>> Cheers,
>>> >> >>>>>>> Till
>>> >> >>>>>>>
>>> >> >>>>>>> On Thu, Jun 3, 2021 at 10:46 PM Márton Balassi <
>>> >> >>>>>>> [hidden email]> wrote:
>>> >> >>>>>>>
>>> >> >>>>>>>> That is an interesting idea, Till.
>>> >> >>>>>>>>
>>> >> >>>>>>>> The main issue with it is that TLS certificates have an
>>> >> expiration
>>> >> >>>>>>>> time, usually they get approved for a couple years. Forcing
>>> our
>>> >> users to
>>> >> >>>>>>>> restart jobs to reprovision TLS certificates would be weird
>>> when
>>> >> we could
>>> >> >>>>>>>> just implement a single proper strong authentication
>>> mechanism
>>> >> instead in a
>>> >> >>>>>>>> couple hundred lines of code. :-)
>>> >> >>>>>>>>
>>> >> >>>>>>>> In many cases it is also impractical to go the TLS mutual
>>> route,
>>> >> >>>>>>>> because the Flink Dashboard can end up on any node in the
>>> >> k8s/Yarn cluster
>>> >> >>>>>>>> which means that we need a certificate per node (due to the
>>> >> mutual auth),
>>> >> >>>>>>>> but if we also want to protect the private key of these from
>>> >> users
>>> >> >>>>>>>> accidentally or intentionally leaking them then we need this
>>> per
>>> >> user. As
>>> >> >>>>>>>> in we end up managing user*machine number certificates and
>>> >> having to renew
>>> >> >>>>>>>> them periodically, which albeit automatable is unfortunately
>>> not
>>> >> yet
>>> >> >>>>>>>> automated in all large organizations.
>>> >> >>>>>>>>
>>> >> >>>>>>>> I fully agree that TLS certificate mutual authentication has
>>> its
>>> >> >>>>>>>> nice properties, especially at very large (multiple thousand
>>> >> node) clusters
>>> >> >>>>>>>> - but it has its own challenges too. Thanks for bringing it
>>> up.
>>> >> >>>>>>>>
>>> >> >>>>>>>> Happy to have this added to the rejected alternative list so
>>> that
>>> >> >>>>>>>> we have the full picture documented.
>>> >> >>>>>>>>
>>> >> >>>>>>>> On Thu, Jun 3, 2021 at 5:52 PM Till Rohrmann <
>>> >> [hidden email]>
>>> >> >>>>>>>> wrote:
>>> >> >>>>>>>>
>>> >> >>>>>>>>> I guess the idea would then be to let the proxy do the
>>> >> >>>>>>>>> authentication job and only forward the request via an SSL
>>> >> mutually
>>> >> >>>>>>>>> encrypted connection to the Flink cluster. Would this be
>>> >> possible? The
>>> >> >>>>>>>>> beauty of this setup is in my opinion that this setup should
>>> >> work with all
>>> >> >>>>>>>>> kinds of authentication mechanisms.
>>> >> >>>>>>>>>
>>> >> >>>>>>>>> Cheers,
>>> >> >>>>>>>>> Till
>>> >> >>>>>>>>>
>>> >> >>>>>>>>> On Thu, Jun 3, 2021 at 3:12 PM Gabor Somogyi <
>>> >> >>>>>>>>> [hidden email]> wrote:
>>> >> >>>>>>>>>
>>> >> >>>>>>>>>> Thanks for giving options to fulfil the need.
>>> >> >>>>>>>>>>
>>> >> >>>>>>>>>> Users are looking for a solution where users can be
>>> identified
>>> >> on
>>> >> >>>>>>>>>> the whole cluster and restrict access to resources/actions.
>>> >> >>>>>>>>>> A good example for such an action is cancelling other users
>>> >> >>>>>>>>>> running jobs.
>>> >> >>>>>>>>>>
>>> >> >>>>>>>>>> * SSL does provide mutual authentication but when
>>> >> authentication
>>> >> >>>>>>>>>> passed there is no user based on restrictions can be made.
>>> >> >>>>>>>>>> * The less problematic part is that generating/maintaining
>>> >> short
>>> >> >>>>>>>>>> time valid certificates would be a hard (that's the reason
>>> KDC
>>> >> like servers
>>> >> >>>>>>>>>> exist).
>>> >> >>>>>>>>>> Having long time valid certificates would widen the attack
>>> >> >>>>>>>>>> surface but since the first concern is there this is just a
>>> >> cosmetic issue.
>>> >> >>>>>>>>>>
>>> >> >>>>>>>>>> All in all using TLS certificates is not sufficient in
>>> these
>>> >> >>>>>>>>>> environments unfortunately.
>>> >> >>>>>>>>>>
>>> >> >>>>>>>>>> BR,
>>> >> >>>>>>>>>> G
>>> >> >>>>>>>>>>
>>> >> >>>>>>>>>>
>>> >> >>>>>>>>>> On Thu, Jun 3, 2021 at 12:49 PM Till Rohrmann <
>>> >> >>>>>>>>>> [hidden email]> wrote:
>>> >> >>>>>>>>>>
>>> >> >>>>>>>>>>> Thanks for the information Gabor. If it is about securing
>>> the
>>> >> >>>>>>>>>>> communication between the REST client and the REST server,
>>> >> then Flink
>>> >> >>>>>>>>>>> already supports enabling mutual SSL authentication [1].
>>> >> Would this be
>>> >> >>>>>>>>>>> enough to secure the communication and to pass an audit?
>>> >> >>>>>>>>>>>
>>> >> >>>>>>>>>>> [1]
>>> >> >>>>>>>>>>>
>>> >>
>>> https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/security/security-ssl/#external--rest-connectivity
>>> >> >>>>>>>>>>>
>>> >> >>>>>>>>>>> Cheers,
>>> >> >>>>>>>>>>> Till
>>> >> >>>>>>>>>>>
>>> >> >>>>>>>>>>> On Thu, Jun 3, 2021 at 10:33 AM Gabor Somogyi <
>>> >> >>>>>>>>>>> [hidden email]> wrote:
>>> >> >>>>>>>>>>>
>>> >> >>>>>>>>>>>> Hi Till,
>>> >> >>>>>>>>>>>>
>>> >> >>>>>>>>>>>> Since I'm working in security area 10+ years let me
>>> share my
>>> >> >>>>>>>>>>>> thought.
>>> >> >>>>>>>>>>>> I would like to emphasise there are experts better than
>>> me
>>> >> but
>>> >> >>>>>>>>>>>> I have some
>>> >> >>>>>>>>>>>> basics.
>>> >> >>>>>>>>>>>> The discussion is open and not trying to tell alone
>>> things...
>>> >> >>>>>>>>>>>>
>>> >> >>>>>>>>>>>> > I mean if an attacker can get access to one of the
>>> >> machines,
>>> >> >>>>>>>>>>>> then it
>>> >> >>>>>>>>>>>> should also be possible to obtain the right Kerberos
>>> token.
>>> >> >>>>>>>>>>>> Not necessarily. For example if one gets access to a
>>> specific
>>> >> >>>>>>>>>>>> user's
>>> >> >>>>>>>>>>>> credentials then it's not possible to compromise other
>>> user's
>>> >> >>>>>>>>>>>> jobs, data,
>>> >> >>>>>>>>>>>> etc...
>>> >> >>>>>>>>>>>> Security is like an onion, the more layers has been
>>> added the
>>> >> >>>>>>>>>>>> more time an
>>> >> >>>>>>>>>>>> attacker needs to proceed.
>>> >> >>>>>>>>>>>> At the end of the day if one is in, then most probably
>>> can
>>> >> find
>>> >> >>>>>>>>>>>> the way but
>>> >> >>>>>>>>>>>> this time is normally enough to sysadmins or security
>>> >> experts to
>>> >> >>>>>>>>>>>> close down the system and minimize the damage.
>>> >> >>>>>>>>>>>>
>>> >> >>>>>>>>>>>> The other thing is that all tokens has a timeout and if
>>> the
>>> >> >>>>>>>>>>>> token is
>>> >> >>>>>>>>>>>> invalid then the attacker can't proceed further.
>>> >> >>>>>>>>>>>>
>>> >> >>>>>>>>>>>> > Is Kerberos also the standard authentication protocol
>>> for
>>> >> >>>>>>>>>>>> Kubernetes
>>> >> >>>>>>>>>>>> deployments?
>>> >> >>>>>>>>>>>> Kerberos is an industry standard which is
>>> cloud/deployment
>>> >> >>>>>>>>>>>> agnostic and it
>>> >> >>>>>>>>>>>> can be used in any deployments including k8s.
>>> >> >>>>>>>>>>>> The main intention is to use kerberos in k8s deployments
>>> too
>>> >> >>>>>>>>>>>> since we're
>>> >> >>>>>>>>>>>> going this direction as well.
>>> >> >>>>>>>>>>>> Please see how Spark does this:
>>> >> >>>>>>>>>>>>
>>> >> >>>>>>>>>>>>
>>> >>
>>> https://spark.apache.org/docs/latest/security.html#secure-interaction-with-kubernetes
>>> >> >>>>>>>>>>>>
>>> >> >>>>>>>>>>>> Last but not least the most important reason to add at
>>> least
>>> >> >>>>>>>>>>>> one strong
>>> >> >>>>>>>>>>>> authentication is that we have users who has
>>> >> >>>>>>>>>>>> hard requirements on this. They're doing security audits
>>> and
>>> >> if
>>> >> >>>>>>>>>>>> they fail
>>> >> >>>>>>>>>>>> then it's deal breaking.
>>> >> >>>>>>>>>>>> That is why we have added kerberos at the first place.
>>> >> >>>>>>>>>>>> Unfortunately we
>>> >> >>>>>>>>>>>> can't name them in this public list, however
>>> >> >>>>>>>>>>>> the customers who specifically asked for this were
>>> mainly in
>>> >> >>>>>>>>>>>> the banking
>>> >> >>>>>>>>>>>> and telco sector.
>>> >> >>>>>>>>>>>>
>>> >> >>>>>>>>>>>> BR,
>>> >> >>>>>>>>>>>> G
>>> >> >>>>>>>>>>>>
>>> >> >>>>>>>>>>>>
>>> >> >>>>>>>>>>>> On Thu, Jun 3, 2021 at 9:20 AM Till Rohrmann <
>>> >> >>>>>>>>>>>> [hidden email]> wrote:
>>> >> >>>>>>>>>>>>
>>> >> >>>>>>>>>>>> > Thanks for updating the document Márton. Why is it that
>>> >> banks
>>> >> >>>>>>>>>>>> will
>>> >> >>>>>>>>>>>> > consider it more secure if Flink comes with Kerberos
>>> >> >>>>>>>>>>>> authentication
>>> >> >>>>>>>>>>>> > (assuming a properly secured setup)? I mean if an
>>> attacker
>>> >> >>>>>>>>>>>> can get access
>>> >> >>>>>>>>>>>> > to one of the machines, then it should also be
>>> possible to
>>> >> >>>>>>>>>>>> obtain the right
>>> >> >>>>>>>>>>>> > Kerberos token.
>>> >> >>>>>>>>>>>> >
>>> >> >>>>>>>>>>>> > I am not an authentication expert and that's why I
>>> wanted
>>> >> to
>>> >> >>>>>>>>>>>> ask what are
>>> >> >>>>>>>>>>>> > other authentication protocols other than Kerberos?
>>> Why did
>>> >> >>>>>>>>>>>> we select
>>> >> >>>>>>>>>>>> > Kerberos and not any other authentication protocol?
>>> Maybe
>>> >> you
>>> >> >>>>>>>>>>>> can list the
>>> >> >>>>>>>>>>>> > pros and cons for the different protocols. Is Kerberos
>>> also
>>> >> >>>>>>>>>>>> the standard
>>> >> >>>>>>>>>>>> > authentication protocol for Kubernetes deployments? If
>>> not,
>>> >> >>>>>>>>>>>> what would be
>>> >> >>>>>>>>>>>> > the answer when deploying on K8s?
>>> >> >>>>>>>>>>>> >
>>> >> >>>>>>>>>>>> > Cheers,
>>> >> >>>>>>>>>>>> > Till
>>> >> >>>>>>>>>>>> >
>>> >> >>>>>>>>>>>> > On Wed, Jun 2, 2021 at 12:07 PM Gabor Somogyi <
>>> >> >>>>>>>>>>>> [hidden email]>
>>> >> >>>>>>>>>>>> > wrote:
>>> >> >>>>>>>>>>>> >
>>> >> >>>>>>>>>>>> >> Hi team,
>>> >> >>>>>>>>>>>> >>
>>> >> >>>>>>>>>>>> >> Happy to be here and hope I can provide quality
>>> additions
>>> >> in
>>> >> >>>>>>>>>>>> the future.
>>> >> >>>>>>>>>>>> >>
>>> >> >>>>>>>>>>>> >> Thank you all for helpful the suggestions!
>>> >> >>>>>>>>>>>> >> Considering them the FLIP has been modified and the
>>> work
>>> >> >>>>>>>>>>>> continues on the
>>> >> >>>>>>>>>>>> >> already existing Jira.
>>> >> >>>>>>>>>>>> >>
>>> >> >>>>>>>>>>>> >> BR,
>>> >> >>>>>>>>>>>> >> G
>>> >> >>>>>>>>>>>> >>
>>> >> >>>>>>>>>>>> >>
>>> >> >>>>>>>>>>>> >> On Wed, Jun 2, 2021 at 11:23 AM Márton Balassi <
>>> >> >>>>>>>>>>>> [hidden email]>
>>> >> >>>>>>>>>>>> >> wrote:
>>> >> >>>>>>>>>>>> >>
>>> >> >>>>>>>>>>>> >>> Thanks, Chesney - I totally missed that. Answered on
>>> the
>>> >> >>>>>>>>>>>> ticket too, let
>>> >> >>>>>>>>>>>> >>> us continue there then.
>>> >> >>>>>>>>>>>> >>>
>>> >> >>>>>>>>>>>> >>> Till, I agree that we should keep this codepath as
>>> slim
>>> >> as
>>> >> >>>>>>>>>>>> possible. It
>>> >> >>>>>>>>>>>> >>> is an important design decision that we aim to keep
>>> the
>>> >> >>>>>>>>>>>> list of
>>> >> >>>>>>>>>>>> >>> authentication protocols to a minimum. We believe
>>> that
>>> >> this
>>> >> >>>>>>>>>>>> should not be a
>>> >> >>>>>>>>>>>> >>> primary concern of Flink and a trusted proxy service
>>> (for
>>> >> >>>>>>>>>>>> example Apache
>>> >> >>>>>>>>>>>> >>> Knox) should be used to enable a multitude of enduser
>>> >> >>>>>>>>>>>> authentication
>>> >> >>>>>>>>>>>> >>> mechanisms. The bare minimum of authentication
>>> mechanisms
>>> >> >>>>>>>>>>>> to support
>>> >> >>>>>>>>>>>> >>> consequently consist of a single strong
>>> authentication
>>> >> >>>>>>>>>>>> protocol for which
>>> >> >>>>>>>>>>>> >>> Kerberos is the enterprise solution and HTTP Basic
>>> >> primary
>>> >> >>>>>>>>>>>> for development
>>> >> >>>>>>>>>>>> >>> and light-weight scenarios.
>>> >> >>>>>>>>>>>> >>>
>>> >> >>>>>>>>>>>> >>> Added the above wording to G's doc.
>>> >> >>>>>>>>>>>> >>>
>>> >> >>>>>>>>>>>> >>>
>>> >> >>>>>>>>>>>>
>>> >>
>>> https://docs.google.com/document/d/1NMPeJ9H0G49TGy3AzTVVJVKmYC0okwOtqLTSPnGqzHw/edit
>>> >> >>>>>>>>>>>> >>>
>>> >> >>>>>>>>>>>> >>>
>>> >> >>>>>>>>>>>> >>>
>>> >> >>>>>>>>>>>> >>> On Tue, Jun 1, 2021 at 11:47 AM Chesnay Schepler <
>>> >> >>>>>>>>>>>> [hidden email]>
>>> >> >>>>>>>>>>>> >>> wrote:
>>> >> >>>>>>>>>>>> >>>
>>> >> >>>>>>>>>>>> >>>> There's a related effort:
>>> >> >>>>>>>>>>>> >>>> https://issues.apache.org/jira/browse/FLINK-21108
>>> >> >>>>>>>>>>>> >>>>
>>> >> >>>>>>>>>>>> >>>> On 6/1/2021 10:14 AM, Till Rohrmann wrote:
>>> >> >>>>>>>>>>>> >>>> > Hi Gabor, welcome to the Flink community!
>>> >> >>>>>>>>>>>> >>>> >
>>> >> >>>>>>>>>>>> >>>> > Thanks for sharing this proposal with the
>>> community
>>> >> >>>>>>>>>>>> Márton. In
>>> >> >>>>>>>>>>>> >>>> general, I
>>> >> >>>>>>>>>>>> >>>> > agree that authentication is missing and that
>>> this is
>>> >> >>>>>>>>>>>> required for
>>> >> >>>>>>>>>>>> >>>> using
>>> >> >>>>>>>>>>>> >>>> > Flink within an enterprise. The thing I am
>>> wondering
>>> >> is
>>> >> >>>>>>>>>>>> whether this
>>> >> >>>>>>>>>>>> >>>> > feature strictly needs to be implemented inside of
>>> >> Flink
>>> >> >>>>>>>>>>>> or whether a
>>> >> >>>>>>>>>>>> >>>> proxy
>>> >> >>>>>>>>>>>> >>>> > setup could do the job? Have you considered this
>>> >> option?
>>> >> >>>>>>>>>>>> If yes, then
>>> >> >>>>>>>>>>>> >>>> it
>>> >> >>>>>>>>>>>> >>>> > would be good to list it under the point of
>>> rejected
>>> >> >>>>>>>>>>>> alternatives.
>>> >> >>>>>>>>>>>> >>>> >
>>> >> >>>>>>>>>>>> >>>> > I do see the benefit of implementing this feature
>>> >> inside
>>> >> >>>>>>>>>>>> of Flink if
>>> >> >>>>>>>>>>>> >>>> many
>>> >> >>>>>>>>>>>> >>>> > users need it. If not, then it might be easier
>>> for the
>>> >> >>>>>>>>>>>> project to not
>>> >> >>>>>>>>>>>> >>>> > increase the surface area since it makes the
>>> overall
>>> >> >>>>>>>>>>>> maintenance
>>> >> >>>>>>>>>>>> >>>> harder.
>>> >> >>>>>>>>>>>> >>>> >
>>> >> >>>>>>>>>>>> >>>> > Cheers,
>>> >> >>>>>>>>>>>> >>>> > Till
>>> >> >>>>>>>>>>>> >>>> >
>>> >> >>>>>>>>>>>> >>>> > On Mon, May 31, 2021 at 4:57 PM Márton Balassi <
>>> >> >>>>>>>>>>>> [hidden email]>
>>> >> >>>>>>>>>>>> >>>> wrote:
>>> >> >>>>>>>>>>>> >>>> >
>>> >> >>>>>>>>>>>> >>>> >> Hi team,
>>> >> >>>>>>>>>>>> >>>> >>
>>> >> >>>>>>>>>>>> >>>> >> Firstly I would like to introduce Gabor or G [1]
>>> for
>>> >> >>>>>>>>>>>> short to the
>>> >> >>>>>>>>>>>> >>>> >> community, he is a Spark committer who has
>>> recently
>>> >> >>>>>>>>>>>> transitioned to
>>> >> >>>>>>>>>>>> >>>> the
>>> >> >>>>>>>>>>>> >>>> >> Flink Engineering team at Cloudera and is looking
>>> >> >>>>>>>>>>>> forward to
>>> >> >>>>>>>>>>>> >>>> contributing
>>> >> >>>>>>>>>>>> >>>> >> to Apache Flink. Previously G primarily focused
>>> on
>>> >> >>>>>>>>>>>> Spark Streaming
>>> >> >>>>>>>>>>>> >>>> and
>>> >> >>>>>>>>>>>> >>>> >> security.
>>> >> >>>>>>>>>>>> >>>> >>
>>> >> >>>>>>>>>>>> >>>> >> Based on requests from our customers G has
>>> >> implemented
>>> >> >>>>>>>>>>>> Kerberos and
>>> >> >>>>>>>>>>>> >>>> HTTP
>>> >> >>>>>>>>>>>> >>>> >> Basic Authentication for the Flink Dashboard and
>>> >> >>>>>>>>>>>> HistoryServer.
>>> >> >>>>>>>>>>>> >>>> Previously
>>> >> >>>>>>>>>>>> >>>> >> lacked an authentication story.
>>> >> >>>>>>>>>>>> >>>> >>
>>> >> >>>>>>>>>>>> >>>> >> We are looking to contribute this functionality
>>> back
>>> >> to
>>> >> >>>>>>>>>>>> the
>>> >> >>>>>>>>>>>> >>>> community, we
>>> >> >>>>>>>>>>>> >>>> >> believe that given Flink's maturity there should
>>> be a
>>> >> >>>>>>>>>>>> common code
>>> >> >>>>>>>>>>>> >>>> solution
>>> >> >>>>>>>>>>>> >>>> >> for this general pattern.
>>> >> >>>>>>>>>>>> >>>> >>
>>> >> >>>>>>>>>>>> >>>> >> We are looking forward to your feedback on G's
>>> >> design.
>>> >> >>>>>>>>>>>> [2]
>>> >> >>>>>>>>>>>> >>>> >>
>>> >> >>>>>>>>>>>> >>>> >> [1] http://gaborsomogyi.com/
>>> >> >>>>>>>>>>>> >>>> >> [2]
>>> >> >>>>>>>>>>>> >>>> >>
>>> >> >>>>>>>>>>>> >>>> >>
>>> >> >>>>>>>>>>>> >>>>
>>> >> >>>>>>>>>>>>
>>> >>
>>> https://docs.google.com/document/d/1NMPeJ9H0G49TGy3AzTVVJVKmYC0okwOtqLTSPnGqzHw/edit
>>> >> >>>>>>>>>>>> >>>> >>
>>> >> >>>>>>>>>>>> >>>>
>>> >> >>>>>>>>>>>> >>>>
>>> >> >>>>>>>>>>>>
>>> >> >>>>>>>>>>>
>>> >>
>>> >
>>> >
>>> > --
>>> >
>>> > Konstantin Knauf
>>> >
>>> > https://twitter.com/snntrable
>>> >
>>> > https://github.com/knaufk
>>> >
>>>
>>
>>
>> --
>>
>> Konstantin Knauf
>>
>> https://twitter.com/snntrable
>>
>> https://github.com/knaufk
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Dashboard/HistoryServer authentication

Austin Cawley-Edwards-2
Hi all,

Sorry to be joining the conversation late. I'm also on the side of
Konstantin, generally, in that this seems to not be a core goal of Flink as
a project and adds a maintenance burden.

Would another con of Kerberos be that is likely a fading project in terms
of network security? (serious question, please correct me if there is
reason to believe it is gaining adoption)

The point about Kerberos being independent of infrastructure is a good one
but is something that is also solved by modern sidecar proxies + service
meshes that can run across Kubernetes and bare-metal. These solutions also
handle certificate provisioning, rotation, etc. in addition to higher-level
authorization policies. Some examples of projects with this "universal
infrastructure support" are Kuma[1] (CNCF Sandbox, I'm a maintainer) and
Istio[2] (Google).

Wondering out loud: has anyone tried to run Flink on top of cilium[3],
which also provides zero-trust networking at the kernel level without
needing to instrument applications? This currently only runs on Kubernetes
on Linux, so that's a major limitation, but solves many of the request
forging concerns at all levels.

Thanks,
Austin

[1]: https://kuma.io/docs/1.1.6/quickstart/universal/
[2]: https://istio.io/latest/docs/setup/install/virtual-machine/
[3]: https://cilium.io/

On Thu, Jun 17, 2021 at 1:50 PM Till Rohrmann <[hidden email]> wrote:

> I left some comments in the Google document. It would be great if
> someone from the community with security experience could also take a look
> at it. Maybe Eron you have an opinion on the topic.
>
> Cheers,
> Till
>
> On Thu, Jun 17, 2021 at 6:57 PM Till Rohrmann <[hidden email]>
> wrote:
>
> > Hi Gabor,
> >
> > I haven't found time to look into the updated FLIP yet. I'll try to do it
> > asap.
> >
> > Cheers,
> > Till
> >
> > On Wed, Jun 16, 2021 at 9:35 PM Konstantin Knauf <[hidden email]>
> > wrote:
> >
> >> Hi Gabor,
> >>
> >> > However representing Kerberos as completely new feature is not true
> >> because
> >> it's already in since Flink makes authentication at least with HDFS and
> >> Hbase through Kerberos.
> >>
> >> True, that is one way to look at it, but there are differences, too:
> >> Control Plane vs Data Plane, Core vs Connectors.
> >>
> >> > Adding OIDC or OAuth2 has the exact same concerns what you've guys
> just
> >> raised. Why exactly these? If you think this would be beneficial we can
> >> discuss it in detail
> >>
> >> That's exactly my point. Once we start adding authx support, we will
> >> sooner or later discuss other options besides Kerberos, too. A user who
> >> would like to use OAuth can not easily use Kerberos, right?
> >> That is one of the reasons I am skeptical about adding initial authx
> >> support.
> >>
> >> > Related authorization you've mentioned it can be complicated over
> time.
> >> Can
> >> you show us an example? We've knowledge with couple of open source
> >> components
> >> but authorization was never a horror complex story. I personally have
> the
> >> most experience with Spark which I think is quite simple and stable.
> Users
> >> can be viewers/admins
> >> and jobs started by others can't be modified. If you can share an
> example
> >> over-complication we can discuss on facts.
> >>
> >> Authorization is a new aspect that needs to be considered for every
> >> addition to the REST API. In the future users might ask for additional
> >> roles (e.g. an editor), user-defined roles and you've already mentioned
> >> job-level permissions yourself. And keep in mind that there might also
> be
> >> larger additions in the future like the flink-sql-gateway. Contributions
> >> like this become more expensive the more aspects we need to consider.
> >>
> >> In general, I believe, it is important that the community focuses its
> >> efforts where we can generate the most value to the user and -
> personally -
> >> I don't think there is much to gain by extending Flink's scope in that
> >> direction. Of course, this is not black and white and there are other
> valid
> >> opinions.
> >>
> >> Thanks,
> >>
> >> Konstantin
> >>
> >> On Wed, Jun 16, 2021 at 7:38 PM Gabor Somogyi <
> [hidden email]>
> >> wrote:
> >>
> >>> Hi Konstantin,
> >>>
> >>> Thanks for the response. Related new feature introduction in case of
> >>> Basic
> >>> auth I tend to agree, anything else can be chosen.
> >>>
> >>> However representing Kerberos as completely new feature is not true
> >>> because
> >>> it's already in since Flink makes authentication at least with HDFS and
> >>> Hbase through Kerberos.
> >>> The main problem with the actual Kerberos implementation is that it
> >>> contains several bugs and only partially implemented. Following your
> >>> suggestion can we agree that we
> >>> skip the Basic auth implementation and finish an already started
> Kerberos
> >>> story by adding History Server and Job Dashboard authentication?
> >>>
> >>> Adding OIDC or OAuth2 has the exact same concerns what you've guys just
> >>> raised. Why exactly these? If you think this would be beneficial we can
> >>> discuss it in detail
> >>> but as a side story it would be good to finish a halfway done Kerberos
> >>> story.
> >>>
> >>> Related authorization you've mentioned it can be complicated over time.
> >>> Can
> >>> you show us an example? We've knowledge with couple of open source
> >>> components
> >>> but authorization was never a horror complex story. I personally have
> the
> >>> most experience with Spark which I think is quite simple and stable.
> >>> Users
> >>> can be viewers/admins
> >>> and jobs started by others can't be modified. If you can share an
> example
> >>> over-complication we can discuss on facts.
> >>>
> >>> Thank you in advance!
> >>>
> >>> BR,
> >>> G
> >>>
> >>>
> >>> On Wed, Jun 16, 2021 at 5:42 PM Konstantin Knauf <[hidden email]>
> >>> wrote:
> >>>
> >>> > Hi everyone,
> >>> >
> >>> > sorry for joining late and thanks for the insightful discussion.
> >>> >
> >>> > In general, I'd personally prefer not to increase the surface area of
> >>> > Apache Flink unless there is a good reason. It seems we all agree
> that
> >>> > authx is not part of the core value proposition of Apache Flink, so
> if
> >>> we
> >>> > can delegate this problem to a more specialized tool, I am in favor
> of
> >>> > that. Apache Flink is already huge and a lot of work goes into
> >>> maintenance,
> >>> > so I personally have become more sensitive to this aspect over time.
> >>> >
> >>> > If we add support for Basic Auth and Kerberos now, users will sooner
> or
> >>> > later ask for OIDC, LDAP, SAML,... I acknowledge that Kerberos is
> >>> widely
> >>> > used in the corporate, on-premises context, but isn't the focus
> moving
> >>> more
> >>> > towards more web-friendly standards like OIDC/OAuth 2.0? If we only
> >>> want to
> >>> > support a single protocol, there is an argument to be made that it
> >>> should
> >>> > be OIDC and Dex [1,2] as a bridge to everything else. Have OIDC or
> >>> OAuth2
> >>> > been considered instead of Kerberos? How do you see the market
> moving?
> >>> But
> >>> > as I said before, in my opinion we can generate more value by
> investing
> >>> > into other areas of Apache Flink.
> >>> >
> >>> > Authorization also has the potential to become more fine-grained and
> >>> > complex over time: you already mentioned restricting the actions
> that a
> >>> > specific user can do in a cluster.
> >>> >
> >>> > Cheers,
> >>> >
> >>> > Konstantin
> >>> >
> >>> > [1] https://github.com/dexidp/dex
> >>> > [2] https://github.com/dexidp/dex/issues/1903
> >>> >
> >>> >
> >>> > On Wed, Jun 16, 2021 at 11:44 AM Gabor Somogyi <
> >>> [hidden email]>
> >>> > wrote:
> >>> >
> >>> >> Hi Till,
> >>> >>
> >>> >> Did you have the chance to take a look at the doc? Not yet seen any
> >>> >> update.
> >>> >>
> >>> >> BR,
> >>> >> G
> >>> >>
> >>> >>
> >>> >> On Wed, Jun 9, 2021 at 1:43 PM Till Rohrmann <[hidden email]>
> >>> >> wrote:
> >>> >>
> >>> >> > Thanks for the update Gabor. I'll take a look and respond in the
> >>> >> document.
> >>> >> >
> >>> >> > Cheers,
> >>> >> > Till
> >>> >> >
> >>> >> > On Wed, Jun 9, 2021 at 12:59 PM Gabor Somogyi <
> >>> >> [hidden email]>
> >>> >> > wrote:
> >>> >> >
> >>> >> >> Hi Till,
> >>> >> >>
> >>> >> >> Your proxy suggestion has been considered in-depth and updated
> the
> >>> FLIP
> >>> >> >> accordingly.
> >>> >> >> We've considered 2 proxy implementation (Nginx and Squid) but
> >>> according
> >>> >> >> to our analysis and testing it's not suitable for the mentioned
> >>> >> use-cases.
> >>> >> >> Please take a look at the rejected alternatives for detailed
> >>> >> explanation.
> >>> >> >>
> >>> >> >> Thanks for your time in advance!
> >>> >> >>
> >>> >> >> BR,
> >>> >> >> G
> >>> >> >>
> >>> >> >>
> >>> >> >> On Fri, Jun 4, 2021 at 3:31 PM Till Rohrmann <
> [hidden email]
> >>> >
> >>> >> >> wrote:
> >>> >> >>
> >>> >> >>> As I've said I am not a security expert and that's why I have to
> >>> ask
> >>> >> for
> >>> >> >>> clarification, Gabor. You are saying that if we configure a
> >>> >> truststore for
> >>> >> >>> the REST endpoint with a single trusted certificate which has
> been
> >>> >> >>> generated by the operator of the Flink cluster, then the
> attacker
> >>> can
> >>> >> >>> generate a new certificate, sign it and then talk to the Flink
> >>> >> cluster if
> >>> >> >>> he has access to the node on which the REST endpoint runs? My
> >>> >> understanding
> >>> >> >>> was that you need the corresponding private key which in my
> >>> proposed
> >>> >> setup
> >>> >> >>> would be under the control of the operator as well (e.g. stored
> >>> in a
> >>> >> >>> keystore on the same machine but guarded by some secret). That
> way
> >>> >> (if I am
> >>> >> >>> not mistaken), only the entity which has access to the keystore
> is
> >>> >> able to
> >>> >> >>> talk to the Flink cluster.
> >>> >> >>>
> >>> >> >>> Maybe we are also getting our wires crossed here and are talking
> >>> about
> >>> >> >>> different things.
> >>> >> >>>
> >>> >> >>> Thanks for listing the pros and cons of Kerberos. Concerning
> what
> >>> >> other
> >>> >> >>> authentication mechanisms are used in the industry, I am not
> 100%
> >>> >> sure.
> >>> >> >>>
> >>> >> >>> Cheers,
> >>> >> >>> Till
> >>> >> >>>
> >>> >> >>> On Fri, Jun 4, 2021 at 11:09 AM Gabor Somogyi <
> >>> >> [hidden email]>
> >>> >> >>> wrote:
> >>> >> >>>
> >>> >> >>>> > I did not mean for the user to sign its own certificates but
> >>> for
> >>> >> the
> >>> >> >>>> operator of the cluster. Once the user request hits the proxy,
> it
> >>> >> should no
> >>> >> >>>> longer be under his control. I think I do not fully understand
> >>> yet
> >>> >> why this
> >>> >> >>>> would not work.
> >>> >> >>>> I said it's not solving the authentication problem over any
> >>> proxy.
> >>> >> Even
> >>> >> >>>> if the operator is signing the certificate one can have access
> >>> to an
> >>> >> >>>> internal node.
> >>> >> >>>> Such case anybody can craft certificates which is accepted by
> the
> >>> >> >>>> server. When it's accepted a bad guy can cancel jobs causing
> huge
> >>> >> impacts.
> >>> >> >>>>
> >>> >> >>>> > Also, I am missing a bit the comparison of Kerberos to other
> >>> >> >>>> authentication mechanisms and why they were rejected in favour
> of
> >>> >> Kerberos.
> >>> >> >>>> PROS:
> >>> >> >>>> * Since it's not depending on cloud provider and/or k8s or
> >>> bare-metal
> >>> >> >>>> etc. deployment it's the biggest plus
> >>> >> >>>> * Centralized with tools and no need to write tons of tools
> >>> around
> >>> >> >>>> * There are clients/tools on almost all OS-es and several
> >>> languages
> >>> >> >>>> * Super huge users are using it for years in production w/o
> huge
> >>> >> issues
> >>> >> >>>> * Provides cross-realm trust possibility amongst other features
> >>> >> >>>> * Several open source components using it which could increase
> >>> >> >>>> compatibility
> >>> >> >>>>
> >>> >> >>>> CONS:
> >>> >> >>>> * Not everybody using kerberos
> >>> >> >>>> * It would increase the code footprint but this is true for
> many
> >>> >> >>>> features (as a side note I'm here to maintain it)
> >>> >> >>>>
> >>> >> >>>> Feel free to add your points because it only represents a
> single
> >>> >> >>>> viewpoint.
> >>> >> >>>> Also if you have any better option for strong authentication
> >>> please
> >>> >> >>>> share it and we can consider the pros/cons here.
> >>> >> >>>>
> >>> >> >>>> BR,
> >>> >> >>>> G
> >>> >> >>>>
> >>> >> >>>>
> >>> >> >>>> On Fri, Jun 4, 2021 at 10:32 AM Till Rohrmann <
> >>> [hidden email]>
> >>> >> >>>> wrote:
> >>> >> >>>>
> >>> >> >>>>> I did not mean for the user to sign its own certificates but
> >>> for the
> >>> >> >>>>> operator of the cluster. Once the user request hits the proxy,
> >>> it
> >>> >> should no
> >>> >> >>>>> longer be under his control. I think I do not fully understand
> >>> yet
> >>> >> why this
> >>> >> >>>>> would not work.
> >>> >> >>>>>
> >>> >> >>>>> What I would like to avoid is to add more complexity into
> Flink
> >>> if
> >>> >> >>>>> there is an easy solution which fulfills the requirements.
> >>> That's
> >>> >> why I
> >>> >> >>>>> would like to exercise thoroughly through the different
> >>> >> alternatives. Also,
> >>> >> >>>>> I am missing a bit the comparison of Kerberos to other
> >>> >> authentication
> >>> >> >>>>> mechanisms and why they were rejected in favour of Kerberos.
> >>> >> >>>>>
> >>> >> >>>>> Cheers,
> >>> >> >>>>> Till
> >>> >> >>>>>
> >>> >> >>>>> On Fri, Jun 4, 2021 at 10:26 AM Gyula Fóra <[hidden email]
> >
> >>> >> wrote:
> >>> >> >>>>>
> >>> >> >>>>>> Hi!
> >>> >> >>>>>>
> >>> >> >>>>>> I think there might be possible alternatives but it seems
> >>> Kerberos
> >>> >> on
> >>> >> >>>>>> the rest endpoint ticks all the right boxes and provides a
> >>> super
> >>> >> clean and
> >>> >> >>>>>> simple solution for strong authentication.
> >>> >> >>>>>>
> >>> >> >>>>>> I wouldn’t even consider sidecar proxies etc if we can solve
> >>> it in
> >>> >> >>>>>> such a simple way as proposed by G.
> >>> >> >>>>>>
> >>> >> >>>>>> Cheers
> >>> >> >>>>>> Gyula
> >>> >> >>>>>>
> >>> >> >>>>>> On Fri, 4 Jun 2021 at 10:03, Till Rohrmann <
> >>> [hidden email]>
> >>> >> >>>>>> wrote:
> >>> >> >>>>>>
> >>> >> >>>>>>> I am not saying that we shouldn't add a strong
> authentication
> >>> >> >>>>>>> mechanism if there are good reasons for it. I primarily
> would
> >>> >> like to
> >>> >> >>>>>>> understand the context a bit better in order to give
> qualified
> >>> >> feedback and
> >>> >> >>>>>>> come to a good decision. In order to do this, I have the
> >>> feeling
> >>> >> that we
> >>> >> >>>>>>> haven't fully considered all available options which are on
> >>> the
> >>> >> table, tbh.
> >>> >> >>>>>>>
> >>> >> >>>>>>> Does the problem of certificate expiry also apply for
> >>> self-signed
> >>> >> >>>>>>> certificates? If yes, then this should then also be a
> problem
> >>> for
> >>> >> the
> >>> >> >>>>>>> internal encryption of Flink's communication. If not, then
> one
> >>> >> could use
> >>> >> >>>>>>> self-signed certificates with a longer validity to solve the
> >>> >> mentioned
> >>> >> >>>>>>> issue.
> >>> >> >>>>>>>
> >>> >> >>>>>>> I think you can set up Flink in such a way that you don't
> >>> have to
> >>> >> >>>>>>> handle all the different certificates. For example, you
> could
> >>> >> deploy Flink
> >>> >> >>>>>>> with a "sidecar proxy" which is responsible for the
> >>> >> authentication using an
> >>> >> >>>>>>> arbitrary method (e.g. Kerberos) and then bind the REST
> >>> endpoint
> >>> >> to a local
> >>> >> >>>>>>> network interface. That way, the REST endpoint would only be
> >>> >> available
> >>> >> >>>>>>> through the sidecar proxy. Additionally, one could enable
> SSL
> >>> for
> >>> >> this
> >>> >> >>>>>>> communication. Would this be a solution for the problem?
> >>> >> >>>>>>>
> >>> >> >>>>>>> Cheers,
> >>> >> >>>>>>> Till
> >>> >> >>>>>>>
> >>> >> >>>>>>> On Thu, Jun 3, 2021 at 10:46 PM Márton Balassi <
> >>> >> >>>>>>> [hidden email]> wrote:
> >>> >> >>>>>>>
> >>> >> >>>>>>>> That is an interesting idea, Till.
> >>> >> >>>>>>>>
> >>> >> >>>>>>>> The main issue with it is that TLS certificates have an
> >>> >> expiration
> >>> >> >>>>>>>> time, usually they get approved for a couple years. Forcing
> >>> our
> >>> >> users to
> >>> >> >>>>>>>> restart jobs to reprovision TLS certificates would be weird
> >>> when
> >>> >> we could
> >>> >> >>>>>>>> just implement a single proper strong authentication
> >>> mechanism
> >>> >> instead in a
> >>> >> >>>>>>>> couple hundred lines of code. :-)
> >>> >> >>>>>>>>
> >>> >> >>>>>>>> In many cases it is also impractical to go the TLS mutual
> >>> route,
> >>> >> >>>>>>>> because the Flink Dashboard can end up on any node in the
> >>> >> k8s/Yarn cluster
> >>> >> >>>>>>>> which means that we need a certificate per node (due to the
> >>> >> mutual auth),
> >>> >> >>>>>>>> but if we also want to protect the private key of these
> from
> >>> >> users
> >>> >> >>>>>>>> accidentally or intentionally leaking them then we need
> this
> >>> per
> >>> >> user. As
> >>> >> >>>>>>>> in we end up managing user*machine number certificates and
> >>> >> having to renew
> >>> >> >>>>>>>> them periodically, which albeit automatable is
> unfortunately
> >>> not
> >>> >> yet
> >>> >> >>>>>>>> automated in all large organizations.
> >>> >> >>>>>>>>
> >>> >> >>>>>>>> I fully agree that TLS certificate mutual authentication
> has
> >>> its
> >>> >> >>>>>>>> nice properties, especially at very large (multiple
> thousand
> >>> >> node) clusters
> >>> >> >>>>>>>> - but it has its own challenges too. Thanks for bringing it
> >>> up.
> >>> >> >>>>>>>>
> >>> >> >>>>>>>> Happy to have this added to the rejected alternative list
> so
> >>> that
> >>> >> >>>>>>>> we have the full picture documented.
> >>> >> >>>>>>>>
> >>> >> >>>>>>>> On Thu, Jun 3, 2021 at 5:52 PM Till Rohrmann <
> >>> >> [hidden email]>
> >>> >> >>>>>>>> wrote:
> >>> >> >>>>>>>>
> >>> >> >>>>>>>>> I guess the idea would then be to let the proxy do the
> >>> >> >>>>>>>>> authentication job and only forward the request via an SSL
> >>> >> mutually
> >>> >> >>>>>>>>> encrypted connection to the Flink cluster. Would this be
> >>> >> possible? The
> >>> >> >>>>>>>>> beauty of this setup is in my opinion that this setup
> should
> >>> >> work with all
> >>> >> >>>>>>>>> kinds of authentication mechanisms.
> >>> >> >>>>>>>>>
> >>> >> >>>>>>>>> Cheers,
> >>> >> >>>>>>>>> Till
> >>> >> >>>>>>>>>
> >>> >> >>>>>>>>> On Thu, Jun 3, 2021 at 3:12 PM Gabor Somogyi <
> >>> >> >>>>>>>>> [hidden email]> wrote:
> >>> >> >>>>>>>>>
> >>> >> >>>>>>>>>> Thanks for giving options to fulfil the need.
> >>> >> >>>>>>>>>>
> >>> >> >>>>>>>>>> Users are looking for a solution where users can be
> >>> identified
> >>> >> on
> >>> >> >>>>>>>>>> the whole cluster and restrict access to
> resources/actions.
> >>> >> >>>>>>>>>> A good example for such an action is cancelling other
> users
> >>> >> >>>>>>>>>> running jobs.
> >>> >> >>>>>>>>>>
> >>> >> >>>>>>>>>> * SSL does provide mutual authentication but when
> >>> >> authentication
> >>> >> >>>>>>>>>> passed there is no user based on restrictions can be
> made.
> >>> >> >>>>>>>>>> * The less problematic part is that
> generating/maintaining
> >>> >> short
> >>> >> >>>>>>>>>> time valid certificates would be a hard (that's the
> reason
> >>> KDC
> >>> >> like servers
> >>> >> >>>>>>>>>> exist).
> >>> >> >>>>>>>>>> Having long time valid certificates would widen the
> attack
> >>> >> >>>>>>>>>> surface but since the first concern is there this is
> just a
> >>> >> cosmetic issue.
> >>> >> >>>>>>>>>>
> >>> >> >>>>>>>>>> All in all using TLS certificates is not sufficient in
> >>> these
> >>> >> >>>>>>>>>> environments unfortunately.
> >>> >> >>>>>>>>>>
> >>> >> >>>>>>>>>> BR,
> >>> >> >>>>>>>>>> G
> >>> >> >>>>>>>>>>
> >>> >> >>>>>>>>>>
> >>> >> >>>>>>>>>> On Thu, Jun 3, 2021 at 12:49 PM Till Rohrmann <
> >>> >> >>>>>>>>>> [hidden email]> wrote:
> >>> >> >>>>>>>>>>
> >>> >> >>>>>>>>>>> Thanks for the information Gabor. If it is about
> securing
> >>> the
> >>> >> >>>>>>>>>>> communication between the REST client and the REST
> server,
> >>> >> then Flink
> >>> >> >>>>>>>>>>> already supports enabling mutual SSL authentication [1].
> >>> >> Would this be
> >>> >> >>>>>>>>>>> enough to secure the communication and to pass an audit?
> >>> >> >>>>>>>>>>>
> >>> >> >>>>>>>>>>> [1]
> >>> >> >>>>>>>>>>>
> >>> >>
> >>>
> https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/security/security-ssl/#external--rest-connectivity
> >>> >> >>>>>>>>>>>
> >>> >> >>>>>>>>>>> Cheers,
> >>> >> >>>>>>>>>>> Till
> >>> >> >>>>>>>>>>>
> >>> >> >>>>>>>>>>> On Thu, Jun 3, 2021 at 10:33 AM Gabor Somogyi <
> >>> >> >>>>>>>>>>> [hidden email]> wrote:
> >>> >> >>>>>>>>>>>
> >>> >> >>>>>>>>>>>> Hi Till,
> >>> >> >>>>>>>>>>>>
> >>> >> >>>>>>>>>>>> Since I'm working in security area 10+ years let me
> >>> share my
> >>> >> >>>>>>>>>>>> thought.
> >>> >> >>>>>>>>>>>> I would like to emphasise there are experts better than
> >>> me
> >>> >> but
> >>> >> >>>>>>>>>>>> I have some
> >>> >> >>>>>>>>>>>> basics.
> >>> >> >>>>>>>>>>>> The discussion is open and not trying to tell alone
> >>> things...
> >>> >> >>>>>>>>>>>>
> >>> >> >>>>>>>>>>>> > I mean if an attacker can get access to one of the
> >>> >> machines,
> >>> >> >>>>>>>>>>>> then it
> >>> >> >>>>>>>>>>>> should also be possible to obtain the right Kerberos
> >>> token.
> >>> >> >>>>>>>>>>>> Not necessarily. For example if one gets access to a
> >>> specific
> >>> >> >>>>>>>>>>>> user's
> >>> >> >>>>>>>>>>>> credentials then it's not possible to compromise other
> >>> user's
> >>> >> >>>>>>>>>>>> jobs, data,
> >>> >> >>>>>>>>>>>> etc...
> >>> >> >>>>>>>>>>>> Security is like an onion, the more layers has been
> >>> added the
> >>> >> >>>>>>>>>>>> more time an
> >>> >> >>>>>>>>>>>> attacker needs to proceed.
> >>> >> >>>>>>>>>>>> At the end of the day if one is in, then most probably
> >>> can
> >>> >> find
> >>> >> >>>>>>>>>>>> the way but
> >>> >> >>>>>>>>>>>> this time is normally enough to sysadmins or security
> >>> >> experts to
> >>> >> >>>>>>>>>>>> close down the system and minimize the damage.
> >>> >> >>>>>>>>>>>>
> >>> >> >>>>>>>>>>>> The other thing is that all tokens has a timeout and if
> >>> the
> >>> >> >>>>>>>>>>>> token is
> >>> >> >>>>>>>>>>>> invalid then the attacker can't proceed further.
> >>> >> >>>>>>>>>>>>
> >>> >> >>>>>>>>>>>> > Is Kerberos also the standard authentication protocol
> >>> for
> >>> >> >>>>>>>>>>>> Kubernetes
> >>> >> >>>>>>>>>>>> deployments?
> >>> >> >>>>>>>>>>>> Kerberos is an industry standard which is
> >>> cloud/deployment
> >>> >> >>>>>>>>>>>> agnostic and it
> >>> >> >>>>>>>>>>>> can be used in any deployments including k8s.
> >>> >> >>>>>>>>>>>> The main intention is to use kerberos in k8s
> deployments
> >>> too
> >>> >> >>>>>>>>>>>> since we're
> >>> >> >>>>>>>>>>>> going this direction as well.
> >>> >> >>>>>>>>>>>> Please see how Spark does this:
> >>> >> >>>>>>>>>>>>
> >>> >> >>>>>>>>>>>>
> >>> >>
> >>>
> https://spark.apache.org/docs/latest/security.html#secure-interaction-with-kubernetes
> >>> >> >>>>>>>>>>>>
> >>> >> >>>>>>>>>>>> Last but not least the most important reason to add at
> >>> least
> >>> >> >>>>>>>>>>>> one strong
> >>> >> >>>>>>>>>>>> authentication is that we have users who has
> >>> >> >>>>>>>>>>>> hard requirements on this. They're doing security
> audits
> >>> and
> >>> >> if
> >>> >> >>>>>>>>>>>> they fail
> >>> >> >>>>>>>>>>>> then it's deal breaking.
> >>> >> >>>>>>>>>>>> That is why we have added kerberos at the first place.
> >>> >> >>>>>>>>>>>> Unfortunately we
> >>> >> >>>>>>>>>>>> can't name them in this public list, however
> >>> >> >>>>>>>>>>>> the customers who specifically asked for this were
> >>> mainly in
> >>> >> >>>>>>>>>>>> the banking
> >>> >> >>>>>>>>>>>> and telco sector.
> >>> >> >>>>>>>>>>>>
> >>> >> >>>>>>>>>>>> BR,
> >>> >> >>>>>>>>>>>> G
> >>> >> >>>>>>>>>>>>
> >>> >> >>>>>>>>>>>>
> >>> >> >>>>>>>>>>>> On Thu, Jun 3, 2021 at 9:20 AM Till Rohrmann <
> >>> >> >>>>>>>>>>>> [hidden email]> wrote:
> >>> >> >>>>>>>>>>>>
> >>> >> >>>>>>>>>>>> > Thanks for updating the document Márton. Why is it
> that
> >>> >> banks
> >>> >> >>>>>>>>>>>> will
> >>> >> >>>>>>>>>>>> > consider it more secure if Flink comes with Kerberos
> >>> >> >>>>>>>>>>>> authentication
> >>> >> >>>>>>>>>>>> > (assuming a properly secured setup)? I mean if an
> >>> attacker
> >>> >> >>>>>>>>>>>> can get access
> >>> >> >>>>>>>>>>>> > to one of the machines, then it should also be
> >>> possible to
> >>> >> >>>>>>>>>>>> obtain the right
> >>> >> >>>>>>>>>>>> > Kerberos token.
> >>> >> >>>>>>>>>>>> >
> >>> >> >>>>>>>>>>>> > I am not an authentication expert and that's why I
> >>> wanted
> >>> >> to
> >>> >> >>>>>>>>>>>> ask what are
> >>> >> >>>>>>>>>>>> > other authentication protocols other than Kerberos?
> >>> Why did
> >>> >> >>>>>>>>>>>> we select
> >>> >> >>>>>>>>>>>> > Kerberos and not any other authentication protocol?
> >>> Maybe
> >>> >> you
> >>> >> >>>>>>>>>>>> can list the
> >>> >> >>>>>>>>>>>> > pros and cons for the different protocols. Is
> Kerberos
> >>> also
> >>> >> >>>>>>>>>>>> the standard
> >>> >> >>>>>>>>>>>> > authentication protocol for Kubernetes deployments?
> If
> >>> not,
> >>> >> >>>>>>>>>>>> what would be
> >>> >> >>>>>>>>>>>> > the answer when deploying on K8s?
> >>> >> >>>>>>>>>>>> >
> >>> >> >>>>>>>>>>>> > Cheers,
> >>> >> >>>>>>>>>>>> > Till
> >>> >> >>>>>>>>>>>> >
> >>> >> >>>>>>>>>>>> > On Wed, Jun 2, 2021 at 12:07 PM Gabor Somogyi <
> >>> >> >>>>>>>>>>>> [hidden email]>
> >>> >> >>>>>>>>>>>> > wrote:
> >>> >> >>>>>>>>>>>> >
> >>> >> >>>>>>>>>>>> >> Hi team,
> >>> >> >>>>>>>>>>>> >>
> >>> >> >>>>>>>>>>>> >> Happy to be here and hope I can provide quality
> >>> additions
> >>> >> in
> >>> >> >>>>>>>>>>>> the future.
> >>> >> >>>>>>>>>>>> >>
> >>> >> >>>>>>>>>>>> >> Thank you all for helpful the suggestions!
> >>> >> >>>>>>>>>>>> >> Considering them the FLIP has been modified and the
> >>> work
> >>> >> >>>>>>>>>>>> continues on the
> >>> >> >>>>>>>>>>>> >> already existing Jira.
> >>> >> >>>>>>>>>>>> >>
> >>> >> >>>>>>>>>>>> >> BR,
> >>> >> >>>>>>>>>>>> >> G
> >>> >> >>>>>>>>>>>> >>
> >>> >> >>>>>>>>>>>> >>
> >>> >> >>>>>>>>>>>> >> On Wed, Jun 2, 2021 at 11:23 AM Márton Balassi <
> >>> >> >>>>>>>>>>>> [hidden email]>
> >>> >> >>>>>>>>>>>> >> wrote:
> >>> >> >>>>>>>>>>>> >>
> >>> >> >>>>>>>>>>>> >>> Thanks, Chesney - I totally missed that. Answered
> on
> >>> the
> >>> >> >>>>>>>>>>>> ticket too, let
> >>> >> >>>>>>>>>>>> >>> us continue there then.
> >>> >> >>>>>>>>>>>> >>>
> >>> >> >>>>>>>>>>>> >>> Till, I agree that we should keep this codepath as
> >>> slim
> >>> >> as
> >>> >> >>>>>>>>>>>> possible. It
> >>> >> >>>>>>>>>>>> >>> is an important design decision that we aim to keep
> >>> the
> >>> >> >>>>>>>>>>>> list of
> >>> >> >>>>>>>>>>>> >>> authentication protocols to a minimum. We believe
> >>> that
> >>> >> this
> >>> >> >>>>>>>>>>>> should not be a
> >>> >> >>>>>>>>>>>> >>> primary concern of Flink and a trusted proxy
> service
> >>> (for
> >>> >> >>>>>>>>>>>> example Apache
> >>> >> >>>>>>>>>>>> >>> Knox) should be used to enable a multitude of
> enduser
> >>> >> >>>>>>>>>>>> authentication
> >>> >> >>>>>>>>>>>> >>> mechanisms. The bare minimum of authentication
> >>> mechanisms
> >>> >> >>>>>>>>>>>> to support
> >>> >> >>>>>>>>>>>> >>> consequently consist of a single strong
> >>> authentication
> >>> >> >>>>>>>>>>>> protocol for which
> >>> >> >>>>>>>>>>>> >>> Kerberos is the enterprise solution and HTTP Basic
> >>> >> primary
> >>> >> >>>>>>>>>>>> for development
> >>> >> >>>>>>>>>>>> >>> and light-weight scenarios.
> >>> >> >>>>>>>>>>>> >>>
> >>> >> >>>>>>>>>>>> >>> Added the above wording to G's doc.
> >>> >> >>>>>>>>>>>> >>>
> >>> >> >>>>>>>>>>>> >>>
> >>> >> >>>>>>>>>>>>
> >>> >>
> >>>
> https://docs.google.com/document/d/1NMPeJ9H0G49TGy3AzTVVJVKmYC0okwOtqLTSPnGqzHw/edit
> >>> >> >>>>>>>>>>>> >>>
> >>> >> >>>>>>>>>>>> >>>
> >>> >> >>>>>>>>>>>> >>>
> >>> >> >>>>>>>>>>>> >>> On Tue, Jun 1, 2021 at 11:47 AM Chesnay Schepler <
> >>> >> >>>>>>>>>>>> [hidden email]>
> >>> >> >>>>>>>>>>>> >>> wrote:
> >>> >> >>>>>>>>>>>> >>>
> >>> >> >>>>>>>>>>>> >>>> There's a related effort:
> >>> >> >>>>>>>>>>>> >>>> https://issues.apache.org/jira/browse/FLINK-21108
> >>> >> >>>>>>>>>>>> >>>>
> >>> >> >>>>>>>>>>>> >>>> On 6/1/2021 10:14 AM, Till Rohrmann wrote:
> >>> >> >>>>>>>>>>>> >>>> > Hi Gabor, welcome to the Flink community!
> >>> >> >>>>>>>>>>>> >>>> >
> >>> >> >>>>>>>>>>>> >>>> > Thanks for sharing this proposal with the
> >>> community
> >>> >> >>>>>>>>>>>> Márton. In
> >>> >> >>>>>>>>>>>> >>>> general, I
> >>> >> >>>>>>>>>>>> >>>> > agree that authentication is missing and that
> >>> this is
> >>> >> >>>>>>>>>>>> required for
> >>> >> >>>>>>>>>>>> >>>> using
> >>> >> >>>>>>>>>>>> >>>> > Flink within an enterprise. The thing I am
> >>> wondering
> >>> >> is
> >>> >> >>>>>>>>>>>> whether this
> >>> >> >>>>>>>>>>>> >>>> > feature strictly needs to be implemented inside
> of
> >>> >> Flink
> >>> >> >>>>>>>>>>>> or whether a
> >>> >> >>>>>>>>>>>> >>>> proxy
> >>> >> >>>>>>>>>>>> >>>> > setup could do the job? Have you considered this
> >>> >> option?
> >>> >> >>>>>>>>>>>> If yes, then
> >>> >> >>>>>>>>>>>> >>>> it
> >>> >> >>>>>>>>>>>> >>>> > would be good to list it under the point of
> >>> rejected
> >>> >> >>>>>>>>>>>> alternatives.
> >>> >> >>>>>>>>>>>> >>>> >
> >>> >> >>>>>>>>>>>> >>>> > I do see the benefit of implementing this
> feature
> >>> >> inside
> >>> >> >>>>>>>>>>>> of Flink if
> >>> >> >>>>>>>>>>>> >>>> many
> >>> >> >>>>>>>>>>>> >>>> > users need it. If not, then it might be easier
> >>> for the
> >>> >> >>>>>>>>>>>> project to not
> >>> >> >>>>>>>>>>>> >>>> > increase the surface area since it makes the
> >>> overall
> >>> >> >>>>>>>>>>>> maintenance
> >>> >> >>>>>>>>>>>> >>>> harder.
> >>> >> >>>>>>>>>>>> >>>> >
> >>> >> >>>>>>>>>>>> >>>> > Cheers,
> >>> >> >>>>>>>>>>>> >>>> > Till
> >>> >> >>>>>>>>>>>> >>>> >
> >>> >> >>>>>>>>>>>> >>>> > On Mon, May 31, 2021 at 4:57 PM Márton Balassi <
> >>> >> >>>>>>>>>>>> [hidden email]>
> >>> >> >>>>>>>>>>>> >>>> wrote:
> >>> >> >>>>>>>>>>>> >>>> >
> >>> >> >>>>>>>>>>>> >>>> >> Hi team,
> >>> >> >>>>>>>>>>>> >>>> >>
> >>> >> >>>>>>>>>>>> >>>> >> Firstly I would like to introduce Gabor or G
> [1]
> >>> for
> >>> >> >>>>>>>>>>>> short to the
> >>> >> >>>>>>>>>>>> >>>> >> community, he is a Spark committer who has
> >>> recently
> >>> >> >>>>>>>>>>>> transitioned to
> >>> >> >>>>>>>>>>>> >>>> the
> >>> >> >>>>>>>>>>>> >>>> >> Flink Engineering team at Cloudera and is
> looking
> >>> >> >>>>>>>>>>>> forward to
> >>> >> >>>>>>>>>>>> >>>> contributing
> >>> >> >>>>>>>>>>>> >>>> >> to Apache Flink. Previously G primarily focused
> >>> on
> >>> >> >>>>>>>>>>>> Spark Streaming
> >>> >> >>>>>>>>>>>> >>>> and
> >>> >> >>>>>>>>>>>> >>>> >> security.
> >>> >> >>>>>>>>>>>> >>>> >>
> >>> >> >>>>>>>>>>>> >>>> >> Based on requests from our customers G has
> >>> >> implemented
> >>> >> >>>>>>>>>>>> Kerberos and
> >>> >> >>>>>>>>>>>> >>>> HTTP
> >>> >> >>>>>>>>>>>> >>>> >> Basic Authentication for the Flink Dashboard
> and
> >>> >> >>>>>>>>>>>> HistoryServer.
> >>> >> >>>>>>>>>>>> >>>> Previously
> >>> >> >>>>>>>>>>>> >>>> >> lacked an authentication story.
> >>> >> >>>>>>>>>>>> >>>> >>
> >>> >> >>>>>>>>>>>> >>>> >> We are looking to contribute this functionality
> >>> back
> >>> >> to
> >>> >> >>>>>>>>>>>> the
> >>> >> >>>>>>>>>>>> >>>> community, we
> >>> >> >>>>>>>>>>>> >>>> >> believe that given Flink's maturity there
> should
> >>> be a
> >>> >> >>>>>>>>>>>> common code
> >>> >> >>>>>>>>>>>> >>>> solution
> >>> >> >>>>>>>>>>>> >>>> >> for this general pattern.
> >>> >> >>>>>>>>>>>> >>>> >>
> >>> >> >>>>>>>>>>>> >>>> >> We are looking forward to your feedback on G's
> >>> >> design.
> >>> >> >>>>>>>>>>>> [2]
> >>> >> >>>>>>>>>>>> >>>> >>
> >>> >> >>>>>>>>>>>> >>>> >> [1] http://gaborsomogyi.com/
> >>> >> >>>>>>>>>>>> >>>> >> [2]
> >>> >> >>>>>>>>>>>> >>>> >>
> >>> >> >>>>>>>>>>>> >>>> >>
> >>> >> >>>>>>>>>>>> >>>>
> >>> >> >>>>>>>>>>>>
> >>> >>
> >>>
> https://docs.google.com/document/d/1NMPeJ9H0G49TGy3AzTVVJVKmYC0okwOtqLTSPnGqzHw/edit
> >>> >> >>>>>>>>>>>> >>>> >>
> >>> >> >>>>>>>>>>>> >>>>
> >>> >> >>>>>>>>>>>> >>>>
> >>> >> >>>>>>>>>>>>
> >>> >> >>>>>>>>>>>
> >>> >>
> >>> >
> >>> >
> >>> > --
> >>> >
> >>> > Konstantin Knauf
> >>> >
> >>> > https://twitter.com/snntrable
> >>> >
> >>> > https://github.com/knaufk
> >>> >
> >>>
> >>
> >>
> >> --
> >>
> >> Konstantin Knauf
> >>
> >> https://twitter.com/snntrable
> >>
> >> https://github.com/knaufk
> >>
> >
>
12