http://deprecated-apache-flink-mailing-list-archive.368.s1.nabble.com/DISCUSS-Dashboard-HistoryServer-authentication-tp50993p51128.html
clarification, Gabor. You are saying that if we configure a truststore for
would be under the control of the operator as well (e.g. stored in a
keystore on the same machine but guarded by some secret). That way (if I am
talk to the Flink cluster.
different things.
Thanks for listing the pros and cons of Kerberos. Concerning what other
authentication mechanisms are used in the industry, I am not 100% sure.
> > I did not mean for the user to sign its own certificates but for the
> operator of the cluster. Once the user request hits the proxy, it should no
> longer be under his control. I think I do not fully understand yet why this
> would not work.
> I said it's not solving the authentication problem over any proxy. Even if
> the operator is signing the certificate one can have access to an internal
> node.
> Such case anybody can craft certificates which is accepted by the server.
> When it's accepted a bad guy can cancel jobs causing huge impacts.
>
> > Also, I am missing a bit the comparison of Kerberos to other
> authentication mechanisms and why they were rejected in favour of Kerberos.
> PROS:
> * Since it's not depending on cloud provider and/or k8s or bare-metal etc.
> deployment it's the biggest plus
> * Centralized with tools and no need to write tons of tools around
> * There are clients/tools on almost all OS-es and several languages
> * Super huge users are using it for years in production w/o huge issues
> * Provides cross-realm trust possibility amongst other features
> * Several open source components using it which could increase
> compatibility
>
> CONS:
> * Not everybody using kerberos
> * It would increase the code footprint but this is true for many features
> (as a side note I'm here to maintain it)
>
> Feel free to add your points because it only represents a single viewpoint.
> Also if you have any better option for strong authentication please share
> it and we can consider the pros/cons here.
>
> BR,
> G
>
>
> On Fri, Jun 4, 2021 at 10:32 AM Till Rohrmann <
[hidden email]>
> wrote:
>
>> I did not mean for the user to sign its own certificates but for the
>> operator of the cluster. Once the user request hits the proxy, it should no
>> longer be under his control. I think I do not fully understand yet why this
>> would not work.
>>
>> What I would like to avoid is to add more complexity into Flink if there
>> is an easy solution which fulfills the requirements. That's why I would
>> like to exercise thoroughly through the different alternatives. Also, I am
>> missing a bit the comparison of Kerberos to other authentication mechanisms
>> and why they were rejected in favour of Kerberos.
>>
>> Cheers,
>> Till
>>
>> On Fri, Jun 4, 2021 at 10:26 AM Gyula Fóra <
[hidden email]> wrote:
>>
>>> Hi!
>>>
>>> I think there might be possible alternatives but it seems Kerberos on
>>> the rest endpoint ticks all the right boxes and provides a super clean and
>>> simple solution for strong authentication.
>>>
>>> I wouldn’t even consider sidecar proxies etc if we can solve it in such
>>> a simple way as proposed by G.
>>>
>>> Cheers
>>> Gyula
>>>
>>> On Fri, 4 Jun 2021 at 10:03, Till Rohrmann <
[hidden email]> wrote:
>>>
>>>> I am not saying that we shouldn't add a strong authentication mechanism
>>>> if there are good reasons for it. I primarily would like to understand the
>>>> context a bit better in order to give qualified feedback and come to a good
>>>> decision. In order to do this, I have the feeling that we haven't fully
>>>> considered all available options which are on the table, tbh.
>>>>
>>>> Does the problem of certificate expiry also apply for self-signed
>>>> certificates? If yes, then this should then also be a problem for the
>>>> internal encryption of Flink's communication. If not, then one could use
>>>> self-signed certificates with a longer validity to solve the mentioned
>>>> issue.
>>>>
>>>> I think you can set up Flink in such a way that you don't have to
>>>> handle all the different certificates. For example, you could deploy Flink
>>>> with a "sidecar proxy" which is responsible for the authentication using an
>>>> arbitrary method (e.g. Kerberos) and then bind the REST endpoint to a local
>>>> network interface. That way, the REST endpoint would only be available
>>>> through the sidecar proxy. Additionally, one could enable SSL for this
>>>> communication. Would this be a solution for the problem?
>>>>
>>>> Cheers,
>>>> Till
>>>>
>>>> On Thu, Jun 3, 2021 at 10:46 PM Márton Balassi <
>>>>
[hidden email]> wrote:
>>>>
>>>>> That is an interesting idea, Till.
>>>>>
>>>>> The main issue with it is that TLS certificates have an expiration
>>>>> time, usually they get approved for a couple years. Forcing our users to
>>>>> restart jobs to reprovision TLS certificates would be weird when we could
>>>>> just implement a single proper strong authentication mechanism instead in a
>>>>> couple hundred lines of code. :-)
>>>>>
>>>>> In many cases it is also impractical to go the TLS mutual route,
>>>>> because the Flink Dashboard can end up on any node in the k8s/Yarn cluster
>>>>> which means that we need a certificate per node (due to the mutual auth),
>>>>> but if we also want to protect the private key of these from users
>>>>> accidentally or intentionally leaking them then we need this per user. As
>>>>> in we end up managing user*machine number certificates and having to renew
>>>>> them periodically, which albeit automatable is unfortunately not yet
>>>>> automated in all large organizations.
>>>>>
>>>>> I fully agree that TLS certificate mutual authentication has its nice
>>>>> properties, especially at very large (multiple thousand node) clusters -
>>>>> but it has its own challenges too. Thanks for bringing it up.
>>>>>
>>>>> Happy to have this added to the rejected alternative list so that we
>>>>> have the full picture documented.
>>>>>
>>>>> On Thu, Jun 3, 2021 at 5:52 PM Till Rohrmann <
[hidden email]>
>>>>> wrote:
>>>>>
>>>>>> I guess the idea would then be to let the proxy do the authentication
>>>>>> job and only forward the request via an SSL mutually encrypted connection
>>>>>> to the Flink cluster. Would this be possible? The beauty of this setup is
>>>>>> in my opinion that this setup should work with all kinds of authentication
>>>>>> mechanisms.
>>>>>>
>>>>>> Cheers,
>>>>>> Till
>>>>>>
>>>>>> On Thu, Jun 3, 2021 at 3:12 PM Gabor Somogyi <
>>>>>>
[hidden email]> wrote:
>>>>>>
>>>>>>> Thanks for giving options to fulfil the need.
>>>>>>>
>>>>>>> Users are looking for a solution where users can be identified on
>>>>>>> the whole cluster and restrict access to resources/actions.
>>>>>>> A good example for such an action is cancelling other users running
>>>>>>> jobs.
>>>>>>>
>>>>>>> * SSL does provide mutual authentication but when authentication
>>>>>>> passed there is no user based on restrictions can be made.
>>>>>>> * The less problematic part is that generating/maintaining short
>>>>>>> time valid certificates would be a hard (that's the reason KDC like servers
>>>>>>> exist).
>>>>>>> Having long time valid certificates would widen the attack surface
>>>>>>> but since the first concern is there this is just a cosmetic issue.
>>>>>>>
>>>>>>> All in all using TLS certificates is not sufficient in these
>>>>>>> environments unfortunately.
>>>>>>>
>>>>>>> BR,
>>>>>>> G
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Jun 3, 2021 at 12:49 PM Till Rohrmann <
[hidden email]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Thanks for the information Gabor. If it is about securing the
>>>>>>>> communication between the REST client and the REST server, then Flink
>>>>>>>> already supports enabling mutual SSL authentication [1]. Would this be
>>>>>>>> enough to secure the communication and to pass an audit?
>>>>>>>>
>>>>>>>> [1]
>>>>>>>>
https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/security/security-ssl/#external--rest-connectivity>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Till
>>>>>>>>
>>>>>>>> On Thu, Jun 3, 2021 at 10:33 AM Gabor Somogyi <
>>>>>>>>
[hidden email]> wrote:
>>>>>>>>
>>>>>>>>> Hi Till,
>>>>>>>>>
>>>>>>>>> Since I'm working in security area 10+ years let me share my
>>>>>>>>> thought.
>>>>>>>>> I would like to emphasise there are experts better than me but I
>>>>>>>>> have some
>>>>>>>>> basics.
>>>>>>>>> The discussion is open and not trying to tell alone things...
>>>>>>>>>
>>>>>>>>> > I mean if an attacker can get access to one of the machines,
>>>>>>>>> then it
>>>>>>>>> should also be possible to obtain the right Kerberos token.
>>>>>>>>> Not necessarily. For example if one gets access to a specific
>>>>>>>>> user's
>>>>>>>>> credentials then it's not possible to compromise other user's
>>>>>>>>> jobs, data,
>>>>>>>>> etc...
>>>>>>>>> Security is like an onion, the more layers has been added the more
>>>>>>>>> time an
>>>>>>>>> attacker needs to proceed.
>>>>>>>>> At the end of the day if one is in, then most probably can find
>>>>>>>>> the way but
>>>>>>>>> this time is normally enough to sysadmins or security experts to
>>>>>>>>> close down the system and minimize the damage.
>>>>>>>>>
>>>>>>>>> The other thing is that all tokens has a timeout and if the token
>>>>>>>>> is
>>>>>>>>> invalid then the attacker can't proceed further.
>>>>>>>>>
>>>>>>>>> > Is Kerberos also the standard authentication protocol for
>>>>>>>>> Kubernetes
>>>>>>>>> deployments?
>>>>>>>>> Kerberos is an industry standard which is cloud/deployment
>>>>>>>>> agnostic and it
>>>>>>>>> can be used in any deployments including k8s.
>>>>>>>>> The main intention is to use kerberos in k8s deployments too since
>>>>>>>>> we're
>>>>>>>>> going this direction as well.
>>>>>>>>> Please see how Spark does this:
>>>>>>>>>
>>>>>>>>>
https://spark.apache.org/docs/latest/security.html#secure-interaction-with-kubernetes>>>>>>>>>
>>>>>>>>> Last but not least the most important reason to add at least one
>>>>>>>>> strong
>>>>>>>>> authentication is that we have users who has
>>>>>>>>> hard requirements on this. They're doing security audits and if
>>>>>>>>> they fail
>>>>>>>>> then it's deal breaking.
>>>>>>>>> That is why we have added kerberos at the first place.
>>>>>>>>> Unfortunately we
>>>>>>>>> can't name them in this public list, however
>>>>>>>>> the customers who specifically asked for this were mainly in the
>>>>>>>>> banking
>>>>>>>>> and telco sector.
>>>>>>>>>
>>>>>>>>> BR,
>>>>>>>>> G
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, Jun 3, 2021 at 9:20 AM Till Rohrmann <
[hidden email]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> > Thanks for updating the document Márton. Why is it that banks
>>>>>>>>> will
>>>>>>>>> > consider it more secure if Flink comes with Kerberos
>>>>>>>>> authentication
>>>>>>>>> > (assuming a properly secured setup)? I mean if an attacker can
>>>>>>>>> get access
>>>>>>>>> > to one of the machines, then it should also be possible to
>>>>>>>>> obtain the right
>>>>>>>>> > Kerberos token.
>>>>>>>>> >
>>>>>>>>> > I am not an authentication expert and that's why I wanted to ask
>>>>>>>>> what are
>>>>>>>>> > other authentication protocols other than Kerberos? Why did we
>>>>>>>>> select
>>>>>>>>> > Kerberos and not any other authentication protocol? Maybe you
>>>>>>>>> can list the
>>>>>>>>> > pros and cons for the different protocols. Is Kerberos also the
>>>>>>>>> standard
>>>>>>>>> > authentication protocol for Kubernetes deployments? If not, what
>>>>>>>>> would be
>>>>>>>>> > the answer when deploying on K8s?
>>>>>>>>> >
>>>>>>>>> > Cheers,
>>>>>>>>> > Till
>>>>>>>>> >
>>>>>>>>> > On Wed, Jun 2, 2021 at 12:07 PM Gabor Somogyi <
>>>>>>>>>
[hidden email]>
>>>>>>>>> > wrote:
>>>>>>>>> >
>>>>>>>>> >> Hi team,
>>>>>>>>> >>
>>>>>>>>> >> Happy to be here and hope I can provide quality additions in
>>>>>>>>> the future.
>>>>>>>>> >>
>>>>>>>>> >> Thank you all for helpful the suggestions!
>>>>>>>>> >> Considering them the FLIP has been modified and the work
>>>>>>>>> continues on the
>>>>>>>>> >> already existing Jira.
>>>>>>>>> >>
>>>>>>>>> >> BR,
>>>>>>>>> >> G
>>>>>>>>> >>
>>>>>>>>> >>
>>>>>>>>> >> On Wed, Jun 2, 2021 at 11:23 AM Márton Balassi <
>>>>>>>>>
[hidden email]>
>>>>>>>>> >> wrote:
>>>>>>>>> >>
>>>>>>>>> >>> Thanks, Chesney - I totally missed that. Answered on the
>>>>>>>>> ticket too, let
>>>>>>>>> >>> us continue there then.
>>>>>>>>> >>>
>>>>>>>>> >>> Till, I agree that we should keep this codepath as slim as
>>>>>>>>> possible. It
>>>>>>>>> >>> is an important design decision that we aim to keep the list of
>>>>>>>>> >>> authentication protocols to a minimum. We believe that this
>>>>>>>>> should not be a
>>>>>>>>> >>> primary concern of Flink and a trusted proxy service (for
>>>>>>>>> example Apache
>>>>>>>>> >>> Knox) should be used to enable a multitude of enduser
>>>>>>>>> authentication
>>>>>>>>> >>> mechanisms. The bare minimum of authentication mechanisms to
>>>>>>>>> support
>>>>>>>>> >>> consequently consist of a single strong authentication
>>>>>>>>> protocol for which
>>>>>>>>> >>> Kerberos is the enterprise solution and HTTP Basic primary for
>>>>>>>>> development
>>>>>>>>> >>> and light-weight scenarios.
>>>>>>>>> >>>
>>>>>>>>> >>> Added the above wording to G's doc.
>>>>>>>>> >>>
>>>>>>>>> >>>
>>>>>>>>>
https://docs.google.com/document/d/1NMPeJ9H0G49TGy3AzTVVJVKmYC0okwOtqLTSPnGqzHw/edit>>>>>>>>> >>>
>>>>>>>>> >>>
>>>>>>>>> >>>
>>>>>>>>> >>> On Tue, Jun 1, 2021 at 11:47 AM Chesnay Schepler <
>>>>>>>>>
[hidden email]>
>>>>>>>>> >>> wrote:
>>>>>>>>> >>>
>>>>>>>>> >>>> There's a related effort:
>>>>>>>>> >>>>
https://issues.apache.org/jira/browse/FLINK-21108>>>>>>>>> >>>>
>>>>>>>>> >>>> On 6/1/2021 10:14 AM, Till Rohrmann wrote:
>>>>>>>>> >>>> > Hi Gabor, welcome to the Flink community!
>>>>>>>>> >>>> >
>>>>>>>>> >>>> > Thanks for sharing this proposal with the community Márton.
>>>>>>>>> In
>>>>>>>>> >>>> general, I
>>>>>>>>> >>>> > agree that authentication is missing and that this is
>>>>>>>>> required for
>>>>>>>>> >>>> using
>>>>>>>>> >>>> > Flink within an enterprise. The thing I am wondering is
>>>>>>>>> whether this
>>>>>>>>> >>>> > feature strictly needs to be implemented inside of Flink or
>>>>>>>>> whether a
>>>>>>>>> >>>> proxy
>>>>>>>>> >>>> > setup could do the job? Have you considered this option? If
>>>>>>>>> yes, then
>>>>>>>>> >>>> it
>>>>>>>>> >>>> > would be good to list it under the point of rejected
>>>>>>>>> alternatives.
>>>>>>>>> >>>> >
>>>>>>>>> >>>> > I do see the benefit of implementing this feature inside of
>>>>>>>>> Flink if
>>>>>>>>> >>>> many
>>>>>>>>> >>>> > users need it. If not, then it might be easier for the
>>>>>>>>> project to not
>>>>>>>>> >>>> > increase the surface area since it makes the overall
>>>>>>>>> maintenance
>>>>>>>>> >>>> harder.
>>>>>>>>> >>>> >
>>>>>>>>> >>>> > Cheers,
>>>>>>>>> >>>> > Till
>>>>>>>>> >>>> >
>>>>>>>>> >>>> > On Mon, May 31, 2021 at 4:57 PM Márton Balassi <
>>>>>>>>>
[hidden email]>
>>>>>>>>> >>>> wrote:
>>>>>>>>> >>>> >
>>>>>>>>> >>>> >> Hi team,
>>>>>>>>> >>>> >>
>>>>>>>>> >>>> >> Firstly I would like to introduce Gabor or G [1] for short
>>>>>>>>> to the
>>>>>>>>> >>>> >> community, he is a Spark committer who has recently
>>>>>>>>> transitioned to
>>>>>>>>> >>>> the
>>>>>>>>> >>>> >> Flink Engineering team at Cloudera and is looking forward
>>>>>>>>> to
>>>>>>>>> >>>> contributing
>>>>>>>>> >>>> >> to Apache Flink. Previously G primarily focused on Spark
>>>>>>>>> Streaming
>>>>>>>>> >>>> and
>>>>>>>>> >>>> >> security.
>>>>>>>>> >>>> >>
>>>>>>>>> >>>> >> Based on requests from our customers G has implemented
>>>>>>>>> Kerberos and
>>>>>>>>> >>>> HTTP
>>>>>>>>> >>>> >> Basic Authentication for the Flink Dashboard and
>>>>>>>>> HistoryServer.
>>>>>>>>> >>>> Previously
>>>>>>>>> >>>> >> lacked an authentication story.
>>>>>>>>> >>>> >>
>>>>>>>>> >>>> >> We are looking to contribute this functionality back to the
>>>>>>>>> >>>> community, we
>>>>>>>>> >>>> >> believe that given Flink's maturity there should be a
>>>>>>>>> common code
>>>>>>>>> >>>> solution
>>>>>>>>> >>>> >> for this general pattern.
>>>>>>>>> >>>> >>
>>>>>>>>> >>>> >> We are looking forward to your feedback on G's design. [2]
>>>>>>>>> >>>> >>
>>>>>>>>> >>>> >> [1]
http://gaborsomogyi.com/>>>>>>>>> >>>> >> [2]
>>>>>>>>> >>>> >>
>>>>>>>>> >>>> >>
>>>>>>>>> >>>>
>>>>>>>>>
https://docs.google.com/document/d/1NMPeJ9H0G49TGy3AzTVVJVKmYC0okwOtqLTSPnGqzHw/edit>>>>>>>>> >>>> >>
>>>>>>>>> >>>>
>>>>>>>>> >>>>
>>>>>>>>>
>>>>>>>>