http://deprecated-apache-flink-mailing-list-archive.368.s1.nabble.com/DISCUSS-Dashboard-HistoryServer-authentication-tp50993p51116.html
operator of the cluster. Once the user request hits the proxy, it should no
longer be under his control. I think I do not fully understand yet why this
would not work.
I said it's not solving the authentication problem over any proxy. Even if
node.
Such case anybody can craft certificates which is accepted by the server.
When it's accepted a bad guy can cancel jobs causing huge impacts.
authentication mechanisms and why they were rejected in favour of Kerberos.
* Since it's not depending on cloud provider and/or k8s or bare-metal etc.
Feel free to add your points because it only represents a single viewpoint.
it and we can consider the pros/cons here.
> I did not mean for the user to sign its own certificates but for the
> operator of the cluster. Once the user request hits the proxy, it should no
> longer be under his control. I think I do not fully understand yet why this
> would not work.
>
> What I would like to avoid is to add more complexity into Flink if there
> is an easy solution which fulfills the requirements. That's why I would
> like to exercise thoroughly through the different alternatives. Also, I am
> missing a bit the comparison of Kerberos to other authentication mechanisms
> and why they were rejected in favour of Kerberos.
>
> Cheers,
> Till
>
> On Fri, Jun 4, 2021 at 10:26 AM Gyula Fóra <
[hidden email]> wrote:
>
>> Hi!
>>
>> I think there might be possible alternatives but it seems Kerberos on the
>> rest endpoint ticks all the right boxes and provides a super clean and
>> simple solution for strong authentication.
>>
>> I wouldn’t even consider sidecar proxies etc if we can solve it in such a
>> simple way as proposed by G.
>>
>> Cheers
>> Gyula
>>
>> On Fri, 4 Jun 2021 at 10:03, Till Rohrmann <
[hidden email]> wrote:
>>
>>> I am not saying that we shouldn't add a strong authentication mechanism
>>> if there are good reasons for it. I primarily would like to understand the
>>> context a bit better in order to give qualified feedback and come to a good
>>> decision. In order to do this, I have the feeling that we haven't fully
>>> considered all available options which are on the table, tbh.
>>>
>>> Does the problem of certificate expiry also apply for self-signed
>>> certificates? If yes, then this should then also be a problem for the
>>> internal encryption of Flink's communication. If not, then one could use
>>> self-signed certificates with a longer validity to solve the mentioned
>>> issue.
>>>
>>> I think you can set up Flink in such a way that you don't have to handle
>>> all the different certificates. For example, you could deploy Flink with a
>>> "sidecar proxy" which is responsible for the authentication using an
>>> arbitrary method (e.g. Kerberos) and then bind the REST endpoint to a local
>>> network interface. That way, the REST endpoint would only be available
>>> through the sidecar proxy. Additionally, one could enable SSL for this
>>> communication. Would this be a solution for the problem?
>>>
>>> Cheers,
>>> Till
>>>
>>> On Thu, Jun 3, 2021 at 10:46 PM Márton Balassi <
[hidden email]>
>>> wrote:
>>>
>>>> That is an interesting idea, Till.
>>>>
>>>> The main issue with it is that TLS certificates have an expiration
>>>> time, usually they get approved for a couple years. Forcing our users to
>>>> restart jobs to reprovision TLS certificates would be weird when we could
>>>> just implement a single proper strong authentication mechanism instead in a
>>>> couple hundred lines of code. :-)
>>>>
>>>> In many cases it is also impractical to go the TLS mutual route,
>>>> because the Flink Dashboard can end up on any node in the k8s/Yarn cluster
>>>> which means that we need a certificate per node (due to the mutual auth),
>>>> but if we also want to protect the private key of these from users
>>>> accidentally or intentionally leaking them then we need this per user. As
>>>> in we end up managing user*machine number certificates and having to renew
>>>> them periodically, which albeit automatable is unfortunately not yet
>>>> automated in all large organizations.
>>>>
>>>> I fully agree that TLS certificate mutual authentication has its nice
>>>> properties, especially at very large (multiple thousand node) clusters -
>>>> but it has its own challenges too. Thanks for bringing it up.
>>>>
>>>> Happy to have this added to the rejected alternative list so that we
>>>> have the full picture documented.
>>>>
>>>> On Thu, Jun 3, 2021 at 5:52 PM Till Rohrmann <
[hidden email]>
>>>> wrote:
>>>>
>>>>> I guess the idea would then be to let the proxy do the authentication
>>>>> job and only forward the request via an SSL mutually encrypted connection
>>>>> to the Flink cluster. Would this be possible? The beauty of this setup is
>>>>> in my opinion that this setup should work with all kinds of authentication
>>>>> mechanisms.
>>>>>
>>>>> Cheers,
>>>>> Till
>>>>>
>>>>> On Thu, Jun 3, 2021 at 3:12 PM Gabor Somogyi <
>>>>>
[hidden email]> wrote:
>>>>>
>>>>>> Thanks for giving options to fulfil the need.
>>>>>>
>>>>>> Users are looking for a solution where users can be identified on the
>>>>>> whole cluster and restrict access to resources/actions.
>>>>>> A good example for such an action is cancelling other users running
>>>>>> jobs.
>>>>>>
>>>>>> * SSL does provide mutual authentication but when authentication
>>>>>> passed there is no user based on restrictions can be made.
>>>>>> * The less problematic part is that generating/maintaining short time
>>>>>> valid certificates would be a hard (that's the reason KDC like servers
>>>>>> exist).
>>>>>> Having long time valid certificates would widen the attack surface
>>>>>> but since the first concern is there this is just a cosmetic issue.
>>>>>>
>>>>>> All in all using TLS certificates is not sufficient in these
>>>>>> environments unfortunately.
>>>>>>
>>>>>> BR,
>>>>>> G
>>>>>>
>>>>>>
>>>>>> On Thu, Jun 3, 2021 at 12:49 PM Till Rohrmann <
[hidden email]>
>>>>>> wrote:
>>>>>>
>>>>>>> Thanks for the information Gabor. If it is about securing the
>>>>>>> communication between the REST client and the REST server, then Flink
>>>>>>> already supports enabling mutual SSL authentication [1]. Would this be
>>>>>>> enough to secure the communication and to pass an audit?
>>>>>>>
>>>>>>> [1]
>>>>>>>
https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/security/security-ssl/#external--rest-connectivity>>>>>>>
>>>>>>> Cheers,
>>>>>>> Till
>>>>>>>
>>>>>>> On Thu, Jun 3, 2021 at 10:33 AM Gabor Somogyi <
>>>>>>>
[hidden email]> wrote:
>>>>>>>
>>>>>>>> Hi Till,
>>>>>>>>
>>>>>>>> Since I'm working in security area 10+ years let me share my
>>>>>>>> thought.
>>>>>>>> I would like to emphasise there are experts better than me but I
>>>>>>>> have some
>>>>>>>> basics.
>>>>>>>> The discussion is open and not trying to tell alone things...
>>>>>>>>
>>>>>>>> > I mean if an attacker can get access to one of the machines, then
>>>>>>>> it
>>>>>>>> should also be possible to obtain the right Kerberos token.
>>>>>>>> Not necessarily. For example if one gets access to a specific user's
>>>>>>>> credentials then it's not possible to compromise other user's jobs,
>>>>>>>> data,
>>>>>>>> etc...
>>>>>>>> Security is like an onion, the more layers has been added the more
>>>>>>>> time an
>>>>>>>> attacker needs to proceed.
>>>>>>>> At the end of the day if one is in, then most probably can find the
>>>>>>>> way but
>>>>>>>> this time is normally enough to sysadmins or security experts to
>>>>>>>> close down the system and minimize the damage.
>>>>>>>>
>>>>>>>> The other thing is that all tokens has a timeout and if the token is
>>>>>>>> invalid then the attacker can't proceed further.
>>>>>>>>
>>>>>>>> > Is Kerberos also the standard authentication protocol for
>>>>>>>> Kubernetes
>>>>>>>> deployments?
>>>>>>>> Kerberos is an industry standard which is cloud/deployment agnostic
>>>>>>>> and it
>>>>>>>> can be used in any deployments including k8s.
>>>>>>>> The main intention is to use kerberos in k8s deployments too since
>>>>>>>> we're
>>>>>>>> going this direction as well.
>>>>>>>> Please see how Spark does this:
>>>>>>>>
>>>>>>>>
https://spark.apache.org/docs/latest/security.html#secure-interaction-with-kubernetes>>>>>>>>
>>>>>>>> Last but not least the most important reason to add at least one
>>>>>>>> strong
>>>>>>>> authentication is that we have users who has
>>>>>>>> hard requirements on this. They're doing security audits and if
>>>>>>>> they fail
>>>>>>>> then it's deal breaking.
>>>>>>>> That is why we have added kerberos at the first place.
>>>>>>>> Unfortunately we
>>>>>>>> can't name them in this public list, however
>>>>>>>> the customers who specifically asked for this were mainly in the
>>>>>>>> banking
>>>>>>>> and telco sector.
>>>>>>>>
>>>>>>>> BR,
>>>>>>>> G
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Jun 3, 2021 at 9:20 AM Till Rohrmann <
[hidden email]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> > Thanks for updating the document Márton. Why is it that banks will
>>>>>>>> > consider it more secure if Flink comes with Kerberos
>>>>>>>> authentication
>>>>>>>> > (assuming a properly secured setup)? I mean if an attacker can
>>>>>>>> get access
>>>>>>>> > to one of the machines, then it should also be possible to obtain
>>>>>>>> the right
>>>>>>>> > Kerberos token.
>>>>>>>> >
>>>>>>>> > I am not an authentication expert and that's why I wanted to ask
>>>>>>>> what are
>>>>>>>> > other authentication protocols other than Kerberos? Why did we
>>>>>>>> select
>>>>>>>> > Kerberos and not any other authentication protocol? Maybe you can
>>>>>>>> list the
>>>>>>>> > pros and cons for the different protocols. Is Kerberos also the
>>>>>>>> standard
>>>>>>>> > authentication protocol for Kubernetes deployments? If not, what
>>>>>>>> would be
>>>>>>>> > the answer when deploying on K8s?
>>>>>>>> >
>>>>>>>> > Cheers,
>>>>>>>> > Till
>>>>>>>> >
>>>>>>>> > On Wed, Jun 2, 2021 at 12:07 PM Gabor Somogyi <
>>>>>>>>
[hidden email]>
>>>>>>>> > wrote:
>>>>>>>> >
>>>>>>>> >> Hi team,
>>>>>>>> >>
>>>>>>>> >> Happy to be here and hope I can provide quality additions in the
>>>>>>>> future.
>>>>>>>> >>
>>>>>>>> >> Thank you all for helpful the suggestions!
>>>>>>>> >> Considering them the FLIP has been modified and the work
>>>>>>>> continues on the
>>>>>>>> >> already existing Jira.
>>>>>>>> >>
>>>>>>>> >> BR,
>>>>>>>> >> G
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >> On Wed, Jun 2, 2021 at 11:23 AM Márton Balassi <
>>>>>>>>
[hidden email]>
>>>>>>>> >> wrote:
>>>>>>>> >>
>>>>>>>> >>> Thanks, Chesney - I totally missed that. Answered on the ticket
>>>>>>>> too, let
>>>>>>>> >>> us continue there then.
>>>>>>>> >>>
>>>>>>>> >>> Till, I agree that we should keep this codepath as slim as
>>>>>>>> possible. It
>>>>>>>> >>> is an important design decision that we aim to keep the list of
>>>>>>>> >>> authentication protocols to a minimum. We believe that this
>>>>>>>> should not be a
>>>>>>>> >>> primary concern of Flink and a trusted proxy service (for
>>>>>>>> example Apache
>>>>>>>> >>> Knox) should be used to enable a multitude of enduser
>>>>>>>> authentication
>>>>>>>> >>> mechanisms. The bare minimum of authentication mechanisms to
>>>>>>>> support
>>>>>>>> >>> consequently consist of a single strong authentication protocol
>>>>>>>> for which
>>>>>>>> >>> Kerberos is the enterprise solution and HTTP Basic primary for
>>>>>>>> development
>>>>>>>> >>> and light-weight scenarios.
>>>>>>>> >>>
>>>>>>>> >>> Added the above wording to G's doc.
>>>>>>>> >>>
>>>>>>>> >>>
>>>>>>>>
https://docs.google.com/document/d/1NMPeJ9H0G49TGy3AzTVVJVKmYC0okwOtqLTSPnGqzHw/edit>>>>>>>> >>>
>>>>>>>> >>>
>>>>>>>> >>>
>>>>>>>> >>> On Tue, Jun 1, 2021 at 11:47 AM Chesnay Schepler <
>>>>>>>>
[hidden email]>
>>>>>>>> >>> wrote:
>>>>>>>> >>>
>>>>>>>> >>>> There's a related effort:
>>>>>>>> >>>>
https://issues.apache.org/jira/browse/FLINK-21108>>>>>>>> >>>>
>>>>>>>> >>>> On 6/1/2021 10:14 AM, Till Rohrmann wrote:
>>>>>>>> >>>> > Hi Gabor, welcome to the Flink community!
>>>>>>>> >>>> >
>>>>>>>> >>>> > Thanks for sharing this proposal with the community Márton.
>>>>>>>> In
>>>>>>>> >>>> general, I
>>>>>>>> >>>> > agree that authentication is missing and that this is
>>>>>>>> required for
>>>>>>>> >>>> using
>>>>>>>> >>>> > Flink within an enterprise. The thing I am wondering is
>>>>>>>> whether this
>>>>>>>> >>>> > feature strictly needs to be implemented inside of Flink or
>>>>>>>> whether a
>>>>>>>> >>>> proxy
>>>>>>>> >>>> > setup could do the job? Have you considered this option? If
>>>>>>>> yes, then
>>>>>>>> >>>> it
>>>>>>>> >>>> > would be good to list it under the point of rejected
>>>>>>>> alternatives.
>>>>>>>> >>>> >
>>>>>>>> >>>> > I do see the benefit of implementing this feature inside of
>>>>>>>> Flink if
>>>>>>>> >>>> many
>>>>>>>> >>>> > users need it. If not, then it might be easier for the
>>>>>>>> project to not
>>>>>>>> >>>> > increase the surface area since it makes the overall
>>>>>>>> maintenance
>>>>>>>> >>>> harder.
>>>>>>>> >>>> >
>>>>>>>> >>>> > Cheers,
>>>>>>>> >>>> > Till
>>>>>>>> >>>> >
>>>>>>>> >>>> > On Mon, May 31, 2021 at 4:57 PM Márton Balassi <
>>>>>>>>
[hidden email]>
>>>>>>>> >>>> wrote:
>>>>>>>> >>>> >
>>>>>>>> >>>> >> Hi team,
>>>>>>>> >>>> >>
>>>>>>>> >>>> >> Firstly I would like to introduce Gabor or G [1] for short
>>>>>>>> to the
>>>>>>>> >>>> >> community, he is a Spark committer who has recently
>>>>>>>> transitioned to
>>>>>>>> >>>> the
>>>>>>>> >>>> >> Flink Engineering team at Cloudera and is looking forward to
>>>>>>>> >>>> contributing
>>>>>>>> >>>> >> to Apache Flink. Previously G primarily focused on Spark
>>>>>>>> Streaming
>>>>>>>> >>>> and
>>>>>>>> >>>> >> security.
>>>>>>>> >>>> >>
>>>>>>>> >>>> >> Based on requests from our customers G has implemented
>>>>>>>> Kerberos and
>>>>>>>> >>>> HTTP
>>>>>>>> >>>> >> Basic Authentication for the Flink Dashboard and
>>>>>>>> HistoryServer.
>>>>>>>> >>>> Previously
>>>>>>>> >>>> >> lacked an authentication story.
>>>>>>>> >>>> >>
>>>>>>>> >>>> >> We are looking to contribute this functionality back to the
>>>>>>>> >>>> community, we
>>>>>>>> >>>> >> believe that given Flink's maturity there should be a
>>>>>>>> common code
>>>>>>>> >>>> solution
>>>>>>>> >>>> >> for this general pattern.
>>>>>>>> >>>> >>
>>>>>>>> >>>> >> We are looking forward to your feedback on G's design. [2]
>>>>>>>> >>>> >>
>>>>>>>> >>>> >> [1]
http://gaborsomogyi.com/>>>>>>>> >>>> >> [2]
>>>>>>>> >>>> >>
>>>>>>>> >>>> >>
>>>>>>>> >>>>
>>>>>>>>
https://docs.google.com/document/d/1NMPeJ9H0G49TGy3AzTVVJVKmYC0okwOtqLTSPnGqzHw/edit>>>>>>>> >>>> >>
>>>>>>>> >>>>
>>>>>>>> >>>>
>>>>>>>>
>>>>>>>