Hi community,
We are working on secure Flink on YARN. The current Flink-Yarn-Kerberos integration will require each container of a job to log in Kerberos via keytab every say, 24 hours, and does not use any Hadoop delegation token mechanism except when localizing the container. As I fixed the current Flink-Yarn-Kerberos (FLINK-8275) and tried to add more features(FLINK-7860), I have some concern regarding the current implementation. It can pose a scalability issue to the KDC, e.g., if YARN cluster is restarted and all 10s of thousands of containers suddenly DDOS KDC. I would like to propose to improve the current Flink-YARN-Kerberos integration by doing something like the following: 1) AppMaster (JobManager) periodically authenticate the KDC, get all required DTs for the job. 2) all other TM or TE containers periodically retrieve new DTs from the AppMaster (either through a secure HDFS folder, or a secure Akka channel) Also, we want to extend Flink to support pluggable AuthN mechanism, because we have our own internal AuthN mechanism. We would like add support in Flink to authenticate periodically to our internal AuthN service as well through, e.g., dynamic class loading, and use similar mechanism to distribute the credential from the appMaster to containers. I would like to get comments and feedbacks. I can also write a design doc or create a Flip if needed. Thanks a lot. Shuyi -- "So you have to trust that the dots will somehow connect in your future." |
I agree that it is reasonable to use Hadoop DTs as you describe. That
approach is even recommended in YARN's documentation (see Securing Long-lived YARN Services on the YARN Application Security page). But one of the goals of Kerberos integration is to support Kerberized data access for connectors other than HDFS, such as Kafka, Cassandra, and Elasticsearch. So your second point makes sense too, suggesting a general architecture for managing secrets (DTs, keytabs, certificates, oauth tokens, etc.) within the cluster. There's quite a few aspects to Flink security, including: 1. data access (e.g. how a connector authenticates to a data source) 2. service authorization and network security (e.g. how a Flink cluster protects itself from unauthorized access) 3. multi-user support (e.g. multi-user Flink clusters, RBAC) I mention these aspects to clarify your point about AuthN, which I took to be related to (1). Do tell if I misunderstood. Eron On Wed, Dec 20, 2017 at 11:21 AM, Shuyi Chen <[hidden email]> wrote: > Hi community, > > We are working on secure Flink on YARN. The current Flink-Yarn-Kerberos > integration will require each container of a job to log in Kerberos via > keytab every say, 24 hours, and does not use any Hadoop delegation token > mechanism except when localizing the container. As I fixed the current > Flink-Yarn-Kerberos (FLINK-8275) and tried to add more > features(FLINK-7860), I have some concern regarding the current > implementation. It can pose a scalability issue to the KDC, e.g., if YARN > cluster is restarted and all 10s of thousands of containers suddenly DDOS > KDC. > > I would like to propose to improve the current Flink-YARN-Kerberos > integration by doing something like the following: > 1) AppMaster (JobManager) periodically authenticate the KDC, get all > required DTs for the job. > 2) all other TM or TE containers periodically retrieve new DTs from the > AppMaster (either through a secure HDFS folder, or a secure Akka channel) > > Also, we want to extend Flink to support pluggable AuthN mechanism, because > we have our own internal AuthN mechanism. We would like add support in > Flink to authenticate periodically to our internal AuthN service as well > through, e.g., dynamic class loading, and use similar mechanism to > distribute the credential from the appMaster to containers. > > I would like to get comments and feedbacks. I can also write a design doc > or create a Flip if needed. Thanks a lot. > > Shuyi > > > > -- > "So you have to trust that the dots will somehow connect in your future." > |
Thanks a lot for the clarification, Eron. That's very helpful. Currently,
we are more concerned about 1) data access, but will get to 2) and 3) eventually. I was thinking doing the following: 1) extend the current HadoopModule to use and refresh DTs as suggested on YARN Application Security docs. 2) I found the current SecurityModule interface might be enough for supporting other security mechanisms. However, the loading of security modules are hard-coded, not configuration based. I think we can extend SecurityUtils to load from configurations. So we can implement our own security mechanism in our internal repo, and have flink jobs to load it at runtime. Please let me know your comments. Thanks a lot. On Fri, Dec 22, 2017 at 3:05 PM, Eron Wright <[hidden email]> wrote: > I agree that it is reasonable to use Hadoop DTs as you describe. That > approach is even recommended in YARN's documentation (see Securing > Long-lived YARN Services on the YARN Application Security page). But one > of the goals of Kerberos integration is to support Kerberized data access > for connectors other than HDFS, such as Kafka, Cassandra, and > Elasticsearch. So your second point makes sense too, suggesting a general > architecture for managing secrets (DTs, keytabs, certificates, oauth > tokens, etc.) within the cluster. > > There's quite a few aspects to Flink security, including: > 1. data access (e.g. how a connector authenticates to a data source) > 2. service authorization and network security (e.g. how a Flink cluster > protects itself from unauthorized access) > 3. multi-user support (e.g. multi-user Flink clusters, RBAC) > > I mention these aspects to clarify your point about AuthN, which I took to > be related to (1). Do tell if I misunderstood. > > Eron > > > On Wed, Dec 20, 2017 at 11:21 AM, Shuyi Chen <[hidden email]> wrote: > > > Hi community, > > > > We are working on secure Flink on YARN. The current Flink-Yarn-Kerberos > > integration will require each container of a job to log in Kerberos via > > keytab every say, 24 hours, and does not use any Hadoop delegation token > > mechanism except when localizing the container. As I fixed the current > > Flink-Yarn-Kerberos (FLINK-8275) and tried to add more > > features(FLINK-7860), I have some concern regarding the current > > implementation. It can pose a scalability issue to the KDC, e.g., if YARN > > cluster is restarted and all 10s of thousands of containers suddenly DDOS > > KDC. > > > > I would like to propose to improve the current Flink-YARN-Kerberos > > integration by doing something like the following: > > 1) AppMaster (JobManager) periodically authenticate the KDC, get all > > required DTs for the job. > > 2) all other TM or TE containers periodically retrieve new DTs from the > > AppMaster (either through a secure HDFS folder, or a secure Akka channel) > > > > Also, we want to extend Flink to support pluggable AuthN mechanism, > because > > we have our own internal AuthN mechanism. We would like add support in > > Flink to authenticate periodically to our internal AuthN service as well > > through, e.g., dynamic class loading, and use similar mechanism to > > distribute the credential from the appMaster to containers. > > > > I would like to get comments and feedbacks. I can also write a design doc > > or create a Flip if needed. Thanks a lot. > > > > Shuyi > > > > > > > > -- > > "So you have to trust that the dots will somehow connect in your future." > > > -- "So you have to trust that the dots will somehow connect in your future." |
Ping, any comments? Thanks a lot.
Shuyi On Wed, Jan 3, 2018 at 3:43 PM, Shuyi Chen <[hidden email]> wrote: > Thanks a lot for the clarification, Eron. That's very helpful. Currently, > we are more concerned about 1) data access, but will get to 2) and 3) > eventually. > > I was thinking doing the following: > 1) extend the current HadoopModule to use and refresh DTs as suggested on YARN > Application Security docs. > 2) I found the current SecurityModule interface might be enough for > supporting other security mechanisms. However, the loading of security > modules are hard-coded, not configuration based. I think we can extend > SecurityUtils to load from configurations. So we can implement our own > security mechanism in our internal repo, and have flink jobs to load it at > runtime. > > Please let me know your comments. Thanks a lot. > > On Fri, Dec 22, 2017 at 3:05 PM, Eron Wright <[hidden email]> wrote: > >> I agree that it is reasonable to use Hadoop DTs as you describe. That >> approach is even recommended in YARN's documentation (see Securing >> Long-lived YARN Services on the YARN Application Security page). But one >> of the goals of Kerberos integration is to support Kerberized data access >> for connectors other than HDFS, such as Kafka, Cassandra, and >> Elasticsearch. So your second point makes sense too, suggesting a >> general >> architecture for managing secrets (DTs, keytabs, certificates, oauth >> tokens, etc.) within the cluster. >> >> There's quite a few aspects to Flink security, including: >> 1. data access (e.g. how a connector authenticates to a data source) >> 2. service authorization and network security (e.g. how a Flink cluster >> protects itself from unauthorized access) >> 3. multi-user support (e.g. multi-user Flink clusters, RBAC) >> >> I mention these aspects to clarify your point about AuthN, which I took to >> be related to (1). Do tell if I misunderstood. >> >> Eron >> >> >> On Wed, Dec 20, 2017 at 11:21 AM, Shuyi Chen <[hidden email]> wrote: >> >> > Hi community, >> > >> > We are working on secure Flink on YARN. The current Flink-Yarn-Kerberos >> > integration will require each container of a job to log in Kerberos via >> > keytab every say, 24 hours, and does not use any Hadoop delegation token >> > mechanism except when localizing the container. As I fixed the current >> > Flink-Yarn-Kerberos (FLINK-8275) and tried to add more >> > features(FLINK-7860), I have some concern regarding the current >> > implementation. It can pose a scalability issue to the KDC, e.g., if >> YARN >> > cluster is restarted and all 10s of thousands of containers suddenly >> DDOS >> > KDC. >> > >> > I would like to propose to improve the current Flink-YARN-Kerberos >> > integration by doing something like the following: >> > 1) AppMaster (JobManager) periodically authenticate the KDC, get all >> > required DTs for the job. >> > 2) all other TM or TE containers periodically retrieve new DTs from the >> > AppMaster (either through a secure HDFS folder, or a secure Akka >> channel) >> > >> > Also, we want to extend Flink to support pluggable AuthN mechanism, >> because >> > we have our own internal AuthN mechanism. We would like add support in >> > Flink to authenticate periodically to our internal AuthN service as well >> > through, e.g., dynamic class loading, and use similar mechanism to >> > distribute the credential from the appMaster to containers. >> > >> > I would like to get comments and feedbacks. I can also write a design >> doc >> > or create a Flip if needed. Thanks a lot. >> > >> > Shuyi >> > >> > >> > >> > -- >> > "So you have to trust that the dots will somehow connect in your >> future." >> > >> > > > > -- > "So you have to trust that the dots will somehow connect in your future." > -- "So you have to trust that the dots will somehow connect in your future." |
I would suggest that you draft a proposal that lays out your goals and the
technical challenges that you perceive. Then the community can provide some feedback on potential solutions to those challenges, culminating in a concrete improvement proposal. Thanks On Wed, Jan 17, 2018 at 7:29 PM, Shuyi Chen <[hidden email]> wrote: > Ping, any comments? Thanks a lot. > > Shuyi > > On Wed, Jan 3, 2018 at 3:43 PM, Shuyi Chen <[hidden email]> wrote: > > > Thanks a lot for the clarification, Eron. That's very helpful. Currently, > > we are more concerned about 1) data access, but will get to 2) and 3) > > eventually. > > > > I was thinking doing the following: > > 1) extend the current HadoopModule to use and refresh DTs as suggested > on YARN > > Application Security docs. > > 2) I found the current SecurityModule interface might be enough for > > supporting other security mechanisms. However, the loading of security > > modules are hard-coded, not configuration based. I think we can extend > > SecurityUtils to load from configurations. So we can implement our own > > security mechanism in our internal repo, and have flink jobs to load it > at > > runtime. > > > > Please let me know your comments. Thanks a lot. > > > > On Fri, Dec 22, 2017 at 3:05 PM, Eron Wright <[hidden email]> > wrote: > > > >> I agree that it is reasonable to use Hadoop DTs as you describe. That > >> approach is even recommended in YARN's documentation (see Securing > >> Long-lived YARN Services on the YARN Application Security page). But > one > >> of the goals of Kerberos integration is to support Kerberized data > access > >> for connectors other than HDFS, such as Kafka, Cassandra, and > >> Elasticsearch. So your second point makes sense too, suggesting a > >> general > >> architecture for managing secrets (DTs, keytabs, certificates, oauth > >> tokens, etc.) within the cluster. > >> > >> There's quite a few aspects to Flink security, including: > >> 1. data access (e.g. how a connector authenticates to a data source) > >> 2. service authorization and network security (e.g. how a Flink cluster > >> protects itself from unauthorized access) > >> 3. multi-user support (e.g. multi-user Flink clusters, RBAC) > >> > >> I mention these aspects to clarify your point about AuthN, which I took > to > >> be related to (1). Do tell if I misunderstood. > >> > >> Eron > >> > >> > >> On Wed, Dec 20, 2017 at 11:21 AM, Shuyi Chen <[hidden email]> > wrote: > >> > >> > Hi community, > >> > > >> > We are working on secure Flink on YARN. The current > Flink-Yarn-Kerberos > >> > integration will require each container of a job to log in Kerberos > via > >> > keytab every say, 24 hours, and does not use any Hadoop delegation > token > >> > mechanism except when localizing the container. As I fixed the current > >> > Flink-Yarn-Kerberos (FLINK-8275) and tried to add more > >> > features(FLINK-7860), I have some concern regarding the current > >> > implementation. It can pose a scalability issue to the KDC, e.g., if > >> YARN > >> > cluster is restarted and all 10s of thousands of containers suddenly > >> DDOS > >> > KDC. > >> > > >> > I would like to propose to improve the current Flink-YARN-Kerberos > >> > integration by doing something like the following: > >> > 1) AppMaster (JobManager) periodically authenticate the KDC, get all > >> > required DTs for the job. > >> > 2) all other TM or TE containers periodically retrieve new DTs from > the > >> > AppMaster (either through a secure HDFS folder, or a secure Akka > >> channel) > >> > > >> > Also, we want to extend Flink to support pluggable AuthN mechanism, > >> because > >> > we have our own internal AuthN mechanism. We would like add support in > >> > Flink to authenticate periodically to our internal AuthN service as > well > >> > through, e.g., dynamic class loading, and use similar mechanism to > >> > distribute the credential from the appMaster to containers. > >> > > >> > I would like to get comments and feedbacks. I can also write a design > >> doc > >> > or create a Flip if needed. Thanks a lot. > >> > > >> > Shuyi > >> > > >> > > >> > > >> > -- > >> > "So you have to trust that the dots will somehow connect in your > >> future." > >> > > >> > > > > > > > > -- > > "So you have to trust that the dots will somehow connect in your future." > > > > > > -- > "So you have to trust that the dots will somehow connect in your future." > |
Thanks a lot, Eron. I'll draft a proposal and share it with the community.
On Thu, Jan 18, 2018 at 4:18 PM, Eron Wright <[hidden email]> wrote: > I would suggest that you draft a proposal that lays out your goals and the > technical challenges that you perceive. Then the community can provide > some feedback on potential solutions to those challenges, culminating in a > concrete improvement proposal. > > Thanks > > On Wed, Jan 17, 2018 at 7:29 PM, Shuyi Chen <[hidden email]> wrote: > > > Ping, any comments? Thanks a lot. > > > > Shuyi > > > > On Wed, Jan 3, 2018 at 3:43 PM, Shuyi Chen <[hidden email]> wrote: > > > > > Thanks a lot for the clarification, Eron. That's very helpful. > Currently, > > > we are more concerned about 1) data access, but will get to 2) and 3) > > > eventually. > > > > > > I was thinking doing the following: > > > 1) extend the current HadoopModule to use and refresh DTs as suggested > > on YARN > > > Application Security docs. > > > 2) I found the current SecurityModule interface might be enough for > > > supporting other security mechanisms. However, the loading of security > > > modules are hard-coded, not configuration based. I think we can extend > > > SecurityUtils to load from configurations. So we can implement our own > > > security mechanism in our internal repo, and have flink jobs to load it > > at > > > runtime. > > > > > > Please let me know your comments. Thanks a lot. > > > > > > On Fri, Dec 22, 2017 at 3:05 PM, Eron Wright <[hidden email]> > > wrote: > > > > > >> I agree that it is reasonable to use Hadoop DTs as you describe. That > > >> approach is even recommended in YARN's documentation (see Securing > > >> Long-lived YARN Services on the YARN Application Security page). But > > one > > >> of the goals of Kerberos integration is to support Kerberized data > > access > > >> for connectors other than HDFS, such as Kafka, Cassandra, and > > >> Elasticsearch. So your second point makes sense too, suggesting a > > >> general > > >> architecture for managing secrets (DTs, keytabs, certificates, oauth > > >> tokens, etc.) within the cluster. > > >> > > >> There's quite a few aspects to Flink security, including: > > >> 1. data access (e.g. how a connector authenticates to a data source) > > >> 2. service authorization and network security (e.g. how a Flink > cluster > > >> protects itself from unauthorized access) > > >> 3. multi-user support (e.g. multi-user Flink clusters, RBAC) > > >> > > >> I mention these aspects to clarify your point about AuthN, which I > took > > to > > >> be related to (1). Do tell if I misunderstood. > > >> > > >> Eron > > >> > > >> > > >> On Wed, Dec 20, 2017 at 11:21 AM, Shuyi Chen <[hidden email]> > > wrote: > > >> > > >> > Hi community, > > >> > > > >> > We are working on secure Flink on YARN. The current > > Flink-Yarn-Kerberos > > >> > integration will require each container of a job to log in Kerberos > > via > > >> > keytab every say, 24 hours, and does not use any Hadoop delegation > > token > > >> > mechanism except when localizing the container. As I fixed the > current > > >> > Flink-Yarn-Kerberos (FLINK-8275) and tried to add more > > >> > features(FLINK-7860), I have some concern regarding the current > > >> > implementation. It can pose a scalability issue to the KDC, e.g., if > > >> YARN > > >> > cluster is restarted and all 10s of thousands of containers suddenly > > >> DDOS > > >> > KDC. > > >> > > > >> > I would like to propose to improve the current Flink-YARN-Kerberos > > >> > integration by doing something like the following: > > >> > 1) AppMaster (JobManager) periodically authenticate the KDC, get all > > >> > required DTs for the job. > > >> > 2) all other TM or TE containers periodically retrieve new DTs from > > the > > >> > AppMaster (either through a secure HDFS folder, or a secure Akka > > >> channel) > > >> > > > >> > Also, we want to extend Flink to support pluggable AuthN mechanism, > > >> because > > >> > we have our own internal AuthN mechanism. We would like add support > in > > >> > Flink to authenticate periodically to our internal AuthN service as > > well > > >> > through, e.g., dynamic class loading, and use similar mechanism to > > >> > distribute the credential from the appMaster to containers. > > >> > > > >> > I would like to get comments and feedbacks. I can also write a > design > > >> doc > > >> > or create a Flip if needed. Thanks a lot. > > >> > > > >> > Shuyi > > >> > > > >> > > > >> > > > >> > -- > > >> > "So you have to trust that the dots will somehow connect in your > > >> future." > > >> > > > >> > > > > > > > > > > > > -- > > > "So you have to trust that the dots will somehow connect in your > future." > > > > > > > > > > > -- > > "So you have to trust that the dots will somehow connect in your future." > > > -- "So you have to trust that the dots will somehow connect in your future." |
Free forum by Nabble | Edit this page |