Hello,
Per documentation (https://ci.apache.org/projects/flink/flink-docs-master/setup/aws.html), it looks like S3/S3A FS implementation is supported using standard Hadoop S3 FS client APIs. In the absence of using standard HCFS and going with S3/S3A, 1) Are there any known limitations/issues? 2) Does checkpoint/savepoint works properly? Regards Vijay |
Hi!
In 1.2-SNAPSHOT, we recently fixed issues due to the "eventual consistency" nature of S3. The fix is not in v1.1 - that is the only known issue I can think of. It results in occasional (seldom) periods of heavy restart retries, until all files are visible to all participants. If you run into that issue, may be worthwhile to look at Flink 1.2-SNAPSHOT. Best, Stephan On Tue, Oct 11, 2016 at 12:13 AM, Vijay Srinivasaraghavan < [hidden email]> wrote: > Hello, > Per documentation (https://ci.apache.org/projects/flink/flink-docs- > master/setup/aws.html), it looks like S3/S3A FS implementation is > supported using standard Hadoop S3 FS client APIs. > In the absence of using standard HCFS and going with S3/S3A, > 1) Are there any known limitations/issues? > 2) Does checkpoint/savepoint works properly? > Regards > Vijay |
Thanks Stephan. My understanding is checkpoint uses truncate API but S3A does not support it. Will this have any impact?
Some of the known S3A client limitations are captured in Hortonworks site https://hortonworks.github.io/hdp-aws/s3-s3aclient/index.html and wondering if that has any impact on Flink deployment using S3? RegardsVijay On Tuesday, October 11, 2016 1:46 AM, Stephan Ewen <[hidden email]> wrote: Hi! In 1.2-SNAPSHOT, we recently fixed issues due to the "eventual consistency" nature of S3. The fix is not in v1.1 - that is the only known issue I can think of. It results in occasional (seldom) periods of heavy restart retries, until all files are visible to all participants. If you run into that issue, may be worthwhile to look at Flink 1.2-SNAPSHOT. Best, Stephan On Tue, Oct 11, 2016 at 12:13 AM, Vijay Srinivasaraghavan <[hidden email]> wrote: Hello, Per documentation (https://ci.apache.org/ projects/flink/flink-docs- master/setup/aws.html), it looks like S3/S3A FS implementation is supported using standard Hadoop S3 FS client APIs. In the absence of using standard HCFS and going with S3/S3A, 1) Are there any known limitations/issues? 2) Does checkpoint/savepoint works properly? Regards Vijay |
Hi!
The "truncate()" functionality is only needed for the rolling/bucketing sink. The core checkpoint functionality does not need any truncate() behavior... Best, Stephan On Tue, Oct 11, 2016 at 5:22 PM, Vijay Srinivasaraghavan < [hidden email]> wrote: > Thanks Stephan. My understanding is checkpoint uses truncate API but S3A > does not support it. Will this have any impact? > Some of the known S3A client limitations are captured in Hortonworks site > https://hortonworks.github.io/hdp-aws/s3-s3aclient/index.html and > wondering if that has any impact on Flink deployment using S3? > RegardsVijay > > > > On Tuesday, October 11, 2016 1:46 AM, Stephan Ewen <[hidden email]> > wrote: > > > Hi! > In 1.2-SNAPSHOT, we recently fixed issues due to the "eventual > consistency" nature of S3. The fix is not in v1.1 - that is the only known > issue I can think of. > It results in occasional (seldom) periods of heavy restart retries, until > all files are visible to all participants. > If you run into that issue, may be worthwhile to look at Flink > 1.2-SNAPSHOT. > Best, > Stephan > > On Tue, Oct 11, 2016 at 12:13 AM, Vijay Srinivasaraghavan > <[hidden email]> wrote: > > Hello, > Per documentation (https://ci.apache.org/ projects/flink/flink-docs- > master/setup/aws.html), it looks like S3/S3A FS implementation is supported > using standard Hadoop S3 FS client APIs. > In the absence of using standard HCFS and going with S3/S3A, > 1) Are there any known limitations/issues? > 2) Does checkpoint/savepoint works properly? > Regards > Vijay > > > > > |
Regarding S3 and the Rolling/BucketingSink, we've seen data loss when
resuming from checkpoints, as S3 FileSystem implementations flush to temporary files while the RollingSink expects a direct flush to in-progress files. Because there is no such think as "flush and resume writing" to S3, I don't know if RollingSink can be workable in a pure S3 environment. We worked around it by using HDFS in a transient way. On Tue, Oct 11, 2016 at 12:01 PM, Stephan Ewen <[hidden email]> wrote: > Hi! > > The "truncate()" functionality is only needed for the rolling/bucketing > sink. The core checkpoint functionality does not need any truncate() > behavior... > > Best, > Stephan > > > On Tue, Oct 11, 2016 at 5:22 PM, Vijay Srinivasaraghavan < > [hidden email]> wrote: > > > Thanks Stephan. My understanding is checkpoint uses truncate API but S3A > > does not support it. Will this have any impact? > > Some of the known S3A client limitations are captured in Hortonworks site > > https://hortonworks.github.io/hdp-aws/s3-s3aclient/index.html and > > wondering if that has any impact on Flink deployment using S3? > > RegardsVijay > > > > > > > > On Tuesday, October 11, 2016 1:46 AM, Stephan Ewen <[hidden email] > > > > wrote: > > > > > > Hi! > > In 1.2-SNAPSHOT, we recently fixed issues due to the "eventual > > consistency" nature of S3. The fix is not in v1.1 - that is the only > known > > issue I can think of. > > It results in occasional (seldom) periods of heavy restart retries, until > > all files are visible to all participants. > > If you run into that issue, may be worthwhile to look at Flink > > 1.2-SNAPSHOT. > > Best, > > Stephan > > > > On Tue, Oct 11, 2016 at 12:13 AM, Vijay Srinivasaraghavan > > <[hidden email]> wrote: > > > > Hello, > > Per documentation (https://ci.apache.org/ projects/flink/flink-docs- > > master/setup/aws.html), it looks like S3/S3A FS implementation is > supported > > using standard Hadoop S3 FS client APIs. > > In the absence of using standard HCFS and going with S3/S3A, > > 1) Are there any known limitations/issues? > > 2) Does checkpoint/savepoint works properly? > > Regards > > Vijay > > > > > > > > > > > |
Free forum by Nabble | Edit this page |