(DEPRECATED) Apache Flink Mailing List archive.

S3/S3A support

Classic

List

Threaded

5 messages Options

Vijay Srinivasaraghavan

S3/S3A support

Hello,
Per documentation (https://ci.apache.org/projects/flink/flink-docs-master/setup/aws.html), it looks like S3/S3A FS implementation is supported using standard Hadoop S3 FS client APIs.
In the absence of using standard HCFS and going with S3/S3A,
1) Are there any known limitations/issues?
2) Does checkpoint/savepoint works properly?
Regards
Vijay

Stephan Ewen

Re: S3/S3A support

Hi!

In 1.2-SNAPSHOT, we recently fixed issues due to the "eventual consistency"
nature of S3. The fix is not in v1.1 - that is the only known issue I can
think of.

It results in occasional (seldom) periods of heavy restart retries, until
all files are visible to all participants.

If you run into that issue, may be worthwhile to look at Flink 1.2-SNAPSHOT.

Best,
Stephan

On Tue, Oct 11, 2016 at 12:13 AM, Vijay Srinivasaraghavan <
[hidden email]> wrote:

> Hello,
> Per documentation (https://ci.apache.org/projects/flink/flink-docs-
> master/setup/aws.html), it looks like S3/S3A FS implementation is
> supported using standard Hadoop S3 FS client APIs.
> In the absence of using standard HCFS and going with S3/S3A,
> 1) Are there any known limitations/issues?
> 2) Does checkpoint/savepoint works properly?
> Regards
> Vijay

Vijay Srinivasaraghavan

Re: S3/S3A support

Thanks Stephan. My understanding is checkpoint uses truncate API but S3A does not support it. Will this have any impact?
Some of the known S3A client limitations are captured in Hortonworks site https://hortonworks.github.io/hdp-aws/s3-s3aclient/index.html and wondering if that has any impact on Flink deployment using S3?
RegardsVijay

On Tuesday, October 11, 2016 1:46 AM, Stephan Ewen <[hidden email]> wrote:

Hi!
In 1.2-SNAPSHOT, we recently fixed issues due to the "eventual consistency" nature of S3. The fix is not in v1.1 - that is the only known issue I can think of.
It results in occasional (seldom) periods of heavy restart retries, until all files are visible to all participants.
If you run into that issue, may be worthwhile to look at Flink 1.2-SNAPSHOT.
Best,
Stephan

On Tue, Oct 11, 2016 at 12:13 AM, Vijay Srinivasaraghavan <[hidden email]> wrote:

Hello,
Per documentation (https://ci.apache.org/ projects/flink/flink-docs- master/setup/aws.html), it looks like S3/S3A FS implementation is supported using standard Hadoop S3 FS client APIs.
In the absence of using standard HCFS and going with S3/S3A,
1) Are there any known limitations/issues?
2) Does checkpoint/savepoint works properly?
Regards
Vijay

Stephan Ewen

Re: S3/S3A support

Hi!

The "truncate()" functionality is only needed for the rolling/bucketing
sink. The core checkpoint functionality does not need any truncate()
behavior...

Best,
Stephan

On Tue, Oct 11, 2016 at 5:22 PM, Vijay Srinivasaraghavan <
[hidden email]> wrote:

> Thanks Stephan. My understanding is checkpoint uses truncate API but S3A
> does not support it. Will this have any impact?
> Some of the known S3A client limitations are captured in Hortonworks site
> https://hortonworks.github.io/hdp-aws/s3-s3aclient/index.html and
> wondering if that has any impact on Flink deployment using S3?
> RegardsVijay
>
>
>
> On Tuesday, October 11, 2016 1:46 AM, Stephan Ewen <[hidden email]>
> wrote:
>
>
> Hi!
> In 1.2-SNAPSHOT, we recently fixed issues due to the "eventual
> consistency" nature of S3. The fix is not in v1.1 - that is the only known
> issue I can think of.
> It results in occasional (seldom) periods of heavy restart retries, until
> all files are visible to all participants.
> If you run into that issue, may be worthwhile to look at Flink
> 1.2-SNAPSHOT.
> Best,
> Stephan
>
> On Tue, Oct 11, 2016 at 12:13 AM, Vijay Srinivasaraghavan
> <[hidden email]> wrote:
>
> Hello,
> Per documentation (https://ci.apache.org/ projects/flink/flink-docs-
> master/setup/aws.html), it looks like S3/S3A FS implementation is supported
> using standard Hadoop S3 FS client APIs.
> In the absence of using standard HCFS and going with S3/S3A,
> 1) Are there any known limitations/issues?
> 2) Does checkpoint/savepoint works properly?
> Regards
> Vijay
>
>
>
>
>

Cliff Resnick

Re: S3/S3A support

Regarding S3 and the Rolling/BucketingSink, we've seen data loss when
resuming from checkpoints, as S3 FileSystem implementations flush to
temporary files while the RollingSink expects a direct flush to in-progress
files. Because there is no such think as "flush and resume writing" to S3,
I don't know if RollingSink can be workable in a pure S3 environment. We
worked around it by using HDFS in a transient way.

On Tue, Oct 11, 2016 at 12:01 PM, Stephan Ewen <[hidden email]> wrote:

> Hi!
>
> The "truncate()" functionality is only needed for the rolling/bucketing
> sink. The core checkpoint functionality does not need any truncate()
> behavior...
>
> Best,
> Stephan
>
>
> On Tue, Oct 11, 2016 at 5:22 PM, Vijay Srinivasaraghavan <
> [hidden email]> wrote:
>
> > Thanks Stephan. My understanding is checkpoint uses truncate API but S3A
> > does not support it. Will this have any impact?
> > Some of the known S3A client limitations are captured in Hortonworks site
> > https://hortonworks.github.io/hdp-aws/s3-s3aclient/index.html and
> > wondering if that has any impact on Flink deployment using S3?
> > RegardsVijay
> >
> >
> >
> > On Tuesday, October 11, 2016 1:46 AM, Stephan Ewen <[hidden email]
> >
> > wrote:
> >
> >
> > Hi!
> > In 1.2-SNAPSHOT, we recently fixed issues due to the "eventual
> > consistency" nature of S3. The fix is not in v1.1 - that is the only
> known
> > issue I can think of.
> > It results in occasional (seldom) periods of heavy restart retries, until
> > all files are visible to all participants.
> > If you run into that issue, may be worthwhile to look at Flink
> > 1.2-SNAPSHOT.
> > Best,
> > Stephan
> >
> > On Tue, Oct 11, 2016 at 12:13 AM, Vijay Srinivasaraghavan
> > <[hidden email]> wrote:
> >
> > Hello,
> > Per documentation (https://ci.apache.org/ projects/flink/flink-docs-
> > master/setup/aws.html), it looks like S3/S3A FS implementation is
> supported
> > using standard Hadoop S3 FS client APIs.
> > In the absence of using standard HCFS and going with S3/S3A,
> > 1) Are there any known limitations/issues?
> > 2) Does checkpoint/savepoint works properly?
> > Regards
> > Vijay
> >
> >
> >
> >
> >
>