S3/S3A support

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

S3/S3A support

Vijay Srinivasaraghavan
Hello,
Per documentation (https://ci.apache.org/projects/flink/flink-docs-master/setup/aws.html), it looks like S3/S3A FS implementation is supported using standard Hadoop S3 FS client APIs.
In the absence of using standard HCFS and going with S3/S3A,
1) Are there any known limitations/issues? 
2) Does checkpoint/savepoint works properly?
Regards
Vijay
Reply | Threaded
Open this post in threaded view
|

Re: S3/S3A support

Stephan Ewen
Hi!

In 1.2-SNAPSHOT, we recently fixed issues due to the "eventual consistency"
nature of S3. The fix is not in v1.1 - that is the only known issue I can
think of.

It results in occasional (seldom) periods of heavy restart retries, until
all files are visible to all participants.

If you run into that issue, may be worthwhile to look at Flink 1.2-SNAPSHOT.

Best,
Stephan


On Tue, Oct 11, 2016 at 12:13 AM, Vijay Srinivasaraghavan <
[hidden email]> wrote:

> Hello,
> Per documentation (https://ci.apache.org/projects/flink/flink-docs-
> master/setup/aws.html), it looks like S3/S3A FS implementation is
> supported using standard Hadoop S3 FS client APIs.
> In the absence of using standard HCFS and going with S3/S3A,
> 1) Are there any known limitations/issues?
> 2) Does checkpoint/savepoint works properly?
> Regards
> Vijay
Reply | Threaded
Open this post in threaded view
|

Re: S3/S3A support

Vijay Srinivasaraghavan
Thanks Stephan. My understanding is checkpoint uses truncate API but S3A does not support it. Will this have any impact?
Some of the known S3A client limitations are captured in Hortonworks site https://hortonworks.github.io/hdp-aws/s3-s3aclient/index.html and wondering if that has any impact on Flink deployment using S3?
RegardsVijay

 

    On Tuesday, October 11, 2016 1:46 AM, Stephan Ewen <[hidden email]> wrote:
 

 Hi!
In 1.2-SNAPSHOT, we recently fixed issues due to the "eventual consistency" nature of S3. The fix is not in v1.1 - that is the only known issue I can think of.
It results in occasional (seldom) periods of heavy restart retries, until all files are visible to all participants.
If you run into that issue, may be worthwhile to look at Flink 1.2-SNAPSHOT.
Best,
Stephan

On Tue, Oct 11, 2016 at 12:13 AM, Vijay Srinivasaraghavan <[hidden email]> wrote:

Hello,
Per documentation (https://ci.apache.org/ projects/flink/flink-docs- master/setup/aws.html), it looks like S3/S3A FS implementation is supported using standard Hadoop S3 FS client APIs.
In the absence of using standard HCFS and going with S3/S3A,
1) Are there any known limitations/issues? 
2) Does checkpoint/savepoint works properly?
Regards
Vijay



   
Reply | Threaded
Open this post in threaded view
|

Re: S3/S3A support

Stephan Ewen
Hi!

The "truncate()" functionality is only needed for the rolling/bucketing
sink. The core checkpoint functionality does not need any truncate()
behavior...

Best,
Stephan


On Tue, Oct 11, 2016 at 5:22 PM, Vijay Srinivasaraghavan <
[hidden email]> wrote:

> Thanks Stephan. My understanding is checkpoint uses truncate API but S3A
> does not support it. Will this have any impact?
> Some of the known S3A client limitations are captured in Hortonworks site
> https://hortonworks.github.io/hdp-aws/s3-s3aclient/index.html and
> wondering if that has any impact on Flink deployment using S3?
> RegardsVijay
>
>
>
>     On Tuesday, October 11, 2016 1:46 AM, Stephan Ewen <[hidden email]>
> wrote:
>
>
>  Hi!
> In 1.2-SNAPSHOT, we recently fixed issues due to the "eventual
> consistency" nature of S3. The fix is not in v1.1 - that is the only known
> issue I can think of.
> It results in occasional (seldom) periods of heavy restart retries, until
> all files are visible to all participants.
> If you run into that issue, may be worthwhile to look at Flink
> 1.2-SNAPSHOT.
> Best,
> Stephan
>
> On Tue, Oct 11, 2016 at 12:13 AM, Vijay Srinivasaraghavan
> <[hidden email]> wrote:
>
> Hello,
> Per documentation (https://ci.apache.org/ projects/flink/flink-docs-
> master/setup/aws.html), it looks like S3/S3A FS implementation is supported
> using standard Hadoop S3 FS client APIs.
> In the absence of using standard HCFS and going with S3/S3A,
> 1) Are there any known limitations/issues?
> 2) Does checkpoint/savepoint works properly?
> Regards
> Vijay
>
>
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: S3/S3A support

Cliff Resnick
Regarding S3 and the Rolling/BucketingSink, we've seen data loss when
resuming from checkpoints, as S3 FileSystem implementations flush to
temporary files while the RollingSink expects a direct flush to in-progress
files. Because there is no such think as "flush and resume writing" to S3,
I don't know if RollingSink can be workable in a pure S3 environment. We
worked around it by using HDFS in a transient way.

On Tue, Oct 11, 2016 at 12:01 PM, Stephan Ewen <[hidden email]> wrote:

> Hi!
>
> The "truncate()" functionality is only needed for the rolling/bucketing
> sink. The core checkpoint functionality does not need any truncate()
> behavior...
>
> Best,
> Stephan
>
>
> On Tue, Oct 11, 2016 at 5:22 PM, Vijay Srinivasaraghavan <
> [hidden email]> wrote:
>
> > Thanks Stephan. My understanding is checkpoint uses truncate API but S3A
> > does not support it. Will this have any impact?
> > Some of the known S3A client limitations are captured in Hortonworks site
> > https://hortonworks.github.io/hdp-aws/s3-s3aclient/index.html and
> > wondering if that has any impact on Flink deployment using S3?
> > RegardsVijay
> >
> >
> >
> >     On Tuesday, October 11, 2016 1:46 AM, Stephan Ewen <[hidden email]
> >
> > wrote:
> >
> >
> >  Hi!
> > In 1.2-SNAPSHOT, we recently fixed issues due to the "eventual
> > consistency" nature of S3. The fix is not in v1.1 - that is the only
> known
> > issue I can think of.
> > It results in occasional (seldom) periods of heavy restart retries, until
> > all files are visible to all participants.
> > If you run into that issue, may be worthwhile to look at Flink
> > 1.2-SNAPSHOT.
> > Best,
> > Stephan
> >
> > On Tue, Oct 11, 2016 at 12:13 AM, Vijay Srinivasaraghavan
> > <[hidden email]> wrote:
> >
> > Hello,
> > Per documentation (https://ci.apache.org/ projects/flink/flink-docs-
> > master/setup/aws.html), it looks like S3/S3A FS implementation is
> supported
> > using standard Hadoop S3 FS client APIs.
> > In the absence of using standard HCFS and going with S3/S3A,
> > 1) Are there any known limitations/issues?
> > 2) Does checkpoint/savepoint works properly?
> > Regards
> > Vijay
> >
> >
> >
> >
> >
>