Checkpointing to S3

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Checkpointing to S3

Gyula Fóra-2
Hey,

I am trying to checkpoint my streaming job to S3 but it seems that the
checkpoints never complete but also I don't get any error in the logs.

The state backend connects properly to S3 apparently as it creates the
following file in the given S3 directory :

95560b1acf5307bc3096020071c83230_$folder$    (this is a file, not a folder)

The job id is 95560b1acf5307bc3096020071c83230, but that filename is odd
and might cause the problem.
It seems that the backend doesnt properly create a folder for the job
checkpoints for the job id.

Does anyone have any idea what might cause this problem?

Thanks,
Gyula
Reply | Threaded
Open this post in threaded view
|

Re: Checkpointing to S3

Gyula Fóra-2
Ok, I could figure out the problem, it was my fault :). The issue was that
I was running a short testing job and the sources finished before
triggering the checkpoint. So the folder was created for the job in S3 but
since we didn't write anything to it is shown as a file in S3.

Maybe it would be good to give some info to the user in case the source is
finished when the checkpoint is triggered.

On the bright side, it seems to work well, also with the savepoints :)

Cheers
Gyula

Gyula Fóra <[hidden email]> ezt írta (időpont: 2016. jan. 2., Szo,
11:57):

> Hey,
>
> I am trying to checkpoint my streaming job to S3 but it seems that the
> checkpoints never complete but also I don't get any error in the logs.
>
> The state backend connects properly to S3 apparently as it creates the
> following file in the given S3 directory :
>
> 95560b1acf5307bc3096020071c83230_$folder$    (this is a file, not a folder)
>
> The job id is 95560b1acf5307bc3096020071c83230, but that filename is odd
> and might cause the problem.
> It seems that the backend doesnt properly create a folder for the job
> checkpoints for the job id.
>
> Does anyone have any idea what might cause this problem?
>
> Thanks,
> Gyula
>
>
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Checkpointing to S3

Stephan Ewen
Hey!

Nice to hear that it works.

A bit of info is now visible in the web dashboard now, as of that PR:
https://github.com/apache/flink/pull/1453

Is that what you had in mind?

Greetings,
Stephan


On Sat, Jan 2, 2016 at 4:53 PM, Gyula Fóra <[hidden email]> wrote:

> Ok, I could figure out the problem, it was my fault :). The issue was that
> I was running a short testing job and the sources finished before
> triggering the checkpoint. So the folder was created for the job in S3 but
> since we didn't write anything to it is shown as a file in S3.
>
> Maybe it would be good to give some info to the user in case the source is
> finished when the checkpoint is triggered.
>
> On the bright side, it seems to work well, also with the savepoints :)
>
> Cheers
> Gyula
>
> Gyula Fóra <[hidden email]> ezt írta (időpont: 2016. jan. 2., Szo,
> 11:57):
>
> > Hey,
> >
> > I am trying to checkpoint my streaming job to S3 but it seems that the
> > checkpoints never complete but also I don't get any error in the logs.
> >
> > The state backend connects properly to S3 apparently as it creates the
> > following file in the given S3 directory :
> >
> > 95560b1acf5307bc3096020071c83230_$folder$    (this is a file, not a
> folder)
> >
> > The job id is 95560b1acf5307bc3096020071c83230, but that filename is odd
> > and might cause the problem.
> > It seems that the backend doesnt properly create a folder for the job
> > checkpoints for the job id.
> >
> > Does anyone have any idea what might cause this problem?
> >
> > Thanks,
> > Gyula
> >
> >
> >
> >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Checkpointing to S3

Gyula Fóra
Yes, this gives much more information :)

Cheers,
Gyula

Stephan Ewen <[hidden email]> ezt írta (időpont: 2016. jan. 4., H, 16:24):

> Hey!
>
> Nice to hear that it works.
>
> A bit of info is now visible in the web dashboard now, as of that PR:
> https://github.com/apache/flink/pull/1453
>
> Is that what you had in mind?
>
> Greetings,
> Stephan
>
>
> On Sat, Jan 2, 2016 at 4:53 PM, Gyula Fóra <[hidden email]> wrote:
>
> > Ok, I could figure out the problem, it was my fault :). The issue was
> that
> > I was running a short testing job and the sources finished before
> > triggering the checkpoint. So the folder was created for the job in S3
> but
> > since we didn't write anything to it is shown as a file in S3.
> >
> > Maybe it would be good to give some info to the user in case the source
> is
> > finished when the checkpoint is triggered.
> >
> > On the bright side, it seems to work well, also with the savepoints :)
> >
> > Cheers
> > Gyula
> >
> > Gyula Fóra <[hidden email]> ezt írta (időpont: 2016. jan. 2., Szo,
> > 11:57):
> >
> > > Hey,
> > >
> > > I am trying to checkpoint my streaming job to S3 but it seems that the
> > > checkpoints never complete but also I don't get any error in the logs.
> > >
> > > The state backend connects properly to S3 apparently as it creates the
> > > following file in the given S3 directory :
> > >
> > > 95560b1acf5307bc3096020071c83230_$folder$    (this is a file, not a
> > folder)
> > >
> > > The job id is 95560b1acf5307bc3096020071c83230, but that filename is
> odd
> > > and might cause the problem.
> > > It seems that the backend doesnt properly create a folder for the job
> > > checkpoints for the job id.
> > >
> > > Does anyone have any idea what might cause this problem?
> > >
> > > Thanks,
> > > Gyula
> > >
> > >
> > >
> > >
> > >
> >
>