Proposal to add Google Cloud Storage FileSystem with RecoverableWriter

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Proposal to add Google Cloud Storage FileSystem with RecoverableWriter

Galen Warren
Hi -- I'm wondering if you would be interested in a contribution to add a
HadoopFileSystem implementation, with associated RecoverableWriter, for
Google Cloud Storage. This would be similar to what's already in place for
S3, and it would allow writing to GCS using a StreamingFileSink. The
implementation would be similar to what's already in place for S3.

I see there's been some work on this before (FLINK-11838 Add GCS
RecoverableWriter by Fokko · Pull Request #7915 · apache/flink (github.com)
<https://github.com/apache/flink/pull/7915>, but the original people
working on it have put it on hold, and the last activity was over six
months ago.

I need this for my own purposes and I have an implementation that I'm
working on locally. I'd be interested to contribute this if you'd be
interested. Let me know if so and I'll create a Jira ticket.

Thanks,
Galen Warren
Reply | Threaded
Open this post in threaded view
|

Re: Proposal to add Google Cloud Storage FileSystem with RecoverableWriter

Till Rohrmann
Hi Galen,

I think that adding support for GCS using the StreamingFileSink sounds like
a very good idea to me. Looking at FLINK-11838 I believe that this effort
has been abandoned. I think that you could take this ticket over if you
want. Maybe you could update this ticket with your solution proposal.

I will check whether I can find a committer who could help you with this
effort.

Cheers,
Till

On Sat, Jan 30, 2021 at 7:43 PM Galen Warren <[hidden email]>
wrote:

> Hi -- I'm wondering if you would be interested in a contribution to add a
> HadoopFileSystem implementation, with associated RecoverableWriter, for
> Google Cloud Storage. This would be similar to what's already in place for
> S3, and it would allow writing to GCS using a StreamingFileSink. The
> implementation would be similar to what's already in place for S3.
>
> I see there's been some work on this before (FLINK-11838 Add GCS
> RecoverableWriter by Fokko · Pull Request #7915 · apache/flink (github.com
> )
> <https://github.com/apache/flink/pull/7915>, but the original people
> working on it have put it on hold, and the last activity was over six
> months ago.
>
> I need this for my own purposes and I have an implementation that I'm
> working on locally. I'd be interested to contribute this if you'd be
> interested. Let me know if so and I'll create a Jira ticket.
>
> Thanks,
> Galen Warren
>
Reply | Threaded
Open this post in threaded view
|

Re: Proposal to add Google Cloud Storage FileSystem with RecoverableWriter

Xintong Song
Hi Galen,

Thanks for offering the contribution.

As Till has already suggested, please comment on FLINK-11838 your solution
proposal.
Once we reach consensus on the proposal, I'll assign you to the ticket.

Thank you~

Xintong Song



On Tue, Feb 2, 2021 at 5:19 PM Till Rohrmann <[hidden email]> wrote:

> Hi Galen,
>
> I think that adding support for GCS using the StreamingFileSink sounds like
> a very good idea to me. Looking at FLINK-11838 I believe that this effort
> has been abandoned. I think that you could take this ticket over if you
> want. Maybe you could update this ticket with your solution proposal.
>
> I will check whether I can find a committer who could help you with this
> effort.
>
> Cheers,
> Till
>
> On Sat, Jan 30, 2021 at 7:43 PM Galen Warren <[hidden email]>
> wrote:
>
> > Hi -- I'm wondering if you would be interested in a contribution to add a
> > HadoopFileSystem implementation, with associated RecoverableWriter, for
> > Google Cloud Storage. This would be similar to what's already in place
> for
> > S3, and it would allow writing to GCS using a StreamingFileSink. The
> > implementation would be similar to what's already in place for S3.
> >
> > I see there's been some work on this before (FLINK-11838 Add GCS
> > RecoverableWriter by Fokko · Pull Request #7915 · apache/flink (
> github.com
> > )
> > <https://github.com/apache/flink/pull/7915>, but the original people
> > working on it have put it on hold, and the last activity was over six
> > months ago.
> >
> > I need this for my own purposes and I have an implementation that I'm
> > working on locally. I'd be interested to contribute this if you'd be
> > interested. Let me know if so and I'll create a Jira ticket.
> >
> > Thanks,
> > Galen Warren
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Proposal to add Google Cloud Storage FileSystem with RecoverableWriter

Till Rohrmann
@Galen, I've just seen that you posted your ideas on the old Github PR. I
think it would be better to post it on the JIRA ticket [1].

[1] https://issues.apache.org/jira/browse/FLINK-11838

Cheers,
Till

On Tue, Feb 2, 2021 at 12:02 PM Xintong Song <[hidden email]> wrote:

> Hi Galen,
>
> Thanks for offering the contribution.
>
> As Till has already suggested, please comment on FLINK-11838 your solution
> proposal.
> Once we reach consensus on the proposal, I'll assign you to the ticket.
>
> Thank you~
>
> Xintong Song
>
>
>
> On Tue, Feb 2, 2021 at 5:19 PM Till Rohrmann <[hidden email]> wrote:
>
> > Hi Galen,
> >
> > I think that adding support for GCS using the StreamingFileSink sounds
> like
> > a very good idea to me. Looking at FLINK-11838 I believe that this effort
> > has been abandoned. I think that you could take this ticket over if you
> > want. Maybe you could update this ticket with your solution proposal.
> >
> > I will check whether I can find a committer who could help you with this
> > effort.
> >
> > Cheers,
> > Till
> >
> > On Sat, Jan 30, 2021 at 7:43 PM Galen Warren <[hidden email]>
> > wrote:
> >
> > > Hi -- I'm wondering if you would be interested in a contribution to
> add a
> > > HadoopFileSystem implementation, with associated RecoverableWriter, for
> > > Google Cloud Storage. This would be similar to what's already in place
> > for
> > > S3, and it would allow writing to GCS using a StreamingFileSink. The
> > > implementation would be similar to what's already in place for S3.
> > >
> > > I see there's been some work on this before (FLINK-11838 Add GCS
> > > RecoverableWriter by Fokko · Pull Request #7915 · apache/flink (
> > github.com
> > > )
> > > <https://github.com/apache/flink/pull/7915>, but the original people
> > > working on it have put it on hold, and the last activity was over six
> > > months ago.
> > >
> > > I need this for my own purposes and I have an implementation that I'm
> > > working on locally. I'd be interested to contribute this if you'd be
> > > interested. Let me know if so and I'll create a Jira ticket.
> > >
> > > Thanks,
> > > Galen Warren
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Proposal to add Google Cloud Storage FileSystem with RecoverableWriter

Xintong Song
@Till,
Did I overlook anything? I don't find Galen's post on the old PR.

Thank you~

Xintong Song



On Wed, Feb 3, 2021 at 6:10 PM Till Rohrmann <[hidden email]> wrote:

> @Galen, I've just seen that you posted your ideas on the old Github PR. I
> think it would be better to post it on the JIRA ticket [1].
>
> [1] https://issues.apache.org/jira/browse/FLINK-11838
>
> Cheers,
> Till
>
> On Tue, Feb 2, 2021 at 12:02 PM Xintong Song <[hidden email]>
> wrote:
>
> > Hi Galen,
> >
> > Thanks for offering the contribution.
> >
> > As Till has already suggested, please comment on FLINK-11838 your
> solution
> > proposal.
> > Once we reach consensus on the proposal, I'll assign you to the ticket.
> >
> > Thank you~
> >
> > Xintong Song
> >
> >
> >
> > On Tue, Feb 2, 2021 at 5:19 PM Till Rohrmann <[hidden email]>
> wrote:
> >
> > > Hi Galen,
> > >
> > > I think that adding support for GCS using the StreamingFileSink sounds
> > like
> > > a very good idea to me. Looking at FLINK-11838 I believe that this
> effort
> > > has been abandoned. I think that you could take this ticket over if you
> > > want. Maybe you could update this ticket with your solution proposal.
> > >
> > > I will check whether I can find a committer who could help you with
> this
> > > effort.
> > >
> > > Cheers,
> > > Till
> > >
> > > On Sat, Jan 30, 2021 at 7:43 PM Galen Warren <[hidden email]>
> > > wrote:
> > >
> > > > Hi -- I'm wondering if you would be interested in a contribution to
> > add a
> > > > HadoopFileSystem implementation, with associated RecoverableWriter,
> for
> > > > Google Cloud Storage. This would be similar to what's already in
> place
> > > for
> > > > S3, and it would allow writing to GCS using a StreamingFileSink. The
> > > > implementation would be similar to what's already in place for S3.
> > > >
> > > > I see there's been some work on this before (FLINK-11838 Add GCS
> > > > RecoverableWriter by Fokko · Pull Request #7915 · apache/flink (
> > > github.com
> > > > )
> > > > <https://github.com/apache/flink/pull/7915>, but the original people
> > > > working on it have put it on hold, and the last activity was over six
> > > > months ago.
> > > >
> > > > I need this for my own purposes and I have an implementation that I'm
> > > > working on locally. I'd be interested to contribute this if you'd be
> > > > interested. Let me know if so and I'll create a Jira ticket.
> > > >
> > > > Thanks,
> > > > Galen Warren
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Proposal to add Google Cloud Storage FileSystem with RecoverableWriter

Till Rohrmann
Hmm, maybe it was deleted again. I received an email notification yesterday.

On Wed, Feb 3, 2021 at 11:57 AM Xintong Song <[hidden email]> wrote:

> @Till,
> Did I overlook anything? I don't find Galen's post on the old PR.
>
> Thank you~
>
> Xintong Song
>
>
>
> On Wed, Feb 3, 2021 at 6:10 PM Till Rohrmann <[hidden email]> wrote:
>
> > @Galen, I've just seen that you posted your ideas on the old Github PR. I
> > think it would be better to post it on the JIRA ticket [1].
> >
> > [1] https://issues.apache.org/jira/browse/FLINK-11838
> >
> > Cheers,
> > Till
> >
> > On Tue, Feb 2, 2021 at 12:02 PM Xintong Song <[hidden email]>
> > wrote:
> >
> > > Hi Galen,
> > >
> > > Thanks for offering the contribution.
> > >
> > > As Till has already suggested, please comment on FLINK-11838 your
> > solution
> > > proposal.
> > > Once we reach consensus on the proposal, I'll assign you to the ticket.
> > >
> > > Thank you~
> > >
> > > Xintong Song
> > >
> > >
> > >
> > > On Tue, Feb 2, 2021 at 5:19 PM Till Rohrmann <[hidden email]>
> > wrote:
> > >
> > > > Hi Galen,
> > > >
> > > > I think that adding support for GCS using the StreamingFileSink
> sounds
> > > like
> > > > a very good idea to me. Looking at FLINK-11838 I believe that this
> > effort
> > > > has been abandoned. I think that you could take this ticket over if
> you
> > > > want. Maybe you could update this ticket with your solution proposal.
> > > >
> > > > I will check whether I can find a committer who could help you with
> > this
> > > > effort.
> > > >
> > > > Cheers,
> > > > Till
> > > >
> > > > On Sat, Jan 30, 2021 at 7:43 PM Galen Warren <
> [hidden email]>
> > > > wrote:
> > > >
> > > > > Hi -- I'm wondering if you would be interested in a contribution to
> > > add a
> > > > > HadoopFileSystem implementation, with associated RecoverableWriter,
> > for
> > > > > Google Cloud Storage. This would be similar to what's already in
> > place
> > > > for
> > > > > S3, and it would allow writing to GCS using a StreamingFileSink.
> The
> > > > > implementation would be similar to what's already in place for S3.
> > > > >
> > > > > I see there's been some work on this before (FLINK-11838 Add GCS
> > > > > RecoverableWriter by Fokko · Pull Request #7915 · apache/flink (
> > > > github.com
> > > > > )
> > > > > <https://github.com/apache/flink/pull/7915>, but the original
> people
> > > > > working on it have put it on hold, and the last activity was over
> six
> > > > > months ago.
> > > > >
> > > > > I need this for my own purposes and I have an implementation that
> I'm
> > > > > working on locally. I'd be interested to contribute this if you'd
> be
> > > > > interested. Let me know if so and I'll create a Jira ticket.
> > > > >
> > > > > Thanks,
> > > > > Galen Warren
> > > > >
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Proposal to add Google Cloud Storage FileSystem with RecoverableWriter

Galen Warren
In reply to this post by Galen Warren
Hi Till -- this is in response to your message before about my proposal to
add GCS FileSystem/RecoverableWriter support. I was only subscribed to the
dev digest before, and so I didn't actually get an email I could reply to,
so sorry for the one-off email. I'm properly subscribed to the dev list now.

I've added this PR <https://github.com/apache/flink/pull/14875>related to
this effort, and I referenced it from the existing Jira ticket
<https://issues.apache.org/jira/browse/FLINK-11838#>.

Please let me know what the next steps are.

Thanks,

Galen