Does `flink-connector-filesystem` work with Hadoop-Free Flink?

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Does `flink-connector-filesystem` work with Hadoop-Free Flink?

Jamie Grier-2
Is the `flink-connector-filesystem` connector supposed to work with the
latest hadoop-free Flink releases, say along with the `flink-s3-fs-presto`
filesystem implementation?

-Jamie
Reply | Threaded
Open this post in threaded view
|

Re: Does `flink-connector-filesystem` work with Hadoop-Free Flink?

Aljoscha Krettek-2
Hi,

I'm afraid not, since the BucketingSink uses the Hadoop FileSystem directly and not the Flink FileSystem abstraction. The flink-s3-fs-* modules only provide Flink FileSystems.

One of the goals for 1.6 is to provide a BucketingSink that uses the Flink FileSystem and also works well with eventually consistent file systems.

--
Aljoscha

> On 23. Feb 2018, at 06:31, Jamie Grier <[hidden email]> wrote:
>
> Is the `flink-connector-filesystem` connector supposed to work with the
> latest hadoop-free Flink releases, say along with the `flink-s3-fs-presto`
> filesystem implementation?
>
> -Jamie

Reply | Threaded
Open this post in threaded view
|

Re: Does `flink-connector-filesystem` work with Hadoop-Free Flink?

Jamie Grier-2
Thanks, Aljoscha :)

So is it possible to continue to use the new "native' fllesystems along
with the BucketingSink by including the Hadoop dependencies only in the
user's uber jar? Or is that asking for trouble?  Has anyone tried that
successfully?

-Jamie


On Fri, Feb 23, 2018 at 12:39 AM, Aljoscha Krettek <[hidden email]>
wrote:

> Hi,
>
> I'm afraid not, since the BucketingSink uses the Hadoop FileSystem
> directly and not the Flink FileSystem abstraction. The flink-s3-fs-*
> modules only provide Flink FileSystems.
>
> One of the goals for 1.6 is to provide a BucketingSink that uses the Flink
> FileSystem and also works well with eventually consistent file systems.
>
> --
> Aljoscha
>
> > On 23. Feb 2018, at 06:31, Jamie Grier <[hidden email]> wrote:
> >
> > Is the `flink-connector-filesystem` connector supposed to work with the
> > latest hadoop-free Flink releases, say along with the
> `flink-s3-fs-presto`
> > filesystem implementation?
> >
> > -Jamie
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Does `flink-connector-filesystem` work with Hadoop-Free Flink?

Aljoscha Krettek-2
You mean putting the Flink-native S3 filesystem in the user jar or Hadoop in the user jar. The former wouldn't work, I think, because the FileSystems are being initialised before the user-jar is loaded. The latter might work but only if you don't have Hadoop in the classpath, i.e. not on YARN and only on a Hadoop-free cluster. Maybe...

> On 23. Feb 2018, at 13:32, Jamie Grier <[hidden email]> wrote:
>
> Thanks, Aljoscha :)
>
> So is it possible to continue to use the new "native' fllesystems along
> with the BucketingSink by including the Hadoop dependencies only in the
> user's uber jar? Or is that asking for trouble?  Has anyone tried that
> successfully?
>
> -Jamie
>
>
> On Fri, Feb 23, 2018 at 12:39 AM, Aljoscha Krettek <[hidden email]>
> wrote:
>
>> Hi,
>>
>> I'm afraid not, since the BucketingSink uses the Hadoop FileSystem
>> directly and not the Flink FileSystem abstraction. The flink-s3-fs-*
>> modules only provide Flink FileSystems.
>>
>> One of the goals for 1.6 is to provide a BucketingSink that uses the Flink
>> FileSystem and also works well with eventually consistent file systems.
>>
>> --
>> Aljoscha
>>
>>> On 23. Feb 2018, at 06:31, Jamie Grier <[hidden email]> wrote:
>>>
>>> Is the `flink-connector-filesystem` connector supposed to work with the
>>> latest hadoop-free Flink releases, say along with the
>> `flink-s3-fs-presto`
>>> filesystem implementation?
>>>
>>> -Jamie
>>
>>

Reply | Threaded
Open this post in threaded view
|

Re: Does `flink-connector-filesystem` work with Hadoop-Free Flink?

Jamie Grier-2
Yeah, I meant that latter..  but it sounds like it could be just asking for
trouble.  I just like the idea of keeping the set of un-shaded JARs in the
flink/lib directory to a minimum..

Thanks.

On Fri, Feb 23, 2018 at 5:29 AM, Aljoscha Krettek <[hidden email]>
wrote:

> You mean putting the Flink-native S3 filesystem in the user jar or Hadoop
> in the user jar. The former wouldn't work, I think, because the FileSystems
> are being initialised before the user-jar is loaded. The latter might work
> but only if you don't have Hadoop in the classpath, i.e. not on YARN and
> only on a Hadoop-free cluster. Maybe...
>
> > On 23. Feb 2018, at 13:32, Jamie Grier <[hidden email]> wrote:
> >
> > Thanks, Aljoscha :)
> >
> > So is it possible to continue to use the new "native' fllesystems along
> > with the BucketingSink by including the Hadoop dependencies only in the
> > user's uber jar? Or is that asking for trouble?  Has anyone tried that
> > successfully?
> >
> > -Jamie
> >
> >
> > On Fri, Feb 23, 2018 at 12:39 AM, Aljoscha Krettek <[hidden email]>
> > wrote:
> >
> >> Hi,
> >>
> >> I'm afraid not, since the BucketingSink uses the Hadoop FileSystem
> >> directly and not the Flink FileSystem abstraction. The flink-s3-fs-*
> >> modules only provide Flink FileSystems.
> >>
> >> One of the goals for 1.6 is to provide a BucketingSink that uses the
> Flink
> >> FileSystem and also works well with eventually consistent file systems.
> >>
> >> --
> >> Aljoscha
> >>
> >>> On 23. Feb 2018, at 06:31, Jamie Grier <[hidden email]> wrote:
> >>>
> >>> Is the `flink-connector-filesystem` connector supposed to work with the
> >>> latest hadoop-free Flink releases, say along with the
> >> `flink-s3-fs-presto`
> >>> filesystem implementation?
> >>>
> >>> -Jamie
> >>
> >>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Does `flink-connector-filesystem` work with Hadoop-Free Flink?

Stephan Ewen
Hi!

You could try to have Hadoop in your application Jar file, but I expect
trouble with s3a, because of its specific way they do connection pooling.

Making the Bucketing Sink work with Flink's file systems (and thus with
Hadoop-free Flink) is super high up on the list as soon as the release is
out.

Stephan


On Fri, Feb 23, 2018 at 2:32 PM, Jamie Grier <[hidden email]> wrote:

> Yeah, I meant that latter..  but it sounds like it could be just asking for
> trouble.  I just like the idea of keeping the set of un-shaded JARs in the
> flink/lib directory to a minimum..
>
> Thanks.
>
> On Fri, Feb 23, 2018 at 5:29 AM, Aljoscha Krettek <[hidden email]>
> wrote:
>
> > You mean putting the Flink-native S3 filesystem in the user jar or Hadoop
> > in the user jar. The former wouldn't work, I think, because the
> FileSystems
> > are being initialised before the user-jar is loaded. The latter might
> work
> > but only if you don't have Hadoop in the classpath, i.e. not on YARN and
> > only on a Hadoop-free cluster. Maybe...
> >
> > > On 23. Feb 2018, at 13:32, Jamie Grier <[hidden email]> wrote:
> > >
> > > Thanks, Aljoscha :)
> > >
> > > So is it possible to continue to use the new "native' fllesystems along
> > > with the BucketingSink by including the Hadoop dependencies only in the
> > > user's uber jar? Or is that asking for trouble?  Has anyone tried that
> > > successfully?
> > >
> > > -Jamie
> > >
> > >
> > > On Fri, Feb 23, 2018 at 12:39 AM, Aljoscha Krettek <
> [hidden email]>
> > > wrote:
> > >
> > >> Hi,
> > >>
> > >> I'm afraid not, since the BucketingSink uses the Hadoop FileSystem
> > >> directly and not the Flink FileSystem abstraction. The flink-s3-fs-*
> > >> modules only provide Flink FileSystems.
> > >>
> > >> One of the goals for 1.6 is to provide a BucketingSink that uses the
> > Flink
> > >> FileSystem and also works well with eventually consistent file
> systems.
> > >>
> > >> --
> > >> Aljoscha
> > >>
> > >>> On 23. Feb 2018, at 06:31, Jamie Grier <[hidden email]> wrote:
> > >>>
> > >>> Is the `flink-connector-filesystem` connector supposed to work with
> the
> > >>> latest hadoop-free Flink releases, say along with the
> > >> `flink-s3-fs-presto`
> > >>> filesystem implementation?
> > >>>
> > >>> -Jamie
> > >>
> > >>
> >
> >
>