Hi, I'm trying to understand the difference between the flink-filesystem and flink-connector-filesystem. How is each intended to be used?
If adding support for a different storage provider that supports HDFS, should additions be made to one or the other, or both? Thanks. |
Hi,
the flink-connector-filesystem contains the BucketingSink which is a connector with which you can write your data to a file system. It provides exactly once processing guarantees and allows to write data to different buckets [1]. The flink-filesystem module contains different file system implementations (like mapr fs, hdfs or s3). If you want to use, for example, s3 file system, then there is the flink-s3-fs-hadoop and flink-s3-fs-presto module. So if you want to write your data to s3 using the BucketingSink, then you have to add flink-connector-filesystem for the BucketingSink as well as a s3 file system implementations (e.g. flink-s3-fs-hadoop or flink-s3-fs-presto). Usually, there should be no need to change Flink's filesystem implementations. If you want to add a new connector, then this would go to flink-connectors or to Apache Bahir [2]. [1] https://ci.apache.org/projects/flink/flink-docs-master/dev/connectors/filesystem_sink.html [2] https://ci.apache.org/projects/flink/flink-docs-master/dev/connectors/index.html#connectors-in-apache-bahir Cheers, Till On Fri, Jan 12, 2018 at 7:22 PM, cw7k <[hidden email]> wrote: > Hi, I'm trying to understand the difference between the flink-filesystem > and flink-connector-filesystem. How is each intended to be used? > If adding support for a different storage provider that supports HDFS, > should additions be made to one or the other, or both? Thanks. |
Thanks, is there a sample project that outputs a Flink example app (ie WordCount) to S3?
On Saturday, January 13, 2018, 4:56:15 AM PST, Till Rohrmann <[hidden email]> wrote: Hi, the flink-connector-filesystem contains the BucketingSink which is a connector with which you can write your data to a file system. It provides exactly once processing guarantees and allows to write data to different buckets [1]. The flink-filesystem module contains different file system implementations (like mapr fs, hdfs or s3). If you want to use, for example, s3 file system, then there is the flink-s3-fs-hadoop and flink-s3-fs-presto module. So if you want to write your data to s3 using the BucketingSink, then you have to add flink-connector-filesystem for the BucketingSink as well as a s3 file system implementations (e.g. flink-s3-fs-hadoop or flink-s3-fs-presto). Usually, there should be no need to change Flink's filesystem implementations. If you want to add a new connector, then this would go to flink-connectors or to Apache Bahir [2]. [1] https://ci.apache.org/projects/flink/flink-docs-master/dev/connectors/filesystem_sink.html [2] https://ci.apache.org/projects/flink/flink-docs-master/dev/connectors/index.html#connectors-in-apache-bahir Cheers, Till On Fri, Jan 12, 2018 at 7:22 PM, cw7k <[hidden email]> wrote: > Hi, I'm trying to understand the difference between the flink-filesystem > and flink-connector-filesystem. How is each intended to be used? > If adding support for a different storage provider that supports HDFS, > should additions be made to one or the other, or both? Thanks. |
In reply to this post by Till Rohrmann
Hi, question on this page:
"You need to point Flink to a valid Hadoop configuration..."https://ci.apache.org/projects/flink/flink-docs-release-1.4/ops/deployment/aws.html#s3-simple-storage-service How do you point Flink to the Hadoop config? On Saturday, January 13, 2018, 4:56:15 AM PST, Till Rohrmann <[hidden email]> wrote: Hi, the flink-connector-filesystem contains the BucketingSink which is a connector with which you can write your data to a file system. It provides exactly once processing guarantees and allows to write data to different buckets [1]. The flink-filesystem module contains different file system implementations (like mapr fs, hdfs or s3). If you want to use, for example, s3 file system, then there is the flink-s3-fs-hadoop and flink-s3-fs-presto module. So if you want to write your data to s3 using the BucketingSink, then you have to add flink-connector-filesystem for the BucketingSink as well as a s3 file system implementations (e.g. flink-s3-fs-hadoop or flink-s3-fs-presto). Usually, there should be no need to change Flink's filesystem implementations. If you want to add a new connector, then this would go to flink-connectors or to Apache Bahir [2]. [1] https://ci.apache.org/projects/flink/flink-docs-master/dev/connectors/filesystem_sink.html [2] https://ci.apache.org/projects/flink/flink-docs-master/dev/connectors/index.html#connectors-in-apache-bahir Cheers, Till On Fri, Jan 12, 2018 at 7:22 PM, cw7k <[hidden email]> wrote: > Hi, I'm trying to understand the difference between the flink-filesystem > and flink-connector-filesystem. How is each intended to be used? > If adding support for a different storage provider that supports HDFS, > should additions be made to one or the other, or both? Thanks. |
Hi, I'm adding support for more cloud storage providers such as Google (gcs://) and Oracle (oci://).
I have an oci:// test working based on the s3a:// test but when I try it on an actual Flink job like WordCount, I get this message: "org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Could not find a file system implementation for scheme 'oci'. The scheme is not directly supported by Flink and no Hadoop file system to support this scheme could be loaded." How do I register new schemes into the file system factory? Thanks. On Tuesday, January 16, 2018, 5:27:31 PM PST, cw7k <[hidden email]> wrote: Hi, question on this page: "You need to point Flink to a valid Hadoop configuration..."https://ci.apache.org/projects/flink/flink-docs-release-1.4/ops/deployment/aws.html#s3-simple-storage-service How do you point Flink to the Hadoop config? On Saturday, January 13, 2018, 4:56:15 AM PST, Till Rohrmann <[hidden email]> wrote: Hi, the flink-connector-filesystem contains the BucketingSink which is a connector with which you can write your data to a file system. It provides exactly once processing guarantees and allows to write data to different buckets [1]. The flink-filesystem module contains different file system implementations (like mapr fs, hdfs or s3). If you want to use, for example, s3 file system, then there is the flink-s3-fs-hadoop and flink-s3-fs-presto module. So if you want to write your data to s3 using the BucketingSink, then you have to add flink-connector-filesystem for the BucketingSink as well as a s3 file system implementations (e.g. flink-s3-fs-hadoop or flink-s3-fs-presto). Usually, there should be no need to change Flink's filesystem implementations. If you want to add a new connector, then this would go to flink-connectors or to Apache Bahir [2]. [1] https://ci.apache.org/projects/flink/flink-docs-master/dev/connectors/filesystem_sink.html [2] https://ci.apache.org/projects/flink/flink-docs-master/dev/connectors/index.html#connectors-in-apache-bahir Cheers, Till On Fri, Jan 12, 2018 at 7:22 PM, cw7k <[hidden email]> wrote: > Hi, I'm trying to understand the difference between the flink-filesystem > and flink-connector-filesystem. How is each intended to be used? > If adding support for a different storage provider that supports HDFS, > should additions be made to one or the other, or both? Thanks. |
Hi,
please have a look at this doc page [1]. It describes how to add new file system implementations and also how to configure them. Best, Fabian [1] https://ci.apache.org/projects/flink/flink-docs-release-1.4/ops/filesystems.html#adding-new-file-system-implementations 2018-01-18 0:32 GMT+01:00 cw7k <[hidden email]>: > Hi, I'm adding support for more cloud storage providers such as Google > (gcs://) and Oracle (oci://). > I have an oci:// test working based on the s3a:// test but when I try it > on an actual Flink job like WordCount, I get this message: > "org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Could not > find a file system implementation for scheme 'oci'. The scheme is not > directly supported by Flink and no Hadoop file system to support this > scheme could be loaded." > How do I register new schemes into the file system factory? Thanks. On > Tuesday, January 16, 2018, 5:27:31 PM PST, cw7k <[hidden email]> > wrote: > > Hi, question on this page: > "You need to point Flink to a valid Hadoop configuration..."https://ci. > apache.org/projects/flink/flink-docs-release-1.4/ops/ > deployment/aws.html#s3-simple-storage-service > How do you point Flink to the Hadoop config? > On Saturday, January 13, 2018, 4:56:15 AM PST, Till Rohrmann < > [hidden email]> wrote: > > Hi, > > the flink-connector-filesystem contains the BucketingSink which is a > connector with which you can write your data to a file system. It provides > exactly once processing guarantees and allows to write data to different > buckets [1]. > > The flink-filesystem module contains different file system implementations > (like mapr fs, hdfs or s3). If you want to use, for example, s3 file > system, then there is the flink-s3-fs-hadoop and flink-s3-fs-presto module. > > So if you want to write your data to s3 using the BucketingSink, then you > have to add flink-connector-filesystem for the BucketingSink as well as a > s3 file system implementations (e.g. flink-s3-fs-hadoop or > flink-s3-fs-presto). > > Usually, there should be no need to change Flink's filesystem > implementations. If you want to add a new connector, then this would go to > flink-connectors or to Apache Bahir [2]. > > [1] > https://ci.apache.org/projects/flink/flink-docs-master/dev/connectors/ > filesystem_sink.html > > [2] > https://ci.apache.org/projects/flink/flink-docs- > master/dev/connectors/index.html#connectors-in-apache-bahir > > Cheers, > Till > > On Fri, Jan 12, 2018 at 7:22 PM, cw7k <[hidden email]> wrote: > > > Hi, I'm trying to understand the difference between the flink-filesystem > > and flink-connector-filesystem. How is each intended to be used? > > If adding support for a different storage provider that supports HDFS, > > should additions be made to one or the other, or both? Thanks. > |
Thanks. I'm looking at the s3 example and I can only find the S3FileSystemFactory but not the File System implementation (subclass of org.apache.flink.core.fs.FileSystem).
Is that requirement still needed? On Wednesday, January 17, 2018, 3:59:47 PM PST, Fabian Hueske <[hidden email]> wrote: Hi, please have a look at this doc page [1]. It describes how to add new file system implementations and also how to configure them. Best, Fabian [1] https://ci.apache.org/projects/flink/flink-docs-release-1.4/ops/filesystems.html#adding-new-file-system-implementations 2018-01-18 0:32 GMT+01:00 cw7k <[hidden email]>: > Hi, I'm adding support for more cloud storage providers such as Google > (gcs://) and Oracle (oci://). > I have an oci:// test working based on the s3a:// test but when I try it > on an actual Flink job like WordCount, I get this message: > "org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Could not > find a file system implementation for scheme 'oci'. The scheme is not > directly supported by Flink and no Hadoop file system to support this > scheme could be loaded." > How do I register new schemes into the file system factory? Thanks. On > Tuesday, January 16, 2018, 5:27:31 PM PST, cw7k <[hidden email]> > wrote: > > Hi, question on this page: > "You need to point Flink to a valid Hadoop configuration..."https://ci. > apache.org/projects/flink/flink-docs-release-1.4/ops/ > deployment/aws.html#s3-simple-storage-service > How do you point Flink to the Hadoop config? > On Saturday, January 13, 2018, 4:56:15 AM PST, Till Rohrmann < > [hidden email]> wrote: > > Hi, > > the flink-connector-filesystem contains the BucketingSink which is a > connector with which you can write your data to a file system. It provides > exactly once processing guarantees and allows to write data to different > buckets [1]. > > The flink-filesystem module contains different file system implementations > (like mapr fs, hdfs or s3). If you want to use, for example, s3 file > system, then there is the flink-s3-fs-hadoop and flink-s3-fs-presto module. > > So if you want to write your data to s3 using the BucketingSink, then you > have to add flink-connector-filesystem for the BucketingSink as well as a > s3 file system implementations (e.g. flink-s3-fs-hadoop or > flink-s3-fs-presto). > > Usually, there should be no need to change Flink's filesystem > implementations. If you want to add a new connector, then this would go to > flink-connectors or to Apache Bahir [2]. > > [1] > https://ci.apache.org/projects/flink/flink-docs-master/dev/connectors/ > filesystem_sink.html > > [2] > https://ci.apache.org/projects/flink/flink-docs- > master/dev/connectors/index.html#connectors-in-apache-bahir > > Cheers, > Till > > On Fri, Jan 12, 2018 at 7:22 PM, cw7k <[hidden email]> wrote: > > > Hi, I'm trying to understand the difference between the flink-filesystem > > and flink-connector-filesystem. How is each intended to be used? > > If adding support for a different storage provider that supports HDFS, > > should additions be made to one or the other, or both? Thanks. > |
In fact, there are two S3FileSystemFactory classes, one for Hadoop and
another one for Presto. In both cases an external file system class is wrapped in Flink's HadoopFileSystem class [1] [2]. Best, Fabian [1] https://github.com/apache/flink/blob/master/flink-filesystems/flink-s3-fs-hadoop/src/main/java/org/apache/flink/fs/s3hadoop/S3FileSystemFactory.java#L132 [2] https://github.com/apache/flink/blob/master/flink-filesystems/flink-s3-fs-presto/src/main/java/org/apache/flink/fs/s3presto/S3FileSystemFactory.java#L131 2018-01-18 1:24 GMT+01:00 cw7k <[hidden email]>: > Thanks. I'm looking at the s3 example and I can only find the > S3FileSystemFactory but not the File System implementation (subclass > of org.apache.flink.core.fs.FileSystem). > Is that requirement still needed? On Wednesday, January 17, 2018, > 3:59:47 PM PST, Fabian Hueske <[hidden email]> wrote: > > Hi, > > please have a look at this doc page [1]. > It describes how to add new file system implementations and also how to > configure them. > > Best, Fabian > > [1] > https://ci.apache.org/projects/flink/flink-docs- > release-1.4/ops/filesystems.html#adding-new-file-system-implementations > > 2018-01-18 0:32 GMT+01:00 cw7k <[hidden email]>: > > > Hi, I'm adding support for more cloud storage providers such as Google > > (gcs://) and Oracle (oci://). > > I have an oci:// test working based on the s3a:// test but when I try it > > on an actual Flink job like WordCount, I get this message: > > "org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Could > not > > find a file system implementation for scheme 'oci'. The scheme is not > > directly supported by Flink and no Hadoop file system to support this > > scheme could be loaded." > > How do I register new schemes into the file system factory? Thanks. > On > > Tuesday, January 16, 2018, 5:27:31 PM PST, cw7k <[hidden email]> > > wrote: > > > > Hi, question on this page: > > "You need to point Flink to a valid Hadoop configuration..."https://ci. > > apache.org/projects/flink/flink-docs-release-1.4/ops/ > > deployment/aws.html#s3-simple-storage-service > > How do you point Flink to the Hadoop config? > > On Saturday, January 13, 2018, 4:56:15 AM PST, Till Rohrmann < > > [hidden email]> wrote: > > > > Hi, > > > > the flink-connector-filesystem contains the BucketingSink which is a > > connector with which you can write your data to a file system. It > provides > > exactly once processing guarantees and allows to write data to different > > buckets [1]. > > > > The flink-filesystem module contains different file system > implementations > > (like mapr fs, hdfs or s3). If you want to use, for example, s3 file > > system, then there is the flink-s3-fs-hadoop and flink-s3-fs-presto > module. > > > > So if you want to write your data to s3 using the BucketingSink, then you > > have to add flink-connector-filesystem for the BucketingSink as well as a > > s3 file system implementations (e.g. flink-s3-fs-hadoop or > > flink-s3-fs-presto). > > > > Usually, there should be no need to change Flink's filesystem > > implementations. If you want to add a new connector, then this would go > to > > flink-connectors or to Apache Bahir [2]. > > > > [1] > > https://ci.apache.org/projects/flink/flink-docs-master/dev/connectors/ > > filesystem_sink.html > > > > [2] > > https://ci.apache.org/projects/flink/flink-docs- > > master/dev/connectors/index.html#connectors-in-apache-bahir > > > > Cheers, > > Till > > > > On Fri, Jan 12, 2018 at 7:22 PM, cw7k <[hidden email]> wrote: > > > > > Hi, I'm trying to understand the difference between the > flink-filesystem > > > and flink-connector-filesystem. How is each intended to be used? > > > If adding support for a different storage provider that supports HDFS, > > > should additions be made to one or the other, or both? Thanks. > > > > |
Thanks. I now have the 3 requirements fulfilled but the scheme isn't being loaded; I get this error:
"Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Could not find a file system implementation for scheme 'oci'. The scheme is not directly supported by Flink and no Hadoop file system to support this scheme could be loaded." What's the best way to debug the loading of the schemes/filesystems by the ServiceLoader? On Thursday, January 18, 2018, 5:09:10 AM PST, Fabian Hueske <[hidden email]> wrote: In fact, there are two S3FileSystemFactory classes, one for Hadoop and another one for Presto. In both cases an external file system class is wrapped in Flink's HadoopFileSystem class [1] [2]. Best, Fabian [1] https://github.com/apache/flink/blob/master/flink-filesystems/flink-s3-fs-hadoop/src/main/java/org/apache/flink/fs/s3hadoop/S3FileSystemFactory.java#L132 [2] https://github.com/apache/flink/blob/master/flink-filesystems/flink-s3-fs-presto/src/main/java/org/apache/flink/fs/s3presto/S3FileSystemFactory.java#L131 2018-01-18 1:24 GMT+01:00 cw7k <[hidden email]>: > Thanks. I'm looking at the s3 example and I can only find the > S3FileSystemFactory but not the File System implementation (subclass > of org.apache.flink.core.fs.FileSystem). > Is that requirement still needed? On Wednesday, January 17, 2018, > 3:59:47 PM PST, Fabian Hueske <[hidden email]> wrote: > > Hi, > > please have a look at this doc page [1]. > It describes how to add new file system implementations and also how to > configure them. > > Best, Fabian > > [1] > https://ci.apache.org/projects/flink/flink-docs- > release-1.4/ops/filesystems.html#adding-new-file-system-implementations > > 2018-01-18 0:32 GMT+01:00 cw7k <[hidden email]>: > > > Hi, I'm adding support for more cloud storage providers such as Google > > (gcs://) and Oracle (oci://). > > I have an oci:// test working based on the s3a:// test but when I try it > > on an actual Flink job like WordCount, I get this message: > > "org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Could > not > > find a file system implementation for scheme 'oci'. The scheme is not > > directly supported by Flink and no Hadoop file system to support this > > scheme could be loaded." > > How do I register new schemes into the file system factory? Thanks. > On > > Tuesday, January 16, 2018, 5:27:31 PM PST, cw7k <[hidden email]> > > wrote: > > > > Hi, question on this page: > > "You need to point Flink to a valid Hadoop configuration..."https://ci. > > apache.org/projects/flink/flink-docs-release-1.4/ops/ > > deployment/aws.html#s3-simple-storage-service > > How do you point Flink to the Hadoop config? > > On Saturday, January 13, 2018, 4:56:15 AM PST, Till Rohrmann < > > [hidden email]> wrote: > > > > Hi, > > > > the flink-connector-filesystem contains the BucketingSink which is a > > connector with which you can write your data to a file system. It > provides > > exactly once processing guarantees and allows to write data to different > > buckets [1]. > > > > The flink-filesystem module contains different file system > implementations > > (like mapr fs, hdfs or s3). If you want to use, for example, s3 file > > system, then there is the flink-s3-fs-hadoop and flink-s3-fs-presto > module. > > > > So if you want to write your data to s3 using the BucketingSink, then you > > have to add flink-connector-filesystem for the BucketingSink as well as a > > s3 file system implementations (e.g. flink-s3-fs-hadoop or > > flink-s3-fs-presto). > > > > Usually, there should be no need to change Flink's filesystem > > implementations. If you want to add a new connector, then this would go > to > > flink-connectors or to Apache Bahir [2]. > > > > [1] > > https://ci.apache.org/projects/flink/flink-docs-master/dev/connectors/ > > filesystem_sink.html > > > > [2] > > https://ci.apache.org/projects/flink/flink-docs- > > master/dev/connectors/index.html#connectors-in-apache-bahir > > > > Cheers, > > Till > > > > On Fri, Jan 12, 2018 at 7:22 PM, cw7k <[hidden email]> wrote: > > > > > Hi, I'm trying to understand the difference between the > flink-filesystem > > > and flink-connector-filesystem. How is each intended to be used? > > > If adding support for a different storage provider that supports HDFS, > > > should additions be made to one or the other, or both? Thanks. > > > > |
Hi, just a bit more info, I have a test function working using oci://, based on the S3 test:
https://github.com/apache/flink/blob/master/flink-filesystems/flink-s3-fs-hadoop/src/test/java/org/apache/flink/fs/s3hadoop/HadoopS3FileSystemITCase.java#L169 However, when I try to get the WordCount example's WriteAsText to write to my new filesystem: https://github.com/apache/flink/blob/master/flink-examples/flink-examples-streaming/src/main/java/org/apache/flink/streaming/examples/wordcount/WordCount.java#L82 that's where I got the "Could not find a file system implementation" error mentioned earlier. On Thursday, January 18, 2018, 10:22:57 AM PST, cw7k <[hidden email]> wrote: Thanks. I now have the 3 requirements fulfilled but the scheme isn't being loaded; I get this error: "Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Could not find a file system implementation for scheme 'oci'. The scheme is not directly supported by Flink and no Hadoop file system to support this scheme could be loaded." What's the best way to debug the loading of the schemes/filesystems by the ServiceLoader? On Thursday, January 18, 2018, 5:09:10 AM PST, Fabian Hueske <[hidden email]> wrote: In fact, there are two S3FileSystemFactory classes, one for Hadoop and another one for Presto. In both cases an external file system class is wrapped in Flink's HadoopFileSystem class [1] [2]. Best, Fabian [1] https://github.com/apache/flink/blob/master/flink-filesystems/flink-s3-fs-hadoop/src/main/java/org/apache/flink/fs/s3hadoop/S3FileSystemFactory.java#L132 [2] https://github.com/apache/flink/blob/master/flink-filesystems/flink-s3-fs-presto/src/main/java/org/apache/flink/fs/s3presto/S3FileSystemFactory.java#L131 2018-01-18 1:24 GMT+01:00 cw7k <[hidden email]>: > Thanks. I'm looking at the s3 example and I can only find the > S3FileSystemFactory but not the File System implementation (subclass > of org.apache.flink.core.fs.FileSystem). > Is that requirement still needed? On Wednesday, January 17, 2018, > 3:59:47 PM PST, Fabian Hueske <[hidden email]> wrote: > > Hi, > > please have a look at this doc page [1]. > It describes how to add new file system implementations and also how to > configure them. > > Best, Fabian > > [1] > https://ci.apache.org/projects/flink/flink-docs- > release-1.4/ops/filesystems.html#adding-new-file-system-implementations > > 2018-01-18 0:32 GMT+01:00 cw7k <[hidden email]>: > > > Hi, I'm adding support for more cloud storage providers such as Google > > (gcs://) and Oracle (oci://). > > I have an oci:// test working based on the s3a:// test but when I try it > > on an actual Flink job like WordCount, I get this message: > > "org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Could > not > > find a file system implementation for scheme 'oci'. The scheme is not > > directly supported by Flink and no Hadoop file system to support this > > scheme could be loaded." > > How do I register new schemes into the file system factory? Thanks. > On > > Tuesday, January 16, 2018, 5:27:31 PM PST, cw7k <[hidden email]> > > wrote: > > > > Hi, question on this page: > > "You need to point Flink to a valid Hadoop configuration..."https://ci. > > apache.org/projects/flink/flink-docs-release-1.4/ops/ > > deployment/aws.html#s3-simple-storage-service > > How do you point Flink to the Hadoop config? > > On Saturday, January 13, 2018, 4:56:15 AM PST, Till Rohrmann < > > [hidden email]> wrote: > > > > Hi, > > > > the flink-connector-filesystem contains the BucketingSink which is a > > connector with which you can write your data to a file system. It > provides > > exactly once processing guarantees and allows to write data to different > > buckets [1]. > > > > The flink-filesystem module contains different file system > implementations > > (like mapr fs, hdfs or s3). If you want to use, for example, s3 file > > system, then there is the flink-s3-fs-hadoop and flink-s3-fs-presto > module. > > > > So if you want to write your data to s3 using the BucketingSink, then you > > have to add flink-connector-filesystem for the BucketingSink as well as a > > s3 file system implementations (e.g. flink-s3-fs-hadoop or > > flink-s3-fs-presto). > > > > Usually, there should be no need to change Flink's filesystem > > implementations. If you want to add a new connector, then this would go > to > > flink-connectors or to Apache Bahir [2]. > > > > [1] > > https://ci.apache.org/projects/flink/flink-docs-master/dev/connectors/ > > filesystem_sink.html > > > > [2] > > https://ci.apache.org/projects/flink/flink-docs- > > master/dev/connectors/index.html#connectors-in-apache-bahir > > > > Cheers, > > Till > > > > On Fri, Jan 12, 2018 at 7:22 PM, cw7k <[hidden email]> wrote: > > > > > Hi, I'm trying to understand the difference between the > flink-filesystem > > > and flink-connector-filesystem. How is each intended to be used? > > > If adding support for a different storage provider that supports HDFS, > > > should additions be made to one or the other, or both? Thanks. > > > > |
Ok, I have the factory working in the WordCount example. I had to move the factory code and META-INF into the WordCount project.
For general Flink jobs, I'm assuming that the goal would be to be able to import the factory from the job itself instead of needing to copy the factory .java file into each project? If so, any guidelines on how to do that? On Thursday, January 18, 2018, 10:53:32 AM PST, cw7k <[hidden email]> wrote: Hi, just a bit more info, I have a test function working using oci://, based on the S3 test: https://github.com/apache/flink/blob/master/flink-filesystems/flink-s3-fs-hadoop/src/test/java/org/apache/flink/fs/s3hadoop/HadoopS3FileSystemITCase.java#L169 However, when I try to get the WordCount example's WriteAsText to write to my new filesystem: https://github.com/apache/flink/blob/master/flink-examples/flink-examples-streaming/src/main/java/org/apache/flink/streaming/examples/wordcount/WordCount.java#L82 that's where I got the "Could not find a file system implementation" error mentioned earlier. On Thursday, January 18, 2018, 10:22:57 AM PST, cw7k <[hidden email]> wrote: Thanks. I now have the 3 requirements fulfilled but the scheme isn't being loaded; I get this error: "Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Could not find a file system implementation for scheme 'oci'. The scheme is not directly supported by Flink and no Hadoop file system to support this scheme could be loaded." What's the best way to debug the loading of the schemes/filesystems by the ServiceLoader? On Thursday, January 18, 2018, 5:09:10 AM PST, Fabian Hueske <[hidden email]> wrote: In fact, there are two S3FileSystemFactory classes, one for Hadoop and another one for Presto. In both cases an external file system class is wrapped in Flink's HadoopFileSystem class [1] [2]. Best, Fabian [1] https://github.com/apache/flink/blob/master/flink-filesystems/flink-s3-fs-hadoop/src/main/java/org/apache/flink/fs/s3hadoop/S3FileSystemFactory.java#L132 [2] https://github.com/apache/flink/blob/master/flink-filesystems/flink-s3-fs-presto/src/main/java/org/apache/flink/fs/s3presto/S3FileSystemFactory.java#L131 2018-01-18 1:24 GMT+01:00 cw7k <[hidden email]>: > Thanks. I'm looking at the s3 example and I can only find the > S3FileSystemFactory but not the File System implementation (subclass > of org.apache.flink.core.fs.FileSystem). > Is that requirement still needed? On Wednesday, January 17, 2018, > 3:59:47 PM PST, Fabian Hueske <[hidden email]> wrote: > > Hi, > > please have a look at this doc page [1]. > It describes how to add new file system implementations and also how to > configure them. > > Best, Fabian > > [1] > https://ci.apache.org/projects/flink/flink-docs- > release-1.4/ops/filesystems.html#adding-new-file-system-implementations > > 2018-01-18 0:32 GMT+01:00 cw7k <[hidden email]>: > > > Hi, I'm adding support for more cloud storage providers such as Google > > (gcs://) and Oracle (oci://). > > I have an oci:// test working based on the s3a:// test but when I try it > > on an actual Flink job like WordCount, I get this message: > > "org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Could > not > > find a file system implementation for scheme 'oci'. The scheme is not > > directly supported by Flink and no Hadoop file system to support this > > scheme could be loaded." > > How do I register new schemes into the file system factory? Thanks. > On > > Tuesday, January 16, 2018, 5:27:31 PM PST, cw7k <[hidden email]> > > wrote: > > > > Hi, question on this page: > > "You need to point Flink to a valid Hadoop configuration..."https://ci. > > apache.org/projects/flink/flink-docs-release-1.4/ops/ > > deployment/aws.html#s3-simple-storage-service > > How do you point Flink to the Hadoop config? > > On Saturday, January 13, 2018, 4:56:15 AM PST, Till Rohrmann < > > [hidden email]> wrote: > > > > Hi, > > > > the flink-connector-filesystem contains the BucketingSink which is a > > connector with which you can write your data to a file system. It > provides > > exactly once processing guarantees and allows to write data to different > > buckets [1]. > > > > The flink-filesystem module contains different file system > implementations > > (like mapr fs, hdfs or s3). If you want to use, for example, s3 file > > system, then there is the flink-s3-fs-hadoop and flink-s3-fs-presto > module. > > > > So if you want to write your data to s3 using the BucketingSink, then you > > have to add flink-connector-filesystem for the BucketingSink as well as a > > s3 file system implementations (e.g. flink-s3-fs-hadoop or > > flink-s3-fs-presto). > > > > Usually, there should be no need to change Flink's filesystem > > implementations. If you want to add a new connector, then this would go > to > > flink-connectors or to Apache Bahir [2]. > > > > [1] > > https://ci.apache.org/projects/flink/flink-docs-master/dev/connectors/ > > filesystem_sink.html > > > > [2] > > https://ci.apache.org/projects/flink/flink-docs- > > master/dev/connectors/index.html#connectors-in-apache-bahir > > > > Cheers, > > Till > > > > On Fri, Jan 12, 2018 at 7:22 PM, cw7k <[hidden email]> wrote: > > > > > Hi, I'm trying to understand the difference between the > flink-filesystem > > > and flink-connector-filesystem. How is each intended to be used? > > > If adding support for a different storage provider that supports HDFS, > > > should additions be made to one or the other, or both? Thanks. > > > > |
In reply to this post by cw7k
Hi,
to do that you can set the env variable HADOOP_CONF_DIR: https://ci.apache.org/projects/flink/flink-docs-master/ops/config.html#hdfs Best, Gary On Wed, Jan 17, 2018 at 2:27 AM, cw7k <[hidden email]> wrote: > Hi, question on this page: > "You need to point Flink to a valid Hadoop configuration..."https://ci. > apache.org/projects/flink/flink-docs-release-1.4/ops/ > deployment/aws.html#s3-simple-storage-service > How do you point Flink to the Hadoop config? > On Saturday, January 13, 2018, 4:56:15 AM PST, Till Rohrmann < > [hidden email]> wrote: > > Hi, > > the flink-connector-filesystem contains the BucketingSink which is a > connector with which you can write your data to a file system. It provides > exactly once processing guarantees and allows to write data to different > buckets [1]. > > The flink-filesystem module contains different file system implementations > (like mapr fs, hdfs or s3). If you want to use, for example, s3 file > system, then there is the flink-s3-fs-hadoop and flink-s3-fs-presto module. > > So if you want to write your data to s3 using the BucketingSink, then you > have to add flink-connector-filesystem for the BucketingSink as well as a > s3 file system implementations (e.g. flink-s3-fs-hadoop or > flink-s3-fs-presto). > > Usually, there should be no need to change Flink's filesystem > implementations. If you want to add a new connector, then this would go to > flink-connectors or to Apache Bahir [2]. > > [1] > https://ci.apache.org/projects/flink/flink-docs-master/dev/connectors/ > filesystem_sink.html > > [2] > https://ci.apache.org/projects/flink/flink-docs- > master/dev/connectors/index.html#connectors-in-apache-bahir > > Cheers, > Till > > On Fri, Jan 12, 2018 at 7:22 PM, cw7k <[hidden email]> wrote: > > > Hi, I'm trying to understand the difference between the flink-filesystem > > and flink-connector-filesystem. How is each intended to be used? > > If adding support for a different storage provider that supports HDFS, > > should additions be made to one or the other, or both? Thanks. > > |
In reply to this post by cw7k
Great! Thanks for reporting back.
2018-01-19 1:43 GMT+01:00 cw7k <[hidden email]>: > Ok, I have the factory working in the WordCount example. I had to move > the factory code and META-INF into the WordCount project. > For general Flink jobs, I'm assuming that the goal would be to be able to > import the factory from the job itself instead of needing to copy the > factory .java file into each project? If so, any guidelines on how to do > that? On Thursday, January 18, 2018, 10:53:32 AM PST, cw7k > <[hidden email]> wrote: > > Hi, just a bit more info, I have a test function working using oci://, > based on the S3 test: > https://github.com/apache/flink/blob/master/flink-filesystems/flink-s3-fs- > hadoop/src/test/java/org/apache/flink/fs/s3hadoop/ > HadoopS3FileSystemITCase.java#L169 > However, when I try to get the WordCount example's WriteAsText to write to > my new filesystem: > https://github.com/apache/flink/blob/master/flink-examples/flink-examples- > streaming/src/main/java/org/apache/flink/streaming/ > examples/wordcount/WordCount.java#L82 > > that's where I got the "Could not find a file system implementation" error > mentioned earlier. > > On Thursday, January 18, 2018, 10:22:57 AM PST, cw7k > <[hidden email]> wrote: > > Thanks. I now have the 3 requirements fulfilled but the scheme isn't > being loaded; I get this error: > "Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: > Could not find a file system implementation for scheme 'oci'. The scheme is > not directly supported by Flink and no Hadoop file system to support this > scheme could be loaded." > What's the best way to debug the loading of the schemes/filesystems by the > ServiceLoader? On Thursday, January 18, 2018, 5:09:10 AM PST, Fabian > Hueske <[hidden email]> wrote: > > In fact, there are two S3FileSystemFactory classes, one for Hadoop and > another one for Presto. > In both cases an external file system class is wrapped in Flink's > HadoopFileSystem class [1] [2]. > > Best, Fabian > > [1] > https://github.com/apache/flink/blob/master/flink-filesystems/flink-s3-fs- > hadoop/src/main/java/org/apache/flink/fs/s3hadoop/ > S3FileSystemFactory.java#L132 > [2] > https://github.com/apache/flink/blob/master/flink-filesystems/flink-s3-fs- > presto/src/main/java/org/apache/flink/fs/s3presto/ > S3FileSystemFactory.java#L131 > > 2018-01-18 1:24 GMT+01:00 cw7k <[hidden email]>: > > > Thanks. I'm looking at the s3 example and I can only find the > > S3FileSystemFactory but not the File System implementation (subclass > > of org.apache.flink.core.fs.FileSystem). > > Is that requirement still needed? On Wednesday, January 17, 2018, > > 3:59:47 PM PST, Fabian Hueske <[hidden email]> wrote: > > > > Hi, > > > > please have a look at this doc page [1]. > > It describes how to add new file system implementations and also how to > > configure them. > > > > Best, Fabian > > > > [1] > > https://ci.apache.org/projects/flink/flink-docs- > > release-1.4/ops/filesystems.html#adding-new-file-system-implementations > > > > 2018-01-18 0:32 GMT+01:00 cw7k <[hidden email]>: > > > > > Hi, I'm adding support for more cloud storage providers such as Google > > > (gcs://) and Oracle (oci://). > > > I have an oci:// test working based on the s3a:// test but when I try > it > > > on an actual Flink job like WordCount, I get this message: > > > "org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Could > > not > > > find a file system implementation for scheme 'oci'. The scheme is not > > > directly supported by Flink and no Hadoop file system to support this > > > scheme could be loaded." > > > How do I register new schemes into the file system factory? Thanks. > > On > > > Tuesday, January 16, 2018, 5:27:31 PM PST, cw7k <[hidden email] > > > > > wrote: > > > > > > Hi, question on this page: > > > "You need to point Flink to a valid Hadoop configuration..."https://ci > . > > > apache.org/projects/flink/flink-docs-release-1.4/ops/ > > > deployment/aws.html#s3-simple-storage-service > > > How do you point Flink to the Hadoop config? > > > On Saturday, January 13, 2018, 4:56:15 AM PST, Till Rohrmann < > > > [hidden email]> wrote: > > > > > > Hi, > > > > > > the flink-connector-filesystem contains the BucketingSink which is a > > > connector with which you can write your data to a file system. It > > provides > > > exactly once processing guarantees and allows to write data to > different > > > buckets [1]. > > > > > > The flink-filesystem module contains different file system > > implementations > > > (like mapr fs, hdfs or s3). If you want to use, for example, s3 file > > > system, then there is the flink-s3-fs-hadoop and flink-s3-fs-presto > > module. > > > > > > So if you want to write your data to s3 using the BucketingSink, then > you > > > have to add flink-connector-filesystem for the BucketingSink as well > as a > > > s3 file system implementations (e.g. flink-s3-fs-hadoop or > > > flink-s3-fs-presto). > > > > > > Usually, there should be no need to change Flink's filesystem > > > implementations. If you want to add a new connector, then this would go > > to > > > flink-connectors or to Apache Bahir [2]. > > > > > > [1] > > > https://ci.apache.org/projects/flink/flink-docs-master/dev/connectors/ > > > filesystem_sink.html > > > > > > [2] > > > https://ci.apache.org/projects/flink/flink-docs- > > > master/dev/connectors/index.html#connectors-in-apache-bahir > > > > > > Cheers, > > > Till > > > > > > On Fri, Jan 12, 2018 at 7:22 PM, cw7k <[hidden email]> wrote: > > > > > > > Hi, I'm trying to understand the difference between the > > flink-filesystem > > > > and flink-connector-filesystem. How is each intended to be used? > > > > If adding support for a different storage provider that supports > HDFS, > > > > should additions be made to one or the other, or both? Thanks. > > > > > > > > > |
Hi, I ran the WordCount batch program and noticed the output was split into 5 files.Is there documentation on how the splitting is done and how to tweak it? On Friday, January 19, 2018, 12:06:45 AM PST, Fabian Hueske <[hidden email]> wrote:
Great! Thanks for reporting back. 2018-01-19 1:43 GMT+01:00 cw7k <[hidden email]>: > Ok, I have the factory working in the WordCount example. I had to move > the factory code and META-INF into the WordCount project. > For general Flink jobs, I'm assuming that the goal would be to be able to > import the factory from the job itself instead of needing to copy the > factory .java file into each project? If so, any guidelines on how to do > that? On Thursday, January 18, 2018, 10:53:32 AM PST, cw7k > <[hidden email]> wrote: > > Hi, just a bit more info, I have a test function working using oci://, > based on the S3 test: > https://github.com/apache/flink/blob/master/flink-filesystems/flink-s3-fs- > hadoop/src/test/java/org/apache/flink/fs/s3hadoop/ > HadoopS3FileSystemITCase.java#L169 > However, when I try to get the WordCount example's WriteAsText to write to > my new filesystem: > https://github.com/apache/flink/blob/master/flink-examples/flink-examples- > streaming/src/main/java/org/apache/flink/streaming/ > examples/wordcount/WordCount.java#L82 > > that's where I got the "Could not find a file system implementation" error > mentioned earlier. > > On Thursday, January 18, 2018, 10:22:57 AM PST, cw7k > <[hidden email]> wrote: > > Thanks. I now have the 3 requirements fulfilled but the scheme isn't > being loaded; I get this error: > "Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: > Could not find a file system implementation for scheme 'oci'. The scheme is > not directly supported by Flink and no Hadoop file system to support this > scheme could be loaded." > What's the best way to debug the loading of the schemes/filesystems by the > ServiceLoader? On Thursday, January 18, 2018, 5:09:10 AM PST, Fabian > Hueske <[hidden email]> wrote: > > In fact, there are two S3FileSystemFactory classes, one for Hadoop and > another one for Presto. > In both cases an external file system class is wrapped in Flink's > HadoopFileSystem class [1] [2]. > > Best, Fabian > > [1] > https://github.com/apache/flink/blob/master/flink-filesystems/flink-s3-fs- > hadoop/src/main/java/org/apache/flink/fs/s3hadoop/ > S3FileSystemFactory.java#L132 > [2] > https://github.com/apache/flink/blob/master/flink-filesystems/flink-s3-fs- > presto/src/main/java/org/apache/flink/fs/s3presto/ > S3FileSystemFactory.java#L131 > > 2018-01-18 1:24 GMT+01:00 cw7k <[hidden email]>: > > > Thanks. I'm looking at the s3 example and I can only find the > > S3FileSystemFactory but not the File System implementation (subclass > > of org.apache.flink.core.fs.FileSystem). > > Is that requirement still needed? On Wednesday, January 17, 2018, > > 3:59:47 PM PST, Fabian Hueske <[hidden email]> wrote: > > > > Hi, > > > > please have a look at this doc page [1]. > > It describes how to add new file system implementations and also how to > > configure them. > > > > Best, Fabian > > > > [1] > > https://ci.apache.org/projects/flink/flink-docs- > > release-1.4/ops/filesystems.html#adding-new-file-system-implementations > > > > 2018-01-18 0:32 GMT+01:00 cw7k <[hidden email]>: > > > > > Hi, I'm adding support for more cloud storage providers such as Google > > > (gcs://) and Oracle (oci://). > > > I have an oci:// test working based on the s3a:// test but when I try > it > > > on an actual Flink job like WordCount, I get this message: > > > "org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Could > > not > > > find a file system implementation for scheme 'oci'. The scheme is not > > > directly supported by Flink and no Hadoop file system to support this > > > scheme could be loaded." > > > How do I register new schemes into the file system factory? Thanks. > > On > > > Tuesday, January 16, 2018, 5:27:31 PM PST, cw7k <[hidden email] > > > > > wrote: > > > > > > Hi, question on this page: > > > "You need to point Flink to a valid Hadoop configuration..."https://ci > . > > > apache.org/projects/flink/flink-docs-release-1.4/ops/ > > > deployment/aws.html#s3-simple-storage-service > > > How do you point Flink to the Hadoop config? > > > On Saturday, January 13, 2018, 4:56:15 AM PST, Till Rohrmann < > > > [hidden email]> wrote: > > > > > > Hi, > > > > > > the flink-connector-filesystem contains the BucketingSink which is a > > > connector with which you can write your data to a file system. It > > provides > > > exactly once processing guarantees and allows to write data to > different > > > buckets [1]. > > > > > > The flink-filesystem module contains different file system > > implementations > > > (like mapr fs, hdfs or s3). If you want to use, for example, s3 file > > > system, then there is the flink-s3-fs-hadoop and flink-s3-fs-presto > > module. > > > > > > So if you want to write your data to s3 using the BucketingSink, then > you > > > have to add flink-connector-filesystem for the BucketingSink as well > as a > > > s3 file system implementations (e.g. flink-s3-fs-hadoop or > > > flink-s3-fs-presto). > > > > > > Usually, there should be no need to change Flink's filesystem > > > implementations. If you want to add a new connector, then this would go > > to > > > flink-connectors or to Apache Bahir [2]. > > > > > > [1] > > > https://ci.apache.org/projects/flink/flink-docs-master/dev/connectors/ > > > filesystem_sink.html > > > > > > [2] > > > https://ci.apache.org/projects/flink/flink-docs- > > > master/dev/connectors/index.html#connectors-in-apache-bahir > > > > > > Cheers, > > > Till > > > > > > On Fri, Jan 12, 2018 at 7:22 PM, cw7k <[hidden email]> wrote: > > > > > > > Hi, I'm trying to understand the difference between the > > flink-filesystem > > > > and flink-connector-filesystem. How is each intended to be used? > > > > If adding support for a different storage provider that supports > HDFS, > > > > should additions be made to one or the other, or both? Thanks. > > > > > > > > > |
In DataSet (batch) programs, FileOutputFormats write one output file for
each parallel operator instance. If your operator runs with a parallelism of 8, the output is split across 8 files. 2018-01-22 23:42 GMT+01:00 cw7k <[hidden email]>: > Hi, I ran the WordCount batch program and noticed the output was split > into 5 files.Is there documentation on how the splitting is done and how to > tweak it? On Friday, January 19, 2018, 12:06:45 AM PST, Fabian Hueske < > [hidden email]> wrote: > > Great! Thanks for reporting back. > > 2018-01-19 1:43 GMT+01:00 cw7k <[hidden email]>: > > > Ok, I have the factory working in the WordCount example. I had to move > > the factory code and META-INF into the WordCount project. > > For general Flink jobs, I'm assuming that the goal would be to be able to > > import the factory from the job itself instead of needing to copy the > > factory .java file into each project? If so, any guidelines on how to do > > that? On Thursday, January 18, 2018, 10:53:32 AM PST, cw7k > > <[hidden email]> wrote: > > > > Hi, just a bit more info, I have a test function working using oci://, > > based on the S3 test: > > https://github.com/apache/flink/blob/master/flink- > filesystems/flink-s3-fs- > > hadoop/src/test/java/org/apache/flink/fs/s3hadoop/ > > HadoopS3FileSystemITCase.java#L169 > > However, when I try to get the WordCount example's WriteAsText to write > to > > my new filesystem: > > https://github.com/apache/flink/blob/master/flink- > examples/flink-examples- > > streaming/src/main/java/org/apache/flink/streaming/ > > examples/wordcount/WordCount.java#L82 > > > > that's where I got the "Could not find a file system implementation" > error > > mentioned earlier. > > > > On Thursday, January 18, 2018, 10:22:57 AM PST, cw7k > > <[hidden email]> wrote: > > > > Thanks. I now have the 3 requirements fulfilled but the scheme isn't > > being loaded; I get this error: > > "Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeExc > eption: > > Could not find a file system implementation for scheme 'oci'. The scheme > is > > not directly supported by Flink and no Hadoop file system to support this > > scheme could be loaded." > > What's the best way to debug the loading of the schemes/filesystems by > the > > ServiceLoader? On Thursday, January 18, 2018, 5:09:10 AM PST, Fabian > > Hueske <[hidden email]> wrote: > > > > In fact, there are two S3FileSystemFactory classes, one for Hadoop and > > another one for Presto. > > In both cases an external file system class is wrapped in Flink's > > HadoopFileSystem class [1] [2]. > > > > Best, Fabian > > > > [1] > > https://github.com/apache/flink/blob/master/flink- > filesystems/flink-s3-fs- > > hadoop/src/main/java/org/apache/flink/fs/s3hadoop/ > > S3FileSystemFactory.java#L132 > > [2] > > https://github.com/apache/flink/blob/master/flink- > filesystems/flink-s3-fs- > > presto/src/main/java/org/apache/flink/fs/s3presto/ > > S3FileSystemFactory.java#L131 > > > > 2018-01-18 1:24 GMT+01:00 cw7k <[hidden email]>: > > > > > Thanks. I'm looking at the s3 example and I can only find the > > > S3FileSystemFactory but not the File System implementation (subclass > > > of org.apache.flink.core.fs.FileSystem). > > > Is that requirement still needed? On Wednesday, January 17, 2018, > > > 3:59:47 PM PST, Fabian Hueske <[hidden email]> wrote: > > > > > > Hi, > > > > > > please have a look at this doc page [1]. > > > It describes how to add new file system implementations and also how to > > > configure them. > > > > > > Best, Fabian > > > > > > [1] > > > https://ci.apache.org/projects/flink/flink-docs- > > > release-1.4/ops/filesystems.html#adding-new-file-system- > implementations > > > > > > 2018-01-18 0:32 GMT+01:00 cw7k <[hidden email]>: > > > > > > > Hi, I'm adding support for more cloud storage providers such as > > > > (gcs://) and Oracle (oci://). > > > > I have an oci:// test working based on the s3a:// test but when I try > > it > > > > on an actual Flink job like WordCount, I get this message: > > > > "org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: > Could > > > not > > > > find a file system implementation for scheme 'oci'. The scheme is not > > > > directly supported by Flink and no Hadoop file system to support this > > > > scheme could be loaded." > > > > How do I register new schemes into the file system factory? Thanks. > > > On > > > > Tuesday, January 16, 2018, 5:27:31 PM PST, cw7k > <[hidden email] > > > > > > > wrote: > > > > > > > > Hi, question on this page: > > > > "You need to point Flink to a valid Hadoop configuration..." > https://ci > > . > > > > apache.org/projects/flink/flink-docs-release-1.4/ops/ > > > > deployment/aws.html#s3-simple-storage-service > > > > How do you point Flink to the Hadoop config? > > > > On Saturday, January 13, 2018, 4:56:15 AM PST, Till Rohrmann < > > > > [hidden email]> wrote: > > > > > > > > Hi, > > > > > > > > the flink-connector-filesystem contains the BucketingSink which is a > > > > connector with which you can write your data to a file system. It > > > provides > > > > exactly once processing guarantees and allows to write data to > > different > > > > buckets [1]. > > > > > > > > The flink-filesystem module contains different file system > > > implementations > > > > (like mapr fs, hdfs or s3). If you want to use, for example, s3 file > > > > system, then there is the flink-s3-fs-hadoop and flink-s3-fs-presto > > > module. > > > > > > > > So if you want to write your data to s3 using the BucketingSink, then > > you > > > > have to add flink-connector-filesystem for the BucketingSink as well > > as a > > > > s3 file system implementations (e.g. flink-s3-fs-hadoop or > > > > flink-s3-fs-presto). > > > > > > > > Usually, there should be no need to change Flink's filesystem > > > > implementations. If you want to add a new connector, then this would > go > > > to > > > > flink-connectors or to Apache Bahir [2]. > > > > > > > > [1] > > > > https://ci.apache.org/projects/flink/flink-docs- > master/dev/connectors/ > > > > filesystem_sink.html > > > > > > > > [2] > > > > https://ci.apache.org/projects/flink/flink-docs- > > > > master/dev/connectors/index.html#connectors-in-apache-bahir > > > > > > > > Cheers, > > > > Till > > > > > > > > On Fri, Jan 12, 2018 at 7:22 PM, cw7k <[hidden email]> > wrote: > > > > > > > > > Hi, I'm trying to understand the difference between the > > > flink-filesystem > > > > > and flink-connector-filesystem. How is each intended to be used? > > > > > If adding support for a different storage provider that supports > > HDFS, > > > > > should additions be made to one or the other, or both? Thanks. > > > > > > > > > > > > > > > |
Free forum by Nabble | Edit this page |