Hello,
I'm working on an implementation of ORC BulkWriter[1]. As of now, I have the entire implementation in a separate module called "flink-orc-compress" under "flink-formats" since I'm not entirely sure whether it should go into the existing ORC modules i.e flink-orc & flink-orc-nohive. So my questions are: 1. What's the difference between these two ORC modules? 2. Should the ORC BulkWriter implementation go into one of these existing modules? If yes, which one? Or can we keep it in a separate module to avoid duplicating or causing any conflicts? Note: My current implementation of ORC BulkWriter uses orc-core with nohive classifier as the dependency. [1] https://issues.apache.org/jira/browse/FLINK-10114 |
Hi,
Maybe you should use flink-orc. And use orc-core instead of orc-core with nohive classifier. We can provide nohive version in the future. Because orc and hive are so close, orc still relies on some classes of hive currently. Apache orc with nohive classifier is for create a variant of core and mapreduce jars that don't conflict with hive 1.x [1] So the orc and orc-nohive have same class name, but orc-nohive shade/relocation lots of classes, like "ColumnVector" and "VectorizedRowBatch". Now the flink-orc-nohive depends on flink-orc, they share lots of codes. They can not be unified to a separate module, there will be a lot of conflicts. [1]https://issues.apache.org/jira/browse/ORC-174 Best, Jingsong Lee On Tue, Apr 14, 2020 at 3:36 PM Sivaprasanna <[hidden email]> wrote: > Hello, > > I'm working on an implementation of ORC BulkWriter[1]. As of now, I have > the entire implementation in a separate module called "flink-orc-compress" > under "flink-formats" since I'm not entirely sure whether it should go into > the existing ORC modules i.e flink-orc & flink-orc-nohive. > > So my questions are: > 1. What's the difference between these two ORC modules? > 2. Should the ORC BulkWriter implementation go into one of these existing > modules? If yes, which one? Or can we keep it in a separate module to avoid > duplicating or causing any conflicts? > > Note: My current implementation of ORC BulkWriter uses orc-core with nohive > classifier as the dependency. > > [1] https://issues.apache.org/jira/browse/FLINK-10114 > -- Best, Jingsong Lee |
On a similar note, I just checked that the Flink currently uses orc 1.4.3
in the dependencies. IMO, it is a little outdated. Can we bump the ORC version to a slightly newer version - maybe 1.5.x or even 1.6.0? - Sivaprasanna On Tue, Apr 14, 2020 at 1:42 PM Jingsong Li <[hidden email]> wrote: > Hi, > > Maybe you should use flink-orc. And use orc-core instead of orc-core with > nohive classifier. We can provide nohive version in the future. > > Because orc and hive are so close, orc still relies on some classes of hive > currently. > Apache orc with nohive classifier is for create a variant of core and > mapreduce jars that don't conflict with hive 1.x [1] > > So the orc and orc-nohive have same class name, but orc-nohive > shade/relocation lots of classes, like "ColumnVector" and > "VectorizedRowBatch". > Now the flink-orc-nohive depends on flink-orc, they share lots of codes. > They can not be unified to a separate module, there will be a lot of > conflicts. > > [1]https://issues.apache.org/jira/browse/ORC-174 > > Best, > Jingsong Lee > > On Tue, Apr 14, 2020 at 3:36 PM Sivaprasanna <[hidden email]> > wrote: > > > Hello, > > > > I'm working on an implementation of ORC BulkWriter[1]. As of now, I have > > the entire implementation in a separate module called > "flink-orc-compress" > > under "flink-formats" since I'm not entirely sure whether it should go > into > > the existing ORC modules i.e flink-orc & flink-orc-nohive. > > > > So my questions are: > > 1. What's the difference between these two ORC modules? > > 2. Should the ORC BulkWriter implementation go into one of these existing > > modules? If yes, which one? Or can we keep it in a separate module to > avoid > > duplicating or causing any conflicts? > > > > Note: My current implementation of ORC BulkWriter uses orc-core with > nohive > > classifier as the dependency. > > > > [1] https://issues.apache.org/jira/browse/FLINK-10114 > > > > > -- > Best, Jingsong Lee > |
Hi, yes, we can bump orc-core version to a newer.
Best, Jingsong Lee On Tue, Apr 14, 2020 at 8:16 PM Sivaprasanna <[hidden email]> wrote: > On a similar note, I just checked that the Flink currently uses orc 1.4.3 > in the dependencies. IMO, it is a little outdated. Can we bump the ORC > version to a slightly newer version - maybe 1.5.x or even 1.6.0? > > - > Sivaprasanna > > On Tue, Apr 14, 2020 at 1:42 PM Jingsong Li <[hidden email]> > wrote: > > > Hi, > > > > Maybe you should use flink-orc. And use orc-core instead of orc-core with > > nohive classifier. We can provide nohive version in the future. > > > > Because orc and hive are so close, orc still relies on some classes of > hive > > currently. > > Apache orc with nohive classifier is for create a variant of core and > > mapreduce jars that don't conflict with hive 1.x [1] > > > > So the orc and orc-nohive have same class name, but orc-nohive > > shade/relocation lots of classes, like "ColumnVector" and > > "VectorizedRowBatch". > > Now the flink-orc-nohive depends on flink-orc, they share lots of codes. > > They can not be unified to a separate module, there will be a lot of > > conflicts. > > > > [1]https://issues.apache.org/jira/browse/ORC-174 > > > > Best, > > Jingsong Lee > > > > On Tue, Apr 14, 2020 at 3:36 PM Sivaprasanna <[hidden email]> > > wrote: > > > > > Hello, > > > > > > I'm working on an implementation of ORC BulkWriter[1]. As of now, I > have > > > the entire implementation in a separate module called > > "flink-orc-compress" > > > under "flink-formats" since I'm not entirely sure whether it should go > > into > > > the existing ORC modules i.e flink-orc & flink-orc-nohive. > > > > > > So my questions are: > > > 1. What's the difference between these two ORC modules? > > > 2. Should the ORC BulkWriter implementation go into one of these > existing > > > modules? If yes, which one? Or can we keep it in a separate module to > > avoid > > > duplicating or causing any conflicts? > > > > > > Note: My current implementation of ORC BulkWriter uses orc-core with > > nohive > > > classifier as the dependency. > > > > > > [1] https://issues.apache.org/jira/browse/FLINK-10114 > > > > > > > > > -- > > Best, Jingsong Lee > > > -- Best, Jingsong Lee |
I have created a ticket to update the ORC version.
https://issues.apache.org/jira/browse/FLINK-17142 On Tue, Apr 14, 2020 at 8:18 PM Jingsong Li <[hidden email]> wrote: > Hi, yes, we can bump orc-core version to a newer. > > Best, > Jingsong Lee > > On Tue, Apr 14, 2020 at 8:16 PM Sivaprasanna <[hidden email]> > wrote: > > > On a similar note, I just checked that the Flink currently uses orc 1.4.3 > > in the dependencies. IMO, it is a little outdated. Can we bump the ORC > > version to a slightly newer version - maybe 1.5.x or even 1.6.0? > > > > - > > Sivaprasanna > > > > On Tue, Apr 14, 2020 at 1:42 PM Jingsong Li <[hidden email]> > > wrote: > > > > > Hi, > > > > > > Maybe you should use flink-orc. And use orc-core instead of orc-core > with > > > nohive classifier. We can provide nohive version in the future. > > > > > > Because orc and hive are so close, orc still relies on some classes of > > hive > > > currently. > > > Apache orc with nohive classifier is for create a variant of core and > > > mapreduce jars that don't conflict with hive 1.x [1] > > > > > > So the orc and orc-nohive have same class name, but orc-nohive > > > shade/relocation lots of classes, like "ColumnVector" and > > > "VectorizedRowBatch". > > > Now the flink-orc-nohive depends on flink-orc, they share lots of > codes. > > > They can not be unified to a separate module, there will be a lot of > > > conflicts. > > > > > > [1]https://issues.apache.org/jira/browse/ORC-174 > > > > > > Best, > > > Jingsong Lee > > > > > > On Tue, Apr 14, 2020 at 3:36 PM Sivaprasanna < > [hidden email]> > > > wrote: > > > > > > > Hello, > > > > > > > > I'm working on an implementation of ORC BulkWriter[1]. As of now, I > > have > > > > the entire implementation in a separate module called > > > "flink-orc-compress" > > > > under "flink-formats" since I'm not entirely sure whether it should > go > > > into > > > > the existing ORC modules i.e flink-orc & flink-orc-nohive. > > > > > > > > So my questions are: > > > > 1. What's the difference between these two ORC modules? > > > > 2. Should the ORC BulkWriter implementation go into one of these > > existing > > > > modules? If yes, which one? Or can we keep it in a separate module to > > > avoid > > > > duplicating or causing any conflicts? > > > > > > > > Note: My current implementation of ORC BulkWriter uses orc-core with > > > nohive > > > > classifier as the dependency. > > > > > > > > [1] https://issues.apache.org/jira/browse/FLINK-10114 > > > > > > > > > > > > > -- > > > Best, Jingsong Lee > > > > > > > > -- > Best, Jingsong Lee > |
Free forum by Nabble | Edit this page |