Hi guys,
Apache Arrow provides a cross-language, standardized, columnar, memory format for data. So it is highly desirable to import Arrow to Flink, and make use of its memory layout and memory management facilities. More background on this can be found in https://issues.apache.org/jira/browse/FLINK-10929 The problem is that, incorporating Arrow to Flink is a big move, and may affect many modules of Flink. In our efforts to vectorize Blink batch jobs, we have tried to incorporate Arrow to Flink. Those were incremental changes, which can be easily turned off with a single flag, and the changes are transparent to other parts of the code base. This is the draft of our design document: https://docs.google.com/document/d/1LstaiGzlzTdGUmyG_-9GSWleKpLRid8SJWXQCTqcQB4/edit?usp=sharing For the first step, I suggest providing a flag which is disabled by default, and let the MemoryManager depend on the Arrow Buffer Allocator. With this change, all the MemorySegment will be based on Arrow buffers, but this is transparent to other components, and should never break them. After this step, we can apply the changes described in the design documents, incrementally. This is our initial thoughts. Would you please give your valuable comments? Best, Liya Fan |
Very BIG +1 for adoption of Apache Arrow. This would simplify a lot the
integration with other tools On Thu, Apr 11, 2019 at 2:21 PM Run <[hidden email]> wrote: > Hi guys, > > > Apache Arrow provides a cross-language, standardized, columnar, memory > format for data. > So it is highly desirable to import Arrow to Flink, and make use of its > memory layout and memory management facilities. > More background on this can be found in > https://issues.apache.org/jira/browse/FLINK-10929 > > > The problem is that, incorporating Arrow to Flink is a big move, and may > affect many modules of Flink. > In our efforts to vectorize Blink batch jobs, we have tried to incorporate > Arrow to Flink. Those were incremental changes, which can be easily turned > off with a single flag, and the changes are transparent to other parts of > the code base. > > > This is the draft of our design document: > > https://docs.google.com/document/d/1LstaiGzlzTdGUmyG_-9GSWleKpLRid8SJWXQCTqcQB4/edit?usp=sharing > > > For the first step, I suggest providing a flag which is disabled by > default, and let the MemoryManager depend on the Arrow Buffer Allocator. > With this change, all the MemorySegment will be based on Arrow buffers, but > this is transparent to other components, and should never break them. > > > After this step, we can apply the changes described in the design > documents, incrementally. > This is our initial thoughts. Would you please give your valuable comments? > > > Best, > Liya Fan |
Free forum by Nabble | Edit this page |