Hi,
I am trying to design a little prototype with Flink and Apache Edgent ( http://edgent.apache.org/) and I would like some help on the direction for it. I am running Flink at my laptop and Edgent on my Raspberry Pi with a simple filter for a proximity sensor ( https://github.com/felipegutierrez/explore-rpi/blob/master/src/main/java/org/sense/edgent/app/UltrasonicEdgentApp.java ). My idea is to push down the filter operator from Flink to the Raspberry Pi which is running Apache Edgent. With this in mind, where do you guys advise me to start? I have some ideas to study... 1 - Try to get the list of operators that Flink is about to execute on the JobManager. source: https://ci.apache.org/projects/flink/flink-docs-stable/internals/job_scheduling.html 2 - Implement a connector to Apache Edgent in order to exchange messages between Flink and Edgent. Do you guys think in another source that is interesting regarding my prototype? Thanks, Felipe *--* *-- Felipe Gutierrez* *-- skype: felipe.o.gutierrez* *--* *https://felipeogutierrez.blogspot.com <https://felipeogutierrez.blogspot.com>* |
Hi Felipe,
This seems related to your previous question about a custom scheduler that knows which task to run on which machine. As Chesnay said, this is a rather involved and laborious task, if you want to do it as a general framework. But if you know what operation to push down, then why not decoupling the two and implementing the filtering as a separate job running on your Raspberry and a new job which consumes the output of the first and does the analytics? Cheers, Kostas On Thu, Nov 29, 2018 at 10:23 AM Felipe Gutierrez < [hidden email]> wrote: > Hi, > > I am trying to design a little prototype with Flink and Apache Edgent ( > http://edgent.apache.org/) and I would like some help on the direction > for it. I am running Flink at my laptop and Edgent on my Raspberry Pi with > a simple filter for a proximity sensor ( > https://github.com/felipegutierrez/explore-rpi/blob/master/src/main/java/org/sense/edgent/app/UltrasonicEdgentApp.java > ). > > My idea is to push down the filter operator from Flink to the Raspberry Pi > which is running Apache Edgent. With this in mind, where do you guys advise > me to start? > > I have some ideas to study... > 1 - Try to get the list of operators that Flink is about to execute on the > JobManager. source: > https://ci.apache.org/projects/flink/flink-docs-stable/internals/job_scheduling.html > 2 - Implement a connector to Apache Edgent in order to exchange messages > between Flink and Edgent. > > Do you guys think in another source that is interesting regarding my > prototype? > > Thanks, > Felipe > *--* > *-- Felipe Gutierrez* > > *-- skype: felipe.o.gutierrez* > *--* *https://felipeogutierrez.blogspot.com > <https://felipeogutierrez.blogspot.com>* > |
Hi again,
I forgot to say that, unfortunately, I am not familiar with Apache Edgent, but if you can write your filter in Edgent's programming model, Then you can push your data from Edgent to a third party storage system (e.g. Kafka, HDFS, etc) and use Flink's connectors, instead of having to implement a custom source. Cheers, Kostas On Thu, Nov 29, 2018 at 11:08 AM Kostas Kloudas <[hidden email]> wrote: > Hi Felipe, > > This seems related to your previous question about a custom scheduler that > knows which task to run on which machine. > As Chesnay said, this is a rather involved and laborious task, if you want > to do it as a general framework. > > But if you know what operation to push down, then why not decoupling the > two and implementing the filtering as a separate job > running on your Raspberry and a new job which consumes the output of the > first and does the analytics? > > Cheers, > Kostas > > On Thu, Nov 29, 2018 at 10:23 AM Felipe Gutierrez < > [hidden email]> wrote: > >> Hi, >> >> I am trying to design a little prototype with Flink and Apache Edgent ( >> http://edgent.apache.org/) and I would like some help on the direction >> for it. I am running Flink at my laptop and Edgent on my Raspberry Pi with >> a simple filter for a proximity sensor ( >> https://github.com/felipegutierrez/explore-rpi/blob/master/src/main/java/org/sense/edgent/app/UltrasonicEdgentApp.java >> ). >> >> My idea is to push down the filter operator from Flink to the Raspberry >> Pi which is running Apache Edgent. With this in mind, where do you guys >> advise me to start? >> >> I have some ideas to study... >> 1 - Try to get the list of operators that Flink is about to execute on >> the JobManager. source: >> https://ci.apache.org/projects/flink/flink-docs-stable/internals/job_scheduling.html >> 2 - Implement a connector to Apache Edgent in order to exchange messages >> between Flink and Edgent. >> >> Do you guys think in another source that is interesting regarding my >> prototype? >> >> Thanks, >> Felipe >> *--* >> *-- Felipe Gutierrez* >> >> *-- skype: felipe.o.gutierrez* >> *--* *https://felipeogutierrez.blogspot.com >> <https://felipeogutierrez.blogspot.com>* >> > |
thanks Kostas for the quick reply,
yes. It is related to my previous question. When you said "But if you know what operation to push down" -> This is what I am trying to search on Flink code. I want to know the operation on the fly. The component on Flink that will say to me that there is a filter on the query specified by the user. I want to get this metadata and send a message to my RPi through a Flink connector (I guess this is the way to do) and the data stream will come to Flink already filtered. I intend to start with a simple and naive example. Do you know which component on Flink I can get the operations on the fly that are running inside a query? thanks *--* *-- Felipe Gutierrez* *-- skype: felipe.o.gutierrez* *--* *https://felipeogutierrez.blogspot.com <https://felipeogutierrez.blogspot.com>* On Thu, Nov 29, 2018 at 11:18 AM Kostas Kloudas <[hidden email]> wrote: > Hi again, > > I forgot to say that, unfortunately, I am not familiar with Apache Edgent, > but if you can write your filter in Edgent's programming model, > Then you can push your data from Edgent to a third party storage system > (e.g. Kafka, HDFS, etc) and use Flink's connectors, instead of > having to implement a custom source. > > Cheers, > Kostas > > On Thu, Nov 29, 2018 at 11:08 AM Kostas Kloudas < > [hidden email]> wrote: > >> Hi Felipe, >> >> This seems related to your previous question about a custom scheduler >> that knows which task to run on which machine. >> As Chesnay said, this is a rather involved and laborious task, if you >> want to do it as a general framework. >> >> But if you know what operation to push down, then why not decoupling the >> two and implementing the filtering as a separate job >> running on your Raspberry and a new job which consumes the output of the >> first and does the analytics? >> >> Cheers, >> Kostas >> >> On Thu, Nov 29, 2018 at 10:23 AM Felipe Gutierrez < >> [hidden email]> wrote: >> >>> Hi, >>> >>> I am trying to design a little prototype with Flink and Apache Edgent ( >>> http://edgent.apache.org/) and I would like some help on the direction >>> for it. I am running Flink at my laptop and Edgent on my Raspberry Pi with >>> a simple filter for a proximity sensor ( >>> https://github.com/felipegutierrez/explore-rpi/blob/master/src/main/java/org/sense/edgent/app/UltrasonicEdgentApp.java >>> ). >>> >>> My idea is to push down the filter operator from Flink to the Raspberry >>> Pi which is running Apache Edgent. With this in mind, where do you guys >>> advise me to start? >>> >>> I have some ideas to study... >>> 1 - Try to get the list of operators that Flink is about to execute on >>> the JobManager. source: >>> https://ci.apache.org/projects/flink/flink-docs-stable/internals/job_scheduling.html >>> 2 - Implement a connector to Apache Edgent in order to exchange messages >>> between Flink and Edgent. >>> >>> Do you guys think in another source that is interesting regarding my >>> prototype? >>> >>> Thanks, >>> Felipe >>> *--* >>> *-- Felipe Gutierrez* >>> >>> *-- skype: felipe.o.gutierrez* >>> *--* *https://felipeogutierrez.blogspot.com >>> <https://felipeogutierrez.blogspot.com>* >>> >> |
I guess this message from 2016 is very related of what I am looking for (
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-Execution-Plan-td4290.html). I am posting here for future references. I am going to implement a toy example to visualize this. Do you guys see this description as actual of latest Flink source code? *--* *-- Felipe Gutierrez* *-- skype: felipe.o.gutierrez* *--* *https://felipeogutierrez.blogspot.com <https://felipeogutierrez.blogspot.com>* On Thu, Nov 29, 2018 at 12:01 PM Felipe Gutierrez < [hidden email]> wrote: > thanks Kostas for the quick reply, > > yes. It is related to my previous question. > > When you said "But if you know what operation to push down" -> This is > what I am trying to search on Flink code. I want to know the operation on > the fly. > The component on Flink that will say to me that there is a filter on the > query specified by the user. I want to get this metadata and send a message > to my RPi through a Flink connector (I guess this is the way to do) and the > data stream will come to Flink already filtered. > > I intend to start with a simple and naive example. Do you know which > component on Flink I can get the operations on the fly that are > running inside a query? > > thanks > *--* > *-- Felipe Gutierrez* > > *-- skype: felipe.o.gutierrez* > *--* *https://felipeogutierrez.blogspot.com > <https://felipeogutierrez.blogspot.com>* > > > On Thu, Nov 29, 2018 at 11:18 AM Kostas Kloudas < > [hidden email]> wrote: > >> Hi again, >> >> I forgot to say that, unfortunately, I am not familiar with Apache >> Edgent, but if you can write your filter in Edgent's programming model, >> Then you can push your data from Edgent to a third party storage system >> (e.g. Kafka, HDFS, etc) and use Flink's connectors, instead of >> having to implement a custom source. >> >> Cheers, >> Kostas >> >> On Thu, Nov 29, 2018 at 11:08 AM Kostas Kloudas < >> [hidden email]> wrote: >> >>> Hi Felipe, >>> >>> This seems related to your previous question about a custom scheduler >>> that knows which task to run on which machine. >>> As Chesnay said, this is a rather involved and laborious task, if you >>> want to do it as a general framework. >>> >>> But if you know what operation to push down, then why not decoupling the >>> two and implementing the filtering as a separate job >>> running on your Raspberry and a new job which consumes the output of the >>> first and does the analytics? >>> >>> Cheers, >>> Kostas >>> >>> On Thu, Nov 29, 2018 at 10:23 AM Felipe Gutierrez < >>> [hidden email]> wrote: >>> >>>> Hi, >>>> >>>> I am trying to design a little prototype with Flink and Apache Edgent ( >>>> http://edgent.apache.org/) and I would like some help on the direction >>>> for it. I am running Flink at my laptop and Edgent on my Raspberry Pi with >>>> a simple filter for a proximity sensor ( >>>> https://github.com/felipegutierrez/explore-rpi/blob/master/src/main/java/org/sense/edgent/app/UltrasonicEdgentApp.java >>>> ). >>>> >>>> My idea is to push down the filter operator from Flink to the Raspberry >>>> Pi which is running Apache Edgent. With this in mind, where do you guys >>>> advise me to start? >>>> >>>> I have some ideas to study... >>>> 1 - Try to get the list of operators that Flink is about to execute on >>>> the JobManager. source: >>>> https://ci.apache.org/projects/flink/flink-docs-stable/internals/job_scheduling.html >>>> 2 - Implement a connector to Apache Edgent in order to exchange >>>> messages between Flink and Edgent. >>>> >>>> Do you guys think in another source that is interesting regarding my >>>> prototype? >>>> >>>> Thanks, >>>> Felipe >>>> *--* >>>> *-- Felipe Gutierrez* >>>> >>>> *-- skype: felipe.o.gutierrez* >>>> *--* *https://felipeogutierrez.blogspot.com >>>> <https://felipeogutierrez.blogspot.com>* >>>> >>> |
Hi Felipe,
You can define TableSources (for SQL, Table API) that support filter push-down. The optimizer will figure out this opportunity and hand filters to a custom TableSource. I should add that AFAIK this feature is not used very often (expect some rough edges) and that the API is likely to change in the future. But it might be enough for a simple POC. Best, Fabian Am Fr., 30. Nov. 2018 um 10:13 Uhr schrieb Felipe Gutierrez < [hidden email]>: > I guess this message from 2016 is very related of what I am looking for ( > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-Execution-Plan-td4290.html). > I am posting here for future references. > > I am going to implement a toy example to visualize this. Do you guys see > this description as actual of latest Flink source code? > *--* > *-- Felipe Gutierrez* > > *-- skype: felipe.o.gutierrez* > *--* *https://felipeogutierrez.blogspot.com > <https://felipeogutierrez.blogspot.com>* > > > On Thu, Nov 29, 2018 at 12:01 PM Felipe Gutierrez < > [hidden email]> wrote: > >> thanks Kostas for the quick reply, >> >> yes. It is related to my previous question. >> >> When you said "But if you know what operation to push down" -> This is >> what I am trying to search on Flink code. I want to know the operation on >> the fly. >> The component on Flink that will say to me that there is a filter on the >> query specified by the user. I want to get this metadata and send a message >> to my RPi through a Flink connector (I guess this is the way to do) and the >> data stream will come to Flink already filtered. >> >> I intend to start with a simple and naive example. Do you know which >> component on Flink I can get the operations on the fly that are >> running inside a query? >> >> thanks >> *--* >> *-- Felipe Gutierrez* >> >> *-- skype: felipe.o.gutierrez* >> *--* *https://felipeogutierrez.blogspot.com >> <https://felipeogutierrez.blogspot.com>* >> >> >> On Thu, Nov 29, 2018 at 11:18 AM Kostas Kloudas < >> [hidden email]> wrote: >> >>> Hi again, >>> >>> I forgot to say that, unfortunately, I am not familiar with Apache >>> Edgent, but if you can write your filter in Edgent's programming model, >>> Then you can push your data from Edgent to a third party storage system >>> (e.g. Kafka, HDFS, etc) and use Flink's connectors, instead of >>> having to implement a custom source. >>> >>> Cheers, >>> Kostas >>> >>> On Thu, Nov 29, 2018 at 11:08 AM Kostas Kloudas < >>> [hidden email]> wrote: >>> >>>> Hi Felipe, >>>> >>>> This seems related to your previous question about a custom scheduler >>>> that knows which task to run on which machine. >>>> As Chesnay said, this is a rather involved and laborious task, if you >>>> want to do it as a general framework. >>>> >>>> But if you know what operation to push down, then why not decoupling >>>> the two and implementing the filtering as a separate job >>>> running on your Raspberry and a new job which consumes the output of >>>> the first and does the analytics? >>>> >>>> Cheers, >>>> Kostas >>>> >>>> On Thu, Nov 29, 2018 at 10:23 AM Felipe Gutierrez < >>>> [hidden email]> wrote: >>>> >>>>> Hi, >>>>> >>>>> I am trying to design a little prototype with Flink and Apache Edgent ( >>>>> http://edgent.apache.org/) and I would like some help on the >>>>> direction for it. I am running Flink at my laptop and Edgent on my >>>>> Raspberry Pi with a simple filter for a proximity sensor ( >>>>> https://github.com/felipegutierrez/explore-rpi/blob/master/src/main/java/org/sense/edgent/app/UltrasonicEdgentApp.java >>>>> ). >>>>> >>>>> My idea is to push down the filter operator from Flink to the >>>>> Raspberry Pi which is running Apache Edgent. With this in mind, where do >>>>> you guys advise me to start? >>>>> >>>>> I have some ideas to study... >>>>> 1 - Try to get the list of operators that Flink is about to execute on >>>>> the JobManager. source: >>>>> https://ci.apache.org/projects/flink/flink-docs-stable/internals/job_scheduling.html >>>>> 2 - Implement a connector to Apache Edgent in order to exchange >>>>> messages between Flink and Edgent. >>>>> >>>>> Do you guys think in another source that is interesting regarding my >>>>> prototype? >>>>> >>>>> Thanks, >>>>> Felipe >>>>> *--* >>>>> *-- Felipe Gutierrez* >>>>> >>>>> *-- skype: felipe.o.gutierrez* >>>>> *--* *https://felipeogutierrez.blogspot.com >>>>> <https://felipeogutierrez.blogspot.com>* >>>>> >>>> |
Cool, thanks!
I am able to verify the Execution Query Plan on this example: https://github.com/felipegutierrez/flink-first/blob/master/src/main/java/flink/example/streaming/SocketWindowWordCountFilterJava.java I am also going to build a little POC like you said. Thanks, Felipe *--* *-- Felipe Gutierrez* *-- skype: felipe.o.gutierrez* *--* *https://felipeogutierrez.blogspot.com <https://felipeogutierrez.blogspot.com>* On Fri, Nov 30, 2018 at 11:33 AM Fabian Hueske <[hidden email]> wrote: > Hi Felipe, > > You can define TableSources (for SQL, Table API) that support filter > push-down. > The optimizer will figure out this opportunity and hand filters to a > custom TableSource. > > I should add that AFAIK this feature is not used very often (expect some > rough edges) and that the API is likely to change in the future. > But it might be enough for a simple POC. > > Best, Fabian > > Am Fr., 30. Nov. 2018 um 10:13 Uhr schrieb Felipe Gutierrez < > [hidden email]>: > >> I guess this message from 2016 is very related of what I am looking for ( >> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-Execution-Plan-td4290.html). >> I am posting here for future references. >> >> I am going to implement a toy example to visualize this. Do you guys see >> this description as actual of latest Flink source code? >> *--* >> *-- Felipe Gutierrez* >> >> *-- skype: felipe.o.gutierrez* >> *--* *https://felipeogutierrez.blogspot.com >> <https://felipeogutierrez.blogspot.com>* >> >> >> On Thu, Nov 29, 2018 at 12:01 PM Felipe Gutierrez < >> [hidden email]> wrote: >> >>> thanks Kostas for the quick reply, >>> >>> yes. It is related to my previous question. >>> >>> When you said "But if you know what operation to push down" -> This is >>> what I am trying to search on Flink code. I want to know the operation on >>> the fly. >>> The component on Flink that will say to me that there is a filter on the >>> query specified by the user. I want to get this metadata and send a message >>> to my RPi through a Flink connector (I guess this is the way to do) and the >>> data stream will come to Flink already filtered. >>> >>> I intend to start with a simple and naive example. Do you know which >>> component on Flink I can get the operations on the fly that are >>> running inside a query? >>> >>> thanks >>> *--* >>> *-- Felipe Gutierrez* >>> >>> *-- skype: felipe.o.gutierrez* >>> *--* *https://felipeogutierrez.blogspot.com >>> <https://felipeogutierrez.blogspot.com>* >>> >>> >>> On Thu, Nov 29, 2018 at 11:18 AM Kostas Kloudas < >>> [hidden email]> wrote: >>> >>>> Hi again, >>>> >>>> I forgot to say that, unfortunately, I am not familiar with Apache >>>> Edgent, but if you can write your filter in Edgent's programming model, >>>> Then you can push your data from Edgent to a third party storage system >>>> (e.g. Kafka, HDFS, etc) and use Flink's connectors, instead of >>>> having to implement a custom source. >>>> >>>> Cheers, >>>> Kostas >>>> >>>> On Thu, Nov 29, 2018 at 11:08 AM Kostas Kloudas < >>>> [hidden email]> wrote: >>>> >>>>> Hi Felipe, >>>>> >>>>> This seems related to your previous question about a custom scheduler >>>>> that knows which task to run on which machine. >>>>> As Chesnay said, this is a rather involved and laborious task, if you >>>>> want to do it as a general framework. >>>>> >>>>> But if you know what operation to push down, then why not decoupling >>>>> the two and implementing the filtering as a separate job >>>>> running on your Raspberry and a new job which consumes the output of >>>>> the first and does the analytics? >>>>> >>>>> Cheers, >>>>> Kostas >>>>> >>>>> On Thu, Nov 29, 2018 at 10:23 AM Felipe Gutierrez < >>>>> [hidden email]> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> I am trying to design a little prototype with Flink and Apache Edgent >>>>>> (http://edgent.apache.org/) and I would like some help on the >>>>>> direction for it. I am running Flink at my laptop and Edgent on my >>>>>> Raspberry Pi with a simple filter for a proximity sensor ( >>>>>> https://github.com/felipegutierrez/explore-rpi/blob/master/src/main/java/org/sense/edgent/app/UltrasonicEdgentApp.java >>>>>> ). >>>>>> >>>>>> My idea is to push down the filter operator from Flink to the >>>>>> Raspberry Pi which is running Apache Edgent. With this in mind, where do >>>>>> you guys advise me to start? >>>>>> >>>>>> I have some ideas to study... >>>>>> 1 - Try to get the list of operators that Flink is about to execute >>>>>> on the JobManager. source: >>>>>> https://ci.apache.org/projects/flink/flink-docs-stable/internals/job_scheduling.html >>>>>> 2 - Implement a connector to Apache Edgent in order to exchange >>>>>> messages between Flink and Edgent. >>>>>> >>>>>> Do you guys think in another source that is interesting regarding my >>>>>> prototype? >>>>>> >>>>>> Thanks, >>>>>> Felipe >>>>>> *--* >>>>>> *-- Felipe Gutierrez* >>>>>> >>>>>> *-- skype: felipe.o.gutierrez* >>>>>> *--* *https://felipeogutierrez.blogspot.com >>>>>> <https://felipeogutierrez.blogspot.com>* >>>>>> >>>>> |
Free forum by Nabble | Edit this page |