Hi,
we have a data analytics server that has analytics data tables. So I need to write a custom *Java* implementation for read data from that data source and do processing (*batch* processing) using Apache Flink. Basically it's like a new client connector for Flink. So It would be great if you can provide a guidance for my requirement. Thanks, Pawan |
Hi Pawan,
this sounds like you need to implement a custom InputFormat [1]. An InputFormat is basically executed in two phases. In the first phase it generates InputSplits. An InputSplit references a a chunk of data that needs to be read. Hence, InputSplits define how the input data is split to be read in parallel. In the second phase, multiple InputFormats are started and request InputSplits from an InputSplitProvider. Each instance of the InputFormat processes one InputSplit at a time. It is hard to give general advice on implementing InputFormats because this very much depends on the data source and data format to read from. I'd suggest to have a look at other InputFormats. Best, Fabian [1] https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/api/common/io/InputFormat.java 2017-01-16 6:18 GMT+01:00 Pawan Manishka Gunarathna < [hidden email]>: > Hi, > > we have a data analytics server that has analytics data tables. So I need > to write a custom *Java* implementation for read data from that data source > and do processing (*batch* processing) using Apache Flink. Basically it's > like a new client connector for Flink. > > So It would be great if you can provide a guidance for my requirement. > > Thanks, > Pawan > |
Hi Fabian,
Thanks for providing those information. On Mon, Jan 16, 2017 at 2:36 PM, Fabian Hueske <[hidden email]> wrote: > Hi Pawan, > > this sounds like you need to implement a custom InputFormat [1]. > An InputFormat is basically executed in two phases. In the first phase it > generates InputSplits. An InputSplit references a a chunk of data that > needs to be read. Hence, InputSplits define how the input data is split to > be read in parallel. In the second phase, multiple InputFormats are started > and request InputSplits from an InputSplitProvider. Each instance of the > InputFormat processes one InputSplit at a time. > > It is hard to give general advice on implementing InputFormats because this > very much depends on the data source and data format to read from. > > I'd suggest to have a look at other InputFormats. > > Best, Fabian > > [1] > https://github.com/apache/flink/blob/master/flink-core/ > src/main/java/org/apache/flink/api/common/io/InputFormat.java > > > 2017-01-16 6:18 GMT+01:00 Pawan Manishka Gunarathna < > [hidden email]>: > > > Hi, > > > > we have a data analytics server that has analytics data tables. So I need > > to write a custom *Java* implementation for read data from that data > source > > and do processing (*batch* processing) using Apache Flink. Basically it's > > like a new client connector for Flink. > > > > So It would be great if you can provide a guidance for my requirement. > > > > Thanks, > > Pawan > > > -- *Pawan Gunaratne* *Mob: +94 770373556* |
Hi,
When we are implementing that InputFormat Interface, if we have that Input split part in our data analytics server APIs can we directly go to the second phase that you have described earlier....? Since Our data source has database tables architecture I have a thought of follow that 'JDBCInputFormat' in Flink. Can you provide some information regarding how that JDBCInputFormat execution happens? Thanks, Pawan On Mon, Jan 16, 2017 at 3:37 PM, Pawan Manishka Gunarathna < [hidden email]> wrote: > Hi Fabian, > Thanks for providing those information. > > On Mon, Jan 16, 2017 at 2:36 PM, Fabian Hueske <[hidden email]> wrote: > >> Hi Pawan, >> >> this sounds like you need to implement a custom InputFormat [1]. >> An InputFormat is basically executed in two phases. In the first phase it >> generates InputSplits. An InputSplit references a a chunk of data that >> needs to be read. Hence, InputSplits define how the input data is split to >> be read in parallel. In the second phase, multiple InputFormats are >> started >> and request InputSplits from an InputSplitProvider. Each instance of the >> InputFormat processes one InputSplit at a time. >> >> It is hard to give general advice on implementing InputFormats because >> this >> very much depends on the data source and data format to read from. >> >> I'd suggest to have a look at other InputFormats. >> >> Best, Fabian >> >> [1] >> https://github.com/apache/flink/blob/master/flink-core/src/ >> main/java/org/apache/flink/api/common/io/InputFormat.java >> >> >> 2017-01-16 6:18 GMT+01:00 Pawan Manishka Gunarathna < >> [hidden email]>: >> >> > Hi, >> > >> > we have a data analytics server that has analytics data tables. So I >> need >> > to write a custom *Java* implementation for read data from that data >> source >> > and do processing (*batch* processing) using Apache Flink. Basically >> it's >> > like a new client connector for Flink. >> > >> > So It would be great if you can provide a guidance for my requirement. >> > >> > Thanks, >> > Pawan >> > >> > > > > -- > > *Pawan Gunaratne* > *Mob: +94 770373556 <+94%2077%20037%203556>* > -- *Pawan Gunaratne* *Mob: +94 770373556* |
Hi,
Just in case it could useful, we are working in Flink-Kudu integration [1]. This is a still Work in Progess but we had to implemente an InputFormat to read from Kudu tables so maybe the code is useful for you [2] Best [1] https://github.com/rubencasado/Flink-Kudu [2] https://github.com/rubencasado/Flink-Kudu/blob/master/src/main/java/es/accenture/flink/Sources/KuduInputFormat.java El 19/1/17 6:03, "Pawan Manishka Gunarathna" <[hidden email]> escribió: Hi, When we are implementing that InputFormat Interface, if we have that Input split part in our data analytics server APIs can we directly go to the second phase that you have described earlier....? Since Our data source has database tables architecture I have a thought of follow that 'JDBCInputFormat' in Flink. Can you provide some information regarding how that JDBCInputFormat execution happens? Thanks, Pawan On Mon, Jan 16, 2017 at 3:37 PM, Pawan Manishka Gunarathna < [hidden email]> wrote: > Hi Fabian, > Thanks for providing those information. > > On Mon, Jan 16, 2017 at 2:36 PM, Fabian Hueske <[hidden email]> wrote: > >> Hi Pawan, >> >> this sounds like you need to implement a custom InputFormat [1]. >> An InputFormat is basically executed in two phases. In the first phase it >> generates InputSplits. An InputSplit references a a chunk of data that >> needs to be read. Hence, InputSplits define how the input data is split to >> be read in parallel. In the second phase, multiple InputFormats are >> started >> and request InputSplits from an InputSplitProvider. Each instance of the >> InputFormat processes one InputSplit at a time. >> >> It is hard to give general advice on implementing InputFormats because >> this >> very much depends on the data source and data format to read from. >> >> I'd suggest to have a look at other InputFormats. >> >> Best, Fabian >> >> [1] >> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_flink_blob_master_flink-2Dcore_src_&d=DgIBaQ&c=eIGjsITfXP_y-DLLX0uEHXJvU8nOHrUK8IrwNKOtkVU&r=brkRAgrW3LbdVDOiRLzI7SFUIWBL5aa2MIfENljA8xoe0lFg2u3-S6GnFTH7Pbmc&m=RVDymwyU0kgdfLg3Rv7z3F9J81xIKmyt-6MlPBY5hSw&s=BDRgnhShzvotGlc7rLXFHyh5iiP4pHXF9lP8uysQW8M&e= >> main/java/org/apache/flink/api/common/io/InputFormat.java >> >> >> 2017-01-16 6:18 GMT+01:00 Pawan Manishka Gunarathna < >> [hidden email]>: >> >> > Hi, >> > >> > we have a data analytics server that has analytics data tables. So I >> need >> > to write a custom *Java* implementation for read data from that data >> source >> > and do processing (*batch* processing) using Apache Flink. Basically >> it's >> > like a new client connector for Flink. >> > >> > So It would be great if you can provide a guidance for my requirement. >> > >> > Thanks, >> > Pawan >> > >> > > > > -- > > *Pawan Gunaratne* > *Mob: +94 770373556 <+94%2077%20037%203556>* > -- *Pawan Gunaratne* *Mob: +94 770373556* ________________________________ This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Where allowed by local law, electronic communications with Accenture and its affiliates, including e-mail and instant messaging (including content), may be scanned by our systems for the purposes of information security and assessment of internal compliance with Accenture policy. ______________________________________________________________________________________ www.accenture.com |
Thanks @Ruben for providing those Information. It will helpful for me.
Seems You have written your own InputSplit for this task. (KuduInputSplit) On Thu, Jan 19, 2017 at 2:58 PM, <[hidden email]> wrote: > Hi, > > Just in case it could useful, we are working in Flink-Kudu integration > [1]. This is a still Work in Progess but we had to implemente an > InputFormat to read from Kudu tables so maybe the code is useful for you [2] > > Best > > [1] https://github.com/rubencasado/Flink-Kudu > [2] https://github.com/rubencasado/Flink-Kudu/blob/ > master/src/main/java/es/accenture/flink/Sources/KuduInputFormat.java > > > El 19/1/17 6:03, "Pawan Manishka Gunarathna" <[hidden email]> > escribió: > > Hi, > When we are implementing that InputFormat Interface, if we have that > Input > split part in our data analytics server APIs can we directly go to the > second phase that you have described earlier....? > > Since Our data source has database tables architecture I have a > thought of > follow that 'JDBCInputFormat' in Flink. Can you provide some > information > regarding how that JDBCInputFormat execution happens? > > Thanks, > Pawan > > On Mon, Jan 16, 2017 at 3:37 PM, Pawan Manishka Gunarathna < > [hidden email]> wrote: > > > Hi Fabian, > > Thanks for providing those information. > > > > On Mon, Jan 16, 2017 at 2:36 PM, Fabian Hueske <[hidden email]> > wrote: > > > >> Hi Pawan, > >> > >> this sounds like you need to implement a custom InputFormat [1]. > >> An InputFormat is basically executed in two phases. In the first > phase it > >> generates InputSplits. An InputSplit references a a chunk of data > that > >> needs to be read. Hence, InputSplits define how the input data is > split to > >> be read in parallel. In the second phase, multiple InputFormats are > >> started > >> and request InputSplits from an InputSplitProvider. Each instance > of the > >> InputFormat processes one InputSplit at a time. > >> > >> It is hard to give general advice on implementing InputFormats > because > >> this > >> very much depends on the data source and data format to read from. > >> > >> I'd suggest to have a look at other InputFormats. > >> > >> Best, Fabian > >> > >> [1] > >> https://urldefense.proofpoint.com/v2/url?u=https-3A__github. > com_apache_flink_blob_master_flink-2Dcore_src_&d=DgIBaQ&c=eIGjsITfXP_y- > DLLX0uEHXJvU8nOHrUK8IrwNKOtkVU&r=brkRAgrW3LbdVDOiRLzI7SFUIWBL5a > a2MIfENljA8xoe0lFg2u3-S6GnFTH7Pbmc&m=RVDymwyU0kgdfLg3Rv7z3F9J81xIKm > yt-6MlPBY5hSw&s=BDRgnhShzvotGlc7rLXFHyh5iiP4pHXF9lP8uysQW8M&e= > >> main/java/org/apache/flink/api/common/io/InputFormat.java > >> > >> > >> 2017-01-16 6:18 GMT+01:00 Pawan Manishka Gunarathna < > >> [hidden email]>: > >> > >> > Hi, > >> > > >> > we have a data analytics server that has analytics data tables. > So I > >> need > >> > to write a custom *Java* implementation for read data from that > data > >> source > >> > and do processing (*batch* processing) using Apache Flink. > Basically > >> it's > >> > like a new client connector for Flink. > >> > > >> > So It would be great if you can provide a guidance for my > requirement. > >> > > >> > Thanks, > >> > Pawan > >> > > >> > > > > > > > > -- > > > > *Pawan Gunaratne* > > *Mob: +94 770373556 <+94%2077%20037%203556>* > > > > > > -- > > *Pawan Gunaratne* > *Mob: +94 770373556* > > > > ________________________________ > > This message is for the designated recipient only and may contain > privileged, proprietary, or otherwise confidential information. If you have > received it in error, please notify the sender immediately and delete the > original. Any other use of the e-mail by you is prohibited. Where allowed > by local law, electronic communications with Accenture and its affiliates, > including e-mail and instant messaging (including content), may be scanned > by our systems for the purposes of information security and assessment of > internal compliance with Accenture policy. > ____________________________________________________________ > __________________________ > > www.accenture.com > -- *Pawan Gunaratne* *Mob: +94 770373556* |
Free forum by Nabble | Edit this page |