Dear all,
I'm working for a big project and one of the challenge is to read Kafka topics and copy them via Hive command into Hive managed tables in order to enable ACID HIVE properties. I try it but I have a issue with back pressure: - The first window read 20.000 events and wrote them in Hive tables - The second, third, ... send only 100 events because the write in Hive take more time than the read of a Kafka topic. But writing 100 events or 50.000 events takes +/- the same time for Hive. Someone have already do this source and sink? Could you help on this? Or have you some tips? It seems that defining a size window on number of event instead time is not possible. Is it true? Thank you for your help Youssef -- ♻ Be green, keep it on the screen |
Hi Youssef,
You need to provide more background context: - Which Hive sink are you using? We are working on the official Hive sink for community and will be released in 1.9. So did you develop yours in house? - What do you mean by 1st, 2nd, 3rd window? You mean the parallel instances of the same operator, or do you have you have 3 windowing operations chained? - What does your Hive table look like? E.g. is it partitioned or non-partitioned? If partitioned, how many partitions do you have? is it writing in static partition or dynamic partition mode? what format? how large? - What does your sink do - is each parallelism writing to multiple partitions or a single partition/table? Is it only appending data or upserting? On Wed, Jul 3, 2019 at 1:38 AM Youssef Achbany <[hidden email]> wrote: > Dear all, > > I'm working for a big project and one of the challenge is to read Kafka > topics and copy them via Hive command into Hive managed tables in order to > enable ACID HIVE properties. > > I try it but I have a issue with back pressure: > - The first window read 20.000 events and wrote them in Hive tables > - The second, third, ... send only 100 events because the write in Hive > take more time than the read of a Kafka topic. But writing 100 events or > 50.000 events takes +/- the same time for Hive. > > Someone have already do this source and sink? Could you help on this? > Or have you some tips? > It seems that defining a size window on number of event instead time is not > possible. Is it true? > > Thank you for your help > > Youssef > > -- > ♻ Be green, keep it on the screen > |
BTW, I'm adding user@ mailing list since this is a user question and
should be asked there. dev@ mailing list is only for discussions of Flink development. Please see https://flink.apache.org/community.html#mailing-lists On Wed, Jul 3, 2019 at 12:34 PM Bowen Li <[hidden email]> wrote: > Hi Youssef, > > You need to provide more background context: > > - Which Hive sink are you using? We are working on the official Hive sink > for community and will be released in 1.9. So did you develop yours in > house? > - What do you mean by 1st, 2nd, 3rd window? You mean the parallel > instances of the same operator, or do you have you have 3 windowing > operations chained? > - What does your Hive table look like? E.g. is it partitioned or > non-partitioned? If partitioned, how many partitions do you have? is it > writing in static partition or dynamic partition mode? what format? how > large? > - What does your sink do - is each parallelism writing to multiple > partitions or a single partition/table? Is it only appending data or > upserting? > > On Wed, Jul 3, 2019 at 1:38 AM Youssef Achbany < > [hidden email]> wrote: > >> Dear all, >> >> I'm working for a big project and one of the challenge is to read Kafka >> topics and copy them via Hive command into Hive managed tables in order to >> enable ACID HIVE properties. >> >> I try it but I have a issue with back pressure: >> - The first window read 20.000 events and wrote them in Hive tables >> - The second, third, ... send only 100 events because the write in Hive >> take more time than the read of a Kafka topic. But writing 100 events or >> 50.000 events takes +/- the same time for Hive. >> >> Someone have already do this source and sink? Could you help on this? >> Or have you some tips? >> It seems that defining a size window on number of event instead time is >> not >> possible. Is it true? >> >> Thank you for your help >> >> Youssef >> >> -- >> ♻ Be green, keep it on the screen >> > |
Free forum by Nabble | Edit this page |