Elasticsearch Sink

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Elasticsearch Sink

Christophe Jolif
Hi all,

There is quite some time Flink Elasticsearch sink is broken for
Elastisearch 5.x  (nearly a year):

https://issues.apache.org/jira/browse/FLINK-7386

And there is no support for Elasticsearch 6.x:

https://issues.apache.org/jira/browse/FLINK-8101

However several PRs were issued:

https://github.com/apache/flink/pull/4675
https://github.com/apache/flink/pull/5374

I also raised the issue on the mailing list in the 1.5 timeframe:

http://apache-flink-mailing-list-archive.1008284.n3.
nabble.com/DISCUSS-Releasing-Flink-1-5-0-td20867.html#a20905

But things are still not really moving. However this seems something people
are looking for, so I would really like the community to consider that for
1.6.

The problems I see from comments on the PRs:

- getting something that is following the Flink APIs initially created is a
nightmare because Elastic is pretty good at breaking compatibility the hard
way (see in particular in the last PR the cast that have to be made to get
an API that works in all cases)
- Elasticsearch is moving away from their "native" API Flink is using to
the REST APIs so there is little  common ground between pre 6 and post 6
even if Elasticsearch tried to get some level of compatibility in the APIs.

My fear is that by trying to kill two birds with one stone, we actually get
nothing done.

In the hope of moving that forward I would like to propose for 1.6 a new
Elasticsearch 6.x+ sink that would follow the design of the previous ones
BUT only leverage the new REST API and not inherit from existing classes.
It would really be close to what is in my previous PR:
https://github.com/apache/flink/pull/5374 but just focus on E6+/REST and so
avoid any "strange" cast.

This would not fill the gap of the 5.2+ not working but at least we would
be back on track with something for the future as REST API is where Elastic
is going.

If people feel there is actual interest and chances this can be merged I'll
be working on issuing a new PR around that.

Alternative is to get back working on the existing PR but it seems to be a
dead-end at the moment and not necessarily the best option long term as
anyway Elasticsearch is looking into promoting the REST API.

Please let me know what you think?

--
Christophe
Reply | Threaded
Open this post in threaded view
|

Re: Elasticsearch Sink

Flavio Pompermaier
+1. Torally agree

On Sat, 12 May 2018, 18:14 Christophe Jolif, <[hidden email]> wrote:

> Hi all,
>
> There is quite some time Flink Elasticsearch sink is broken for
> Elastisearch 5.x  (nearly a year):
>
> https://issues.apache.org/jira/browse/FLINK-7386
>
> And there is no support for Elasticsearch 6.x:
>
> https://issues.apache.org/jira/browse/FLINK-8101
>
> However several PRs were issued:
>
> https://github.com/apache/flink/pull/4675
> https://github.com/apache/flink/pull/5374
>
> I also raised the issue on the mailing list in the 1.5 timeframe:
>
> http://apache-flink-mailing-list-archive.1008284.n3.
> nabble.com/DISCUSS-Releasing-Flink-1-5-0-td20867.html#a20905
>
> But things are still not really moving. However this seems something people
> are looking for, so I would really like the community to consider that for
> 1.6.
>
> The problems I see from comments on the PRs:
>
> - getting something that is following the Flink APIs initially created is a
> nightmare because Elastic is pretty good at breaking compatibility the hard
> way (see in particular in the last PR the cast that have to be made to get
> an API that works in all cases)
> - Elasticsearch is moving away from their "native" API Flink is using to
> the REST APIs so there is little  common ground between pre 6 and post 6
> even if Elasticsearch tried to get some level of compatibility in the APIs.
>
> My fear is that by trying to kill two birds with one stone, we actually get
> nothing done.
>
> In the hope of moving that forward I would like to propose for 1.6 a new
> Elasticsearch 6.x+ sink that would follow the design of the previous ones
> BUT only leverage the new REST API and not inherit from existing classes.
> It would really be close to what is in my previous PR:
> https://github.com/apache/flink/pull/5374 but just focus on E6+/REST and
> so
> avoid any "strange" cast.
>
> This would not fill the gap of the 5.2+ not working but at least we would
> be back on track with something for the future as REST API is where Elastic
> is going.
>
> If people feel there is actual interest and chances this can be merged I'll
> be working on issuing a new PR around that.
>
> Alternative is to get back working on the existing PR but it seems to be a
> dead-end at the moment and not necessarily the best option long term as
> anyway Elasticsearch is looking into promoting the REST API.
>
> Please let me know what you think?
>
> --
> Christophe
>
Reply | Threaded
Open this post in threaded view
|

Re: Elasticsearch Sink

Tzu-Li (Gordon) Tai
Hi Christophe,

Thanks for bringing this up.

Yes, the main issue with the existing PRs and preventing it from moving
forward is how it currently breaks initial assumptions of APIs in the
`elasticsearch-base` module.
Working around that would require introducing a new base module
specifically for 5.3+ and 6.x+, which we've also agreed on the PR isn't a
nice way to go.

I had a quick stab at the REST API, and it seems to be promising,
especially given that you mentioned that starting from next versions, the
current API we use will be fully removed.
I'm definitely a +1 to try to move this forward with a proper fix.

Some other remarks / questions I have:
- Maybe we can consider removing support for ES 1.x and 2.x starting from
1.6. Those are very old ES versions (considering that ES 6.x has already
been out for a while). Do you think this would simply how our base module
APIs are designed?
- Wouldn't it be possible to have a REST implementation of the
`ElasticsearchSinkCallBridge` for 5.x (covering both 5.2 and 5.3+)? If so,
once we remove ES 1.x and 2.x, it might actually be possible to completely
replace the current `elasticsearch-base` module.

Cheers,
Gordon


On Sun, May 13, 2018 at 12:36 AM, Flavio Pompermaier <[hidden email]>
wrote:

> +1. Torally agree
>
> On Sat, 12 May 2018, 18:14 Christophe Jolif, <[hidden email]> wrote:
>
> > Hi all,
> >
> > There is quite some time Flink Elasticsearch sink is broken for
> > Elastisearch 5.x  (nearly a year):
> >
> > https://issues.apache.org/jira/browse/FLINK-7386
> >
> > And there is no support for Elasticsearch 6.x:
> >
> > https://issues.apache.org/jira/browse/FLINK-8101
> >
> > However several PRs were issued:
> >
> > https://github.com/apache/flink/pull/4675
> > https://github.com/apache/flink/pull/5374
> >
> > I also raised the issue on the mailing list in the 1.5 timeframe:
> >
> > http://apache-flink-mailing-list-archive.1008284.n3.
> > nabble.com/DISCUSS-Releasing-Flink-1-5-0-td20867.html#a20905
> >
> > But things are still not really moving. However this seems something
> people
> > are looking for, so I would really like the community to consider that
> for
> > 1.6.
> >
> > The problems I see from comments on the PRs:
> >
> > - getting something that is following the Flink APIs initially created
> is a
> > nightmare because Elastic is pretty good at breaking compatibility the
> hard
> > way (see in particular in the last PR the cast that have to be made to
> get
> > an API that works in all cases)
> > - Elasticsearch is moving away from their "native" API Flink is using to
> > the REST APIs so there is little  common ground between pre 6 and post 6
> > even if Elasticsearch tried to get some level of compatibility in the
> APIs.
> >
> > My fear is that by trying to kill two birds with one stone, we actually
> get
> > nothing done.
> >
> > In the hope of moving that forward I would like to propose for 1.6 a new
> > Elasticsearch 6.x+ sink that would follow the design of the previous ones
> > BUT only leverage the new REST API and not inherit from existing classes.
> > It would really be close to what is in my previous PR:
> > https://github.com/apache/flink/pull/5374 but just focus on E6+/REST and
> > so
> > avoid any "strange" cast.
> >
> > This would not fill the gap of the 5.2+ not working but at least we would
> > be back on track with something for the future as REST API is where
> Elastic
> > is going.
> >
> > If people feel there is actual interest and chances this can be merged
> I'll
> > be working on issuing a new PR around that.
> >
> > Alternative is to get back working on the existing PR but it seems to be
> a
> > dead-end at the moment and not necessarily the best option long term as
> > anyway Elasticsearch is looking into promoting the REST API.
> >
> > Please let me know what you think?
> >
> > --
> > Christophe
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Elasticsearch Sink

Christophe Jolif
Hi Gordon,

Thanks for your feedback (and Flavio for your support!)

About your remarks/questions:

> - Maybe we can consider removing support for ES 1.x and 2.x starting from
1.6. Those are very old ES versions (considering that ES 6.x has already
been out for a while). Do you think this would simply how our base module
APIs are designed?

I would tend to say it should not change drastically the picture but would
have to look into it.

> - Wouldn't it be possible to have a REST implementation of the
`ElasticsearchSinkCallBridge` for 5.x (covering both 5.2 and 5.3+)? If so,
once we remove ES 1.x and 2.x, it might actually be possible to completely
replace the current `elasticsearch-base` module.

The High level REST API was introduced in Elasticsearch 5.6 so it is not
possible to cover 5.5 and below with it.

If all the necessary APIs are already here (to be double checked) it should
be able cover 5.6. What I noticed when working on the PRs is that 6.2 REST
Level High Level client API was improved to be closer to original APIs, if
we want to support 5.6 with it we might have to rely on APIs they already
improved since then. Not dramatic. But does it worth it knowing this would
just be giving us 5.6 not 5.2,3,4 and 5?

Now on moving forward I read:

> I'm definitely a +1 to try to move this forward with a proper fix.

and

> Working around that would require introducing a new base module
specifically for 5.3+ and 6.x+, which we've also agreed on the PR isn't a
nice way to go.

So if I read you correctly you are ok moving with a proper fix but it
should not introduce a new (REST based) base module? Then to be honest I'm
not sure how to proceed :) Any more specific feedback on the direction to
follow would be great!

Thanks,
--
Christophe

On Sun, May 13, 2018 at 5:39 AM, Tzu-Li (Gordon) Tai <[hidden email]>
wrote:

> Hi Christophe,
>
> Thanks for bringing this up.
>
> Yes, the main issue with the existing PRs and preventing it from moving
> forward is how it currently breaks initial assumptions of APIs in the
> `elasticsearch-base` module.
> Working around that would require introducing a new base module
> specifically for 5.3+ and 6.x+, which we've also agreed on the PR isn't a
> nice way to go.
>
> I had a quick stab at the REST API, and it seems to be promising,
> especially given that you mentioned that starting from next versions, the
> current API we use will be fully removed.
> I'm definitely a +1 to try to move this forward with a proper fix.
>
> Some other remarks / questions I have:
> - Maybe we can consider removing support for ES 1.x and 2.x starting from
> 1.6. Those are very old ES versions (considering that ES 6.x has already
> been out for a while). Do you think this would simply how our base module
> APIs are designed?
> - Wouldn't it be possible to have a REST implementation of the
> `ElasticsearchSinkCallBridge` for 5.x (covering both 5.2 and 5.3+)? If so,
> once we remove ES 1.x and 2.x, it might actually be possible to completely
> replace the current `elasticsearch-base` module.
>
> Cheers,
> Gordon
>
>
> On Sun, May 13, 2018 at 12:36 AM, Flavio Pompermaier <[hidden email]
> >
> wrote:
>
> > +1. Torally agree
> >
> > On Sat, 12 May 2018, 18:14 Christophe Jolif, <[hidden email]> wrote:
> >
> > > Hi all,
> > >
> > > There is quite some time Flink Elasticsearch sink is broken for
> > > Elastisearch 5.x  (nearly a year):
> > >
> > > https://issues.apache.org/jira/browse/FLINK-7386
> > >
> > > And there is no support for Elasticsearch 6.x:
> > >
> > > https://issues.apache.org/jira/browse/FLINK-8101
> > >
> > > However several PRs were issued:
> > >
> > > https://github.com/apache/flink/pull/4675
> > > https://github.com/apache/flink/pull/5374
> > >
> > > I also raised the issue on the mailing list in the 1.5 timeframe:
> > >
> > > http://apache-flink-mailing-list-archive.1008284.n3.
> > > nabble.com/DISCUSS-Releasing-Flink-1-5-0-td20867.html#a20905
> > >
> > > But things are still not really moving. However this seems something
> > people
> > > are looking for, so I would really like the community to consider that
> > for
> > > 1.6.
> > >
> > > The problems I see from comments on the PRs:
> > >
> > > - getting something that is following the Flink APIs initially created
> > is a
> > > nightmare because Elastic is pretty good at breaking compatibility the
> > hard
> > > way (see in particular in the last PR the cast that have to be made to
> > get
> > > an API that works in all cases)
> > > - Elasticsearch is moving away from their "native" API Flink is using
> to
> > > the REST APIs so there is little  common ground between pre 6 and post
> 6
> > > even if Elasticsearch tried to get some level of compatibility in the
> > APIs.
> > >
> > > My fear is that by trying to kill two birds with one stone, we actually
> > get
> > > nothing done.
> > >
> > > In the hope of moving that forward I would like to propose for 1.6 a
> new
> > > Elasticsearch 6.x+ sink that would follow the design of the previous
> ones
> > > BUT only leverage the new REST API and not inherit from existing
> classes.
> > > It would really be close to what is in my previous PR:
> > > https://github.com/apache/flink/pull/5374 but just focus on E6+/REST
> and
> > > so
> > > avoid any "strange" cast.
> > >
> > > This would not fill the gap of the 5.2+ not working but at least we
> would
> > > be back on track with something for the future as REST API is where
> > Elastic
> > > is going.
> > >
> > > If people feel there is actual interest and chances this can be merged
> > I'll
> > > be working on issuing a new PR around that.
> > >
> > > Alternative is to get back working on the existing PR but it seems to
> be
> > a
> > > dead-end at the moment and not necessarily the best option long term as
> > > anyway Elasticsearch is looking into promoting the REST API.
> > >
> > > Please let me know what you think?
> > >
> > > --
> > > Christophe
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Elasticsearch Sink

sblackmon
It seems to me that if the transport client dependency is removed, the same
module could perform inserts, updates, and deletes via the http bulk API,
and whatever version differences exist with that API could be handled
inside the module without any difference to the classpath of the pipeline.

If that's true there's no need or benefit to deprecating support for
earlier elastic version so long as someone is willing to implement test and
maintain them.

Steve

On 5/13/18 at 2:00 PM, Christophe wrote:

Hi Gordon,

Thanks for your feedback (and Flavio for your support!)

About your remarks/questions:

- Maybe we can consider removing support for ES 1.x and 2.x starting from

1.6. Those are very old ES versions (considering that ES 6.x has already
been out for a while). Do you think this would simply how our base module
APIs are designed?

I would tend to say it should not change drastically the picture but would
have to look into it.

- Wouldn't it be possible to have a REST implementation of the

`ElasticsearchSinkCallBridge` for 5.x (covering both 5.2 and 5.3+)? If so,
once we remove ES 1.x and 2.x, it might actually be possible to completely
replace the current `elasticsearch-base` module.

The High level REST API was introduced in Elasticsearch 5.6 so it is not
possible to cover 5.5 and below with it.

If all the necessary APIs are already here (to be double checked) it should
be able cover 5.6. What I noticed when working on the PRs is that 6.2 REST
Level High Level client API was improved to be closer to original APIs, if
we want to support 5.6 with it we might have to rely on APIs they already
improved since then. Not dramatic. But does it worth it knowing this would
just be giving us 5.6 not 5.2,3,4 and 5?

Now on moving forward I read:

I'm definitely a +1 to try to move this forward with a proper fix.


and

Working around that would require introducing a new base module

specifically for 5.3+ and 6.x+, which we've also agreed on the PR isn't a
nice way to go.

So if I read you correctly you are ok moving with a proper fix but it
should not introduce a new (REST based) base module? Then to be honest I'm
not sure how to proceed :) Any more specific feedback on the direction to
follow would be great!

Thanks,
--
Christophe

On Sun, May 13, 2018 at 5:39 AM, Tzu-Li (Gordon) Tai <[hidden email]>
wrote:

Hi Christophe,

Thanks for bringing this up.

Yes, the main issue with the existing PRs and preventing it from moving
forward is how it currently breaks initial assumptions of APIs in the
`elasticsearch-base` module.
Working around that would require introducing a new base module
specifically for 5.3+ and 6.x+, which we've also agreed on the PR isn't a
nice way to go.

I had a quick stab at the REST API, and it seems to be promising,
especially given that you mentioned that starting from next versions, the
current API we use will be fully removed.
I'm definitely a +1 to try to move this forward with a proper fix.

Some other remarks / questions I have:
- Maybe we can consider removing support for ES 1.x and 2.x starting from
1.6. Those are very old ES versions (considering that ES 6.x has already
been out for a while). Do you think this would simply how our base module
APIs are designed?
- Wouldn't it be possible to have a REST implementation of the
`ElasticsearchSinkCallBridge` for 5.x (covering both 5.2 and 5.3+)? If so,
once we remove ES 1.x and 2.x, it might actually be possible to completely
replace the current `elasticsearch-base` module.

Cheers,
Gordon


On Sun, May 13, 2018 at 12:36 AM, Flavio Pompermaier <[hidden email]


wrote:

+1. Torally agree

On Sat, 12 May 2018, 18:14 Christophe Jolif, <[hidden email]> wrote:

Hi all,

There is quite some time Flink Elasticsearch sink is broken for
Elastisearch 5.x (nearly a year):

https://issues.apache.org/jira/browse/FLINK-7386

And there is no support for Elasticsearch 6.x:

https://issues.apache.org/jira/browse/FLINK-8101

However several PRs were issued:

https://github.com/apache/flink/pull/4675
https://github.com/apache/flink/pull/5374

I also raised the issue on the mailing list in the 1.5 timeframe:

http://apache-flink-mailing-list-archive.1008284.n3.
nabble.com/DISCUSS-Releasing-Flink-1-5-0-td20867.html#a20905

But things are still not really moving. However this seems something

people

are looking for, so I would really like the community to consider that

for

1.6.

The problems I see from comments on the PRs:

- getting something that is following the Flink APIs initially created

is a

nightmare because Elastic is pretty good at breaking compatibility the

hard

way (see in particular in the last PR the cast that have to be made to

get

an API that works in all cases)
- Elasticsearch is moving away from their "native" API Flink is using

to

the REST APIs so there is little common ground between pre 6 and post

6

even if Elasticsearch tried to get some level of compatibility in the

APIs.


My fear is that by trying to kill two birds with one stone, we actually

get

nothing done.

In the hope of moving that forward I would like to propose for 1.6 a

new

Elasticsearch 6.x+ sink that would follow the design of the previous

ones

BUT only leverage the new REST API and not inherit from existing

classes.

It would really be close to what is in my previous PR:
https://github.com/apache/flink/pull/5374 but just focus on E6+/REST

and

so
avoid any "strange" cast.

This would not fill the gap of the 5.2+ not working but at least we

would

be back on track with something for the future as REST API is where

Elastic

is going.

If people feel there is actual interest and chances this can be merged

I'll

be working on issuing a new PR around that.

Alternative is to get back working on the existing PR but it seems to

be

a

dead-end at the moment and not necessarily the best option long term as
anyway Elasticsearch is looking into promoting the REST API.

Please let me know what you think?

--
Christophe
Reply | Threaded
Open this post in threaded view
|

Re: Elasticsearch Sink

Tzu-Li (Gordon) Tai
Hi,

Let me first clarify a few things so that we are on the same page here:

1. The reason that we are looking into having a new base-module for versions 5.3+ is that
new Elasticsearch BulkProcessor APIs are breaking some of our original base API assumptions.
This is noticeable from the intended cast here [1].

2. Given that we are looking into a new base module, Christophe proposed that we focus on designing
the new base module around Elasticsearch’s new REST API, so that we are more-future proof.

3. I proposed to remove / deprecate support for earlier versions, because once we introduce the new
base module, we would be maintaining a lot of Elasticsearch connector modules (2 base modules, 4+ version specific).
Of course, whether or not this really is a problem depends on how much capacity the community has to maintain them, as Steve mentioned.


Now, moving on to another proposed solution that should work (at least for now, depends on how Elasticsearch changes their API in the future):
The main problem we’ve bumped into is that Elasticsearch changed their BulkProcessor API from accepting `add(ActionRequest)`s to `add(DocWriteRequest)`s.

The reason why our elasticsearch-base-module used the `ActionRequest` in the first place was so that sinks could have full functionality to perform both delete and write requests.
We can actually just consider separating `RequestIndexer#add(ActionRequest …)` to `RequestIndexer#add(IndexRequest)` and `RequestIndexer#add(DeleteRequest)`.
AFAIK, the IndexRequest class and DeleteRequest class has remained stable across all ES versions (1.x., 2.x, 5.x, 6.x), so we should be safe with that.

If I’m not mistaken, this should allow us to avoid introducing a new module (no need for a new base AND a 5.3+ module), still have equal functionality, and continue to use the existing base module across all versions.
The only downside would be that we’ll need to deprecate / break `RequestIndexer#add(ActionRequest …)` in favor of the new separated methods.

Whether or not we move on to use the HighLevelRestClient for 6.x, would then be a completely orthogonal consideration. We can simply use it in 6.x as an implementation of the API call bridge.

What do you think?

Best,
Gordon

[1] https://github.com/apache/flink/pull/4675/files#diff-5539d0d57fa1ac17a36f08d1bb9a90b5R54

On 15 May 2018 at 9:15:52 AM, Steve Blackmon ([hidden email]) wrote:

It seems to me that if the transport client dependency is removed, the same  
module could perform inserts, updates, and deletes via the http bulk API,  
and whatever version differences exist with that API could be handled  
inside the module without any difference to the classpath of the pipeline.  

If that's true there's no need or benefit to deprecating support for  
earlier elastic version so long as someone is willing to implement test and  
maintain them.  

Steve  

On 5/13/18 at 2:00 PM, Christophe wrote:  

Hi Gordon,  

Thanks for your feedback (and Flavio for your support!)  

About your remarks/questions:  

- Maybe we can consider removing support for ES 1.x and 2.x starting from  

1.6. Those are very old ES versions (considering that ES 6.x has already  
been out for a while). Do you think this would simply how our base module  
APIs are designed?  

I would tend to say it should not change drastically the picture but would  
have to look into it.  

- Wouldn't it be possible to have a REST implementation of the  

`ElasticsearchSinkCallBridge` for 5.x (covering both 5.2 and 5.3+)? If so,  
once we remove ES 1.x and 2.x, it might actually be possible to completely  
replace the current `elasticsearch-base` module.  

The High level REST API was introduced in Elasticsearch 5.6 so it is not  
possible to cover 5.5 and below with it.  

If all the necessary APIs are already here (to be double checked) it should  
be able cover 5.6. What I noticed when working on the PRs is that 6.2 REST  
Level High Level client API was improved to be closer to original APIs, if  
we want to support 5.6 with it we might have to rely on APIs they already  
improved since then. Not dramatic. But does it worth it knowing this would  
just be giving us 5.6 not 5.2,3,4 and 5?  

Now on moving forward I read:  

I'm definitely a +1 to try to move this forward with a proper fix.  


and  

Working around that would require introducing a new base module  

specifically for 5.3+ and 6.x+, which we've also agreed on the PR isn't a  
nice way to go.  

So if I read you correctly you are ok moving with a proper fix but it  
should not introduce a new (REST based) base module? Then to be honest I'm  
not sure how to proceed :) Any more specific feedback on the direction to  
follow would be great!  

Thanks,  
--  
Christophe  

On Sun, May 13, 2018 at 5:39 AM, Tzu-Li (Gordon) Tai <[hidden email]>  
wrote:  

Hi Christophe,  

Thanks for bringing this up.  

Yes, the main issue with the existing PRs and preventing it from moving  
forward is how it currently breaks initial assumptions of APIs in the  
`elasticsearch-base` module.  
Working around that would require introducing a new base module  
specifically for 5.3+ and 6.x+, which we've also agreed on the PR isn't a  
nice way to go.  

I had a quick stab at the REST API, and it seems to be promising,  
especially given that you mentioned that starting from next versions, the  
current API we use will be fully removed.  
I'm definitely a +1 to try to move this forward with a proper fix.  

Some other remarks / questions I have:  
- Maybe we can consider removing support for ES 1.x and 2.x starting from  
1.6. Those are very old ES versions (considering that ES 6.x has already  
been out for a while). Do you think this would simply how our base module  
APIs are designed?  
- Wouldn't it be possible to have a REST implementation of the  
`ElasticsearchSinkCallBridge` for 5.x (covering both 5.2 and 5.3+)? If so,  
once we remove ES 1.x and 2.x, it might actually be possible to completely  
replace the current `elasticsearch-base` module.  

Cheers,  
Gordon  


On Sun, May 13, 2018 at 12:36 AM, Flavio Pompermaier <[hidden email]  


wrote:  

+1. Torally agree  

On Sat, 12 May 2018, 18:14 Christophe Jolif, <[hidden email]> wrote:  

Hi all,  

There is quite some time Flink Elasticsearch sink is broken for  
Elastisearch 5.x (nearly a year):  

https://issues.apache.org/jira/browse/FLINK-7386 

And there is no support for Elasticsearch 6.x:  

https://issues.apache.org/jira/browse/FLINK-8101 

However several PRs were issued:  

https://github.com/apache/flink/pull/4675 
https://github.com/apache/flink/pull/5374 

I also raised the issue on the mailing list in the 1.5 timeframe:  

http://apache-flink-mailing-list-archive.1008284.n3.  
nabble.com/DISCUSS-Releasing-Flink-1-5-0-td20867.html#a20905  

But things are still not really moving. However this seems something  

people  

are looking for, so I would really like the community to consider that  

for  

1.6.  

The problems I see from comments on the PRs:  

- getting something that is following the Flink APIs initially created  

is a  

nightmare because Elastic is pretty good at breaking compatibility the  

hard  

way (see in particular in the last PR the cast that have to be made to  

get  

an API that works in all cases)  
- Elasticsearch is moving away from their "native" API Flink is using  

to  

the REST APIs so there is little common ground between pre 6 and post  

6  

even if Elasticsearch tried to get some level of compatibility in the  

APIs.  


My fear is that by trying to kill two birds with one stone, we actually  

get  

nothing done.  

In the hope of moving that forward I would like to propose for 1.6 a  

new  

Elasticsearch 6.x+ sink that would follow the design of the previous  

ones  

BUT only leverage the new REST API and not inherit from existing  

classes.  

It would really be close to what is in my previous PR:  
https://github.com/apache/flink/pull/5374 but just focus on E6+/REST  

and  

so  
avoid any "strange" cast.  

This would not fill the gap of the 5.2+ not working but at least we  

would  

be back on track with something for the future as REST API is where  

Elastic  

is going.  

If people feel there is actual interest and chances this can be merged  

I'll  

be working on issuing a new PR around that.  

Alternative is to get back working on the existing PR but it seems to  

be  

a  

dead-end at the moment and not necessarily the best option long term as  
anyway Elasticsearch is looking into promoting the REST API.  

Please let me know what you think?  

--  
Christophe  
Reply | Threaded
Open this post in threaded view
|

Re: Elasticsearch Sink

Christophe Jolif
Hi Gordon,

On Tue, May 15, 2018 at 6:16 AM, Tzu-Li (Gordon) Tai <[hidden email]>
wrote:

> Hi,
>
> Let me first clarify a few things so that we are on the same page here:
>
> 1. The reason that we are looking into having a new base-module for
> versions 5.3+ is that
> new Elasticsearch BulkProcessor APIs are breaking some of our original
> base API assumptions.
> This is noticeable from the intended cast here [1].
>
> 2. Given that we are looking into a new base module, Christophe proposed
> that we focus on designing
> the new base module around Elasticsearch’s new REST API, so that we are
> more-future proof.
>
> 3. I proposed to remove / deprecate support for earlier versions, because
> once we introduce the new
> base module, we would be maintaining a lot of Elasticsearch connector
> modules (2 base modules, 4+ version specific).
> Of course, whether or not this really is a problem depends on how much
> capacity the community has to maintain them, as Steve mentioned.
>
>
> Now, moving on to another proposed solution that should work (at least for
> now, depends on how Elasticsearch changes their API in the future):
> The main problem we’ve bumped into is that Elasticsearch changed their
> BulkProcessor API from accepting `add(ActionRequest)`s to
> `add(DocWriteRequest)`s.
>
> The reason why our elasticsearch-base-module used the `ActionRequest` in
> the first place was so that sinks could have full functionality to perform
> both delete and write requests.
> We can actually just consider separating `RequestIndexer#add(ActionRequest
> …)` to `RequestIndexer#add(IndexRequest)` and `RequestIndexer#add(
> DeleteRequest)`.
> AFAIK, the IndexRequest class and DeleteRequest class has remained stable
> across all ES versions (1.x., 2.x, 5.x, 6.x), so we should be safe with
> that.
>
> If I’m not mistaken, this should allow us to avoid introducing a new
> module (no need for a new base AND a 5.3+ module), still have equal
> functionality, and continue to use the existing base module across all
> versions.
> The only downside would be that we’ll need to deprecate / break
> `RequestIndexer#add(ActionRequest …)` in favor of the new separated
> methods.
>
> Whether or not we move on to use the HighLevelRestClient for 6.x, would
> then be a completely orthogonal consideration. We can simply use it in 6.x
> as an implementation of the API call bridge.
>
> What do you think?
>

What if the user in a ES5.3+ case is calling the deprecated method? You
agree it will fail? I'm not necessarily against that. I just want to make
it clear that we don't have a perfect solution here either? If we ignore
that it should be working fine if indeed we always considered that among
ActionRequest only delete and index were actually supported (which makes
sense to me).

For the REST part, there is another incompatibility at the "Client" API
level. Indeed the RestClient is not implementing the "Client" interface
that is exposed in the bridge. So "just" doing the change above would still
not allow to provide a REST based implementation?

Best,

> Gordon
>
> [1] https://github.com/apache/flink/pull/4675/files#diff-
> 5539d0d57fa1ac17a36f08d1bb9a90b5R54
>
> On 15 May 2018 at 9:15:52 AM, Steve Blackmon ([hidden email]) wrote:
>
> It seems to me that if the transport client dependency is removed, the
> same
> module could perform inserts, updates, and deletes via the http bulk API,
> and whatever version differences exist with that API could be handled
> inside the module without any difference to the classpath of the
> pipeline.
>
> If that's true there's no need or benefit to deprecating support for
> earlier elastic version so long as someone is willing to implement test
> and
> maintain them.
>
> Steve
>
> On 5/13/18 at 2:00 PM, Christophe wrote:
>
> Hi Gordon,
>
> Thanks for your feedback (and Flavio for your support!)
>
> About your remarks/questions:
>
> - Maybe we can consider removing support for ES 1.x and 2.x starting from
>
> 1.6. Those are very old ES versions (considering that ES 6.x has already
> been out for a while). Do you think this would simply how our base module
> APIs are designed?
>
> I would tend to say it should not change drastically the picture but
> would
> have to look into it.
>
> - Wouldn't it be possible to have a REST implementation of the
>
> `ElasticsearchSinkCallBridge` for 5.x (covering both 5.2 and 5.3+)? If
> so,
> once we remove ES 1.x and 2.x, it might actually be possible to
> completely
> replace the current `elasticsearch-base` module.
>
> The High level REST API was introduced in Elasticsearch 5.6 so it is not
> possible to cover 5.5 and below with it.
>
> If all the necessary APIs are already here (to be double checked) it
> should
> be able cover 5.6. What I noticed when working on the PRs is that 6.2
> REST
> Level High Level client API was improved to be closer to original APIs,
> if
> we want to support 5.6 with it we might have to rely on APIs they already
> improved since then. Not dramatic. But does it worth it knowing this
> would
> just be giving us 5.6 not 5.2,3,4 and 5?
>
> Now on moving forward I read:
>
> I'm definitely a +1 to try to move this forward with a proper fix.
>
>
> and
>
> Working around that would require introducing a new base module
>
> specifically for 5.3+ and 6.x+, which we've also agreed on the PR isn't a
> nice way to go.
>
> So if I read you correctly you are ok moving with a proper fix but it
> should not introduce a new (REST based) base module? Then to be honest
> I'm
> not sure how to proceed :) Any more specific feedback on the direction to
> follow would be great!
>
> Thanks,
> --
> Christophe
>
> On Sun, May 13, 2018 at 5:39 AM, Tzu-Li (Gordon) Tai <[hidden email]>
>
> wrote:
>
> Hi Christophe,
>
> Thanks for bringing this up.
>
> Yes, the main issue with the existing PRs and preventing it from moving
> forward is how it currently breaks initial assumptions of APIs in the
> `elasticsearch-base` module.
> Working around that would require introducing a new base module
> specifically for 5.3+ and 6.x+, which we've also agreed on the PR isn't a
> nice way to go.
>
> I had a quick stab at the REST API, and it seems to be promising,
> especially given that you mentioned that starting from next versions, the
> current API we use will be fully removed.
> I'm definitely a +1 to try to move this forward with a proper fix.
>
> Some other remarks / questions I have:
> - Maybe we can consider removing support for ES 1.x and 2.x starting from
> 1.6. Those are very old ES versions (considering that ES 6.x has already
> been out for a while). Do you think this would simply how our base module
> APIs are designed?
> - Wouldn't it be possible to have a REST implementation of the
> `ElasticsearchSinkCallBridge` for 5.x (covering both 5.2 and 5.3+)? If
> so,
> once we remove ES 1.x and 2.x, it might actually be possible to
> completely
> replace the current `elasticsearch-base` module.
>
> Cheers,
> Gordon
>
>
> On Sun, May 13, 2018 at 12:36 AM, Flavio Pompermaier <[hidden email]
>
>
>
> wrote:
>
> +1. Torally agree
>
> On Sat, 12 May 2018, 18:14 Christophe Jolif, <[hidden email]> wrote:
>
> Hi all,
>
> There is quite some time Flink Elasticsearch sink is broken for
> Elastisearch 5.x (nearly a year):
>
> https://issues.apache.org/jira/browse/FLINK-7386
>
> And there is no support for Elasticsearch 6.x:
>
> https://issues.apache.org/jira/browse/FLINK-8101
>
> However several PRs were issued:
>
> https://github.com/apache/flink/pull/4675
> https://github.com/apache/flink/pull/5374
>
> I also raised the issue on the mailing list in the 1.5 timeframe:
>
> http://apache-flink-mailing-list-archive.1008284.n3.
> nabble.com/DISCUSS-Releasing-Flink-1-5-0-td20867.html#a20905
>
> But things are still not really moving. However this seems something
>
> people
>
> are looking for, so I would really like the community to consider that
>
> for
>
> 1.6.
>
> The problems I see from comments on the PRs:
>
> - getting something that is following the Flink APIs initially created
>
> is a
>
> nightmare because Elastic is pretty good at breaking compatibility the
>
> hard
>
> way (see in particular in the last PR the cast that have to be made to
>
> get
>
> an API that works in all cases)
> - Elasticsearch is moving away from their "native" API Flink is using
>
> to
>
> the REST APIs so there is little common ground between pre 6 and post
>
> 6
>
> even if Elasticsearch tried to get some level of compatibility in the
>
> APIs.
>
>
> My fear is that by trying to kill two birds with one stone, we actually
>
> get
>
> nothing done.
>
> In the hope of moving that forward I would like to propose for 1.6 a
>
> new
>
> Elasticsearch 6.x+ sink that would follow the design of the previous
>
> ones
>
> BUT only leverage the new REST API and not inherit from existing
>
> classes.
>
> It would really be close to what is in my previous PR:
> https://github.com/apache/flink/pull/5374 but just focus on E6+/REST
>
> and
>
> so
> avoid any "strange" cast.
>
> This would not fill the gap of the 5.2+ not working but at least we
>
> would
>
> be back on track with something for the future as REST API is where
>
> Elastic
>
> is going.
>
> If people feel there is actual interest and chances this can be merged
>
> I'll
>
> be working on issuing a new PR around that.
>
> Alternative is to get back working on the existing PR but it seems to
>
> be
>
> a
>
> dead-end at the moment and not necessarily the best option long term as
> anyway Elasticsearch is looking into promoting the REST API.
>
> Please let me know what you think?
>
> --
> Christophe
>



--
Christophe
Reply | Threaded
Open this post in threaded view
|

Re: Elasticsearch Sink

Tzu-Li (Gordon) Tai
Hi,

What if the user in a ES5.3+ case is calling the deprecated method? You 
agree it will fail? I'm not necessarily against that. I just want to make 
it clear that we don't have a perfect solution here either?

I think what we could do here is at least in the deprecated `add(ActionRequest)` method, try casting to either `IndexRequest` or `DeleteRequest` and forward calls to the new methods.
As a matter of fact, this is exactly what the Elasticsearch BulkProcessor API is doing internally [1] [2] [3], so we should be safe here.
Looking at the code in [1] [2] [3], we should actually also consider it being a `UpdateRequest` (just to correct my previous statement that it can only be either index or delete).
But yes, I think there wouldn’t be a perfect solution here; something has to be broken / deprecated in order for our implementation to be more future proof.

indeed we always considered that among 
ActionRequest only delete and index were actually supported (which makes 
sense to me). 

Looking at the code in [1] [2] [3], we should actually also consider it being a `UpdateRequest` (just to correct my previous statement that it can only be either index or delete).
And yes, the Elasticsearch Javadocs of the BulkProcessor also clearly state this.

For the REST part, there is another incompatibility at the "Client" API 
level. Indeed the RestClient is not implementing the "Client" interface 
that is exposed in the bridge. So "just" doing the change above would still 
not allow to provide a REST based implementation? 

I agree. IIRC, the ES PRs that were opened also did this by changing the return type from Client to AutoClosable, as well as letting the call bridge also handle creation of BulkProcessors, correct?
I think this is definitely the way to go for us to be more future proof.
The general rule of thumb is that, for all APIs (either APIs of the base module that the ES connectors internally use, or user-facing APIs of the connector), we should move towards not leaking Elasticsearch classes as the API.
This goes for removing Client as the return type in the call bridge interface. When re-working the RequestIndexer interface, we can even consider not directly exposing the `IndexRequest`, `DeleteRequest`, `UpdateRequest` classes as the API.
Instead, we maintain our own request class to abstract those classes away, and only create the actual Elasticsearch requests internally.

Cheers,
Gordon

[1] https://github.com/elastic/elasticsearch/blob/v5.2.2/core/src/main/java/org/elasticsearch/action/bulk/BulkRequest.java#L110
[2] https://github.com/elastic/elasticsearch/blob/v6.2.4/server/src/main/java/org/elasticsearch/action/bulk/BulkRequest.java#L128
On 16 May 2018 at 4:12:15 AM, Christophe Jolif ([hidden email]) wrote:

Hi Gordon,  

On Tue, May 15, 2018 at 6:16 AM, Tzu-Li (Gordon) Tai <[hidden email]>  
wrote:  

> Hi,  
>  
> Let me first clarify a few things so that we are on the same page here:  
>  
> 1. The reason that we are looking into having a new base-module for  
> versions 5.3+ is that  
> new Elasticsearch BulkProcessor APIs are breaking some of our original  
> base API assumptions.  
> This is noticeable from the intended cast here [1].  
>  
> 2. Given that we are looking into a new base module, Christophe proposed  
> that we focus on designing  
> the new base module around Elasticsearch’s new REST API, so that we are  
> more-future proof.  
>  
> 3. I proposed to remove / deprecate support for earlier versions, because  
> once we introduce the new  
> base module, we would be maintaining a lot of Elasticsearch connector  
> modules (2 base modules, 4+ version specific).  
> Of course, whether or not this really is a problem depends on how much  
> capacity the community has to maintain them, as Steve mentioned.  
>  
>  
> Now, moving on to another proposed solution that should work (at least for  
> now, depends on how Elasticsearch changes their API in the future):  
> The main problem we’ve bumped into is that Elasticsearch changed their  
> BulkProcessor API from accepting `add(ActionRequest)`s to  
> `add(DocWriteRequest)`s.  
>  
> The reason why our elasticsearch-base-module used the `ActionRequest` in  
> the first place was so that sinks could have full functionality to perform  
> both delete and write requests.  
> We can actually just consider separating `RequestIndexer#add(ActionRequest  
> …)` to `RequestIndexer#add(IndexRequest)` and `RequestIndexer#add(  
> DeleteRequest)`.  
> AFAIK, the IndexRequest class and DeleteRequest class has remained stable  
> across all ES versions (1.x., 2.x, 5.x, 6.x), so we should be safe with  
> that.  
>  
> If I’m not mistaken, this should allow us to avoid introducing a new  
> module (no need for a new base AND a 5.3+ module), still have equal  
> functionality, and continue to use the existing base module across all  
> versions.  
> The only downside would be that we’ll need to deprecate / break  
> `RequestIndexer#add(ActionRequest …)` in favor of the new separated  
> methods.  
>  
> Whether or not we move on to use the HighLevelRestClient for 6.x, would  
> then be a completely orthogonal consideration. We can simply use it in 6.x  
> as an implementation of the API call bridge.  
>  
> What do you think?  
>  

What if the user in a ES5.3+ case is calling the deprecated method? You  
agree it will fail? I'm not necessarily against that. I just want to make  
it clear that we don't have a perfect solution here either? If we ignore  
that it should be working fine if indeed we always considered that among  
ActionRequest only delete and index were actually supported (which makes  
sense to me).  

For the REST part, there is another incompatibility at the "Client" API  
level. Indeed the RestClient is not implementing the "Client" interface  
that is exposed in the bridge. So "just" doing the change above would still  
not allow to provide a REST based implementation?  

Best,  

> Gordon  
>  
> [1] https://github.com/apache/flink/pull/4675/files#diff- 
> 5539d0d57fa1ac17a36f08d1bb9a90b5R54  
>  
> On 15 May 2018 at 9:15:52 AM, Steve Blackmon ([hidden email]) wrote:  
>  
> It seems to me that if the transport client dependency is removed, the  
> same  
> module could perform inserts, updates, and deletes via the http bulk API,  
> and whatever version differences exist with that API could be handled  
> inside the module without any difference to the classpath of the  
> pipeline.  
>  
> If that's true there's no need or benefit to deprecating support for  
> earlier elastic version so long as someone is willing to implement test  
> and  
> maintain them.  
>  
> Steve  
>  
> On 5/13/18 at 2:00 PM, Christophe wrote:  
>  
> Hi Gordon,  
>  
> Thanks for your feedback (and Flavio for your support!)  
>  
> About your remarks/questions:  
>  
> - Maybe we can consider removing support for ES 1.x and 2.x starting from  
>  
> 1.6. Those are very old ES versions (considering that ES 6.x has already  
> been out for a while). Do you think this would simply how our base module  
> APIs are designed?  
>  
> I would tend to say it should not change drastically the picture but  
> would  
> have to look into it.  
>  
> - Wouldn't it be possible to have a REST implementation of the  
>  
> `ElasticsearchSinkCallBridge` for 5.x (covering both 5.2 and 5.3+)? If  
> so,  
> once we remove ES 1.x and 2.x, it might actually be possible to  
> completely  
> replace the current `elasticsearch-base` module.  
>  
> The High level REST API was introduced in Elasticsearch 5.6 so it is not  
> possible to cover 5.5 and below with it.  
>  
> If all the necessary APIs are already here (to be double checked) it  
> should  
> be able cover 5.6. What I noticed when working on the PRs is that 6.2  
> REST  
> Level High Level client API was improved to be closer to original APIs,  
> if  
> we want to support 5.6 with it we might have to rely on APIs they already  
> improved since then. Not dramatic. But does it worth it knowing this  
> would  
> just be giving us 5.6 not 5.2,3,4 and 5?  
>  
> Now on moving forward I read:  
>  
> I'm definitely a +1 to try to move this forward with a proper fix.  
>  
>  
> and  
>  
> Working around that would require introducing a new base module  
>  
> specifically for 5.3+ and 6.x+, which we've also agreed on the PR isn't a  
> nice way to go.  
>  
> So if I read you correctly you are ok moving with a proper fix but it  
> should not introduce a new (REST based) base module? Then to be honest  
> I'm  
> not sure how to proceed :) Any more specific feedback on the direction to  
> follow would be great!  
>  
> Thanks,  
> --  
> Christophe  
>  
> On Sun, May 13, 2018 at 5:39 AM, Tzu-Li (Gordon) Tai <[hidden email]>  
>  
> wrote:  
>  
> Hi Christophe,  
>  
> Thanks for bringing this up.  
>  
> Yes, the main issue with the existing PRs and preventing it from moving  
> forward is how it currently breaks initial assumptions of APIs in the  
> `elasticsearch-base` module.  
> Working around that would require introducing a new base module  
> specifically for 5.3+ and 6.x+, which we've also agreed on the PR isn't a  
> nice way to go.  
>  
> I had a quick stab at the REST API, and it seems to be promising,  
> especially given that you mentioned that starting from next versions, the  
> current API we use will be fully removed.  
> I'm definitely a +1 to try to move this forward with a proper fix.  
>  
> Some other remarks / questions I have:  
> - Maybe we can consider removing support for ES 1.x and 2.x starting from  
> 1.6. Those are very old ES versions (considering that ES 6.x has already  
> been out for a while). Do you think this would simply how our base module  
> APIs are designed?  
> - Wouldn't it be possible to have a REST implementation of the  
> `ElasticsearchSinkCallBridge` for 5.x (covering both 5.2 and 5.3+)? If  
> so,  
> once we remove ES 1.x and 2.x, it might actually be possible to  
> completely  
> replace the current `elasticsearch-base` module.  
>  
> Cheers,  
> Gordon  
>  
>  
> On Sun, May 13, 2018 at 12:36 AM, Flavio Pompermaier <[hidden email]  
>  
>  
>  
> wrote:  
>  
> +1. Torally agree  
>  
> On Sat, 12 May 2018, 18:14 Christophe Jolif, <[hidden email]> wrote:  
>  
> Hi all,  
>  
> There is quite some time Flink Elasticsearch sink is broken for  
> Elastisearch 5.x (nearly a year):  
>  
> https://issues.apache.org/jira/browse/FLINK-7386 
>  
> And there is no support for Elasticsearch 6.x:  
>  
> https://issues.apache.org/jira/browse/FLINK-8101 
>  
> However several PRs were issued:  
>  
> https://github.com/apache/flink/pull/4675 
> https://github.com/apache/flink/pull/5374 
>  
> I also raised the issue on the mailing list in the 1.5 timeframe:  
>  
> http://apache-flink-mailing-list-archive.1008284.n3.  
> nabble.com/DISCUSS-Releasing-Flink-1-5-0-td20867.html#a20905  
>  
> But things are still not really moving. However this seems something  
>  
> people  
>  
> are looking for, so I would really like the community to consider that  
>  
> for  
>  
> 1.6.  
>  
> The problems I see from comments on the PRs:  
>  
> - getting something that is following the Flink APIs initially created  
>  
> is a  
>  
> nightmare because Elastic is pretty good at breaking compatibility the  
>  
> hard  
>  
> way (see in particular in the last PR the cast that have to be made to  
>  
> get  
>  
> an API that works in all cases)  
> - Elasticsearch is moving away from their "native" API Flink is using  
>  
> to  
>  
> the REST APIs so there is little common ground between pre 6 and post  
>  
> 6  
>  
> even if Elasticsearch tried to get some level of compatibility in the  
>  
> APIs.  
>  
>  
> My fear is that by trying to kill two birds with one stone, we actually  
>  
> get  
>  
> nothing done.  
>  
> In the hope of moving that forward I would like to propose for 1.6 a  
>  
> new  
>  
> Elasticsearch 6.x+ sink that would follow the design of the previous  
>  
> ones  
>  
> BUT only leverage the new REST API and not inherit from existing  
>  
> classes.  
>  
> It would really be close to what is in my previous PR:  
> https://github.com/apache/flink/pull/5374 but just focus on E6+/REST  
>  
> and  
>  
> so  
> avoid any "strange" cast.  
>  
> This would not fill the gap of the 5.2+ not working but at least we  
>  
> would  
>  
> be back on track with something for the future as REST API is where  
>  
> Elastic  
>  
> is going.  
>  
> If people feel there is actual interest and chances this can be merged  
>  
> I'll  
>  
> be working on issuing a new PR around that.  
>  
> Alternative is to get back working on the existing PR but it seems to  
>  
> be  
>  
> a  
>  
> dead-end at the moment and not necessarily the best option long term as  
> anyway Elasticsearch is looking into promoting the REST API.  
>  
> Please let me know what you think?  
>  
> --  
> Christophe  
>  



--  
Christophe  
Reply | Threaded
Open this post in threaded view
|

Re: Elasticsearch Sink

Christophe Jolif
Ok thanks for the feedback.

> I agree. IIRC, the ES PRs that were opened also did this by changing the
return type from Client to AutoClosable, as well as letting the call bridge
also handle creation of BulkProcessors, correct?

Correct.

> Instead, we maintain our own request class to abstract those classes
away, and only create the actual Elasticsearch requests internally.

I'm personally unsure I would like to have to use Flink specific request
classes instead of Elastic ones but apart from that I think we are pretty
much inline.

I'm not exactly sure when I'll get the cycles but I will try to issue yet
another PR along those lines so we can continue the discussion from there
and hopefully we would get something in time for 1.6!

Thanks again,
--
Christophe

On Wed, May 16, 2018 at 7:19 AM, Tzu-Li (Gordon) Tai <[hidden email]>
wrote:

> Hi,
>
> What if the user in a ES5.3+ case is calling the deprecated method? You
> agree it will fail? I'm not necessarily against that. I just want to make
> it clear that we don't have a perfect solution here either?
>
>
> I think what we could do here is at least in the deprecated
> `add(ActionRequest)` method, try casting to either `IndexRequest` or
> `DeleteRequest` and forward calls to the new methods.
> As a matter of fact, this is exactly what the Elasticsearch BulkProcessor
> API is doing internally [1] [2] [3], so we should be safe here.
> Looking at the code in [1] [2] [3], we should actually also consider it
> being a `UpdateRequest` (just to correct my previous statement that it can
> only be either index or delete).
> But yes, I think there wouldn’t be a perfect solution here; something has
> to be broken / deprecated in order for our implementation to be more future
> proof.
>
> indeed we always considered that among
> ActionRequest only delete and index were actually supported (which makes
> sense to me).
>
>
> Looking at the code in [1] [2] [3], we should actually also consider it
> being a `UpdateRequest` (just to correct my previous statement that it can
> only be either index or delete).
> And yes, the Elasticsearch Javadocs of the BulkProcessor also clearly
> state this.
>
> For the REST part, there is another incompatibility at the "Client" API
> level. Indeed the RestClient is not implementing the "Client" interface
> that is exposed in the bridge. So "just" doing the change above would
> still
> not allow to provide a REST based implementation?
>
>
> I agree. IIRC, the ES PRs that were opened also did this by changing the
> return type from Client to AutoClosable, as well as letting the call bridge
> also handle creation of BulkProcessors, correct?
> I think this is definitely the way to go for us to be more future proof.
> The general rule of thumb is that, for all APIs (either APIs of the base
> module that the ES connectors internally use, or user-facing APIs of the
> connector), we should move towards not leaking Elasticsearch classes as the
> API.
> This goes for removing Client as the return type in the call bridge
> interface. When re-working the RequestIndexer interface, we can even
> consider not directly exposing the `IndexRequest`, `DeleteRequest`,
> `UpdateRequest` classes as the API.
> Instead, we maintain our own request class to abstract those classes away,
> and only create the actual Elasticsearch requests internally.
>
> Cheers,
> Gordon
>
> [1] https://github.com/elastic/elasticsearch/blob/v5.
> 2.2/core/src/main/java/org/elasticsearch/action/bulk/BulkRequest.java#L110
> [2] https://github.com/elastic/elasticsearch/blob/v6.
> 2.4/server/src/main/java/org/elasticsearch/action/bulk/
> BulkRequest.java#L128
>
> On 16 May 2018 at 4:12:15 AM, Christophe Jolif ([hidden email]) wrote:
>
> Hi Gordon,
>
> On Tue, May 15, 2018 at 6:16 AM, Tzu-Li (Gordon) Tai <[hidden email]>
>
> wrote:
>
> > Hi,
> >
> > Let me first clarify a few things so that we are on the same page here:
> >
> > 1. The reason that we are looking into having a new base-module for
> > versions 5.3+ is that
> > new Elasticsearch BulkProcessor APIs are breaking some of our original
> > base API assumptions.
> > This is noticeable from the intended cast here [1].
> >
> > 2. Given that we are looking into a new base module, Christophe proposed
> > that we focus on designing
> > the new base module around Elasticsearch’s new REST API, so that we are
> > more-future proof.
> >
> > 3. I proposed to remove / deprecate support for earlier versions,
> because
> > once we introduce the new
> > base module, we would be maintaining a lot of Elasticsearch connector
> > modules (2 base modules, 4+ version specific).
> > Of course, whether or not this really is a problem depends on how much
> > capacity the community has to maintain them, as Steve mentioned.
> >
> >
> > Now, moving on to another proposed solution that should work (at least
> for
> > now, depends on how Elasticsearch changes their API in the future):
> > The main problem we’ve bumped into is that Elasticsearch changed their
> > BulkProcessor API from accepting `add(ActionRequest)`s to
> > `add(DocWriteRequest)`s.
> >
> > The reason why our elasticsearch-base-module used the `ActionRequest` in
> > the first place was so that sinks could have full functionality to
> perform
> > both delete and write requests.
> > We can actually just consider separating `RequestIndexer#add(ActionRequest
>
> > …)` to `RequestIndexer#add(IndexRequest)` and `RequestIndexer#add(
> > DeleteRequest)`.
> > AFAIK, the IndexRequest class and DeleteRequest class has remained
> stable
> > across all ES versions (1.x., 2.x, 5.x, 6.x), so we should be safe with
> > that.
> >
> > If I’m not mistaken, this should allow us to avoid introducing a new
> > module (no need for a new base AND a 5.3+ module), still have equal
> > functionality, and continue to use the existing base module across all
> > versions.
> > The only downside would be that we’ll need to deprecate / break
> > `RequestIndexer#add(ActionRequest …)` in favor of the new separated
> > methods.
> >
> > Whether or not we move on to use the HighLevelRestClient for 6.x, would
> > then be a completely orthogonal consideration. We can simply use it in
> 6.x
> > as an implementation of the API call bridge.
> >
> > What do you think?
> >
>
> What if the user in a ES5.3+ case is calling the deprecated method? You
> agree it will fail? I'm not necessarily against that. I just want to make
> it clear that we don't have a perfect solution here either? If we ignore
> that it should be working fine if indeed we always considered that among
> ActionRequest only delete and index were actually supported (which makes
> sense to me).
>
> For the REST part, there is another incompatibility at the "Client" API
> level. Indeed the RestClient is not implementing the "Client" interface
> that is exposed in the bridge. So "just" doing the change above would
> still
> not allow to provide a REST based implementation?
>
> Best,
> > Gordon
> >
> > [1] https://github.com/apache/flink/pull/4675/files#diff-
> > 5539d0d57fa1ac17a36f08d1bb9a90b5R54
> >
> > On 15 May 2018 at 9:15:52 AM, Steve Blackmon ([hidden email])
> wrote:
> >
> > It seems to me that if the transport client dependency is removed, the
> > same
> > module could perform inserts, updates, and deletes via the http bulk
> API,
> > and whatever version differences exist with that API could be handled
> > inside the module without any difference to the classpath of the
> > pipeline.
> >
> > If that's true there's no need or benefit to deprecating support for
> > earlier elastic version so long as someone is willing to implement test
> > and
> > maintain them.
> >
> > Steve
> >
> > On 5/13/18 at 2:00 PM, Christophe wrote:
> >
> > Hi Gordon,
> >
> > Thanks for your feedback (and Flavio for your support!)
> >
> > About your remarks/questions:
> >
> > - Maybe we can consider removing support for ES 1.x and 2.x starting
> from
> >
> > 1.6. Those are very old ES versions (considering that ES 6.x has already
> > been out for a while). Do you think this would simply how our base
> module
> > APIs are designed?
> >
> > I would tend to say it should not change drastically the picture but
> > would
> > have to look into it.
> >
> > - Wouldn't it be possible to have a REST implementation of the
> >
> > `ElasticsearchSinkCallBridge` for 5.x (covering both 5.2 and 5.3+)? If
> > so,
> > once we remove ES 1.x and 2.x, it might actually be possible to
> > completely
> > replace the current `elasticsearch-base` module.
> >
> > The High level REST API was introduced in Elasticsearch 5.6 so it is not
> > possible to cover 5.5 and below with it.
> >
> > If all the necessary APIs are already here (to be double checked) it
> > should
> > be able cover 5.6. What I noticed when working on the PRs is that 6.2
> > REST
> > Level High Level client API was improved to be closer to original APIs,
> > if
> > we want to support 5.6 with it we might have to rely on APIs they
> already
> > improved since then. Not dramatic. But does it worth it knowing this
> > would
> > just be giving us 5.6 not 5.2,3,4 and 5?
> >
> > Now on moving forward I read:
> >
> > I'm definitely a +1 to try to move this forward with a proper fix.
> >
> >
> > and
> >
> > Working around that would require introducing a new base module
> >
> > specifically for 5.3+ and 6.x+, which we've also agreed on the PR isn't
> a
> > nice way to go.
> >
> > So if I read you correctly you are ok moving with a proper fix but it
> > should not introduce a new (REST based) base module? Then to be honest
> > I'm
> > not sure how to proceed :) Any more specific feedback on the direction
> to
> > follow would be great!
> >
> > Thanks,
> > --
> > Christophe
> >
> > On Sun, May 13, 2018 at 5:39 AM, Tzu-Li (Gordon) Tai <
> [hidden email]>
> >
> > wrote:
> >
> > Hi Christophe,
> >
> > Thanks for bringing this up.
> >
> > Yes, the main issue with the existing PRs and preventing it from moving
> > forward is how it currently breaks initial assumptions of APIs in the
> > `elasticsearch-base` module.
> > Working around that would require introducing a new base module
> > specifically for 5.3+ and 6.x+, which we've also agreed on the PR isn't
> a
> > nice way to go.
> >
> > I had a quick stab at the REST API, and it seems to be promising,
> > especially given that you mentioned that starting from next versions,
> the
> > current API we use will be fully removed.
> > I'm definitely a +1 to try to move this forward with a proper fix.
> >
> > Some other remarks / questions I have:
> > - Maybe we can consider removing support for ES 1.x and 2.x starting
> from
> > 1.6. Those are very old ES versions (considering that ES 6.x has already
> > been out for a while). Do you think this would simply how our base
> module
> > APIs are designed?
> > - Wouldn't it be possible to have a REST implementation of the
> > `ElasticsearchSinkCallBridge` for 5.x (covering both 5.2 and 5.3+)? If
> > so,
> > once we remove ES 1.x and 2.x, it might actually be possible to
> > completely
> > replace the current `elasticsearch-base` module.
> >
> > Cheers,
> > Gordon
> >
> >
> > On Sun, May 13, 2018 at 12:36 AM, Flavio Pompermaier <
> [hidden email]
> >
> >
> >
> > wrote:
> >
> > +1. Torally agree
> >
> > On Sat, 12 May 2018, 18:14 Christophe Jolif, <[hidden email]> wrote:
> >
> > Hi all,
> >
> > There is quite some time Flink Elasticsearch sink is broken for
> > Elastisearch 5.x (nearly a year):
> >
> > https://issues.apache.org/jira/browse/FLINK-7386
> >
> > And there is no support for Elasticsearch 6.x:
> >
> > https://issues.apache.org/jira/browse/FLINK-8101
> >
> > However several PRs were issued:
> >
> > https://github.com/apache/flink/pull/4675
> > https://github.com/apache/flink/pull/5374
> >
> > I also raised the issue on the mailing list in the 1.5 timeframe:
> >
> > http://apache-flink-mailing-list-archive.1008284.n3.
> > nabble.com/DISCUSS-Releasing-Flink-1-5-0-td20867.html#a20905
> >
> > But things are still not really moving. However this seems something
> >
> > people
> >
> > are looking for, so I would really like the community to consider that
> >
> > for
> >
> > 1.6.
> >
> > The problems I see from comments on the PRs:
> >
> > - getting something that is following the Flink APIs initially created
> >
> > is a
> >
> > nightmare because Elastic is pretty good at breaking compatibility the
> >
> > hard
> >
> > way (see in particular in the last PR the cast that have to be made to
> >
> > get
> >
> > an API that works in all cases)
> > - Elasticsearch is moving away from their "native" API Flink is using
> >
> > to
> >
> > the REST APIs so there is little common ground between pre 6 and post
> >
> > 6
> >
> > even if Elasticsearch tried to get some level of compatibility in the
> >
> > APIs.
> >
> >
> > My fear is that by trying to kill two birds with one stone, we actually
> >
> > get
> >
> > nothing done.
> >
> > In the hope of moving that forward I would like to propose for 1.6 a
> >
> > new
> >
> > Elasticsearch 6.x+ sink that would follow the design of the previous
> >
> > ones
> >
> > BUT only leverage the new REST API and not inherit from existing
> >
> > classes.
> >
> > It would really be close to what is in my previous PR:
> > https://github.com/apache/flink/pull/5374 but just focus on E6+/REST
> >
> > and
> >
> > so
> > avoid any "strange" cast.
> >
> > This would not fill the gap of the 5.2+ not working but at least we
> >
> > would
> >
> > be back on track with something for the future as REST API is where
> >
> > Elastic
> >
> > is going.
> >
> > If people feel there is actual interest and chances this can be merged
> >
> > I'll
> >
> > be working on issuing a new PR around that.
> >
> > Alternative is to get back working on the existing PR but it seems to
> >
> > be
> >
> > a
> >
> > dead-end at the moment and not necessarily the best option long term as
> > anyway Elasticsearch is looking into promoting the REST API.
> >
> > Please let me know what you think?
> >
> > --
> > Christophe
> >
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Elasticsearch Sink

Tzu-Li (Gordon) Tai
Good to know! Thanks a lot for pushing this Christophe.

Please ping me when the new PR is opened, and we can continue the discussion there.
Ideally we have this in early in the 1.6 release cycle, so that the Elasticsearch e2e tests (will be merging a PR for that soon) can catching anything unexpected.

Cheers,
Gordon

On 16 May 2018 at 4:26:36 PM, Christophe Jolif ([hidden email]) wrote:

Ok thanks for the feedback. 

> I agree. IIRC, the ES PRs that were opened also did this by changing the return type from Client to AutoClosable, as well as letting the call bridge also handle creation of BulkProcessors, correct?

Correct.

> Instead, we maintain our own request class to abstract those classes away, and only create the actual Elasticsearch requests internally.

I'm personally unsure I would like to have to use Flink specific request classes instead of Elastic ones but apart from that I think we are pretty much inline. 

I'm not exactly sure when I'll get the cycles but I will try to issue yet another PR along those lines so we can continue the discussion from there and hopefully we would get something in time for 1.6!

Thanks again,
--
Christophe

On Wed, May 16, 2018 at 7:19 AM, Tzu-Li (Gordon) Tai <[hidden email]> wrote:
Hi,

What if the user in a ES5.3+ case is calling the deprecated method? You 
agree it will fail? I'm not necessarily against that. I just want to make 
it clear that we don't have a perfect solution here either?

I think what we could do here is at least in the deprecated `add(ActionRequest)` method, try casting to either `IndexRequest` or `DeleteRequest` and forward calls to the new methods.
As a matter of fact, this is exactly what the Elasticsearch BulkProcessor API is doing internally [1] [2] [3], so we should be safe here.
Looking at the code in [1] [2] [3], we should actually also consider it being a `UpdateRequest` (just to correct my previous statement that it can only be either index or delete).
But yes, I think there wouldn’t be a perfect solution here; something has to be broken / deprecated in order for our implementation to be more future proof.

indeed we always considered that among 
ActionRequest only delete and index were actually supported (which makes 
sense to me). 

Looking at the code in [1] [2] [3], we should actually also consider it being a `UpdateRequest` (just to correct my previous statement that it can only be either index or delete).
And yes, the Elasticsearch Javadocs of the BulkProcessor also clearly state this.

For the REST part, there is another incompatibility at the "Client" API 
level. Indeed the RestClient is not implementing the "Client" interface 
that is exposed in the bridge. So "just" doing the change above would still 
not allow to provide a REST based implementation? 

I agree. IIRC, the ES PRs that were opened also did this by changing the return type from Client to AutoClosable, as well as letting the call bridge also handle creation of BulkProcessors, correct?
I think this is definitely the way to go for us to be more future proof.
The general rule of thumb is that, for all APIs (either APIs of the base module that the ES connectors internally use, or user-facing APIs of the connector), we should move towards not leaking Elasticsearch classes as the API.
This goes for removing Client as the return type in the call bridge interface. When re-working the RequestIndexer interface, we can even consider not directly exposing the `IndexRequest`, `DeleteRequest`, `UpdateRequest` classes as the API.
Instead, we maintain our own request class to abstract those classes away, and only create the actual Elasticsearch requests internally.

Cheers,
Gordon

[1] https://github.com/elastic/elasticsearch/blob/v5.2.2/core/src/main/java/org/elasticsearch/action/bulk/BulkRequest.java#L110
[2] https://github.com/elastic/elasticsearch/blob/v6.2.4/server/src/main/java/org/elasticsearch/action/bulk/BulkRequest.java#L128
On 16 May 2018 at 4:12:15 AM, Christophe Jolif ([hidden email]) wrote:

Hi Gordon,

On Tue, May 15, 2018 at 6:16 AM, Tzu-Li (Gordon) Tai <[hidden email]>
wrote:

> Hi,
>
> Let me first clarify a few things so that we are on the same page here:
>
> 1. The reason that we are looking into having a new base-module for
> versions 5.3+ is that
> new Elasticsearch BulkProcessor APIs are breaking some of our original
> base API assumptions.
> This is noticeable from the intended cast here [1].
>
> 2. Given that we are looking into a new base module, Christophe proposed
> that we focus on designing
> the new base module around Elasticsearch’s new REST API, so that we are
> more-future proof.
>
> 3. I proposed to remove / deprecate support for earlier versions, because
> once we introduce the new
> base module, we would be maintaining a lot of Elasticsearch connector
> modules (2 base modules, 4+ version specific).
> Of course, whether or not this really is a problem depends on how much
> capacity the community has to maintain them, as Steve mentioned.
>
>
> Now, moving on to another proposed solution that should work (at least for
> now, depends on how Elasticsearch changes their API in the future):
> The main problem we’ve bumped into is that Elasticsearch changed their
> BulkProcessor API from accepting `add(ActionRequest)`s to
> `add(DocWriteRequest)`s.
>
> The reason why our elasticsearch-base-module used the `ActionRequest` in
> the first place was so that sinks could have full functionality to perform
> both delete and write requests.
> We can actually just consider separating `RequestIndexer#add(ActionRequest
> …)` to `RequestIndexer#add(IndexRequest)` and `RequestIndexer#add(
> DeleteRequest)`.
> AFAIK, the IndexRequest class and DeleteRequest class has remained stable
> across all ES versions (1.x., 2.x, 5.x, 6.x), so we should be safe with
> that.
>
> If I’m not mistaken, this should allow us to avoid introducing a new
> module (no need for a new base AND a 5.3+ module), still have equal
> functionality, and continue to use the existing base module across all
> versions.
> The only downside would be that we’ll need to deprecate / break
> `RequestIndexer#add(ActionRequest …)` in favor of the new separated
> methods.
>
> Whether or not we move on to use the HighLevelRestClient for 6.x, would
> then be a completely orthogonal consideration. We can simply use it in 6.x
> as an implementation of the API call bridge.
>
> What do you think?
>

What if the user in a ES5.3+ case is calling the deprecated method? You
agree it will fail? I'm not necessarily against that. I just want to make
it clear that we don't have a perfect solution here either? If we ignore
that it should be working fine if indeed we always considered that among
ActionRequest only delete and index were actually supported (which makes
sense to me).

For the REST part, there is another incompatibility at the "Client" API
level. Indeed the RestClient is not implementing the "Client" interface
that is exposed in the bridge. So "just" doing the change above would still
not allow to provide a REST based implementation?

Best,

> Gordon
>
> [1] https://github.com/apache/flink/pull/4675/files#diff-
> 5539d0d57fa1ac17a36f08d1bb9a90b5R54
>
> On 15 May 2018 at 9:15:52 AM, Steve Blackmon ([hidden email]) wrote:
>
> It seems to me that if the transport client dependency is removed, the
> same
> module could perform inserts, updates, and deletes via the http bulk API,
> and whatever version differences exist with that API could be handled
> inside the module without any difference to the classpath of the
> pipeline.
>
> If that's true there's no need or benefit to deprecating support for
> earlier elastic version so long as someone is willing to implement test
> and
> maintain them.
>
> Steve
>
> On 5/13/18 at 2:00 PM, Christophe wrote:
>
> Hi Gordon,
>
> Thanks for your feedback (and Flavio for your support!)
>
> About your remarks/questions:
>
> - Maybe we can consider removing support for ES 1.x and 2.x starting from
>
> 1.6. Those are very old ES versions (considering that ES 6.x has already
> been out for a while). Do you think this would simply how our base module
> APIs are designed?
>
> I would tend to say it should not change drastically the picture but
> would
> have to look into it.
>
> - Wouldn't it be possible to have a REST implementation of the
>
> `ElasticsearchSinkCallBridge` for 5.x (covering both 5.2 and 5.3+)? If
> so,
> once we remove ES 1.x and 2.x, it might actually be possible to
> completely
> replace the current `elasticsearch-base` module.
>
> The High level REST API was introduced in Elasticsearch 5.6 so it is not
> possible to cover 5.5 and below with it.
>
> If all the necessary APIs are already here (to be double checked) it
> should
> be able cover 5.6. What I noticed when working on the PRs is that 6.2
> REST
> Level High Level client API was improved to be closer to original APIs,
> if
> we want to support 5.6 with it we might have to rely on APIs they already
> improved since then. Not dramatic. But does it worth it knowing this
> would
> just be giving us 5.6 not 5.2,3,4 and 5?
>
> Now on moving forward I read:
>
> I'm definitely a +1 to try to move this forward with a proper fix.
>
>
> and
>
> Working around that would require introducing a new base module
>
> specifically for 5.3+ and 6.x+, which we've also agreed on the PR isn't a
> nice way to go.
>
> So if I read you correctly you are ok moving with a proper fix but it
> should not introduce a new (REST based) base module? Then to be honest
> I'm
> not sure how to proceed :) Any more specific feedback on the direction to
> follow would be great!
>
> Thanks,
> --
> Christophe
>
> On Sun, May 13, 2018 at 5:39 AM, Tzu-Li (Gordon) Tai <[hidden email]>
>
> wrote:
>
> Hi Christophe,
>
> Thanks for bringing this up.
>
> Yes, the main issue with the existing PRs and preventing it from moving
> forward is how it currently breaks initial assumptions of APIs in the
> `elasticsearch-base` module.
> Working around that would require introducing a new base module
> specifically for 5.3+ and 6.x+, which we've also agreed on the PR isn't a
> nice way to go.
>
> I had a quick stab at the REST API, and it seems to be promising,
> especially given that you mentioned that starting from next versions, the
> current API we use will be fully removed.
> I'm definitely a +1 to try to move this forward with a proper fix.
>
> Some other remarks / questions I have:
> - Maybe we can consider removing support for ES 1.x and 2.x starting from
> 1.6. Those are very old ES versions (considering that ES 6.x has already
> been out for a while). Do you think this would simply how our base module
> APIs are designed?
> - Wouldn't it be possible to have a REST implementation of the
> `ElasticsearchSinkCallBridge` for 5.x (covering both 5.2 and 5.3+)? If
> so,
> once we remove ES 1.x and 2.x, it might actually be possible to
> completely
> replace the current `elasticsearch-base` module.
>
> Cheers,
> Gordon
>
>
> On Sun, May 13, 2018 at 12:36 AM, Flavio Pompermaier <[hidden email]
>
>
>
> wrote:
>
> +1. Torally agree
>
> On Sat, 12 May 2018, 18:14 Christophe Jolif, <[hidden email]> wrote:
>
> Hi all,
>
> There is quite some time Flink Elasticsearch sink is broken for
> Elastisearch 5.x (nearly a year):
>
> https://issues.apache.org/jira/browse/FLINK-7386
>
> And there is no support for Elasticsearch 6.x:
>
> https://issues.apache.org/jira/browse/FLINK-8101
>
> However several PRs were issued:
>
> https://github.com/apache/flink/pull/4675
> https://github.com/apache/flink/pull/5374
>
> I also raised the issue on the mailing list in the 1.5 timeframe:
>
> http://apache-flink-mailing-list-archive.1008284.n3.
> nabble.com/DISCUSS-Releasing-Flink-1-5-0-td20867.html#a20905
>
> But things are still not really moving. However this seems something
>
> people
>
> are looking for, so I would really like the community to consider that
>
> for
>
> 1.6.
>
> The problems I see from comments on the PRs:
>
> - getting something that is following the Flink APIs initially created
>
> is a
>
> nightmare because Elastic is pretty good at breaking compatibility the
>
> hard
>
> way (see in particular in the last PR the cast that have to be made to
>
> get
>
> an API that works in all cases)
> - Elasticsearch is moving away from their "native" API Flink is using
>
> to
>
> the REST APIs so there is little common ground between pre 6 and post
>
> 6
>
> even if Elasticsearch tried to get some level of compatibility in the
>
> APIs.
>
>
> My fear is that by trying to kill two birds with one stone, we actually
>
> get
>
> nothing done.
>
> In the hope of moving that forward I would like to propose for 1.6 a
>
> new
>
> Elasticsearch 6.x+ sink that would follow the design of the previous
>
> ones
>
> BUT only leverage the new REST API and not inherit from existing
>
> classes.
>
> It would really be close to what is in my previous PR:
> https://github.com/apache/flink/pull/5374 but just focus on E6+/REST
>
> and
>
> so
> avoid any "strange" cast.
>
> This would not fill the gap of the 5.2+ not working but at least we
>
> would
>
> be back on track with something for the future as REST API is where
>
> Elastic
>
> is going.
>
> If people feel there is actual interest and chances this can be merged
>
> I'll
>
> be working on issuing a new PR around that.
>
> Alternative is to get back working on the existing PR but it seems to
>
> be
>
> a
>
> dead-end at the moment and not necessarily the best option long term as
> anyway Elasticsearch is looking into promoting the REST API.
>
> Please let me know what you think?
>
> --
> Christophe
>


Reply | Threaded
Open this post in threaded view
|

Re: Elasticsearch Sink

Christophe Jolif
Hi Gordon,

A first step: https://github.com/apache/flink/pull/6043

--
Christophe

On Wed, May 16, 2018 at 12:16 PM, Tzu-Li (Gordon) Tai <[hidden email]>
wrote:

> Good to know! Thanks a lot for pushing this Christophe.
>
> Please ping me when the new PR is opened, and we can continue the
> discussion there.
> Ideally we have this in early in the 1.6 release cycle, so that the
> Elasticsearch e2e tests (will be merging a PR for that soon) can catching
> anything unexpected.
>
> Cheers,
> Gordon
>
>
> On 16 May 2018 at 4:26:36 PM, Christophe Jolif ([hidden email]) wrote:
>
> Ok thanks for the feedback.
>
> > I agree. IIRC, the ES PRs that were opened also did this by changing
> the return type from Client to AutoClosable, as well as letting the call
> bridge also handle creation of BulkProcessors, correct?
>
> Correct.
>
> > Instead, we maintain our own request class to abstract those classes
> away, and only create the actual Elasticsearch requests internally.
>
> I'm personally unsure I would like to have to use Flink specific request
> classes instead of Elastic ones but apart from that I think we are pretty
> much inline.
>
> I'm not exactly sure when I'll get the cycles but I will try to issue yet
> another PR along those lines so we can continue the discussion from there
> and hopefully we would get something in time for 1.6!
>
> Thanks again,
> --
> Christophe
>
> On Wed, May 16, 2018 at 7:19 AM, Tzu-Li (Gordon) Tai <[hidden email]>
> wrote:
>
>> Hi,
>>
>> What if the user in a ES5.3+ case is calling the deprecated method? You
>> agree it will fail? I'm not necessarily against that. I just want to make
>> it clear that we don't have a perfect solution here either?
>>
>>
>> I think what we could do here is at least in the deprecated
>> `add(ActionRequest)` method, try casting to either `IndexRequest` or
>> `DeleteRequest` and forward calls to the new methods.
>> As a matter of fact, this is exactly what the Elasticsearch BulkProcessor
>> API is doing internally [1] [2] [3], so we should be safe here.
>> Looking at the code in [1] [2] [3], we should actually also consider it
>> being a `UpdateRequest` (just to correct my previous statement that it can
>> only be either index or delete).
>> But yes, I think there wouldn’t be a perfect solution here; something has
>> to be broken / deprecated in order for our implementation to be more future
>> proof.
>>
>> indeed we always considered that among
>> ActionRequest only delete and index were actually supported (which makes
>> sense to me).
>>
>>
>> Looking at the code in [1] [2] [3], we should actually also consider it
>> being a `UpdateRequest` (just to correct my previous statement that it can
>> only be either index or delete).
>> And yes, the Elasticsearch Javadocs of the BulkProcessor also clearly
>> state this.
>>
>> For the REST part, there is another incompatibility at the "Client" API
>> level. Indeed the RestClient is not implementing the "Client" interface
>> that is exposed in the bridge. So "just" doing the change above would
>> still
>> not allow to provide a REST based implementation?
>>
>>
>> I agree. IIRC, the ES PRs that were opened also did this by changing the
>> return type from Client to AutoClosable, as well as letting the call bridge
>> also handle creation of BulkProcessors, correct?
>> I think this is definitely the way to go for us to be more future proof.
>> The general rule of thumb is that, for all APIs (either APIs of the base
>> module that the ES connectors internally use, or user-facing APIs of the
>> connector), we should move towards not leaking Elasticsearch classes as the
>> API.
>> This goes for removing Client as the return type in the call bridge
>> interface. When re-working the RequestIndexer interface, we can even
>> consider not directly exposing the `IndexRequest`, `DeleteRequest`,
>> `UpdateRequest` classes as the API.
>> Instead, we maintain our own request class to abstract those classes
>> away, and only create the actual Elasticsearch requests internally.
>>
>> Cheers,
>> Gordon
>>
>> [1] https://github.com/elastic/elasticsearch/blob/v5.2.2/cor
>> e/src/main/java/org/elasticsearch/action/bulk/BulkRequest.java#L110
>> [2] https://github.com/elastic/elasticsearch/blob/v6.2.4/ser
>> ver/src/main/java/org/elasticsearch/action/bulk/BulkRequest.java#L128
>>
>> On 16 May 2018 at 4:12:15 AM, Christophe Jolif ([hidden email]) wrote:
>>
>> Hi Gordon,
>>
>> On Tue, May 15, 2018 at 6:16 AM, Tzu-Li (Gordon) Tai <[hidden email]
>> >
>> wrote:
>>
>> > Hi,
>> >
>> > Let me first clarify a few things so that we are on the same page here:
>> >
>> > 1. The reason that we are looking into having a new base-module for
>> > versions 5.3+ is that
>> > new Elasticsearch BulkProcessor APIs are breaking some of our original
>> > base API assumptions.
>> > This is noticeable from the intended cast here [1].
>> >
>> > 2. Given that we are looking into a new base module, Christophe proposed
>> > that we focus on designing
>> > the new base module around Elasticsearch’s new REST API, so that we are
>> > more-future proof.
>> >
>> > 3. I proposed to remove / deprecate support for earlier versions,
>> because
>> > once we introduce the new
>> > base module, we would be maintaining a lot of Elasticsearch connector
>> > modules (2 base modules, 4+ version specific).
>> > Of course, whether or not this really is a problem depends on how much
>> > capacity the community has to maintain them, as Steve mentioned.
>> >
>> >
>> > Now, moving on to another proposed solution that should work (at least
>> for
>> > now, depends on how Elasticsearch changes their API in the future):
>> > The main problem we’ve bumped into is that Elasticsearch changed their
>> > BulkProcessor API from accepting `add(ActionRequest)`s to
>> > `add(DocWriteRequest)`s.
>> >
>> > The reason why our elasticsearch-base-module used the `ActionRequest` in
>> > the first place was so that sinks could have full functionality to
>> perform
>> > both delete and write requests.
>> > We can actually just consider separating `RequestIndexer#add(ActionRequ
>> est
>> > …)` to `RequestIndexer#add(IndexRequest)` and `RequestIndexer#add(
>> > DeleteRequest)`.
>> > AFAIK, the IndexRequest class and DeleteRequest class has remained
>> stable
>> > across all ES versions (1.x., 2.x, 5.x, 6.x), so we should be safe with
>> > that.
>> >
>> > If I’m not mistaken, this should allow us to avoid introducing a new
>> > module (no need for a new base AND a 5.3+ module), still have equal
>> > functionality, and continue to use the existing base module across all
>> > versions.
>> > The only downside would be that we’ll need to deprecate / break
>> > `RequestIndexer#add(ActionRequest …)` in favor of the new separated
>> > methods.
>> >
>> > Whether or not we move on to use the HighLevelRestClient for 6.x, would
>> > then be a completely orthogonal consideration. We can simply use it in
>> 6.x
>> > as an implementation of the API call bridge.
>> >
>> > What do you think?
>> >
>>
>> What if the user in a ES5.3+ case is calling the deprecated method? You
>> agree it will fail? I'm not necessarily against that. I just want to make
>> it clear that we don't have a perfect solution here either? If we ignore
>> that it should be working fine if indeed we always considered that among
>> ActionRequest only delete and index were actually supported (which makes
>> sense to me).
>>
>> For the REST part, there is another incompatibility at the "Client" API
>> level. Indeed the RestClient is not implementing the "Client" interface
>> that is exposed in the bridge. So "just" doing the change above would
>> still
>> not allow to provide a REST based implementation?
>>
>> Best,
>> > Gordon
>> >
>> > [1] https://github.com/apache/flink/pull/4675/files#diff-
>> > 5539d0d57fa1ac17a36f08d1bb9a90b5R54
>> >
>> > On 15 May 2018 at 9:15:52 AM, Steve Blackmon ([hidden email])
>> wrote:
>> >
>> > It seems to me that if the transport client dependency is removed, the
>> > same
>> > module could perform inserts, updates, and deletes via the http bulk
>> API,
>> > and whatever version differences exist with that API could be handled
>> > inside the module without any difference to the classpath of the
>> > pipeline.
>> >
>> > If that's true there's no need or benefit to deprecating support for
>> > earlier elastic version so long as someone is willing to implement test
>> > and
>> > maintain them.
>> >
>> > Steve
>> >
>> > On 5/13/18 at 2:00 PM, Christophe wrote:
>> >
>> > Hi Gordon,
>> >
>> > Thanks for your feedback (and Flavio for your support!)
>> >
>> > About your remarks/questions:
>> >
>> > - Maybe we can consider removing support for ES 1.x and 2.x starting
>> from
>> >
>> > 1.6. Those are very old ES versions (considering that ES 6.x has already
>> > been out for a while). Do you think this would simply how our base
>> module
>> > APIs are designed?
>> >
>> > I would tend to say it should not change drastically the picture but
>> > would
>> > have to look into it.
>> >
>> > - Wouldn't it be possible to have a REST implementation of the
>> >
>> > `ElasticsearchSinkCallBridge` for 5.x (covering both 5.2 and 5.3+)? If
>> > so,
>> > once we remove ES 1.x and 2.x, it might actually be possible to
>> > completely
>> > replace the current `elasticsearch-base` module.
>> >
>> > The High level REST API was introduced in Elasticsearch 5.6 so it is not
>> > possible to cover 5.5 and below with it.
>> >
>> > If all the necessary APIs are already here (to be double checked) it
>> > should
>> > be able cover 5.6. What I noticed when working on the PRs is that 6.2
>> > REST
>> > Level High Level client API was improved to be closer to original APIs,
>> > if
>> > we want to support 5.6 with it we might have to rely on APIs they
>> already
>> > improved since then. Not dramatic. But does it worth it knowing this
>> > would
>> > just be giving us 5.6 not 5.2,3,4 and 5?
>> >
>> > Now on moving forward I read:
>> >
>> > I'm definitely a +1 to try to move this forward with a proper fix.
>> >
>> >
>> > and
>> >
>> > Working around that would require introducing a new base module
>> >
>> > specifically for 5.3+ and 6.x+, which we've also agreed on the PR isn't
>> a
>> > nice way to go.
>> >
>> > So if I read you correctly you are ok moving with a proper fix but it
>> > should not introduce a new (REST based) base module? Then to be honest
>> > I'm
>> > not sure how to proceed :) Any more specific feedback on the direction
>> to
>> > follow would be great!
>> >
>> > Thanks,
>> > --
>> > Christophe
>> >
>> > On Sun, May 13, 2018 at 5:39 AM, Tzu-Li (Gordon) Tai <
>> [hidden email]>
>> >
>> > wrote:
>> >
>> > Hi Christophe,
>> >
>> > Thanks for bringing this up.
>> >
>> > Yes, the main issue with the existing PRs and preventing it from moving
>> > forward is how it currently breaks initial assumptions of APIs in the
>> > `elasticsearch-base` module.
>> > Working around that would require introducing a new base module
>> > specifically for 5.3+ and 6.x+, which we've also agreed on the PR isn't
>> a
>> > nice way to go.
>> >
>> > I had a quick stab at the REST API, and it seems to be promising,
>> > especially given that you mentioned that starting from next versions,
>> the
>> > current API we use will be fully removed.
>> > I'm definitely a +1 to try to move this forward with a proper fix.
>> >
>> > Some other remarks / questions I have:
>> > - Maybe we can consider removing support for ES 1.x and 2.x starting
>> from
>> > 1.6. Those are very old ES versions (considering that ES 6.x has already
>> > been out for a while). Do you think this would simply how our base
>> module
>> > APIs are designed?
>> > - Wouldn't it be possible to have a REST implementation of the
>> > `ElasticsearchSinkCallBridge` for 5.x (covering both 5.2 and 5.3+)? If
>> > so,
>> > once we remove ES 1.x and 2.x, it might actually be possible to
>> > completely
>> > replace the current `elasticsearch-base` module.
>> >
>> > Cheers,
>> > Gordon
>> >
>> >
>> > On Sun, May 13, 2018 at 12:36 AM, Flavio Pompermaier <
>> [hidden email]
>> >
>> >
>> >
>> > wrote:
>> >
>> > +1. Torally agree
>> >
>> > On Sat, 12 May 2018, 18:14 Christophe Jolif, <[hidden email]> wrote:
>> >
>> > Hi all,
>> >
>> > There is quite some time Flink Elasticsearch sink is broken for
>> > Elastisearch 5.x (nearly a year):
>> >
>> > https://issues.apache.org/jira/browse/FLINK-7386
>> >
>> > And there is no support for Elasticsearch 6.x:
>> >
>> > https://issues.apache.org/jira/browse/FLINK-8101
>> >
>> > However several PRs were issued:
>> >
>> > https://github.com/apache/flink/pull/4675
>> > https://github.com/apache/flink/pull/5374
>> >
>> > I also raised the issue on the mailing list in the 1.5 timeframe:
>> >
>> > http://apache-flink-mailing-list-archive.1008284.n3.
>> > nabble.com/DISCUSS-Releasing-Flink-1-5-0-td20867.html#a20905
>> >
>> > But things are still not really moving. However this seems something
>> >
>> > people
>> >
>> > are looking for, so I would really like the community to consider that
>> >
>> > for
>> >
>> > 1.6.
>> >
>> > The problems I see from comments on the PRs:
>> >
>> > - getting something that is following the Flink APIs initially created
>> >
>> > is a
>> >
>> > nightmare because Elastic is pretty good at breaking compatibility the
>> >
>> > hard
>> >
>> > way (see in particular in the last PR the cast that have to be made to
>> >
>> > get
>> >
>> > an API that works in all cases)
>> > - Elasticsearch is moving away from their "native" API Flink is using
>> >
>> > to
>> >
>> > the REST APIs so there is little common ground between pre 6 and post
>> >
>> > 6
>> >
>> > even if Elasticsearch tried to get some level of compatibility in the
>> >
>> > APIs.
>> >
>> >
>> > My fear is that by trying to kill two birds with one stone, we actually
>> >
>> > get
>> >
>> > nothing done.
>> >
>> > In the hope of moving that forward I would like to propose for 1.6 a
>> >
>> > new
>> >
>> > Elasticsearch 6.x+ sink that would follow the design of the previous
>> >
>> > ones
>> >
>> > BUT only leverage the new REST API and not inherit from existing
>> >
>> > classes.
>> >
>> > It would really be close to what is in my previous PR:
>> > https://github.com/apache/flink/pull/5374 but just focus on E6+/REST
>> >
>> > and
>> >
>> > so
>> > avoid any "strange" cast.
>> >
>> > This would not fill the gap of the 5.2+ not working but at least we
>> >
>> > would
>> >
>> > be back on track with something for the future as REST API is where
>> >
>> > Elastic
>> >
>> > is going.
>> >
>> > If people feel there is actual interest and chances this can be merged
>> >
>> > I'll
>> >
>> > be working on issuing a new PR around that.
>> >
>> > Alternative is to get back working on the existing PR but it seems to
>> >
>> > be
>> >
>> > a
>> >
>> > dead-end at the moment and not necessarily the best option long term as
>> > anyway Elasticsearch is looking into promoting the REST API.
>> >
>> > Please let me know what you think?
>> >
>> > --
>> > Christophe
>> >
>>
>>
>>