Adding the streaming project to the main repository

classic Classic list List threaded Threaded
43 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Adding the streaming project to the main repository

Gyula Fóra
Hey,

The package names in the streaming code have now be renamed to the proper
flink package names and it uses the latest flink snapshot as its
dependencies.

Also, support for iterative streaming jobs have been added to the API and
we have also added support for directed emits to match the functionality
provided by other streaming frameworks. These new features are still under
development and testing.

Regards,
Gyula & Marton


On Mon, Jul 7, 2014 at 2:33 PM, Gyula Fóra <[hidden email]> wrote:

> The utilites that we used for performance measurements have no direct
> connections to this project. We thought it would make sense to move them
> out into a separate repo since we are constantly modifying the settings for
> the actual tests.
>
>
> On Mon, Jul 7, 2014 at 2:30 PM, Ufuk Celebi <[hidden email]> wrote:
>
>>
>> On 07 Jul 2014, at 12:06, Márton Balassi <[hidden email]>
>> wrote:
>>
>> > Yeah, this might be slightly confusing - for clarifying the situation:
>> >
>> >
>> >   - Right under the streaming-addons one can find basic connectors for
>> >   message queue services - at the moment Kafka and RabbitMQ. We
>> considered
>> >   this "classical" addon functionality.
>> >   - Additionally the job used for performance measurements is also under
>> >   addons, but I'm removing it.
>> >   - As usually addons mean surplus dependencies I am for separating
>> them.
>> >   Would you suggest another name then? Streaming-connectors e.g.?
>>
>> I also like connectors.
>>
>> Why are you removing the performance measurements stuff?
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Adding the streaming project to the main repository

Stephan Ewen
Very nice, thanks!

I'll try and merge the current state under "flink-addons/flink-streaming"
today.

Stephan



On Fri, Jul 11, 2014 at 7:15 PM, Gyula Fóra <[hidden email]> wrote:

> Hey,
>
> The package names in the streaming code have now be renamed to the proper
> flink package names and it uses the latest flink snapshot as its
> dependencies.
>
> Also, support for iterative streaming jobs have been added to the API and
> we have also added support for directed emits to match the functionality
> provided by other streaming frameworks. These new features are still under
> development and testing.
>
> Regards,
> Gyula & Marton
>
>
> On Mon, Jul 7, 2014 at 2:33 PM, Gyula Fóra <[hidden email]> wrote:
>
> > The utilites that we used for performance measurements have no direct
> > connections to this project. We thought it would make sense to move them
> > out into a separate repo since we are constantly modifying the settings
> for
> > the actual tests.
> >
> >
> > On Mon, Jul 7, 2014 at 2:30 PM, Ufuk Celebi <[hidden email]>
> wrote:
> >
> >>
> >> On 07 Jul 2014, at 12:06, Márton Balassi <[hidden email]>
> >> wrote:
> >>
> >> > Yeah, this might be slightly confusing - for clarifying the situation:
> >> >
> >> >
> >> >   - Right under the streaming-addons one can find basic connectors for
> >> >   message queue services - at the moment Kafka and RabbitMQ. We
> >> considered
> >> >   this "classical" addon functionality.
> >> >   - Additionally the job used for performance measurements is also
> under
> >> >   addons, but I'm removing it.
> >> >   - As usually addons mean surplus dependencies I am for separating
> >> them.
> >> >   Would you suggest another name then? Streaming-connectors e.g.?
> >>
> >> I also like connectors.
> >>
> >> Why are you removing the performance measurements stuff?
> >
> >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Adding the streaming project to the main repository

Stephan Ewen
Hi folks!

I have made a version that added the code to the flink repository.

The thing is: all code is attributed to me (as the one who added the files).

If you do not mind, I can commit it like that. If you want to code to be
attributed to you, you need to make a pull request that puts the contents
of your "stratosphere-streaming" project under
"flink-addons/flink-streaming".

Let me know what your opinion on this is.

Stephan
Reply | Threaded
Open this post in threaded view
|

Re: Adding the streaming project to the main repository

Robert Metzger
I think it is also possible to merge the streaming project keeping its
history: http://git-scm.com/book/en/Git-Tools-Subtree-Merging.
I saw this recently in Optiq's JIRA. They are doing something like:

git subtree add --prefix=example-csv
https://github.com/julianhyde/optiq-csv.git master




On Sun, Jul 13, 2014 at 3:30 PM, Stephan Ewen <[hidden email]> wrote:

> Hi folks!
>
> I have made a version that added the code to the flink repository.
>
> The thing is: all code is attributed to me (as the one who added the
> files).
>
> If you do not mind, I can commit it like that. If you want to code to be
> attributed to you, you need to make a pull request that puts the contents
> of your "stratosphere-streaming" project under
> "flink-addons/flink-streaming".
>
> Let me know what your opinion on this is.
>
> Stephan
>
Reply | Threaded
Open this post in threaded view
|

Re: Adding the streaming project to the main repository

Márton Balassi
Thanks, Stefan & Robert.

We'd definitely vote for merging with history as we've invested 4 months of
work to reach the current stage. It is also benefitial for Flink as the
merge will add 7 contributors to the project then.


On Sun, Jul 13, 2014 at 3:38 PM, Robert Metzger <[hidden email]> wrote:

> I think it is also possible to merge the streaming project keeping its
> history: http://git-scm.com/book/en/Git-Tools-Subtree-Merging.
> I saw this recently in Optiq's JIRA. They are doing something like:
>
> git subtree add --prefix=example-csv
> https://github.com/julianhyde/optiq-csv.git master
>
>
>
>
> On Sun, Jul 13, 2014 at 3:30 PM, Stephan Ewen <[hidden email]> wrote:
>
> > Hi folks!
> >
> > I have made a version that added the code to the flink repository.
> >
> > The thing is: all code is attributed to me (as the one who added the
> > files).
> >
> > If you do not mind, I can commit it like that. If you want to code to be
> > attributed to you, you need to make a pull request that puts the contents
> > of your "stratosphere-streaming" project under
> > "flink-addons/flink-streaming".
> >
> > Let me know what your opinion on this is.
> >
> > Stephan
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Adding the streaming project to the main repository

Stephan Ewen
Okay. How do we do this, because it is cross-repository merge? I'll look
into Robert's referene...
Reply | Threaded
Open this post in threaded view
|

Re: Adding the streaming project to the main repository

Stephan Ewen
Okay, subtree merge looks promising:
http://stackoverflow.com/questions/1425892/how-do-you-merge-two-git-repositories

I'll give it a try...
Reply | Threaded
Open this post in threaded view
|

Re: Adding the streaming project to the main repository

Márton Balassi
In reply to this post by Stephan Ewen
Let us know if we can assist the merge in any way.


On Sun, Jul 13, 2014 at 3:50 PM, Stephan Ewen <[hidden email]> wrote:

> Okay. How do we do this, because it is cross-repository merge? I'll look
> into Robert's referene...
>
Reply | Threaded
Open this post in threaded view
|

Re: Adding the streaming project to the main repository

Stephan Ewen
Okay, here is a try:
https://github.com/StephanEwen/incubator-flink/tree/streaming/flink-addons/flink-streaming

It attributes all files to my commit, but it preseves all authors in git
blame. It is a bit strange, the history is broken, but some author
information is preserved.

Not ideal. Hope we can do better...



On Sun, Jul 13, 2014 at 3:52 PM, Márton Balassi <[hidden email]>
wrote:

> Let us know if we can assist the merge in any way.
>
>
> On Sun, Jul 13, 2014 at 3:50 PM, Stephan Ewen <[hidden email]> wrote:
>
> > Okay. How do we do this, because it is cross-repository merge? I'll look
> > into Robert's referene...
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Adding the streaming project to the main repository

Robert Metzger
Lets see if the variant with rewriting the history using git filter-branch
works better.


One other thing regarding the merge:
I'm not sure if we have to do any legal checks prior to merging the changes
into our project. Maybe we even need a SGA or CCLA if the code has been
written as part of an employment.
We certainly need to check the dependencies for incompatible licenses.



On Sun, Jul 13, 2014 at 4:16 PM, Stephan Ewen <[hidden email]> wrote:

> Okay, here is a try:
>
> https://github.com/StephanEwen/incubator-flink/tree/streaming/flink-addons/flink-streaming
>
> It attributes all files to my commit, but it preseves all authors in git
> blame. It is a bit strange, the history is broken, but some author
> information is preserved.
>
> Not ideal. Hope we can do better...
>
>
>
> On Sun, Jul 13, 2014 at 3:52 PM, Márton Balassi <[hidden email]>
> wrote:
>
> > Let us know if we can assist the merge in any way.
> >
> >
> > On Sun, Jul 13, 2014 at 3:50 PM, Stephan Ewen <[hidden email]> wrote:
> >
> > > Okay. How do we do this, because it is cross-repository merge? I'll
> look
> > > into Robert's referene...
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Adding the streaming project to the main repository

Stephan Ewen
Good point! I will ping Marton and Gyula for that.


On Sun, Jul 13, 2014 at 4:22 PM, Robert Metzger <[hidden email]> wrote:

> Lets see if the variant with rewriting the history using git filter-branch
> works better.
>
>
> One other thing regarding the merge:
> I'm not sure if we have to do any legal checks prior to merging the changes
> into our project. Maybe we even need a SGA or CCLA if the code has been
> written as part of an employment.
> We certainly need to check the dependencies for incompatible licenses.
>
>
>
> On Sun, Jul 13, 2014 at 4:16 PM, Stephan Ewen <[hidden email]> wrote:
>
> > Okay, here is a try:
> >
> >
> https://github.com/StephanEwen/incubator-flink/tree/streaming/flink-addons/flink-streaming
> >
> > It attributes all files to my commit, but it preseves all authors in git
> > blame. It is a bit strange, the history is broken, but some author
> > information is preserved.
> >
> > Not ideal. Hope we can do better...
> >
> >
> >
> > On Sun, Jul 13, 2014 at 3:52 PM, Márton Balassi <
> [hidden email]>
> > wrote:
> >
> > > Let us know if we can assist the merge in any way.
> > >
> > >
> > > On Sun, Jul 13, 2014 at 3:50 PM, Stephan Ewen <[hidden email]>
> wrote:
> > >
> > > > Okay. How do we do this, because it is cross-repository merge? I'll
> > look
> > > > into Robert's referene...
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Adding the streaming project to the main repository

Stephan Ewen
Hi everyone!

I have found a way to add the code into the main repository in a different
branch, preserving all history.
All code is rewritten (with history) to be in
"flink-addons/flink-streaming" and the commits are prefixed with
[streaming].
https://github.com/StephanEwen/incubator-flink/commits/streaming

What we can now do is rebase the branch on top of master and then just add
the commits.

For that, I would like to ask for your help:

The commit history is a bit messy, to be honest. A lot of stuff is in
multiple commits with identical messages. Some commits are called
"whatever". Can you clean up the commit history a bit by doing a "git
rebase -i 3b88e30924268799c96317fe1bf9f5b9c6bf6f80" and squash some commits.

I think then we are good to do a merge.

Greetings,
Stephan
Reply | Threaded
Open this post in threaded view
|

Re: Adding the streaming project to the main repository

Márton Balassi
Thanks for the effort. Sorry for the mess, I'll clean it up as soon as
possible.

Cheers,

Marton


On Sun, Jul 13, 2014 at 5:25 PM, Stephan Ewen <[hidden email]> wrote:

> Hi everyone!
>
> I have found a way to add the code into the main repository in a different
> branch, preserving all history.
> All code is rewritten (with history) to be in
> "flink-addons/flink-streaming" and the commits are prefixed with
> [streaming].
> https://github.com/StephanEwen/incubator-flink/commits/streaming
>
> What we can now do is rebase the branch on top of master and then just add
> the commits.
>
> For that, I would like to ask for your help:
>
> The commit history is a bit messy, to be honest. A lot of stuff is in
> multiple commits with identical messages. Some commits are called
> "whatever". Can you clean up the commit history a bit by doing a "git
> rebase -i 3b88e30924268799c96317fe1bf9f5b9c6bf6f80" and squash some commits.
>
> I think then we are good to do a merge.
>
> Greetings,
> Stephan
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Adding the streaming project to the main repository

Robert Metzger
Regarding the dependencies, I found that they require "jblas", with this
license: https://github.com/mikiobraun/jblas/blob/master/COPYING
It seems to be a BSD license, which is compatible with ASF projects [1].

The connectors package depends on RabbitMQ, which is MPL Licensed:
http://www.rabbitmq.com/mpl.html.
We are able to include this into our dependencies, if we label them
appropriately (I guess that means adding a notice to the NOTICE file).

It also contains ZeroMQ, which is LGPL licensed and we cannot include!

Apache Kafka is not a problem ;)


Another issue that I found is that the streaming project does currently not
have our checkstyle rules enforced (I saw star-imports). It would be cool
if you could fix that as well.

[1] http://www.apache.org/legal/resolved.html



On Sun, Jul 13, 2014 at 5:41 PM, Márton Balassi <[hidden email]>
wrote:

> Thanks for the effort. Sorry for the mess, I'll clean it up as soon as
> possible.
>
> Cheers,
>
> Marton
>
>
> On Sun, Jul 13, 2014 at 5:25 PM, Stephan Ewen <[hidden email]> wrote:
>
> > Hi everyone!
> >
> > I have found a way to add the code into the main repository in a
> different
> > branch, preserving all history.
> > All code is rewritten (with history) to be in
> > "flink-addons/flink-streaming" and the commits are prefixed with
> > [streaming].
> > https://github.com/StephanEwen/incubator-flink/commits/streaming
> >
> > What we can now do is rebase the branch on top of master and then just
> add
> > the commits.
> >
> > For that, I would like to ask for your help:
> >
> > The commit history is a bit messy, to be honest. A lot of stuff is in
> > multiple commits with identical messages. Some commits are called
> > "whatever". Can you clean up the commit history a bit by doing a "git
> > rebase -i 3b88e30924268799c96317fe1bf9f5b9c6bf6f80" and squash some
> commits.
> >
> > I think then we are good to do a merge.
> >
> > Greetings,
> > Stephan
> >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Adding the streaming project to the main repository

Henry Saputra
Thanks for the update Robert. This needs some review so let's wait merging
to master or any branch

On Sunday, July 13, 2014, Robert Metzger <[hidden email]> wrote:

> Regarding the dependencies, I found that they require "jblas", with this
> license: https://github.com/mikiobraun/jblas/blob/master/COPYING
> It seems to be a BSD license, which is compatible with ASF projects [1].
>
> The connectors package depends on RabbitMQ, which is MPL Licensed:
> http://www.rabbitmq.com/mpl.html.
> We are able to include this into our dependencies, if we label them
> appropriately (I guess that means adding a notice to the NOTICE file).
>
> It also contains ZeroMQ, which is LGPL licensed and we cannot include!
>
> Apache Kafka is not a problem ;)
>
>
> Another issue that I found is that the streaming project does currently not
> have our checkstyle rules enforced (I saw star-imports). It would be cool
> if you could fix that as well.
>
> [1] http://www.apache.org/legal/resolved.html
>
>
>
> On Sun, Jul 13, 2014 at 5:41 PM, Márton Balassi <[hidden email]
> <javascript:;>>
> wrote:
>
> > Thanks for the effort. Sorry for the mess, I'll clean it up as soon as
> > possible.
> >
> > Cheers,
> >
> > Marton
> >
> >
> > On Sun, Jul 13, 2014 at 5:25 PM, Stephan Ewen <[hidden email]
> <javascript:;>> wrote:
> >
> > > Hi everyone!
> > >
> > > I have found a way to add the code into the main repository in a
> > different
> > > branch, preserving all history.
> > > All code is rewritten (with history) to be in
> > > "flink-addons/flink-streaming" and the commits are prefixed with
> > > [streaming].
> > > https://github.com/StephanEwen/incubator-flink/commits/streaming
> > >
> > > What we can now do is rebase the branch on top of master and then just
> > add
> > > the commits.
> > >
> > > For that, I would like to ask for your help:
> > >
> > > The commit history is a bit messy, to be honest. A lot of stuff is in
> > > multiple commits with identical messages. Some commits are called
> > > "whatever". Can you clean up the commit history a bit by doing a "git
> > > rebase -i 3b88e30924268799c96317fe1bf9f5b9c6bf6f80" and squash some
> > commits.
> > >
> > > I think then we are good to do a merge.
> > >
> > > Greetings,
> > > Stephan
> > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Adding the streaming project to the main repository

Gyula Fóra
In reply to this post by Stephan Ewen
Hey,
As you have said, our commit history is indeed a little messy in some
places especially regarding some duplicate commits.

We tried what you suggested to rebase it with git rebase -i , but our
problem is that because -i ignores the merge commits, squashing and editing
names make pretty much all the commits conflict. Even if we just run a git
rebase -i and than do nothing it still gives merge conflicts (because the
merges are omitted).

I have read somewhere that I should try -i -p but typing  "git rebase -i -p
3b88e30924268799c96317fe1bf9f5b9c6bf6f80" only shows the last 2 commits.

Do you have any suggestions on this because trying to remerge all the 600
commits seems a little desperate?

Regards,
Gyula


On Sun, Jul 13, 2014 at 5:25 PM, Stephan Ewen <[hidden email]> wrote:

> Hi everyone!
>
> I have found a way to add the code into the main repository in a different
> branch, preserving all history.
> All code is rewritten (with history) to be in
> "flink-addons/flink-streaming" and the commits are prefixed with
> [streaming].
> https://github.com/StephanEwen/incubator-flink/commits/streaming
>
> What we can now do is rebase the branch on top of master and then just add
> the commits.
>
> For that, I would like to ask for your help:
>
> The commit history is a bit messy, to be honest. A lot of stuff is in
> multiple commits with identical messages. Some commits are called
> "whatever". Can you clean up the commit history a bit by doing a "git
> rebase -i 3b88e30924268799c96317fe1bf9f5b9c6bf6f80" and squash some commits.
>
> I think then we are good to do a merge.
>
> Greetings,
> Stephan
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Adding the streaming project to the main repository

Gyula Fóra
So what we have figured out so far is that git rebasing "straightens" out
the history, so all the merges will be omitted and they need to be merged
again. Doing this with our 540 regular and 120 merge commits seems a little
overkill. In the light of this adding the streaming files as new files to
the project seems a much better option now.

So what we could do is to not completely lose our history is that we could
add the files so at least that 1 big commit would be attributed to us.

Or any other idea would be greatly appreciated :)

Regards,
Gyula


On Mon, Jul 14, 2014 at 11:40 AM, Gyula Fóra <[hidden email]> wrote:

> Hey,
> As you have said, our commit history is indeed a little messy in some
> places especially regarding some duplicate commits.
>
> We tried what you suggested to rebase it with git rebase -i , but our
> problem is that because -i ignores the merge commits, squashing and editing
> names make pretty much all the commits conflict. Even if we just run a git
> rebase -i and than do nothing it still gives merge conflicts (because the
> merges are omitted).
>
> I have read somewhere that I should try -i -p but typing  "git rebase -i
> -p 3b88e30924268799c96317fe1bf9f5b9c6bf6f80" only shows the last 2 commits.
>
> Do you have any suggestions on this because trying to remerge all the 600
> commits seems a little desperate?
>
> Regards,
> Gyula
>
>
> On Sun, Jul 13, 2014 at 5:25 PM, Stephan Ewen <[hidden email]> wrote:
>
>> Hi everyone!
>>
>> I have found a way to add the code into the main repository in a
>> different branch, preserving all history.
>> All code is rewritten (with history) to be in
>> "flink-addons/flink-streaming" and the commits are prefixed with
>> [streaming].
>> https://github.com/StephanEwen/incubator-flink/commits/streaming
>>
>> What we can now do is rebase the branch on top of master and then just
>> add the commits.
>>
>> For that, I would like to ask for your help:
>>
>> The commit history is a bit messy, to be honest. A lot of stuff is in
>> multiple commits with identical messages. Some commits are called
>> "whatever". Can you clean up the commit history a bit by doing a "git
>> rebase -i 3b88e30924268799c96317fe1bf9f5b9c6bf6f80" and squash some commits.
>>
>> I think then we are good to do a merge.
>>
>> Greetings,
>> Stephan
>>
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: Adding the streaming project to the main repository

Stephan Ewen
Ho guys!

I made a scripted manual rebase of each commit (basically add the commit
not via its diff, but such that it reflects the code base after the commit)

https://github.com/StephanEwen/incubator-flink/commits/streamrebase

No more merge commits that mess things up. You should be able to squash
things easily via "git rebase -i 3002258f8a22a8adbdb230e57c972ad17910debf"

The commit diffs may be a bit different than before (not too much if I did
things correctly), but can you have a quick look at the commits to see
whether they make sense?

Stephan


BTW: I used this way to do it:

Have two repositories (clones)
  - /data/repositories/flink
  - /data/repositories/flinkbak

The do the following for every non-merge commit:
 - Check out the state after a commit in the backup (detached head)
 - Remove current streaming directory (physically and from the index)
 - Add it again (files and index), with the state of the cloned repo
 - Commit (git recreates the diffs in a way that they reflect the original
commit plus any merges)

---------------------

#!/bin/bash

for line in $(cat commits)
do
  cd /data/repositories/flinkbak
  author=`git --no-pager show -s --format='%an <%ae>' $line`
  message=`git --no-pager show -s --format='%s%n' $line`

  echo "picking commit $line from author $author"

  git checkout $line
  cd /data/repositories/flink
  rm -rf "/data/repositories/flink/flink-addons/flink-streaming"
  git rm -r "/data/repositories/flink/flink-addons/flink-streaming"
  cp -r "/data/repositories/flinkbak/flink-addons/flink-streaming"
"/data/repositories/flink/flink-addons/flink-streaming"
  git add /data/repositories/flink/flink-addons/flink-streaming
  git commit --author "$author" --m "$message"

#  read -rsp $'Press any key to continue...\n' -n1 key
done





On Mon, Jul 14, 2014 at 1:10 PM, Gyula Fóra <[hidden email]> wrote:

> By the way, I forked your repo switch to the streaming branch and then I
> executed the commands (I think this is how it should have been done)
>
>
> On Mon, Jul 14, 2014 at 1:09 PM, Gyula Fóra <[hidden email]> wrote:
>
>> This is what I get with "rebase -i -p master":
>>
>> pick 9456624 Merge branch 'master' of file:///data/repositories/streamin
>> into streaming
>> pick 89299b8 [streaming] Post-merge cleanups
>>
>> #Rebase 1fd457d..89299b8 onto 1fd457d
>> #......
>>
>>
>> On Mon, Jul 14, 2014 at 12:47 PM, Stephan Ewen <[hidden email]> wrote:
>>
>>> Can you do "rebase -i -p master". That should include all commits and
>>> might save you the meeting hell.
>>>
>>
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: Adding the streaming project to the main repository

Stephan Ewen
Before adding this contribution to the project, there are some legal things
to do:

 - Obtain ICLAs from all major contributors. There are 7 in the streaming
code, out of which three did the largest portion of the work: Márton
Balassi, Gyula Fóra, Hermann Gábor
 - @mentors: Should the other 4 also sign and send ICLAs?

 - Licenses: Walk through the code, collect all dependencies and make sure
they are ASL compatible.Here are some links with information:
    - http://www.apache.org/legal/resolved.html
    - http://www.apache.org/foundation/license-faq.html#WhatDoesItMEAN

 - All used licenses must be mentioned in the LICENSE files
   - under ./LICENSE
   - under ./flink-dist/src/main/flink-bin/LICENSE

 - Check headers for ASF compliance.


This looks manageable. Anything I forgot?

Greetings,
Stephan




On Mon, Jul 14, 2014 at 4:43 PM, Stephan Ewen <[hidden email]> wrote:

> Ho guys!
>
> I made a scripted manual rebase of each commit (basically add the commit
> not via its diff, but such that it reflects the code base after the commit)
>
> https://github.com/StephanEwen/incubator-flink/commits/streamrebase
>
> No more merge commits that mess things up. You should be able to squash
> things easily via "git rebase -i 3002258f8a22a8adbdb230e57c972ad17910debf"
>
> The commit diffs may be a bit different than before (not too much if I did
> things correctly), but can you have a quick look at the commits to see
> whether they make sense?
>
> Stephan
>
>
> BTW: I used this way to do it:
>
> Have two repositories (clones)
>   - /data/repositories/flink
>   - /data/repositories/flinkbak
>
> The do the following for every non-merge commit:
>  - Check out the state after a commit in the backup (detached head)
>  - Remove current streaming directory (physically and from the index)
>  - Add it again (files and index), with the state of the cloned repo
>  - Commit (git recreates the diffs in a way that they reflect the original
> commit plus any merges)
>
> ---------------------
>
> #!/bin/bash
>
> for line in $(cat commits)
> do
>   cd /data/repositories/flinkbak
>   author=`git --no-pager show -s --format='%an <%ae>' $line`
>   message=`git --no-pager show -s --format='%s%n' $line`
>
>   echo "picking commit $line from author $author"
>
>   git checkout $line
>   cd /data/repositories/flink
>   rm -rf "/data/repositories/flink/flink-addons/flink-streaming"
>   git rm -r "/data/repositories/flink/flink-addons/flink-streaming"
>   cp -r "/data/repositories/flinkbak/flink-addons/flink-streaming"
> "/data/repositories/flink/flink-addons/flink-streaming"
>   git add /data/repositories/flink/flink-addons/flink-streaming
>   git commit --author "$author" --m "$message"
>
> #  read -rsp $'Press any key to continue...\n' -n1 key
> done
>
>
>
>
>
> On Mon, Jul 14, 2014 at 1:10 PM, Gyula Fóra <[hidden email]> wrote:
>
>> By the way, I forked your repo switch to the streaming branch and then I
>> executed the commands (I think this is how it should have been done)
>>
>>
>> On Mon, Jul 14, 2014 at 1:09 PM, Gyula Fóra <[hidden email]> wrote:
>>
>>> This is what I get with "rebase -i -p master":
>>>
>>> pick 9456624 Merge branch 'master' of file:///data/repositories/streamin
>>> into streaming
>>> pick 89299b8 [streaming] Post-merge cleanups
>>>
>>> #Rebase 1fd457d..89299b8 onto 1fd457d
>>> #......
>>>
>>>
>>> On Mon, Jul 14, 2014 at 12:47 PM, Stephan Ewen <[hidden email]> wrote:
>>>
>>>> Can you do "rebase -i -p master". That should include all commits and
>>>> might save you the meeting hell.
>>>>
>>>
>>>
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: Adding the streaming project to the main repository

Henry Saputra
@Stephan, yes unfortunately all the individuals who have contributed
code need to send his/her ICLAs.

Once we resolved the open issues then we ready to merge =)

- Henry

On Mon, Jul 14, 2014 at 7:58 AM, Stephan Ewen <[hidden email]> wrote:

> Before adding this contribution to the project, there are some legal things
> to do:
>
>  - Obtain ICLAs from all major contributors. There are 7 in the streaming
> code, out of which three did the largest portion of the work: Márton
> Balassi, Gyula Fóra, Hermann Gábor
>  - @mentors: Should the other 4 also sign and send ICLAs?
>
>  - Licenses: Walk through the code, collect all dependencies and make sure
> they are ASL compatible.Here are some links with information:
>     - http://www.apache.org/legal/resolved.html
>     - http://www.apache.org/foundation/license-faq.html#WhatDoesItMEAN
>
>  - All used licenses must be mentioned in the LICENSE files
>    - under ./LICENSE
>    - under ./flink-dist/src/main/flink-bin/LICENSE
>
>  - Check headers for ASF compliance.
>
>
> This looks manageable. Anything I forgot?
>
> Greetings,
> Stephan
>
>
>
>
> On Mon, Jul 14, 2014 at 4:43 PM, Stephan Ewen <[hidden email]> wrote:
>
>> Ho guys!
>>
>> I made a scripted manual rebase of each commit (basically add the commit
>> not via its diff, but such that it reflects the code base after the commit)
>>
>> https://github.com/StephanEwen/incubator-flink/commits/streamrebase
>>
>> No more merge commits that mess things up. You should be able to squash
>> things easily via "git rebase -i 3002258f8a22a8adbdb230e57c972ad17910debf"
>>
>> The commit diffs may be a bit different than before (not too much if I did
>> things correctly), but can you have a quick look at the commits to see
>> whether they make sense?
>>
>> Stephan
>>
>>
>> BTW: I used this way to do it:
>>
>> Have two repositories (clones)
>>   - /data/repositories/flink
>>   - /data/repositories/flinkbak
>>
>> The do the following for every non-merge commit:
>>  - Check out the state after a commit in the backup (detached head)
>>  - Remove current streaming directory (physically and from the index)
>>  - Add it again (files and index), with the state of the cloned repo
>>  - Commit (git recreates the diffs in a way that they reflect the original
>> commit plus any merges)
>>
>> ---------------------
>>
>> #!/bin/bash
>>
>> for line in $(cat commits)
>> do
>>   cd /data/repositories/flinkbak
>>   author=`git --no-pager show -s --format='%an <%ae>' $line`
>>   message=`git --no-pager show -s --format='%s%n' $line`
>>
>>   echo "picking commit $line from author $author"
>>
>>   git checkout $line
>>   cd /data/repositories/flink
>>   rm -rf "/data/repositories/flink/flink-addons/flink-streaming"
>>   git rm -r "/data/repositories/flink/flink-addons/flink-streaming"
>>   cp -r "/data/repositories/flinkbak/flink-addons/flink-streaming"
>> "/data/repositories/flink/flink-addons/flink-streaming"
>>   git add /data/repositories/flink/flink-addons/flink-streaming
>>   git commit --author "$author" --m "$message"
>>
>> #  read -rsp $'Press any key to continue...\n' -n1 key
>> done
>>
>>
>>
>>
>>
>> On Mon, Jul 14, 2014 at 1:10 PM, Gyula Fóra <[hidden email]> wrote:
>>
>>> By the way, I forked your repo switch to the streaming branch and then I
>>> executed the commands (I think this is how it should have been done)
>>>
>>>
>>> On Mon, Jul 14, 2014 at 1:09 PM, Gyula Fóra <[hidden email]> wrote:
>>>
>>>> This is what I get with "rebase -i -p master":
>>>>
>>>> pick 9456624 Merge branch 'master' of file:///data/repositories/streamin
>>>> into streaming
>>>> pick 89299b8 [streaming] Post-merge cleanups
>>>>
>>>> #Rebase 1fd457d..89299b8 onto 1fd457d
>>>> #......
>>>>
>>>>
>>>> On Mon, Jul 14, 2014 at 12:47 PM, Stephan Ewen <[hidden email]> wrote:
>>>>
>>>>> Can you do "rebase -i -p master". That should include all commits and
>>>>> might save you the meeting hell.
>>>>>
>>>>
>>>>
>>>
>>
123