(DEPRECATED) Apache Flink Mailing List archive.

[jira] [Comment Edited] (FLINK-838) GSoC Summer Project: Implement full Hadoop Compatibility Layer for Stratosphere

Classic

List

Threaded

1 message

Shang Yuanchun (Jira)

[jira] [Comment Edited] (FLINK-838) GSoC Summer Project: Implement full Hadoop Compatibility Layer for Stratosphere

[ https://issues.apache.org/jira/browse/FLINK-838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14032168#comment-14032168 ]

Artem Tsikiridis edited comment on FLINK-838 at 6/16/14 7:08 AM:
-----------------------------------------------------------------

Hello,

here is a report of the fourth week .

short

Worked more on the runtime environment of a hadoop job. (see point 2). Added support for custom partitioning and intermediate sorting (comparator, groupcomparator).
Prepared an environment for distributed testing.

1)

we are reaching the midterm evaluation of the program in 2 weeks time. As Robert suggested above it would be nice to merge the first version
of the abstraction layer. That would be the support for the following hadoop mapred interfaces: Mapper, Reducer, Combiner, A basic driver
(justing parsing the conf and starting a job), and the comparator-partitioner interfaces which I worked on this week.

I am currently trying to improve test coverage for this branch and will try it on the cluster today. So in a few days (mid of the week)
it will be virtually be ready to be code-reviewed to be merged. More, I would be happy to assist with testing 777 if iit is
needed.

2)

Then, as soon as 1 is being code-reviewed I will in parallel work on the advanced features of a Hadoop driver where I have some issues mostly because I need to access information from Flink's Nephele Cluster which is abstracted away to have a working RunningJob for Hadoop's JobClient. You see, I repeat myself a lot in the environment code. I was wondering if it is possible to refactor the environments (e.g. break submitJobandWait to submitJob and wait - generally have a wait ). This is the nature of the changes. However, I believe this discussion can be done after the midterm where a first version of the project is already merged.

was (Author: atsikiridis):
Hello,

here is a report of the fourth week .

short

Worked more on the runtime environment of a hadoop job. (see point 2). Added support for custom partitioning and intermediate sorting (comparator, groupcomparator).
Prepared an environment for distributed testing.

1)

we are reaching the midterm evaluation of the program in 2 weeks time. As Robert suggested above it would be nice to merge the first version
of the abstraction layer. That would be the support for the following hadoop mapred interfaces: Mapper, Reducer, Combiner, A basic driver
(justing parsing the conf and starting a job), and the comparator-partitioner interfaces which I worked on this week.

I am currently trying to improve test coverage for this branch and will try it on the cluster today. So in a few days (mid of the week)
it will be virtually be ready to be code-reviewed to be merged. More, I would be happy to assist with testing 777 if iit is
needed.

2)

Then, as soon as 1 is being code-reviewed I will in parallel work on the advanced features of a Hadoop driver

Where I have some issues mostly because I need to access information from Flink's Nephele Cluster which is abstracted away to have a working RunningJob for Hadoop's JobClient. You see, I repeat myself a lot in the environment code. Is it possible to refactor the environments (e.g. break submitJobandWait to submitJob and wait - generally have a wait ). This is the nature of the changes. However, I believe this discussion can be done after the midterm where a first version of the project is already merged.

> GSoC Summer Project: Implement full Hadoop Compatibility Layer for Stratosphere
> -------------------------------------------------------------------------------
>
> Key: FLINK-838
> URL: https://issues.apache.org/jira/browse/FLINK-838
> Project: Flink
> Issue Type: Improvement
> Reporter: GitHub Import
> Labels: github-import
> Fix For: pre-apache
>
>
> This is a meta issue for tracking @atsikiridis progress with implementing a full Hadoop Compatibliltiy Layer for Stratosphere.
> Some documentation can be found in the Wiki: https://github.com/stratosphere/stratosphere/wiki/%5BGSoC-14%5D-A-Hadoop-abstraction-layer-for-Stratosphere-(Project-Map-and-Notes)
> As well as the project proposal: https://github.com/stratosphere/stratosphere/wiki/GSoC-2014-Project-Proposal-Draft-by-Artem-Tsikiridis
> Most importantly, there is the following **schedule**:
> *19 May - 27 June (Midterm)*
> 1) Work on the Hadoop tasks, their Context and the mapping of Hadoop's Configuration to the one of Stratosphere. By successfully bridging the Hadoop tasks with Stratosphere, we already cover the most basic Hadoop Jobs. This can be determined by running some popular Hadoop examples on Stratosphere (e.g. WordCount, k-means, join) (4 - 5 weeks)
> 2) Understand how the running of these jobs works (e.g. command line interface) for the wrapper. Implement how will the user run them. (1 - 2 weeks).
> *27 June - 11 August*
> 1) Continue wrapping more "advanced" Hadoop Interfaces (Comparators, Partitioners, Distributed Cache etc.) There are quite a few interfaces and it will be a challenge to support all of them. (5 full weeks)
> 2) Profiling of the application and optimizations (if applicable)
> *11 August - 18 August*
> Write documentation on code, write a README with care and add more unit-tests. (1 week)
> ---------------- Imported from GitHub ----------------
> Url: https://github.com/stratosphere/stratosphere/issues/838
> Created by: [rmetzger|https://github.com/rmetzger]
> Labels: core, enhancement, parent-for-major-feature,
> Milestone: Release 0.7 (unplanned)
> Created at: Tue May 20 10:11:34 CEST 2014
> State: open

--
This message was sent by Atlassian JIRA
(v6.2#6252)