[jira] [Created] (FLINK-16355) Inconsistent library versions notice.

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (FLINK-16355) Inconsistent library versions notice.

Shang Yuanchun (Jira)
Kaifeng Huang created FLINK-16355:
-------------------------------------

             Summary: Inconsistent library versions notice.
                 Key: FLINK-16355
                 URL: https://issues.apache.org/jira/browse/FLINK-16355
             Project: Flink
          Issue Type: Improvement
            Reporter: Kaifeng Huang
         Attachments: apache flink.pdf

Hi. I have implemented a tool to detect library version inconsistencies. Your project have 9 inconsistent libraries and 9 false consistent libraries.

Take org.apache.hadoop:hadoop-common for example, this library is declared as version 2.4.1 in flink-yarn-tests, 3.1.0 in flink-filesystems/flink-s3-fs-base, 2.7.5 in flink-table/flink-sql-client and etc... Such version inconsistencies may cause unnecessary maintenance effort in the long run. For example, if two modules become inter-dependent, library version conflict may happen. It has already become a common issue and hinders development progress. Thus a version harmonization is necessary.

Provided we applied a version harmonization, I calculated the cost it may have to harmonize to all upper versions including an up-to-date one. The cost refers to POM config changes and API invocation changes. Take org.apache.hadoop:hadoop-common for example, if we harmonize all the library versions into 3.1.3. The concern is, how much should the project code adapt to the newer library version. We list an effort table to quantify the harmonization cost.

The effort table is listed below. It shows the overall harmonization effort by modules. The columns represents the number of library APIs and API calls(NA,NAC), deleted APIs and API calls(NDA,NDAC) as well as modified API and API calls(NMA,NMAC). Modified APIs refers to those APIs whose call graph is not the same as previous version. Take the first row for example, if upgrading the library into version 3.1.3. Given that 103 APIs is used in module flink-filesystems/flink-fs-hadoop-shaded, 0 of them is deleted in a recommended version(which will throw a NoMethodFoundError unless re-compiling the project), 55 of them is regarded as modified which could break the former API contract.
||Index||Module||NA(NAC)||NDA(NDAC)||NMA(NMAC)||
|1|flink-filesystems/flink-fs-hadoop-shaded|103(223)|0(0)|55(115)|
|2|flink-filesystems/flink-s3-fs-base|2(4)|0(0)|1(1)|
|3|flink-yarn-tests|0(0)|0(0)|0(0)|
|4|..|..|..|..|


Also we provided another table to show the potential files that may be affected due to library API change, which could help to spot the concerned API usage and rerun the test cases. The table is listed below.


||Module||File||Type||API||
|flink-filesystems/flink-s3-fs-base|flink-filesystems/flink-s3-fs-base/src/main/java/org/apache/flink/fs/s3/common/writer/S3RecoverableMultipartUploadFactory.java|modify|org.apache.hadoop.fs.Path.isAbsolute()|
|flink-filesystems/flink-fs-hadoop-shaded|flink-filesystems/flink-fs-hadoop-shaded/src/main/java/org/apache/hadoop/util/VersionInfo.java|modify|org.apache.hadoop.util.VersionInfo._getDate()|
|flink-filesystems/flink-fs-hadoop-shaded|flink-filesystems/flink-fs-hadoop-shaded/src/main/java/org/apache/hadoop/util/VersionInfo.java|modify|org.apache.hadoop.util.VersionInfo._getBuildVersion()|
|4|..|..|..|



 

As for false consistency, take log4j log4j jar for example. The library is declared in version 1.2.17 in all modules. However they are declared differently. As components are developed in parallel, if one single library version is updated, which could become inconsistent as mentioned above, may cause above-mentioned inconsistency issues


If you are interested, you can have a more complete and detailed report in the attached PDF file.
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)