Alejandro Sellero created FLINK-13292:
----------------------------------------- Summary: NullPointerException when reading a string field in a nested struct from an Orc file. Key: FLINK-13292 URL: https://issues.apache.org/jira/browse/FLINK-13292 Project: Flink Issue Type: Bug Components: Connectors / ORC Affects Versions: 1.8.0 Reporter: Alejandro Sellero When I try to read an Orc file using flink-orc an NullPointerException exception is thrown. I think this issue could be related with this closed issue https://issues.apache.org/jira/browse/FLINK-8230 This happens when trying to read the string fields in a nested struct. This is my schema: {code:java} "struct<" + "operation:int," + "originalTransaction:bigInt," + "bucket:int," + "rowId:bigInt," + "currentTransaction:bigInt," + "row:struct<" + "id:int," + "headline:string," + "user_id:int," + "company_id:int," + "created_at:timestamp," + "updated_at:timestamp," + "link:string," + "is_html:tinyint," + "source:string," + "company_feed_id:int," + "editable:tinyint," + "body_clean:string," + "activitystream_activity_id:bigint," + "uniqueness_checksum:string," + "rating:string," + "kununu_review_id:int," + "soft_deleted:tinyint," + "type:string," + "metadata:string," + "url:string," + "imagecache_uuid:string," + "video_id:int" + ">>",{code} {code:java} [error] Caused by: java.lang.NullPointerException [error] at java.lang.String.checkBounds(String.java:384) [error] at java.lang.String.<init>(String.java:462) [error] at org.apache.flink.orc.OrcBatchReader.readString(OrcBatchReader.java:1216) [error] at org.apache.flink.orc.OrcBatchReader.readNonNullBytesColumnAsString(OrcBatchReader.java:328) [error] at org.apache.flink.orc.OrcBatchReader.readField(OrcBatchReader.java:215) [error] at org.apache.flink.orc.OrcBatchReader.readNonNullStructColumn(OrcBatchReader.java:453) [error] at org.apache.flink.orc.OrcBatchReader.readField(OrcBatchReader.java:250) [error] at org.apache.flink.orc.OrcBatchReader.fillRows(OrcBatchReader.java:143) [error] at org.apache.flink.orc.OrcRowInputFormat.ensureBatch(OrcRowInputFormat.java:333) [error] at org.apache.flink.orc.OrcRowInputFormat.reachedEnd(OrcRowInputFormat.java:313) [error] at org.apache.flink.runtime.operators.DataSourceTask.invoke(DataSourceTask.java:190) [error] at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711) [error] at java.lang.Thread.run(Thread.java:748){code} Instead to use the TableApi I am trying to read the orc files in the Batch mode as following: {code:java} env .readFile( new OrcRowInputFormat( "", "SCHEMA_GIVEN_BEFORE", new HadoopConfiguration() ), "PATH_TO_FOLDER" ) .writeAsText("file:///tmp/test/fromOrc") {code} Thanks for your support -- This message was sent by Atlassian JIRA (v7.6.14#76016) |
Free forum by Nabble | Edit this page |