[jira] [Created] (FLINK-13292) NullPointerException when reading a string field in a nested struct from an Orc file.

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (FLINK-13292) NullPointerException when reading a string field in a nested struct from an Orc file.

Shang Yuanchun (Jira)
Alejandro Sellero created FLINK-13292:
-----------------------------------------

             Summary: NullPointerException when reading a string field in a nested struct from an Orc file.
                 Key: FLINK-13292
                 URL: https://issues.apache.org/jira/browse/FLINK-13292
             Project: Flink
          Issue Type: Bug
          Components: Connectors / ORC
    Affects Versions: 1.8.0
            Reporter: Alejandro Sellero


When I try to read an Orc file using flink-orc an NullPointerException exception is thrown.
I think this issue could be related with this closed issue https://issues.apache.org/jira/browse/FLINK-8230

This happens when trying to read the string fields in a nested struct. This is my schema:
{code:java}
      "struct<" +
        "operation:int," +
        "originalTransaction:bigInt," +
        "bucket:int," +
        "rowId:bigInt," +
        "currentTransaction:bigInt," +
        "row:struct<" +
        "id:int," +
        "headline:string," +
        "user_id:int," +
        "company_id:int," +
        "created_at:timestamp," +
        "updated_at:timestamp," +
        "link:string," +
        "is_html:tinyint," +
        "source:string," +
        "company_feed_id:int," +
        "editable:tinyint," +
        "body_clean:string," +
        "activitystream_activity_id:bigint," +
        "uniqueness_checksum:string," +
        "rating:string," +
        "kununu_review_id:int," +
        "soft_deleted:tinyint," +
        "type:string," +
        "metadata:string," +
        "url:string," +
        "imagecache_uuid:string," +
        "video_id:int" +
        ">>",{code}
{code:java}
[error] Caused by: java.lang.NullPointerException
[error] at java.lang.String.checkBounds(String.java:384)
[error] at java.lang.String.<init>(String.java:462)
[error] at org.apache.flink.orc.OrcBatchReader.readString(OrcBatchReader.java:1216)
[error] at org.apache.flink.orc.OrcBatchReader.readNonNullBytesColumnAsString(OrcBatchReader.java:328)
[error] at org.apache.flink.orc.OrcBatchReader.readField(OrcBatchReader.java:215)
[error] at org.apache.flink.orc.OrcBatchReader.readNonNullStructColumn(OrcBatchReader.java:453)
[error] at org.apache.flink.orc.OrcBatchReader.readField(OrcBatchReader.java:250)
[error] at org.apache.flink.orc.OrcBatchReader.fillRows(OrcBatchReader.java:143)
[error] at org.apache.flink.orc.OrcRowInputFormat.ensureBatch(OrcRowInputFormat.java:333)
[error] at org.apache.flink.orc.OrcRowInputFormat.reachedEnd(OrcRowInputFormat.java:313)
[error] at org.apache.flink.runtime.operators.DataSourceTask.invoke(DataSourceTask.java:190)
[error] at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
[error] at java.lang.Thread.run(Thread.java:748){code}
Instead to use the TableApi I am trying to read the orc files in the Batch mode as following:
{code:java}
      env
        .readFile(
          new OrcRowInputFormat(
          "",
          "SCHEMA_GIVEN_BEFORE",
          new HadoopConfiguration()
        ),
        "PATH_TO_FOLDER"
        )
        .writeAsText("file:///tmp/test/fromOrc")
{code}


Thanks for your support



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)