Nicolas Ferrario created FLINK-21825:
---------------------------------------- Summary: AvroInputFormat doesn't honor -1 length on FileInputSplits Key: FLINK-21825 URL: https://issues.apache.org/jira/browse/FLINK-21825 Project: Flink Issue Type: Bug Components: API / Core Affects Versions: 1.12.2 Reporter: Nicolas Ferrario FileInputSplit documentation says a length of *-1* means that the whole file should be read, however AvroInputFormat expects the exact size. [https://github.com/apache/flink/blob/97bfd049951f8d52a2e0aed14265074c4255ead0/flink-core/src/main/java/org/apache/flink/core/fs/FileInputSplit.java#L50] {code:java} /** * Constructs a split with host information. * * @param num the number of this input split * @param file the file name * @param start the position of the first byte in the file to process * @param length the number of bytes in the file to process (-1 is flag for "read whole file") * @param hosts the list of hosts containing the block, possibly <code>null</code> */ public FileInputSplit(int num, Path file, long start, long length, String[] hosts){code} [https://github.com/apache/flink/blob/97bfd049951f8d52a2e0aed14265074c4255ead0/flink-formats/flink-avro/src/main/java/org/apache/flink/formats/avro/AvroInputFormat.java#L141] {code:java} private DataFileReader<E> initReader(FileInputSplit split) throws IOException { DatumReader<E> datumReader; if (org.apache.avro.generic.GenericRecord.class == avroValueType) { datumReader = new GenericDatumReader<E>(); } else { datumReader = org.apache.avro.specific.SpecificRecordBase.class.isAssignableFrom( avroValueType) ? new SpecificDatumReader<E>(avroValueType) : new ReflectDatumReader<E>(avroValueType); } if (LOG.isInfoEnabled()) { LOG.info("Opening split {}", split); } SeekableInput in = new FSDataInputStreamWrapper( stream, split.getPath().getFileSystem().getFileStatus(split.getPath()).getLen()); DataFileReader<E> dataFileReader = (DataFileReader) DataFileReader.openReader(in, datumReader); if (LOG.isDebugEnabled()) { LOG.debug("Loaded SCHEMA: {}", dataFileReader.getSchema()); } end = split.getStart() + split.getLength(); <--- THIS LINE recordsReadSinceLastSync = 0; return dataFileReader; } {code} This could be fixed either by updating the documentation, or by honoring -1 in AvroInputFormat. -- This message was sent by Atlassian Jira (v8.3.4#803005) |
Free forum by Nabble | Edit this page |