Blog

Highlights from the new Apache Avro 1.9.0 release

14 May, 2019
Xebia Background Header Wave

Since the last release of Apache Avro 1.8.2 on May 31, 2017. Two years later, I’m thrilled to announce the release of Apache Avro 1.9.0.

Avro is a remote procedure call and data serialization framework developed within Apache’s Hadoop project. It uses JSON for defining data types and protocols, and serializes data in a compact binary format. If you’re unfamiliar with Avro, I would highly recommend the explanation of Dennis Vriend at Binx.io about an introduction into Avro.

Over 272 Jira tickets have been resolved, and 844 PRs are included since 1.8.2. I’d like to point out several major changes.

Deprecate Joda-Time in favor of Java8 JSR310

Before Avro 1.9 the Joda time library was used for handling the logical date(time) values. But since the introduction of Java8 the Java Specification Request (JSR) 310 has been included, which greatly improves the handling of date and time natively. Avro is now built by default with Java 8. To speed up the deprecation process, the JSR310 datetimes are enabled by default, which might introduce some regression problems if you’re upgrading from <1.9.


  org.apache.avro
  avro-maven-plugin
  1.9.0
  
    
      generate-sources
      
        schema
      
      
        ${project.basedir}/src/main/avro/
        ${project.basedir}/src/main/java/
        joda
      
    
  

It is possible to fall back to Joda time by setting configuration, as above. Please note that this won’t break any compatibility since it will only affect the logical type. The physical type will still be seconds since epoch, in case of a date.

Move from Jackson 1.x to 2.9

The schema of Avro is encoded in JSON. The handling of reading the JSON is being done by Java’s popular library called Jackson. The old Codehaus Jackson 1.x has been replaced by FasterXML’s Jackson 2.9. It is always important to keep Jackson up to date since it is prone to security issues. This is not because of bad design or so, but because of the nature of the library, it will read arbitrary input potentially from unverified sources.

Beside updating Jackson to the latest version, we also removed Jackson classes from public API. This might be a source of breaking API changes when upgrading. Previously, when you want to make a field default to null (which is important for schema evolution), you would pass Jackson’s NullNode:

import org.codehaus.jackson.node.NullNode;

new Schema.Field(name, schema, doc, NullNode.getInstance());

This has been replaced by Avro’s JsonProperties:

import org.apache.avro.JsonProperties;

new Schema.Field(name, schema, doc, JsonProperties.NULL_VALUE);

Support for ZStandard compression

Since Avro 1.9 support for Facebook’s ZStandard compression has been added. Zstandard is a real-time compression algorithm, providing high compression ratios. It offers a very wide range of compression/speed trade-offs, while being backed by a very fast decoder. Be careful with using this, if you write using ZStandard compression, then you also need to read it with Avro 1.9+.

Remove support for Hadoop 1.x

If you’re still at Hadoop 1.x, then you need to upgrade to use Avro 1.9+. Apache Avro MapReduce is compiled and tested with Hadoop 3, but not offically supported (yet).

Avro is now leaner

Multiple dependencies were removed: guava, paranamer, commons-codec, and commons-logging which makes the footprint of Avro smaller, and lowers the possibilities of having classpath collissions with other libraries.

Java 11 support

Apache Avro is compiled and tested with Java 11 to guarantee compatibility. Java8 is offically supported until March 2022, but it is good practice to already check compatibility with the new EOL release of Java.

Questions?

Get in touch with us to learn more about the subject and related solutions

Explore related posts