Advanced JSON parsing techniques using Moshi and Kotlin

A match made in parser heaven

11 min readJul 30, 2018

Moshi is a modern JSON library for Android and Java from Square. It can be considered as the successor to GSON, with a simpler and leaner API and an architecture enabling better performance through the use of the Okio library. It’s also the most Kotlin-friendly library you can use to parse JSON files, as it comes with Kotlin-aware extensions.

In this article I’m going to demonstrate how to take advantage of features from both the Moshi library and the Kotlin language itself in order to write efficient and robust JSON parsers.

The example model and JSON file

Consider the following model representing a person:

class Person(val id: Long, val name: String, val age: Int = -1)

id and name are mandatory properties, while the age is optional with a default value of -1.

Our objective is to load a list of Person objects in our application from a JSON file or stream with the following contents:

[
  {
    "id": 1,
    "name": "John",
    "age": 38
  },
  {
    "id": 8,
    "name": "Lisa",
    "age": 23
  },
  {
    "id": 23,
    "name": "Karen"
  }
]

In this simple example, the JSON object key names exactly match the Person property names and the "age" key is also optional.

1. Fully manual parsing

The most basic way of parsing JSON using Moshi is to use the streaming API, which is similar to the streaming API of GSON and the one provided by the Android Framework. This gives you the most control over the parsing process, which is especially useful when the JSON source is dirty.

Typical code would look like this:

class ManualParser {
    fun parse(reader: JsonReader): List<Person> {
        val result = mutableListOf<Person>()

        reader.beginArray()
        while (reader.hasNext()) {
            var id: Long = -1L
            var name: String = ""
            var age: Int = -1

            reader.beginObject()
            while (reader.hasNext()) {
                when (reader.nextName()) {
                    "id" -> id = reader.nextLong()
                    "name" -> name = reader.nextString()
                    "age" -> age = reader.nextInt()
                    else -> reader.skipValue()
                }
            }
            reader.endObject()

            if (id == -1L || name == "") {
                throw JsonDataException("Missing required field")
            }
            val person = Person(id, name, age)
            result.add(person)
        }
        reader.endArray()

        return result
    }
}

Introducing `selectName()` for performance

We can further optimize the parser by leveraging a feature introduced in Moshi 1.5: the ability to compare the next bytes of the stream to expected JSON object key names or values. This is done using the JsonReader.selectName() and JsonReader.selectString() methods, respectively.

How does that improve performance? By looking at the JSON file, you can see that some string values are being repeated multiple times: the key names for the JSON objects are known in advance and match the property names of the Person class ("id", "name", "age"). So instead of letting Moshi decode the UTF-8 string and allocate memory for it every time a key name is encountered in the stream, we can skip that step and simply compare the next bytes of the stream with a preloaded collection of known key names, already encoded in UTF-8. If the byte sequence is found in the collection, its index will be returned by selectName(). This is made possible using some neat features of the lower-level Okio library.

Note: Moshi is optimized for, and only works with, JSON files encoded in UTF-8. If you still use a different character encoding in 2018, shame on you.

After replacing nextName() with selectName() in the parser, the code now looks like this:

class ManualParser {
    companion object {
        val NAMES = JsonReader.Options.of("id", "name", "age")
    }
    
    fun parse(reader: JsonReader): List<Person> {
        val result = mutableListOf<Person>()

        reader.beginArray()
        while (reader.hasNext()) {
            var id: Long = -1L
            var name: String = ""
            var age: Int = -1

            reader.beginObject()
            while (reader.hasNext()) {
                when (reader.selectName(NAMES)) {
                    0 -> id = reader.nextLong()
                    1 -> name = reader.nextString()
                    2 -> age = reader.nextInt()
                    else -> {
                        reader.skipName()
                        reader.skipValue()
                    }
                }
            }
            reader.endObject()

            if (id == -1L || name == "") {
                throw JsonDataException("Missing required field")
            }
            val person = Person(id, name, age)
            result.add(person)
        }
        reader.endArray()

        return result
    }
}

It’s a bit less readable than before since the when expression now mentions array indices instead of the string values directly. That can be improved by declaring extra constants for these indices in the companion object, at the expense of adding more lines of code.

Reducing boilerplate code using extension functions

The code for reading a JSON array and a JSON object using a streaming API always follows the same patterns. These patterns can be extracted to extension functions to avoid the repetition. This especially improves readability and reduces the risk of errors for JSON files with multiple levels of nested arrays and objects.

fun JsonReader.skipNameAndValue() {
    skipName()
    skipValue()
}inline fun JsonReader.readObject(body: () -> Unit) {
    beginObject()
    while (hasNext()) {
        body()
    }
    endObject()
}

inline fun JsonReader.readArray(body: () -> Unit) {
    beginArray()
    while (hasNext()) {
        body()
    }
    endArray()
}

It’s recommended to declare the higher-order functions as inline so that calling them introduces no additional cost compared to the original code (no extra objects allocations) and the bytecode will be identical.

We can also create another version of readArray() which goes one step further and builds and returns the list as well.

inline fun <T : Any> JsonReader.readArrayToList(body: () -> T?): List<T> {
    val result = mutableListOf<T>()
    beginArray()
    while (hasNext()) {
        body()?.let { result.add(it) }
    }
    endArray()
    return result
}

When using that function, the body of the lambda must return the item to be added to the list, or null to skip it.

With these new functions the parser code becomes much shorter:

class ManualParser {
    companion object {
        val NAMES = JsonReader.Options.of("id", "name", "age")
    }

    fun parse(reader: JsonReader): List<Person> {
        return reader.readArrayToList {
            var id: Long = -1L
            var name: String = ""
            var age: Int = -1

            reader.readObject {
                when (reader.selectName(NAMES)) {
                    0 -> id = reader.nextLong()
                    1 -> name = reader.nextString()
                    2 -> age = reader.nextInt()
                    else -> reader.skipNameAndValue()
                }
            }

            if (id == -1L || name == "") {
                throw JsonDataException("Missing required field")
            }
            Person(id, name, age)
        }
    }
}

Immutability, default values and mandatory fields

To be able to enforce the immutability of the Person object (all its fields are vals), we need to pass all the field values at once in its constructor, which means that we need to declare one extra variable for each field.

var id: Long = -1L
var name: String = ""
var age: Int = -1

Variables for optional fields (age) need to be assigned a default value matching the default value of the field.
Variables for mandatory fields (id, name) are assigned an arbitrary default value outside of the range of valid values. null can also be used as default value for non-null mandatory fields. Then before instantiating the model object, we need to check that these variables have been reassigned valid values, otherwise throw an Exception or skip the item:

if (id == -1L || name == "") {
    throw JsonDataException("Missing required field")
}

Manual fields validation makes the parser code both more verbose and error-prone because we need to remember updating the parser every time we change the mandatory fields or update the default values in the model.

Note: We can avoid declaring these variables entirely in the parser by making the model fully mutable (declaring all fields as vars) with an empty constructor.

class Person {
    var id: Long = 0L
    var name: String = ""
    var age: Int = -1
}

Benefits:

Less lines of code in the parser;
We can rely on the model default values instead of duplicating them in the parser code.

Downsides:

We lose all the benefits of immutability: coherent states, no side effects, thread safety;
All fields are now declared as optional. We need to document which ones are in practice mandatory and the parser still needs to manually check if they have been assigned a valid value.

Because the downsides outweigh the benefits, I recommend to always favor immutability in your model objects when you can, especially since Kotlin makes this easier than Java.

2. Moshi’s Kotlin Code Gen

A better way to validate mandatory Kotlin fields and respect default values while keeping the objects immutable is to let Moshi do this work for us by using the automatic JSON to Kotlin converters it provides.

Moshi has two ways to generate these converters which are called JsonAdapters:

Using reflection, via the moshi-kotlin artifact. Adapters will be generated at runtime when needed, then reused. The main downside of this solution is that in order to understand metadata like nullability and default values of Kotlin properties, this artifact depends on the kotlin-reflect library, a 2.5 MiB jar file which will make your application code size grow significantly.
Since Moshi 1.6, adapters can also be generated at compile time using the moshi-kotlin-codegen annotation processor. This is a much better solution because it provides better performance while adding no extra dependency to your application. The only limitation of that solution compared to kotlin-reflect is that it is unable to initialize private or protected fields, but the same limitations apply when writing your parsers manually.

Setup moshi-kotlin-codegen

Make sure you enable the kapt plugin in your application’s build.gradle file.

apply plugin: 'kotlin-kapt'

Then add the annotation processor to your dependencies.

dependencies {
    kapt 'com.squareup.moshi:moshi-kotlin-codegen:1.6.0'
}

Now for every Kotlin class for which you want to generate a JsonAdapter implementation at compile time, add the @JsonClass annotation on top of it with the generateAdapter element set to true. In this example, we’ll add it to the Person class.

@JsonClass(generateAdapter = true)
class Person(val id: Long, val name: String, val age: Int = -1)

By default, the generated parser will match the property names with the JSON object key names. To override that, add the @Json annotation on the property to specify its JSON key name.

Moshi also allows to fully customize the way each property is converted from or to JSON, by using custom annotations associated with your own custom type adapters. For more information, check out the official documentation.

A look at the generated code

When building the project, the annotation processor will create a new Kotlin class called PersonJsonAdapter. Let’s evaluate its code quality.

class PersonJsonAdapter(moshi: Moshi) : JsonAdapter<Person>() {
    private val options: JsonReader.Options = JsonReader.Options.of("id", "name", "age")

    private val longAdapter: JsonAdapter<Long> = moshi.adapter(Long::class.java).nonNull()

    private val stringAdapter: JsonAdapter<String> = moshi.adapter(String::class.java).nonNull()

    private val intAdapter: JsonAdapter<Int> = moshi.adapter(Int::class.java).nonNull()

    override fun toString(): String = "GeneratedJsonAdapter(Person)"

    override fun fromJson(reader: JsonReader): Person {
        var id: Long? = null
        var name: String? = null
        var age: Int? = null
        reader.beginObject()
        while (reader.hasNext()) {
            when (reader.selectName(options)) {
                0 -> id = longAdapter.fromJson(reader) ?: throw JsonDataException("Non-null value 'id' was null at ${reader.path}")
                1 -> name = stringAdapter.fromJson(reader) ?: throw JsonDataException("Non-null value 'name' was null at ${reader.path}")
                2 -> age = intAdapter.fromJson(reader) ?: throw JsonDataException("Non-null value 'age' was null at ${reader.path}")
                -1 -> {
                    // Unknown name, skip it.
                    reader.skipName()
                    reader.skipValue()
                }
            }
        }
        reader.endObject()
        var result = Person(
                id = id
                        ?: throw JsonDataException("Required property 'id' missing at ${reader.path}"),
                name = name
                        ?: throw JsonDataException("Required property 'name' missing at ${reader.path}"))
        result = Person(
                id = id ?: result.id,
                name = name ?: result.name,
                age = age ?: result.age)
        return result
    }

    override fun toJson(writer: JsonWriter, value: Person?) {
        // Removed for brevity
    }
}

We can notice that this code is quite similar to what we wrote in the manual parser.

It uses selectName() for better performance, which is good. The options are not stored inside a companion object but it doesn’t matter because a JsonAdapter is only instantiated once then reused, making it effectively a singleton. Non-null properties are enforced. Mandatory fields are properly checked and an exception is thrown when they are missing.

Only two things aren’t perfect compared to a smart manually-written parser:

Nullable variables are used to store values of non-null primitive types (Long, Int) and generic adapters are used to read them ( longAdapter, intAdapter), which means unnecessary boxing will occur when reading these values.
For each item, two instances of Person are created instead of one. The reason behind this is to allow reading back all the default values of optional fields from the first instance when creating the second, final instance.
This should not cause any performance issue, unless your model class includes heavy initialization code that you want to avoid doing twice.
This also means that when all fields are mandatory, only one instance will be created by the adapter.

These are minor performance issues, so I recommend that you use the automatically generated Kotlin JsonAdapters when both your code and the JSON data source allow it.

Using the generated adapters

It’s preferred to use the Moshi API to retrieve adapter instances, because it will automatically locate the class, create a single instance of it and put it in a cache so it can be shared between parsers.

val adapter: JsonAdapter<Person> = moshi.adapter(Person::class.java)

The generated adapter converts JSON to a single Person object. But the JSON file contains an array of persons so we want to retrieve a List<Person> instead. There are two ways to achieve this.

First, we can ask Moshi to build a generic list adapter which will use the generated PersonJsonAdapter under the hood. This is done by using Moshi’s Types utility class to build a type representing List<Person>:

val listType = Types.newParameterizedType(List::class.java, Person::class.java)
val adapter: JsonAdapter<List<Person>> = moshi.adapter(listType)val result = adapter.fromJson(reader)

But more interestingly, we can also use the generated adapters inside our own parsers, effectively mixing and matching manual and automatic parsing. This is especially useful when the JSON source is not very well structured or contains a lot of unnecessary layers that we want to skip, but at the same time we still want to leverage the automatic parsing in some parts of the file.

For our example, we can write a hybrid parser which builds a list of Person objects while filtering out children and people of unknown age:

class HybridParser(moshi: Moshi) {
    private val personAdapter: JsonAdapter<Person> = moshi.adapter(Person::class.java)

    fun parse(reader: JsonReader): List<Person> {
        return reader.readArrayToList {
            personAdapter.fromJson(reader)?.takeIf { it.age >= 18 }
        }
    }
}

We start by manually parsing the JSON array using the readArrayToList() extension function defined earlier. It was designed to not add the item to the list when the lambda returns null, so we can use takeIf() to filter out items.

Generated JsonAdapters are a great way to remove boilerplate code from manual JSON parsers.
Of course, you can also write your own custom JsonAdapters and reuse them in your code.

If you want to learn more about Moshi’s Kotlin Code Gen, you can read this excellent article from its main contributor, Zac Sweers.

3. Coroutines magic for big data sources

In the previous example we saw how to filter out items inside the parser. Now, what if we wanted to filter out items from outside the parser, or wanted to process parsed items one by one, without storing them all at once into memory?

We can achieve this by creating a lazy streaming parser which returns a Kotlin Sequence instead of a List. Sequences allow to keep memory usage under control when dealing with potentially huge JSON lists, as they only deal with one item at a time and all processing is delayed until their final collection. A Sequence can also be converted to an Iterable for Java compatibility.

While it is possible to manually write a custom Sequence with a custom Iterator which will resume parsing every time next() is called, it is not an easy task. It requires some mental gymnastics because the parsing code, instead of being structured hierarchically like the JSON file in one single block, has to be split into separate parts sharing the same state.

Fortunately, there is an easier way thanks to Kotlin’s coroutines which allow to build a lazy sequence using only a single block of synchronous code and the yield() suspending function.

class LazyParser(moshi: Moshi) {
    private val personAdapter: JsonAdapter<Person> = moshi.adapter(Person::class.java)

    fun parse(reader: JsonReader): Sequence<Person> {
        return sequence {
            reader.readArray {
                yield(personAdapter.fromJson(reader)!!)
            }
        }
    }
}

The code is very simple. It’s similar to building a list in one go, except that the calls to add an item to the list are replaced with yield(item). The parser will effectively stop at that point, preserving its state, then resume later when the next item of the Sequence is requested by the caller. That’s all the code needed to create a fully lazy JSON parser in Kotlin.

It’s then possible filter out items, iterate over them or do anything else lazily outside of the parser:

val sequence = LazyParser(moshi).parse(reader)
val filteredList = sequence.filter { it.age >= 18 }.toList()

I hope this article was able to inspire you to make your JSON parsers cleaner, faster and fun. Feel free to share it and also ask questions or provide more examples in the comments. Happy coding!